## Abstract

Floods are becoming the most challenging hydrologic issue in the Kelantan River basin in Malaysia. All three flood characteristics, i.e. peak flow, flood volume and flood duration, are important when formulating actions and measures to manage flood risk. Therefore, estimating the multivariate designs and their associated return periods is an essential element of making informed risk-based decisions in this river basin. In this paper, the efficacy of a kernel density estimator is tested by assessing the adequacy of kernel functions for capturing flood marginal density of 50 years (from 1961 to 2016) of daily streamflow data collected at Gulliemard Bridge gauge station in the Kelantan River basin. Tests for stationarity or the existence of serial correlation within the flood series is often a pre-requisite before introducing the random samples into a univariate or a multivariate framework. It was found that homogeneity existed within the flood vector series. It was concluded therefore that time series of the flood vectors do not exhibit any significant trend. Based on analytically based fitness measures, it was concluded that it is likely that Triweight kernel function is the best-fitted distribution for defining the marginal distribution of peak flows, flood volumes and flood durations in the Kelantan River basin.

## INTRODUCTION

From the perspective of water resources operational planning in a river basin, the design and operation of flood management infrastructure often demands an accurate estimate of the flow exceedance probability for assessing hydrologic risk (i.e. Salvadori 2004; Xu *et al.* 2015). Flood frequency analysis (FFA) establishes an association between extreme event quantiles and their non-exceedance probabilities by fitting a univariate or multivariate probability distribution function (Cunnane 1988; Rao & Hameed 2000; Zhang 2005). A flood can be defined through three inter-correlated random vectors; that is, peak flow, flood volume and flood duration, which can lead univariate distribution analysis of return periods to underestimate or overestimate floods (Yue *et al.* 2001; Zhang & Singh 2006; Reddy & Ganguli 2012). This has led researchers to explore multivariate distribution analysis for the estimation of flood design quantiles under different notations of return periods based on joint distribution, conditional joint distribution or Kendall's distribution and the establishment of joint probability density functions (or PDFs) and joint cumulative distribution functions (or CDFs) for various possible combinations of the flood vectors (Yue 1999; Zhang & Singh 2006; Daneshkhan *et al.* 2016). Most of the earlier research applied traditional multivariate functions such as bivariate normal, gamma and lognormal distributions (i.e. Yue 1999, 2000, 2001). However, several modelling constraints and limitations have prompted the incorporation of multivariate copulas distribution analysis (i.e. De Michele & Salvadori 2003; Favre *et al.* 2004) which is frequently undertaken via a parametric framework, where both the marginal distribution of each targeted flood vector and their joint dependence structure are modeled under univariate parametric functions or copulas distribution (i.e. Favre *et al.* 2004; Zhang 2005; Veronika & Halmova 2014; Fan & Zheng 2016 and references therein). While some researchers have implemented copulas under semiparametric settings where marginal functions are approximated via nonparametric; that is, kernel density functions (KDE) or orthonormal series, their joint structures are still modelled under parametric settings (i.e. Karmakar & Simonovic 2008, 2009; Reddy & Ganguli 2012).

Selecting the most parsimonious univariate functions for defining flood marginal distributions is often a mandatory pre-requisite before establishing their joint dependence structure. Parametric functions have always imposed an assumption that the random samples are coming from known populations whose PDFs are pre-defined; that is, marginal distributions are assumed to follow a specific family of parametric probability functions. In reality, however, the best fitted marginal distributions may not be from the same probability distribution family (Adamowski 1985; Silverman 1986; Kim & Heo 2002; Botev *et al.* 2010). Dooge (1986) has already pointed out that no amount of statistical refinement can overcome the consequences of a lack of prior probability distribution information of the observed random samples. More especially, in the case of multimodal or skewed distributions, parametric distributions might be incompatible and lead to inconsistencies in the estimated quantiles. According to Bardsley (1988) and Bardsley & Manly (1987), the approximation of any distribution tail beyond the largest value under a parameter distribution can be a difficult task. In the last few decades, in the field of hydrologic or flood frequency analysis an attempt has been made to define bona fide density functions via kernel density estimators, or KDE, which are recognized as a flexible and very stable nonparametric data smoothing procedure for approximating or inferencing populations based on finite observational (Adamowski 1989, 2000; Guo 1991; Lall *et al.* 1993; Bowman & Azzalini 1997; Efromovich 1999; Kim *et al.* 2003; Ghosh & Mujumdar 2007). Alternate theoretical overviews for nonparametric setting have been described in the earlier literature such as Rosenblatt (1956), Parzen (1962) and Bartlett (1963). Nonparametric distributions do not require any prior distribution assumptions and have a higher level of flexibility in their univariate function in comparison with parametric density estimators (Adamowski 1989; Moon & Lall 1994).

Adamowski (1985) retrieved the flood frequencies curve using nonparametric kernel estimators while Adamowski & Labatiuk (1987) compared the difference between real and synthetic data (derived from a Monte Carlo procedure) and identified the modelling efficiency of nonparametric density simulations. Similarly, Adamowski (1989) performed comparative assessments between parametric functions (i.e. Weibull and log-Pearson type III) and nonparametric density functions based on the Monte Carlo simulation statistics and found nonparametric density functions demonstrations performed better than the parametric functions. Adamowski & Feluch (1990) were the first to identify the efficiency of kernel estimators through selecting an appropriate kernel function for the extrapolation of distribution samples beyond the available record. Adamowski (1996) applied the nonparametric procedure to drought modelling while Kim & Heo (2002) compared nonparametric distributions from parametric functions for annual maximum flood samples. Lall *et al.* (1993) postulated the different orientations required to select an appropriate shape representation for kernel functions and their bandwidth for different distribution scenarios such as symmetrical, asymmetrical or skewed and mixed data structures. Kim *et al.* (2003) estimated the return periods of low-flow or drought characteristics using a nonparametric distribution framework. Kim *et al.* (2006) established bivariate drought characteristics using the Palmer drought index in a nonparametric procedure. Karmakar & Simonovic (2008) approximated the flood marginal densities by introducing both parametric and nonparametric functions. In this experiment, nonparametric based kernel density functions as well as orthonormal series functions, which are demonstrated earlier in the literature; for example, Bowman & Azzalini (1997) and Efromovich (1999), were used to establish univariate density functions. Santhosh & Srinivas (2013) demonstrated a flood frequency relationship by employing a bivariate diffusion process based on the adaptive kernel functions; that is, D-kernel function, and compared this against parametric copulas functions. This demonstration revealed that the nonparametric methodology is a data-driven approach that poses the higher degree of flexibility when establishing probability density functions. Lall *et al.* (1996), Zhang & Karunamuni (1998), Kim *et al.* (2003), Karmakar & Simonovic (2008) and Santhosh & Srinivas (2013) all pointed out the flexibility of an interactive sets of kernel functions for modelling extreme hydrological episodes such as Gaussian or Normal, Epanechnikov, Triangular and Quadratic functions.

Floods are becoming the most challenging hydrologic issue in the Kelantan River basin in Malaysia, and particularly during the period of wet monsoons (DID 2004; MMD 2007). Recent extreme weather across the basin includes the intense and prolonged precipitation in the year 2002, which caused flooding and affected a total area of 1,640 km^{2} with a total population of 714,287, while in the year 2014 the worst flood ever recorded in history affected more than 200,000 people in the several parts of the east coast of the Kelantan River basin. The river is about 248 km long and drains a catchment area of 13,100 km^{2} The catchment is 150 km long and 140 km wide. During the wet season, between mid-October and mid-January, this basin receives a rainfall of about 2,500 mm. According to the studies of Hussain & Ismail (2013), the Gulliemard Bridge, Lebir and Galas gauge stations have higher flood frequency than the Nenggiri gauge station. In 2018, Alamgir *et al.* (2018) performed a multivariate analysis of floods under a parametric copulas distribution framework for the different gauge stations of the river basin, while Nashwan *et al.* (2018) performed flood susceptibility assessments at the different gauge stations, which revealed that the downstream area is under the highest risk of devastating floods. A number of studies have also identified the negative consequences of land use changes on the catchment's response (i.e. Wan 1996; Jamaliah 2007).

All three flood characteristics; that is, peak flow, flood volume and flood duration, are important when formulating actions and measures to manage flood risk. For example, when designing retention basins or the spillways for reservoirs or any other infrastructures designs where flood storage is involved, the estimation of flood volume is required as well as the peak discharge in order to calculate the impact of inflow on the storage (Gaál *et al.* 2015). Similarly, estimating the joint behaviour of peak flow-flood volume and flood volume-flood duration when assessing flood control or diversion options (i.e. Xu *et al.* 2015). Therefore, estimating the multivariate designs and their associated return periods is an essential element of making informed risk-based decisions in this river basin.

The case study is divided into two parts, with each part covered in a full paper. The objective of this part of the case study is to identify the best marginal distribution of the flood characteristics. In this paper, the efficacy of a kernel density estimator is tested by assessing the adequacy of an interactive set of kernel functions for capturing the flood marginal density. Demonstration of the efficacy of copulas functions for establishing joint distribution functions whose marginal distributions are approximated with nonparametric distribution or kernel density functions (which are also called semiparametric copulas distribution settings) is introduced separately in the companion paper. An at-site event-based or block (annual) maxima methodology based on 50 years (from 1961 to 2016) of daily stream flow data collected at Gulliemard Bridge gauge station in the Kelantan River basin is described. A brief mathematical overview of the nonparametric procedure in estimating univariate margins via kernel density function is provided below. The sampling procedure for estimating multivariate flood characteristics, estimating marginal distribution functions by employing a variety of univariate kernel functions and also via parametric functions are discussed in the third section of this paper.

## NONPARAMETRIC PROCEDURE FOR MARGINAL DISTRIBUTIONS

### Univariate kernel density estimators, or KDE

Rosenblatt (1956) introduced the concept of kernel estimators through smoothing kernel weights on each of the random observations. The univariate kernel density estimation (or KDE) is a nonparametric approach to approximate the PDFs, say of a given random observations of or flood characteristics such as peak flow, flood volume or flood duration, where inference about the flood populations is made based on finite samples. Therefore, univariate kernel functions are used to estimate the probability density of the random observations having the following statistical property.

^{th}observation and is the kernel density estimate. Table 1 lists five standard univariate kernel functions, which have been reported previously when determining the PDFs and CDFs of hydrologic or flood vectors (i.e. Moon & Lall 1994; Adamowski 2000; Miladinovic 2008; Karmakar & Simonovic 2008).

SI. no. . | Kernel function . | . |
---|---|---|

1 | Epanechnikov | |

2 | Triangular | |

3 | Bi-weight or Quartic | |

4 | Tri-weight | |

5 | Cosine |

SI. no. . | Kernel function . | . |
---|---|---|

1 | Epanechnikov | |

2 | Triangular | |

3 | Bi-weight or Quartic | |

4 | Tri-weight | |

5 | Cosine |

Algorithms . | Literature . |
---|---|

Rule of Thumb (ROT) | Silverman (1986) & Azzalini (1981) |

Least-squares cross-validation (LSCV) | Tarboton et al., (1998) |

Bandwidth factorized cross-validation | Kim & Heo (2002) |

Smoothed cross-validation | Hall (1992) |

Biased cross-validation | Scott & Terrell (1987) |

Maximum likelihood cross-validation (MLCV) | Duin (1976) |

Plug-in estimates | Wand & Jones (1995) |

Asymptotic mean integrated squared error (AMISE) | Ghosh & Huang (1992) |

Algorithms . | Literature . |
---|---|

Rule of Thumb (ROT) | Silverman (1986) & Azzalini (1981) |

Least-squares cross-validation (LSCV) | Tarboton et al., (1998) |

Bandwidth factorized cross-validation | Kim & Heo (2002) |

Smoothed cross-validation | Hall (1992) |

Biased cross-validation | Scott & Terrell (1987) |

Maximum likelihood cross-validation (MLCV) | Duin (1976) |

Plug-in estimates | Wand & Jones (1995) |

Asymptotic mean integrated squared error (AMISE) | Ghosh & Huang (1992) |

The efficiency of the estimated kernel density depends upon two factors: (1) an appropriate choice of the kernel bandwidth and (2) selection of the kernel function. Bandwidth estimation procedures are discussed below. The kernel functions listed in Table 1 are ideally suited to unbounded observations but in actuality most hydrologic data including rainfall, streamflow or humidity and so on, are either upper or lower bounded. This can lead to boundary leakage problems when applying standard kernel functions in the hydrological data domain. This issue has been tackled in a number of ways including using a Method of reflection (Silverman 1986), Methods of transformations (Marron & Ruppert 1994), Beta kernels (Chen 2000) and Adaptive kernels (Botev *et al.* 2010). Such boundary leakage issues during kernel estimation are beyond the scope of this paper.

*et al.*(1996) and Sharma

*et al.*(1998) provide an extended overview of these algorithms and their approaches. Several bandwidth estimation procedures are solely based on minimizing the estimates of the Mean square error or MSE (Shabri 2002). According to Miladinovic (2008), the asymptotic mean integrated square error or AMISE depends on four factors; that is, the kernel bandwidth, the sample size, the kernel function and targeted density function. Therefore, it could be possible to minimize the AMISE value by selecting justifiable kernel functions and their smoothing parameters ‘h’. According to Bowman & Azzalini (1997) and Kim

*et al.*(2003), the optimal bandwidth is typically estimated based on estimates of the integrated square error or ISE (Kim & Heo 2002). In other words, mean integrated square error or MISE, which is the expected value of ISE, determines the overall effectiveness of the estimated kernel density estimators during the asymptotically optimal choice of kernel bandwidth ‘h’ and can be derived as: where the terms and represent the integrated variance and integrated squared bias (Kim & Heo 2002). From Equation (4), it is clear that the second order derivatives of the density functions are required to estimate MISE, and are not defined or unknown. According to Silverman (1986), the rule of thumb or ROT was proposed to minimize the asymptotic MISE value. Therefore, Azzalini (1981) and Silverman (1986) estimated the optimal bandwidth h

_{0}based on a final distribution being Gaussian or symmetrical and can be formulated as: where σ = minimum {Sample standard deviation, (Interquartile range or IQR/1.349)}. Thus, in this paper, the kernel bandwidth is estimated by minimizing the mean integrated square error (MISE) via the optimal bandwidth (h

_{0}) algorithm. Plug-in bandwidth estimators that target the AMISE as the distance to be minimized are also very simple and promising (Wand & Jones 1995). Similarly, the performance of smoothed cross-validation becomes superior only for large sample size (Hall 1992). Beside this, readers are advised to follow the respective papers for visualizing the statistical significance of different bandwidth estimators, as listed in Table 2.

## APPLICATION TO ANNUAL MAXIMUM FLOOD SERIES: A CASE STUDY

### Data pre-processing stages: extraction of flood characteristics

Monsoonal floods seem to have increased in the Kelantan River basin in Malaysia in the last few decades in terms of both frequency and magnitude (DID 2004; MMD 2007). It is the longest river of Kelantan state, which rises in the Tahan mountain range and flows to the South China Sea in the north-eastern part of Peninsular Malaysia between the geographical location of Lat 4° 30′ N to 6° 15′ N and Long 101°E to 102° 45′ E. The Galas River and the Lebir River are the two major tributaries of the Kelantan River. The land use in the upper catchment is forest while agriculture including paddy farms, rubber and oil-palm plantations are the major land-use activities in the middle and lower areas of the catchment. The precipitation for this region typically varies between 0 mm (in the dry period) to 1,750 mm (in the wet or north-eastern monsoonal period) (DID 2004).

*et al.*1985). Daily streamflows were recorded by the Drainage and Irrigation Department, Malaysia, for the period 1961–2016. Peak flows were selected for each year based on the maximum flow record using Equation (7), while the flood volume and flood duration corresponding to each peak flow were estimated using the methodology described by Yue & Rasmussen (2002), Eckhardt (2005), Gonzales

*et al.*(2009), Xu

*et al.*(2015) and given in Equations (8) and (9) and illustrated in Figure 1. where ‘’ is the j

^{th}day streamflow magnitude in the i

^{th}year; ‘’

*and*‘Q

_{ie}’ are the streamflow magnitude at the start date ‘’ and end date ‘of the flood. The flood volume were determined after separating the base flow (i.e. low frequency component) from the direct runoff (i.e. high frequency component). The flood duration extraction was based on the time difference between the rising (SDi) and recession (EDi) limb of the target flood peak flow. A recursive digital filtering procedure in the form of either a single parameter digital filter (i.e. Eckhardt 2004) or a recursive filtering algorithm (Eckhardt 2005) are the two different ways of extracting low-frequency components or base flow separation. In this demonstration, we adopted the Eckhardt (2005) algorithm, which usually provides an effective way to discriminate base flow from direct-surface runoff and is significant for the wider verification of catchments to reveal consistent measures (i.e. Zhang

*et al.*2013). Flood peak flow often attains the maximum value but it is not necessary for flood volume and duration observations (Xu

*et al.*2015). Figure 2 illustrates the time series of annual flood characteristics for the period 1961–2016. The descriptive statistics of the flood characteristics are given in Table 3 and reveal that each flood vector exhibits a positively skewed distribution; that is, asymmetrical behaviour, which is also indicated from the histogram plots given in Figure 3. Figure 4(a) and 4(b) provide the normal quantile-quantile (or q-q) plot and box-whisker plot of the annual flood characteristics.

Descriptive measure . | P(m^{3}/sec)
. | V(m^{3})
. | D(days) . | Percentile . | P(m^{3}/sec)
. | V(m^{3})
. | D(days) . |
---|---|---|---|---|---|---|---|

Sample size | 50 | 50 | 50 | Min | 916.3 | 3,182.3 | 7 |

Range | 19,670 | 71,558 | 57 | 5% | 1,209.1 | 4,334.7 | 8 |

Mean | 6,078 | 19,122 | 19.04 | 10% | 1,647.1 | 4,811.7 | 9.1 |

Variance | 2.15E + 07 | 2.14E + 08 | 117.75 | 25% (Q1) | 2,671.8 | 8,668.5 | 12 |

Standard deviation | 4,639 | 14,623 | 10.851 | 50% (Median) | 4,961 | 15,959 | 16 |

Coefficient of variation | 0.76324 | 0.76473 | 0.56993 | 75% (Q3) | 7,711.7 | 24,476 | 25 |

Standard error | 656.05 | 2,068.1 | 1.5346 | 90% | 11,584 | 43,077 | 28.9 |

Skewness (Fisher) | 1.5532 | 1.6392 | 2.2793 | 95% | 18,581 | 47,790 | 43.35 |

Skewness (Pearson) | 1.506 | 1.590 | 2.210 | Max | 20,586 | 74,740 | 64 |

Kurtosis (Pearson) | 1.883 | 2.864 | 6.252 | ||||

Excess Kurtosis (Fisher) | 2.2158 | 3.3029 | 7.0557 | ||||

Standard error of the mean | 656.050 | 2,068.071 | 1.535 | ||||

Lower bound on mean (95%) | 4,759.628 | 14,966.495 | 15.956 | ||||

Upper bound on mean (95%) | 7,396.392 | 23,278.381 | 22.124 | ||||

Standard error of the variance | 4,347,713.616 | 43,203,375.975 | 23.790 |

Descriptive measure . | P(m^{3}/sec)
. | V(m^{3})
. | D(days) . | Percentile . | P(m^{3}/sec)
. | V(m^{3})
. | D(days) . |
---|---|---|---|---|---|---|---|

Sample size | 50 | 50 | 50 | Min | 916.3 | 3,182.3 | 7 |

Range | 19,670 | 71,558 | 57 | 5% | 1,209.1 | 4,334.7 | 8 |

Mean | 6,078 | 19,122 | 19.04 | 10% | 1,647.1 | 4,811.7 | 9.1 |

Variance | 2.15E + 07 | 2.14E + 08 | 117.75 | 25% (Q1) | 2,671.8 | 8,668.5 | 12 |

Standard deviation | 4,639 | 14,623 | 10.851 | 50% (Median) | 4,961 | 15,959 | 16 |

Coefficient of variation | 0.76324 | 0.76473 | 0.56993 | 75% (Q3) | 7,711.7 | 24,476 | 25 |

Standard error | 656.05 | 2,068.1 | 1.5346 | 90% | 11,584 | 43,077 | 28.9 |

Skewness (Fisher) | 1.5532 | 1.6392 | 2.2793 | 95% | 18,581 | 47,790 | 43.35 |

Skewness (Pearson) | 1.506 | 1.590 | 2.210 | Max | 20,586 | 74,740 | 64 |

Kurtosis (Pearson) | 1.883 | 2.864 | 6.252 | ||||

Excess Kurtosis (Fisher) | 2.2158 | 3.3029 | 7.0557 | ||||

Standard error of the mean | 656.050 | 2,068.071 | 1.535 | ||||

Lower bound on mean (95%) | 4,759.628 | 14,966.495 | 15.956 | ||||

Upper bound on mean (95%) | 7,396.392 | 23,278.381 | 22.124 | ||||

Standard error of the variance | 4,347,713.616 | 43,203,375.975 | 23.790 |

### Testing for stationarity within the flood characteristics

Tests for stationarity or the existence of serial correlation (or autocorrelation) within the flood series is often a pre-requisite before introducing random samples into a univariate or multivariate framework (Daneshkhan *et al.* 2016). Ljung & Box (1978) based hypothesis testing, which is also known as Q-statistics, were performed on each time series of observations. As indicated in Table 4(a), the tests found negligible or zero first-order autocorrelations for each of the flood vector series for different lag sizes (i.e. lag 20, lag 10, lag 5). A nonparametric rank-based Mann-Kendall (or M-K) test (Mann 1945; Kendall 1975) was also performed to test for the existence of any monotonic trend within the historical flood series. As indicated in Table 4(b), the test found zero monotonic trend at the 5% or 0.05 level of significance within the flood vector series. Testing for the existence of a homogenous environment between any two given time points was also investigated for each flood vector through application of a Pettit test (Pettitt 1979), a Buishand test (Buishand 1982), von Neumann's test (Jaiswal *et al.* 2015) and by undertaking Alexanderson's SNHT based hypothesis testing (Alexandersson 1986). As demonstrated in Table 4(c), it was found that homogeneity existed within the flood vector series. It was concluded therefore that the time series of the flood vector do not exhibit any significant trend.

(a) . | |||||
---|---|---|---|---|---|

Flood vectors . | . | Lag 20 . | Lag 10 . | Lag 5 . | . |

P | p-value | 0.78 | 0.466 | 0.43 | |

Q- statistics | 14.9 | 9.719 | 4.89 | ||

V | p-value | 0.92 | 0.678 | 0.99 | |

Q- statistics | 12 | 7.497 | 0.5 | ||

D | p-value | 0.73 | 0.801 | 0.69 | |

Q- statistics | 15.7 | 6.171 | 3.04 | ||

Note: | Critical value | 31.4104 | 18.307 | 11.070 | |

(b) . | |||||

Series/test . | P . | V . | D . | . | . |

Kendall's tau | 0.007 | 0.006533 | −0.041 | ||

S | 9.000 | 8 | −49.000 | ||

Var (S) | 7,541.387 | 8,474.018 | 8,985.326 | ||

p-value (two-tailed) | 0.917 | 0.939386 | 0.613 | ||

Alpha | 0.05 | 0.05 | 0.05 | ||

Sen's slope | 2.100 | 3.954 | 0.000 | ||

Risk for rejecting null hypothesis Ho | 91.75% | 93.94% | 61.26% | ||

(c) . | |||||

Test . | Statistics . | P . | V . | D . | Overall conclusion . |

Pettitt | K | 138.000 | 140 | 128 | |

T | 4 | 8 | 34 | ||

p-value (two-tailed) | 0.715 | 0.744 | 0.555 | Homogenous | |

Confidence interval@99% on p-value | ]0.704, 0.727 [ | ] 0.591, 0.616 [ | ] 0.542, 0.568 [ | ||

SNHT | T0 | 3.614 | 2.992 | 2.504 | |

T | 13 | 6 | 34 | ||

p-value (two-tailed) | 0.501 | 0.603 | 0.697 | Homogenous | |

Confidence interval @99% on p-value | ]0.488, 0.513 [ | 4.051 | ] 0.685, 0.708 [ | ||

Buishand's | Q | 5.956 | 4.015 | 5.273 | |

T | 13 | 6 | 34 | Homogenous | |

p-value (two-tailed) | 0.363 | 0.817 | 0.519 | ||

Confidence interval @99% on p-value | ] 0.351, 0.376 [ | ] 807, 0.827 [ | ] 0.506, 532 [ | ||

Von Nuemann's | N | 2.080 | 2.015 | 2.441 | |

p-value (two-tailed) | 0.592 | 0.501 | 0.970 | Homogenous | |

Confidence interval @99% on p-value | ] 0.580, 0.605 [ | ] 0.488, 0.513 [ | ] 0.965, 974 [ |

(a) . | |||||
---|---|---|---|---|---|

Flood vectors . | . | Lag 20 . | Lag 10 . | Lag 5 . | . |

P | p-value | 0.78 | 0.466 | 0.43 | |

Q- statistics | 14.9 | 9.719 | 4.89 | ||

V | p-value | 0.92 | 0.678 | 0.99 | |

Q- statistics | 12 | 7.497 | 0.5 | ||

D | p-value | 0.73 | 0.801 | 0.69 | |

Q- statistics | 15.7 | 6.171 | 3.04 | ||

Note: | Critical value | 31.4104 | 18.307 | 11.070 | |

(b) . | |||||

Series/test . | P . | V . | D . | . | . |

Kendall's tau | 0.007 | 0.006533 | −0.041 | ||

S | 9.000 | 8 | −49.000 | ||

Var (S) | 7,541.387 | 8,474.018 | 8,985.326 | ||

p-value (two-tailed) | 0.917 | 0.939386 | 0.613 | ||

Alpha | 0.05 | 0.05 | 0.05 | ||

Sen's slope | 2.100 | 3.954 | 0.000 | ||

Risk for rejecting null hypothesis Ho | 91.75% | 93.94% | 61.26% | ||

(c) . | |||||

Test . | Statistics . | P . | V . | D . | Overall conclusion . |

Pettitt | K | 138.000 | 140 | 128 | |

T | 4 | 8 | 34 | ||

p-value (two-tailed) | 0.715 | 0.744 | 0.555 | Homogenous | |

Confidence interval@99% on p-value | ]0.704, 0.727 [ | ] 0.591, 0.616 [ | ] 0.542, 0.568 [ | ||

SNHT | T0 | 3.614 | 2.992 | 2.504 | |

T | 13 | 6 | 34 | ||

p-value (two-tailed) | 0.501 | 0.603 | 0.697 | Homogenous | |

Confidence interval @99% on p-value | ]0.488, 0.513 [ | 4.051 | ] 0.685, 0.708 [ | ||

Buishand's | Q | 5.956 | 4.015 | 5.273 | |

T | 13 | 6 | 34 | Homogenous | |

p-value (two-tailed) | 0.363 | 0.817 | 0.519 | ||

Confidence interval @99% on p-value | ] 0.351, 0.376 [ | ] 807, 0.827 [ | ] 0.506, 532 [ | ||

Von Nuemann's | N | 2.080 | 2.015 | 2.441 | |

p-value (two-tailed) | 0.592 | 0.501 | 0.970 | Homogenous | |

Confidence interval @99% on p-value | ] 0.580, 0.605 [ | ] 0.488, 0.513 [ | ] 0.965, 974 [ |

*Note*: *p*-values are computed using 10,000 Monte Carlo simulations.

### Nonparametric estimations

Table 1 identified some standard univariate kernel functions, which are adopted when defining a best-fit flood marginal distribution. The bandwidth of the candidate kernel functions was estimated using the optimal bandwidth algorithm given in Equation (6) (Azzalini 1981; Silverman 1986). According to Kim *et al.* (2006), the nonparametric density approximations do not facilitate a closed form of the PDF and CDF, thus CDFs were estimated through an empirical procedure that is based on numerical integration (Kim & Heo 2002).

Some frequently applied parametric family functions such as the Log Pearson type III distribution (Bobee 1974), the Lognormal-2P function (Yue 2000), the Weibull-3P distribution (Heo *et al.* 2001), the Johnson SB-4P distribution (Keshtkaran & Torabihaghighi 2011), the Gamma-3P (Xu *et al.* 2015) and the Inverse Gaussian-3P functions (Daneshkhan *et al.* 2016) were also tested and compared to the nonparametric estimates. The vectors of unknown statistical parameters or model parameters were estimated using the Maximum likelihood estimators (MLE) and Methods of Moments estimators (MOM) and their estimated values are listed in Table 5.

Parametric functions . | Flood peak (P) . | Volume (V) . | Durations (D) . |
---|---|---|---|

Log-Pearson‐3P | a = 663.54, b = −0.02887, g = 27.608 | a = 1,781.0, b = −0.01771, g = 41.234 | a = 14.523, b = 0.12506, g = 1.0099 |

Lognormal‐2P | s = 0.7362, m = 8.4513 | s = 0.74093, m = 9.594 | s = 0.4717, m = 2.826 |

Weibull‐3P | a = 1.1175, b = 5,389.8, g = 899.42 | a = 1.0689, b = 16,369.0, g = 3,155.6 | a = 1.1951, b = 12.878, g = 6.9279 |

Johnson SB‐4P | g = 1.5161, d = 0.74495, l = 27,319.0, x = 1,304.2 | g = 2.2027, d = 1.0357, l = 1.3052E + 5, x = 961.8 | g = 2.5314, d = 0.92215, l = 118.81, x = 8.2791 |

Gamma‐3P | a = 1.2106, b = 4,290.0, g = 884.47 | a = 1.0848, b = 14,723.0, g = 3,150.8 | a = 1.4696, b = 8.3319, g = 6.7958 |

Inverse Gaussian-3P | l = 10,556.0, m = 6,320.9, g = −242.85 | l = 26,884.0, m = 19,086.0, g = 36.267 | l = 28.913, m = 14.81, g = 4.2297 |

Parametric functions . | Flood peak (P) . | Volume (V) . | Durations (D) . |
---|---|---|---|

Log-Pearson‐3P | a = 663.54, b = −0.02887, g = 27.608 | a = 1,781.0, b = −0.01771, g = 41.234 | a = 14.523, b = 0.12506, g = 1.0099 |

Lognormal‐2P | s = 0.7362, m = 8.4513 | s = 0.74093, m = 9.594 | s = 0.4717, m = 2.826 |

Weibull‐3P | a = 1.1175, b = 5,389.8, g = 899.42 | a = 1.0689, b = 16,369.0, g = 3,155.6 | a = 1.1951, b = 12.878, g = 6.9279 |

Johnson SB‐4P | g = 1.5161, d = 0.74495, l = 27,319.0, x = 1,304.2 | g = 2.2027, d = 1.0357, l = 1.3052E + 5, x = 961.8 | g = 2.5314, d = 0.92215, l = 118.81, x = 8.2791 |

Gamma‐3P | a = 1.2106, b = 4,290.0, g = 884.47 | a = 1.0848, b = 14,723.0, g = 3,150.8 | a = 1.4696, b = 8.3319, g = 6.7958 |

Inverse Gaussian-3P | l = 10,556.0, m = 6,320.9, g = −242.85 | l = 26,884.0, m = 19,086.0, g = 36.267 | l = 28.913, m = 14.81, g = 4.2297 |

### Goodness-of-fit test

*N*observations when the data are arranged in ascending order. Several fitness test statistics are incorporated such as using error indices statistics called the Mean Square Error (MSE) and the Root Mean Square Error (RMSE) (Moriasi

*et al.*2007), the Kullback-Leibler information measures (Kullback & Leibler 1951), statistics called the Akaike information criteria (AIC) (Akaike 1974), Schwartz's Bayesian information criteria (BIC) (Schwarz 1978) and the Hannan-Quinn criteria (HQC) (Hannan & Quinn 1979; Burnham & Anderson 2002). The lowest value of RMSE, MSE, AIC, BIC and the HQC statistics indicate the best fit. The AIC statistics include the lack of the fit of the model on one hand and the unreliability of the model due to the number of model parameters on the other hand (Zhang & Singh 2006). Therefore, maximizing the likelihood of fitted distributions or in the context of the maximized value of the likelihood functions, it can be mathematically estimated as:

Table 6 presents the estimated values of MSE, RMSE, AIC, BIC and the HQC statistics of all the nonparametric kernel distribution functions fitted to the flood vectors. It was found that the Triweight kernel outperformed other functions as it gave the lowest values of the fitness test statistics for all three flood distribution series; that is, MSE (0.00022), RMSE (0.01483), AIC (−419.0760), BIC (−417.164) and HQC (−418.348) for peak flow, MSE (0.00016), RMSE (0.01287), AIC (−433.27), BIC (−431.36) and HQC (−432.55) for flood volume series and MSE (0.00048), RMSE (0.02208), AIC (−379.27), BIC (−377.26) and HQC (−378.54) for flood duration. The performance of the Biweight and Triangular functions was also more effective than the targeted parametric functions. While the Epanechnikov kernel was less effective than the other candidate kernel functions, it was still better than the parametric functions as revealed in Table 6. Based on analytically based fitness measures, it was concluded that it is likely that Triweight kernel function is the best-fitted distribution for defining the marginal distribution of peak flows, flood volumes and flood durations in the Kelantan River basin.

Flood vector . | F(X) . | Error indices statistics . | Information criteria statistics . | |||
---|---|---|---|---|---|---|

MSE (or mean square error) . | RMSE (or root mean square error) . | AIC (or Akaike information criteria) . | BIC (or Bayesian information criteria) . | HQC (or Hannan-Quinn information criteria) . | ||

P | Epanechnikov | 0.00038 | 0.01957 | −391.37 | −389.45 | −390.64 |

Bi-weight or quartic | 0.00026 | 0.01620 | −410.25 | −408.34 | −409.52 | |

Triweight | 0.00022 | 0.01483 | −419.07 | −417.16 | −418.34 | |

Triangular | 0.00028 | 0.01686 | −406.26 | −404.35 | −405.54 | |

Cosine | 0.00032 | 0.01800 | −399.98 | −398.07 | −399.25 | |

LogPearson‐3P | 0.00045 | 0.02127 | −379.03 | −373.30 | −376.85 | |

Lognormal‐2P | 0.00046 | 0.02163 | −379.34 | −375.52 | −377.89 | |

Weibull‐3P | 0.01612 | 0.12699 | −200.89 | −194.62 | −200.90 | |

Johnson SB‐4P | 0.00093 | 0.03053 | −340.89 | −333.25 | −337.99 | |

Gamma‐3P | 0.01173 | 0.10828 | −216.30 | −210.56 | −214.12 | |

Inverse Gaussia‐3P | 0.01636 | 0.12792 | −199.63 | −193.89 | −197.45 | |

V | Epanechnikov | 0.00093 | 0.03060 | −346.66 | −344.75 | −345.93 |

Bi-weight or quartic | 0.00018 | 0.01350 | −428.44 | −426.53 | −427.71 | |

Triweight | 0.00016 | 0.01287 | −433.27 | −431.36 | −432.55 | |

Triangular | 0.00020 | 0.01426 | −423.01 | −421.10 | −422.29 | |

Cosine | 0.00022 | 0.01514 | −417.02 | −415.11 | −416.30 | |

Log-Pearson‐3P | 0.00048 | 0.02207 | −375.31 | −369.58 | −373.13 | |

Lognormal‐2P | 0.00055 | 0.02351 | −371.02 | −367.20 | −369.57 | |

Weibull‐3P | 0.00047 | 0.02182 | −376.47 | −370.74 | −374.29 | |

Johnson SB‐4P | 0.00041 | 0.02027 | −381.82 | −374.17 | −378.91 | |

Gamma‐3P | 0.01327 | 0.11520 | −210.10 | −204.37 | −207.92 | |

Inverse Gaussian‐3P | 0.00077 | 0.02783 | −352.16 | −346.42 | −349.98 | |

D | Epanechnikov | 0.00059 | 0.02430 | −369.69 | −367.77 | −368.96 |

Bi-weight or quartic | 0.00051 | 0.02265 | −376.71 | −374.80 | −375.99 | |

Triweight | 0.00048 | 0.02208 | −379.27 | −377.36 | −378.54 | |

Triangular | 0.00055 | 0.02357 | −372.74 | −370.83 | −372.01 | |

Cosine | 0.00062 | 0.02496 | −367.03 | −365.12 | −366.30 | |

Log-Pearson‐3P | 0.00100 | 0.03171 | −339.08 | −333.34 | −336.89 | |

Lognormal‐2P | 0.00132 | 0.03635 | −327.46 | −323.63 | −326.00 | |

Weibull‐3P | 0.01619 | 0.12726 | −200.15 | −194.41 | −197.97 | |

Johnson‐4P | 0.00972 | 0.09861 | −223.65 | −216.00 | −220.74 | |

Gamma‐3P | 0.00091 | 0.03012 | −343.62 | −337.88 | −341.43 | |

Inverse Gaussian‐3P | 0.00091 | 0.03027 | −343.74 | −338.00 | −341.55 |

Flood vector . | F(X) . | Error indices statistics . | Information criteria statistics . | |||
---|---|---|---|---|---|---|

MSE (or mean square error) . | RMSE (or root mean square error) . | AIC (or Akaike information criteria) . | BIC (or Bayesian information criteria) . | HQC (or Hannan-Quinn information criteria) . | ||

P | Epanechnikov | 0.00038 | 0.01957 | −391.37 | −389.45 | −390.64 |

Bi-weight or quartic | 0.00026 | 0.01620 | −410.25 | −408.34 | −409.52 | |

Triweight | 0.00022 | 0.01483 | −419.07 | −417.16 | −418.34 | |

Triangular | 0.00028 | 0.01686 | −406.26 | −404.35 | −405.54 | |

Cosine | 0.00032 | 0.01800 | −399.98 | −398.07 | −399.25 | |

LogPearson‐3P | 0.00045 | 0.02127 | −379.03 | −373.30 | −376.85 | |

Lognormal‐2P | 0.00046 | 0.02163 | −379.34 | −375.52 | −377.89 | |

Weibull‐3P | 0.01612 | 0.12699 | −200.89 | −194.62 | −200.90 | |

Johnson SB‐4P | 0.00093 | 0.03053 | −340.89 | −333.25 | −337.99 | |

Gamma‐3P | 0.01173 | 0.10828 | −216.30 | −210.56 | −214.12 | |

Inverse Gaussia‐3P | 0.01636 | 0.12792 | −199.63 | −193.89 | −197.45 | |

V | Epanechnikov | 0.00093 | 0.03060 | −346.66 | −344.75 | −345.93 |

Bi-weight or quartic | 0.00018 | 0.01350 | −428.44 | −426.53 | −427.71 | |

Triweight | 0.00016 | 0.01287 | −433.27 | −431.36 | −432.55 | |

Triangular | 0.00020 | 0.01426 | −423.01 | −421.10 | −422.29 | |

Cosine | 0.00022 | 0.01514 | −417.02 | −415.11 | −416.30 | |

Log-Pearson‐3P | 0.00048 | 0.02207 | −375.31 | −369.58 | −373.13 | |

Lognormal‐2P | 0.00055 | 0.02351 | −371.02 | −367.20 | −369.57 | |

Weibull‐3P | 0.00047 | 0.02182 | −376.47 | −370.74 | −374.29 | |

Johnson SB‐4P | 0.00041 | 0.02027 | −381.82 | −374.17 | −378.91 | |

Gamma‐3P | 0.01327 | 0.11520 | −210.10 | −204.37 | −207.92 | |

Inverse Gaussian‐3P | 0.00077 | 0.02783 | −352.16 | −346.42 | −349.98 | |

D | Epanechnikov | 0.00059 | 0.02430 | −369.69 | −367.77 | −368.96 |

Bi-weight or quartic | 0.00051 | 0.02265 | −376.71 | −374.80 | −375.99 | |

Triweight | 0.00048 | 0.02208 | −379.27 | −377.36 | −378.54 | |

Triangular | 0.00055 | 0.02357 | −372.74 | −370.83 | −372.01 | |

Cosine | 0.00062 | 0.02496 | −367.03 | −365.12 | −366.30 | |

Log-Pearson‐3P | 0.00100 | 0.03171 | −339.08 | −333.34 | −336.89 | |

Lognormal‐2P | 0.00132 | 0.03635 | −327.46 | −323.63 | −326.00 | |

Weibull‐3P | 0.01619 | 0.12726 | −200.15 | −194.41 | −197.97 | |

Johnson‐4P | 0.00972 | 0.09861 | −223.65 | −216.00 | −220.74 | |

Gamma‐3P | 0.00091 | 0.03012 | −343.62 | −337.88 | −341.43 | |

Inverse Gaussian‐3P | 0.00091 | 0.03027 | −343.74 | −338.00 | −341.55 |

A qualitative approach based on a graphically based visual inspection was also conducted for each flood vector for the probability density plot, the cumulative density plot, the probability- probability (or p-p) plot, the K-S test comparison cumulative fraction plot and the K-S test comparison percentile plot as illustrated in Figures 5 and 6(a)–6(c). It is noted that the Kolmogorov-Smirnov test (K-S) is a nonparametric distribution-free test that seeks to investigate the largest vertical gap between cumulative empirical and theoretical probabilities and also has the advantage of not assuming the distribution of data (Xu *et al.* 2015).

It was concluded that these plots clearly indicate the effectiveness of a nonparametric kernel structure and support the adoption of a Triweight kernel function for defining univariate flood marginal distributions.

## CONCLUSIONS

Floods are becoming the most challenging hydrologic issue in the Kelantan River basin in Malaysia, and particularly during the period of wet monsoons. All three flood characteristics; that is, peak flow, flood volume and flood duration, are important when formulating actions and measures to manage flood risk. Therefore, estimating the multivariate designs and their associated return periods is an essential element of making informed risk-based decisions in this river basin.

In this paper, the efficacy of a kernel density estimator is tested by assessing the adequacy of an interactive set of kernel functions for capturing the flood marginal density of 50 years (from 1961 to 2016) of daily stream flow data collected at Gulliemard Bridge gauge station in the Kelantan River basin.

Tests for stationarity or existence of serial correlation (or autocorrelation) within the flood series is often a pre-requisite before introducing the random samples into a univariate or a multivariate framework. It was found that homogeneity existed within the flood vector series. It was concluded therefore that the time series of the flood vectors do not exhibit any significant trend.

Based on analytically based fitness measures, it was concluded that it is likely that Triweight kernel function is the best-fitted distribution for defining the marginal distribution of peak flows, flood volumes and flood durations in the Kelantan River basin.

## ACKNOWLEDGEMENTS

Special thanks are extended to the Drainage and Irrigation Department (DID), Malaysia, for supplying streamflow data of the Kelantan river basin.

## REFERENCES

*Jabatan Matematik, UTM*

Hydrologic Frequency Modeling. Springer, Dordrecht, The Netherlands, pp. 97–106