Abstract

The analysis of precipitation data is extremely important for strategic planning and decision-making in various natural systems, as well as in planning and preparing for a drought period. The drought is responsible for several impacts on the economy of Northeast Brazil (NEB), mainly in the agricultural and livestock sectors. This study analyzed the fit of 2-parameter distributions gamma (GAM), log-normal (LNORM), Weibull (WEI), generalized Pareto (GP), Gumbel (GUM) and normal (NORM) to monthly precipitation data from 293 rainfall stations across NEB, in the period 1988–2017. The maximum likelihood (ML) method was used to estimate the parameters to fit the models and the selection of the model was based on a modification of the Shapiro-Wilk statistic. The results showed the chosen 2-parameter distributions to be flexible enough to describe the studied monthly precipitation data. The GAM and WEI models showed the overall best fits, but the LNORM and GP models gave the best fits in certain months of the year and regions that differed from the others in terms of their average precipitation.

HIGHLIGHTS

  • Real monthly precipitation data from 293 rainfall stations in Northeastern Brazil.

  • The selection of the model was based on a modification of the Shapiro-Wilk statistic.

  • The gamma and Weibull distributions showed the best fits compared to the others.

INTRODUCTION

The analysis of precipitation data is extremely important for strategic planning and decision-making in various natural and socio-economic systems, such as agricultural planning (Stern & Coe 1982; Hussain et al. 2010), civil engineering (Bjureland et al. 2019), hydrology (Bedient et al. 2008; Langat et al. 2019; Brendel et al. 2020), water resources management (Jain & Singh 2003), among others. The search for the probability distribution that best fits the precipitation data, for the highlighted systems, was the objective of several studies (Önöz & Bayazit 1995; Olofintoye et al. 2009; Alam et al. 2018) because by using the identified distribution, it is possible to predict future events, such as the probability of rain occurring in a given region (Şen & Eljadid 1999).

Low precipitation is one of the natural factors responsible for drought (Wilhite 2000). According to Mcglade et al. (2019), among the weather-related natural hazards, drought is probably the most complex and severe natural disaster due to its intrinsic nature and wide-ranging and cascading impacts. Several definitions of drought can be found in the literature; however, in general, drought is defined as a deficiency of precipitation over a long period, resulting in a water shortage for some activities (Eslamian & Eslamian 2017). Drought is a natural phenomenon that develops slowly, unlike tornadoes and hurricanes, which are immediately detectable. The difficulty of immediately detecting the beginning of a drought period can generate devastating costs for society in different sectors such as economy, agriculture, ecosystem, and infrastructure (WMO 2015; Marengo et al. 2017; Alvala et al. 2019).

Brazil is a country characterized by having long periods of rain, as well as long periods of drought, in different regions at the same time (Marengo et al. 2013). Covering about 47.3% of South America, Brazil is influenced by natural phenomena that directly influence the climate. The Northeast Region of Brazil (NEB) presents scarce rainfall and is characterized by long periods of drought (Palharini & Vila 2017). This region has a territory of 1,554,000 km² and a population of approximately 56 million inhabitants. The sectors that drive the economy are agriculture, livestock, industry and tourism (IBGE 2020).

Most states in the NEB have a semi-arid climate and high, seasonal and interannual, rainfall variability with extreme episodes of humidity and drought. Over the years, records of extreme drought have been reported in NEB (Marengo et al. 2017). These phenomena caused several impacts, mainly in the agriculture and livestock sectors. In the years 2012 and 2013, Brazil recorded losses of approximately US$ 1.6 billion for the economic sector and US$ 1.5 billion for livestock mortality, due to drought (Brito et al. 2018).

All over the world, many scientists have been investigating changes in patterns and amounts of precipitation, such as Brazil (Rao et al. 1986; Coêlho et al. 2004; Gutiérrez et al. 2014), Nigeria (Olofintoye et al. 2009), Bangladesh (Ghosh et al. 2016; Alam et al. 2018), United States (Ye et al. 2018), Burundi (Nkunzimana et al. 2019), Australia (Hasan et al. 2019), among others. Most studies use precipitation data to analyze the severity of drought, since this phenomenon directly interferes with drought (Cavalcanti & Kousky 2004; Coelho et al. 2016; Brito et al. 2018).

Precipitation analysis using probability distribution models has already been investigated by several researchers in Brazil. Vieira et al. (2018) evaluated the fit of four probability distributions (gamma, Weibull, log normal and normal) for precipitation data in the southwest region of Paraná and concluded that the gamma and Weibull models presented a better fit to the data. Eight probability distributions were applied for modeling the monthly rainfall data of the pluviometric station of Campo Grande, Mato Grosso do Sul (Ozonur et al. 2020). The goodness of fit tools indicated that although no distribution provides the best fit to the data for all months, the three-parameter lognormal distribution shows a generally better fit than the other distributions. Martins et al. (2020) applied both the Generalized Pareto Distribution (GP) and the Exponential Distribution (ED), in monthly rainfall data from the city of Uruguaiana, in the state of Rio Grande do Sul. The results show that the GP and ED fit the data in all months. Through the simulation study, they perceive that the GP is more suitable in September and November. However, in January, March, April and August the exponential model is more appropriate. Melo & Lima (2021) analyzed rainfall series obtained between 1910 and 2016 in 11 localities in the region of Catolé do Rocha, State of Paraíba and concluded that the logistic distribution adequately represents the rainfall in the region.

Although several studies in different parts of Brazil have assessed precipitation data using probability distributions, few have analyzed a large area like the Northeast region. In this scenario, this study aimed to identify the probability models that best fit the precipitation data from the Northeast region of Brazil, in order to contribute to a better understanding of rainfall patterns in the region. Monthly rainfall data from 293 rainfall stations spread across the NEB for 30 years (1988–2017) were analyzed.

METHODS

Study area and rainfall data

The Northeast Brazil (NEB) region is located between the parallels of 01° 02′ 30″ north latitude and 18° 20′ 07″ south latitude and between the meridians of 34° 47′ 30″ and 48° 45′ 24″ west of Greenwich Mean Time. It is the Brazilian region that has the largest number of states (nine in total), Maranhão (MA), Piauí (PI), Ceará (CE), Rio Grande do Norte (RN), Paraíba (PB), Pernambuco (PE), Alagoas (AL), Sergipe (SE), Bahia (BA) and a territorial extension of 1,554 km2. To the north and east it is bordered by the Atlantic Ocean; to the south with the states of Minas Gerais (MG) and Espírito Santo (ES) and the west with the states of Pará (PA), Tocantins (TO), and Goiás (GO). The NEB has an estimated population of approximately 56 million people and the economic sectors that stand out are agriculture, livestock, industry, and tourism (IBGE 2020). A graphical representation of NEB is given in Figure 1.

Figure 1

Spatial distribution of rainfall stations under study.

Figure 1

Spatial distribution of rainfall stations under study.

This study was conducted with data from the online platform of the National Water Agency (Agência Nacional de Águas – ANA) – Brazil. This data was collected in 293 rainfall stations spread across the NEB and consists of a historical series of monthly precipitation from 1988 to 2017 (30 years). The location of the stations can be seen in Figure 1.

Rain season

The NEB rainy season can be divided into different periods. From Figure 2, it can be seen that from January to May the occurrence of rain is greater in the northern portion of the NEB, affecting mainly the states of Maranhão and Piauí, especially in March and April. This behavior may be related to the squall lines and the Intertropical Convergence Zone (ITCZ) (Palharini & Vila 2017).

Figure 2

Monthly mean precipitation in NEB in the period 1988–2017.

Figure 2

Monthly mean precipitation in NEB in the period 1988–2017.

From June onwards, it is possible to observe a significant decrease in average precipitation in almost the entire NEB. The rainfall pattern changes almost completely compared to previous months, with the highest rainfall on the NEB coast. According to Kousky (1979), the rainy season on the NEB coast from April to July is mainly due to the circulation of the sea breeze and cold fronts that occur along the coast (Figure 2).

An increase of average precipitation in November and December is observed towards the southwest of the Northeast region of Brazil, affecting mainly the states of Bahia, southern Piauí, and southern Maranhão. The increase in rainfall may be related to the fact that the South Atlantic Convergence Zone shifts to the east in December (Carvalho & Jones 2009) (Figure 2).

Probability distributions

The procedure for the correct choice of the statistical model that best fits the data is given in three steps. The first is the choice of the model to be tested. In this step, graphs like histograms and dot plots are extremely useful to visualize the shape of the data. Once the model has been chosen, the second step is the estimation of the different parameters of the chosen model. In this step, it is necessary to use a parameter estimation method. In the third step, the quality of the model fitted to the data can be assessed, that is, how well it fits the observed data.

In the last 10 years, several authors have presented different model choices to represent monthly precipitation data (Li et al. 2013; Ashkar & Ba 2017; Sukrutha et al. 2018; Ye et al. 2018; Hasan et al. 2019; Nkunzimana et al. 2019; Salman et al. 2019; Mehdizadeh 2020); these models were listed in Table 1. Among the models, the 2-parameter models presented a better fit and for this reason, we selected the ones which presented the best results to be used in this study, they are Gamma (GAM), Gumbel (GUM), Normal (NORM), Log-normal (LNORM), Generalized Pareto (GP) and Weibull (WEI).

Table 1

The most used probability distributions to fit monthly rainfall data for the past 10 years

DistributionProbability Density Function (PDF)Cumulative Distribution Function (CDF)SupportParameters
Normal    
 
Log-Normal    
 
Weibull    
 
Generalized Extreme Value (GEV)   

 


 
Log-Pearson type III (a(a) (b
 


(c
Gumbel    
 
Gamma   (b 
 
Generalized Pareto (GP)   
 

 
DistributionProbability Density Function (PDF)Cumulative Distribution Function (CDF)SupportParameters
Normal    
 
Log-Normal    
 
Weibull    
 
Generalized Extreme Value (GEV)   

 


 
Log-Pearson type III (a(a) (b
 


(c
Gumbel    
 
Gamma   (b 
 
Generalized Pareto (GP)   
 

 

a is the gamma function (Artin 2015).

b is the lower incomplete gamma function (Gautschi 1959).

cThese are shape, scale and location parameters of .

Estimation of parameters by maximum likelihood

In statistical modeling, all or most parameters of the probability distributions are unknown, so there is a need to estimate them from the data. Among the several methods of parameter estimation that were developed (moments, L-moments, maximum likelihood, among others). Many studies have focused on identifying the method, or methods, that are more efficient in estimating parameters for certain probability distributions. However, for most probability distributions and all sample sizes, it is rarely possible to identify a method that stands out as the best (Ashkar & Ba 2017). For this reason, in the present study, the maximum likelihood estimation method was chosen to estimate the parameters, since it performs well in parameter estimation for several probability distributions (Myung 2003; Fienberg & Rinaldo 2012).

The Maximum Likelihood (ML) method is the most widely used method for estimating parameters, as it presents a good performance for different probability distributions (Zong 2006; Ashkar & Tatsambon 2007; Park et al. 2009). In general, given a set of data and a statistical model, ML estimates the values of the parameters of a statistical model to maximize the probability of the observed data (that is, it seeks parameter values that maximize the likelihood function).

Let X1,…, Xn be a random sample of size n of the random variable X with density function (or probability function) with θΘ, where Θ is the parameter space. The likelihood function of θ corresponding to the observed random sample is given by:
formula
(1)
The ML estimator of is the value Θ that maximizes the likelihood function (Bolfarine & Sandoval 2001). In practice, the logarithm of is generally used.
formula
(2)

The ML estimates are those that maximize and is called ML estimator (MLE) of . The ML estimation method has some essential properties for its application that will not be presented because they are outside the scope of this study; more details about the method can be seen in Zong (2006), Bolfarine & Sandoval (2001).

TN.SW statistic

The Akaike information criterion (AIC) and the Anderson-Darling (AD) statistics are probably the most used methods in the literature to discriminate between probability distributions. However, these methods are biased, especially when used in small sample sizes (Ashkar & Ba 2017; Ashkar et al. 2019), which is the reality of most studies in the field of hydrology. Ashkar et al. (1997) developed the TN.SW statistic, which is a modified version of the famous Shapiro-Wilk test statistic (Shapiro & Wilk 1965).

The TN.SW statistic is much less biased than the AIC and AD methods and offers a clear advantage over Regularized Maximum Likelihood (RML)-based methods as it does not favor the selection of one model over the other, especially for small sample sizes (Ashkar & Ba 2017; Ashkar et al. 2019). Another advantage of using TN.SW statistic is the easy implementation of the method, as described by Ashkar & Ba (2017). The calculation of the statistic consists of just two steps:

  • 1.
    Let be the quantile function of the standard normal distribution and assume model A with cumulative distribution function to be the true distribution for the sample . If were known, the following transformation would exactly transform to a standard normal sample :
    formula
    (3)

However, since the model A parameters are actually unknown, one uses to estimate , where is obtained, for example, by ML. So Equation (3) becomes:
formula
(4)
where the are now only approximately normally distributed, i.e.
  • 2.
    Use the transformed sample obtained in Equation (4) to calculate the required TN.SW statistic:
    formula
    (5)
where is the ith order statistic of , , and are coefficients approximated by a method proposed by Royston (1982a, 1982b, 1995). Repeat steps (4) and (5) by replacing Model A by Model B, to obtain . The decision rule is to choose Model A as the correct model if > , and to choose Model B otherwise.

Given the advantages offered by TN.SW statistic, this method was chosen to be used in this study.

RESULTS AND DISCUSSIONS

In this section, the results obtained from the fitting of the six probability distributions selected in this study will be presented. The objective is to show which distribution provides the most appropriate fit for each month of the year, using monthly precipitation data, extracted from 293 rainfall stations spread around the Northeast Brazilian region from 1988 to 2017. The ML parameters estimates for each statistical distribution, as well as the calculation of the TN.SW statistic were obtained using software R.

To assess the quality of the fit of the distributions mentioned before, it is necessary to assess the p-value of the TN.SW statistic, since the TN.SW statistic itself allows comparison of the distributions' fits but does not indicate whether a fit is ‘good’ or ‘bad’ (Ashkar & Aucoin 2012). Once the p-values are calculated, the analysis proceeds by observing those with the highest p-values, as these present the best fit to the data. The higher the p-value produced by a model, the better is its fit to the data. And low p-values indicate an inadequate fit. Figure 3 presents the boxplots for the p-values obtained from the fitting of the distributions under study.

Figure 3

Box plots of the p-value of the TN.SW statistic. Tvmax represents the highest p-value obtained among all the considered distributions.

Figure 3

Box plots of the p-value of the TN.SW statistic. Tvmax represents the highest p-value obtained among all the considered distributions.

In Figure 3, it can be seen that the majority of p-values were above 0.10. This means that, at a significance level of α = 0.05, for example, most of the models were not rejected. Globally, the gamma and Weibull models showed the best fits compared to the others but the log normal model gave a comparable fit during a certain period of the year, particularly for January and June to October. The good performance of the gamma distribution to fit precipitation data has already been identified in other regions of Brazil. Araújo et al. (2001) evaluated monthly precipitation data in Boa Vista, Roraima (Northern Brazil), and observed the gamma distribution presented best fits to the data for almost all months of the year. Ribeiro et al. (2007) carried out a similar study in Barbacena, Minas Gerais (Southeast Brazil) and also identified the gamma distribution as the one that presented the best fit to monthly precipitation data in that region.

The GP model gave a comparable fit from September to January. This classification is based on the medians of the p-values (Figures 3 and 4). For all months of the year, the Weibull and gamma distributions, despite having high variability, presented a first quartile (Q1) above 0.125 (Figure 4); this means that less than 25% of each of these models were rejected at a significance level of α = 0.125 and a significantly lower percentage was rejected at α = 0.05.

Figure 4

Box plots of the p-value of the TN.SW statistic, by the probability distribution.

Figure 4

Box plots of the p-value of the TN.SW statistic, by the probability distribution.

The log normal model does not appear to be as interesting as the gamma and Weibull models when only Figures 3 and 4 are analyzed, but by looking at Figure 5 it is possible to identify six months in which this model had the highest percentage of best fit; this is, for January and June to October, the log normal model fitted better than the others. From Figure 4 it is possible to observe that from July to October the medians of the p-values were greater than 0.25, showing that less than 50% of the time this model was rejected at a significance level α = 0.25. A comparison of gamma and lognormal distributions for characterizing satellite rain rates was performed by Cho et al. (2004) and it was observed the gamma fits outperform the lognormal fits in wet regions, whereas the lognormal fits are better than the gamma fits for dry regions.

Figure 5

Percentage of the distributions that showed the best fit to the data, selected by month.

Figure 5

Percentage of the distributions that showed the best fit to the data, selected by month.

As of September, and up to January, the GP model becomes more interesting, with a performance comparable to that of the Weibull and gamma models. In fact, from September to February, the medians of the GP p-values were greater than 0.25 (Figure 4), showing that less than 50% of the time this model was rejected at a significance level α = 0.25 and a significantly lower percentage was rejected at α = 0.05. Also, high variability was observed among the p-values. In agreement with these results, Martins et al. (2020) observed that the GP distribution showed good fits to rainfall data for the Uruguaiana region (Southern Brazil). Their results showed adequate GP distribution fits the data in all months. Through a simulation study, they remarked that the GP model was more suitable in September and November.

The Gumbel distribution showed low p-values. In all months, the median of the p-values of the TN.SW statistic was well below 0.50 (Figure 4), and well below 0.25 outside of the period March-May. In addition, for all months the first quartile (Q1) was well below 0.25, which means that more than 25% of the data fits were rejected at a significance level of α = 0.25.

The normal distribution, in contrast to those mentioned in the previous paragraphs, presented the lowest p-values in all months analyzed. In most months, the median values of this distribution were very close to zero, as in the months of January, October and December, indicating rejection at the significance level α = 0.05. The poor performance of the normal distribution for fitting rainfall data when compared to other distributions has already been observed by other researchers. When analyzing precipitation data from Bangladesh, Ghosh et al. (2016) observed among six probability distributions under study, the normal distribution showed less satisfactory results compared to the others. Similar results were also observed by Olofintoye et al. (2009) when comparing six probability distributions to rainfall data in some cities in Nigeria.

From Figure 5, it is possible to observe the percentage of distributions that showed the best fit to the data, selected by month. As in Figures 3 and 4, the distributions that showed good fitting potential were the gamma, Weibull, GP, and log normal. The log normal gave a good performance during the period of June to October. During the month of August, the log normal model gave the best fits compared to the other models in 34.8% of the data sets. The gamma, Weibull and GP models presented the best fits during different periods of the year. For February to May, the Weibull and gamma distributions fitted better to the data with February and April being the months in which the Weibull model gave the best fit. Finally, in November and December, the GP model outperformed the others, although in the two preceding months (September and October) it also performed relatively well. The Gumbel and Normal distributions did not outperform in any month.

Figure 6 was created to facilitate the spatial visualization of the selected models for each of the 293 stations, per month. From this figure, it is possible to see some results also observed in the analysis of Figures 35.

Figure 6

Selected distributions by month and by station.

Figure 6

Selected distributions by month and by station.

The selection of the GP distribution showed an interesting behavior. It is possible to observe that most of the stations, in which this model presented a better fit, are in a region with low average rainfall. In the months from February to April, there is a high concentration of GP distribution on the NEB coast. In addition, for September and October, the model was chosen more throughout the region, regardless of the area. This behavior can be justified, since the GP distribution is particularly suitable for fitting data with a thick right tail, and between February and April the NEB coast has low average rainfall, as well as September and October in almost the entire NEB.

The log normal and gamma models fitted well in any region of NEB and showed similar behavior in almost all months of the year. A highlight only for the log normal distribution in January and October, which was more selected at stations on the coast of the region. The state of Maranhão had the highest percentage of selection for the Weibull model, mainly from December to April. These months are characterized by a higher average rainfall.

The Gumbel and Normal models were seldom selected in almost every month analyzed. The normal model presented the highest percentage of selection in March (6.5%), fitting better to the data coming from the northern region of the NEB, mainly in the state of Maranhão. The Gumbel model was concentrated in the regions with the highest average rainfall for each month of the year. As one would expect, the presence of a location parameter in the Gumbel model gives it a fitting advantage when the average rainfall is high, in comparison to other models that have no location parameter. The location parameter of the Gumbel model provides it with a flexibility to better fit the high rainfall events. On the other hand, as would also be expected, the absence of a shape parameter in the Gumbel model (i.e. low shape flexibility) lowers its overall performance. In January and April, there is a concentration of the Gumbel model in the state of Maranhão. Between July and August, this concentration moves to the coast, and finally, it returns to concentrate in the north of the NEB in December.

As previously mentioned, some distributions showed similar behavior in the analysis of the p-values, as well as in the fitting models to the data. For this reason, Figures 7 and 8 were created. From these figures, we compare the p-values between the models analyzed, in addition to presenting a scatter plot showing the correlation between these variables (based on the Pearson method). These analyses were carried out for all months, however, their results were similar, so we use only the month of January as an example.

Figure 7

Comparison between the p-values of the TN.SW statistics and correlations, January (Gamma versus others).

Figure 7

Comparison between the p-values of the TN.SW statistics and correlations, January (Gamma versus others).

Figure 8

Comparison between the p-values of the TN.SW statistics and correlations, January (GP versus others).

Figure 8

Comparison between the p-values of the TN.SW statistics and correlations, January (GP versus others).

Figure 7 provides a comparison between the p-values of the gamma distribution and the other models. It is easy to see a possible positive linear relationship between the gamma and Weibull distribution (Figure 7(b)). The correlation analysis confirms that there is a strong and positive correlation (0.84) between these variables (Figure 7(f)).

When analyzing the p-values of the gamma distribution compared to the p-values of the GP, there is a moderate correlation between these variables. There is a moderately strong correlation in the upper right portion of the scatter plot, i.e. the area where both models give a good data fit. The correlation analysis indicates only a moderate linear relationship between the p-values of the gamma and GP models (correlation coefficient Corr = 0.44).

When comparing the p-values of the gamma model with the other models, it is noticed that the points are more scattered on the graph, which indicates that there is a low relationship between the p-values. The calculated correlation coefficients confirm this suspicion (gamma x lnorm: Corr = 0.01, gamma × Gumbel: Corr = 0.38, gamma x normal: Corr = 0.04).

Figure 8 provides a comparison between the p-values of the GP distribution and the other models. It is possible to observe a possible linear relationship between GP and gamma, as well as GP and Weibull (Figure 8(b) and 8(c)). Comparing the GP model with the other models, it is possible to notice random points in the graph (Figure 8(a), 8(d) and 8(e)), meaning that there is no linear relationship between the variables. The correlation chart shows a moderate linear relationship between the GP and gamma models (Corr = 0.44), as well as GP and Weibull (Corr = 0.53) (Figure 8(f)).

An analysis of the skewness of the data sets and its effect on the p-values of the TN.SW statistic for the various models, was also performed. From Figure 9, it can be seen that most of the observed skew coefficient(Cs) values (marked simply as ‘skewness’ on the horizontal axes) are between 0.5 and 3. For the gamma, log-normal, Weibull and GP models the highest p-values were when the skewness was around 1 and 2 (Figure 9(a)–9(d)), this means that these models fitted better to the data that had a medium positive skewness. The Gumbel model presented the highest p-values (Figure 9(e)) when the skewness was around 1 because the skew coefficient of the Gumbel distribution itself is close to 1 (). On the other hand, the normal distribution showed the highest concentration of high p-values (Figure 9(f)) when the skewness was close to 0; this behavior can be justified by the symmetrical shape that this distribution assumes ().

Figure 9

Comparison between the p-values of the TN.SW statistics and skewness, January.

Figure 9

Comparison between the p-values of the TN.SW statistics and skewness, January.

CONCLUSION

The 2-parameter distributions log normal, Weibull, gamma and Generalized Pareto proved to be flexible enough to describe monthly precipitation data. The results presented in this study allowed us to observe that:

  • Globally, the gamma and Weibull distributions showed the best fits compared to the others. These models fitted well to data throughout the NEB territory and showed similar behavior for almost every month, except for January and October. Although the gamma and Weibull models presented the overall best fits to the data, the log normal model gave a comparable fit during a certain period of the year, particularly for January and June to October.

  • The GP model was better fitted to data in regions with low average precipitation, mainly on the NEB coast between February and April. In general, the GP model becomes more interesting than the others from September to January, where the medians of the p-values were greater than 0.25, showing that less than 50% of the time this model was rejected at a significance level α = 0.25 and a significantly lower percentage was rejected at α = 0.05.

  • The Weibull model, unlike the GP, fitted better to data in regions with high average rainfall, mainly between December and April. This model was highly selected in the state of Maranhão from January to April. In general, the Weibull model was most selected in February, March and April.

  • As expected, the absence of a shape parameter in the Gumbel and normal models (that is, low shape flexibility) lowers their performance compared to the other models.

  • A 1-parameter model such as the exponential model would not be flexible enough to provide a good fit for the data.

  • In most cases, there seems to be no need to pick a 3-parameter model (such as the log-Pearson type III or GEV) to fit the data because 2-parameter ones such as those considered in this study seem to be flexible enough to fit the majority of the observed data in the NEB region.

In general, this study identified which probability distributions best fit monthly precipitation data for the Northeastern region of Brazil, identifying models best fit for each region in a given period of the year. These results will be useful for studies related to drought in the region, as well as in the analysis of precipitation.

ACKNOWLEDGEMENT

The authors acknowledge the support of Brazilian agency CAPES for granting a scholarship in the first year of the research and the National Water Agency (Agência Nacional de Águas – ANA) for providing the data used in the study.

DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories at https://www.snirh.gov.br/hidroweb/serieshistoricas.

REFERENCES

Alam
M. A.
,
Emura
K.
,
Farnham
C.
&
Yuan
J.
2018
Best-fit probability distributions and return periods for maximum monthly rainfall in Bangladesh
.
Climate
6
(
1
),
9
.
https://doi.org/10.3390/cli6010009
Alvala
R.
,
Cunha
A. P.
,
Brito
S. S.
,
Seluchi
M. E.
,
Marengo
J. A.
,
Moraes
O. L.
&
Carvalho
M. A.
2019
Drought monitoring in the Brazilian Semiarid region
.
Anais da Academia Brasileira de Ciências
91
.
https://doi.org/10.1590/0001-3765201720170209
Araújo
W. F.
,
Andrade Junior
A. S. D.
,
Medeiros
R. D. D.
&
Sampaio
R. A.
,
2001
Precipitação pluviométrica mensal provável em Boa Vista, estado de Roraima, Brasil (Probable monthly rainfall in Boa Vista, state of Roraima, Brazil)
.
Revista Brasileira de Engenharia Agrícola E Ambiental
5
,
3
, p.
563
567
.
https://doi.org/10.1590/S1415-43662001000300032
Artin
E.
2015
The Gamma Function
.
Courier Dover Publications
,
Belmont, Canada
.
Ashkar
F.
&
Aucoin
F.
2012
Choice between competitive pairs of frequency models for use in hydrology: a review and some new results
.
Hydrological Sciences Journal
57
(
6
),
1092
1106
.
https://doi.org/10.1080/02626667.2012.701746
Ashkar
F.
&
Ba
I.
2017
Selection between the generalized Pareto and kappa distributions in peaks-over-threshold hydrological frequency modelling
.
Hydrological Sciences Journal
62
(
7
),
1167
1180
.
https://doi.org/10.1080/02626667.2017.1302089
Ashkar
F.
&
Tatsambon
C. N.
2007
Revisiting some estimation methods for the generalized Pareto distribution
.
Journal of Hydrology
346
(
3–4
),
136
143
.
https://doi.org/10.1016/j.jhydrol.2007.09.007
Ashkar
F.
,
Arsenault
M.
&
Zoglat
A.
1997
On the discrimination between statistical distributions for hydrological frequency analysis
. In:
The 1997 Annual Conference of the Canadian Society for Civil Engineering. Part 3(of 7)
,
Sherbrooke, Can
,
05/27–30/97
, pp.
169
178
.
Ashkar
F.
,
Ba
I.
&
Dieng
B. B.
2019
Hydrological frequency analysis: some results on discriminating between the Gumbel or Weibull probability distributions and other competing models
. In:
World Environmental and Water Resources Congress 2019: Watershed Management, Irrigation and Drainage, and Water Resources Planning and Management
,
Reston, VA
,
American Society of Civil Engineers
. pp.
374
387
.
https://doi.org/10.1061/9780784482339.037
.
Bedient
P. B.
,
Huber
W. C.
&
Vieux
B. E.
2008
Hydrology and floodplain analysis (Vol. 816). Prentice Hall, Upper Saddle River, NJ
.
Bjureland
W.
,
Johansson
F.
,
Sjölander
A.
,
Spross
J.
&
Larsson
S.
2019
Probability distributions of shotcrete parameters for reliability-based analyses of rock tunnel support
.
Tunneling and Underground Space Technology
87
,
15
26
.
https://doi.org/10.1016/j.tust.2019.02.002
Bolfarine
H.
&
Sandoval
M. C.
2001
Introdução à inferência estatística (Introduction to Statistical Inference)
.
SBM
.
Brendel
C. E.
,
Dymond
R. L.
&
Aguilar
M. F.
2020
Integration of quantitative precipitation forecasts with real-time hydrology and hydraulics modeling towards probabilistic forecasting of urban flooding
.
Environmental Modelling & Software
134
,
104864
.
https://doi.org/10.1016/j.envsoft.2020.104864
Brito
S. S. B.
,
Cunha
A. P. M.
,
Cunningham
C. C.
,
Alvalá
R. C.
,
Marengo
J. A.
&
Carvalho
M. A.
2018
Frequency, duration and severity of drought in the Semiarid Northeast Brazil region
.
International Journal of Climatology
38
(
2
),
517
529
.
https://doi.org/10.1002/joc.5225
Carvalho
L. M. V.
,
Jones
C.
2009
Zona de convergência do atlântico sul (South Atlantic convergence zone)
. In:
Tempo E Clima no Brasil
, pp
95
109
(
Cavalcanti
I. F. A.
,
Ferreira
N. J.
,
Da Silva
M. G. A. J.
&
Silva Dias
M. A. F.
, eds).
Oficina de Textos
,
São Paulo
.
Cavalcanti
I. F. A.
&
Kousky
V. E.
2004
Drought in Brazil during summer and fall 2001 and associated atmospheric circulation features
.
Revista Climanálise Ano
2
(
01
).
Cho
H.
,
Bowman
K. P.
&
North
G. R.
2004
A comparison of gamma and lognormal distributions for characterizing satellite rain rates from the tropical rainfall measuring mission
.
Journal of Applied Meteorology and Climatology
43
(
11
),
1586
1597
.
https://doi.org/10.1175/JAM2165.1
Coêlho
A. E. L.
,
Adair
J. G.
&
Mocellin
J. S. P.
2004
Psychological responses to drought in northeastern Brazil
.
Revista Interamericana de Psicologia/Interamerican Journal of Psychology
38
(
1
).
https://doi.org/10.30849/rip/ijp.v38i1.845
Coelho
C. A. S.
,
Cardoso
D. H. F.
&
Firpo
M. A. F.
2016
Precipitation diagnostics of an exceptionally dry event in São Paulo, Brazil
.
Theoretical and Applied Climatology
125
(
3–4
),
769
784
.
doi:10.1007/s00704-015-1540-9
.
Eslamian
S.
&
Eslamian
F.
, (eds) (
2017
)
Handbook of Drought and Water Scarcity: Principles of Drought and Water Scarcity
, 1st edn. CRC Press.
https://doi.org/10.1201/9781315404219
.
Fienberg
S. E.
&
Rinaldo
A.
2012
Maximum likelihood estimation in log-linear models
.
The Annals of Statistics
40
(
2
),
996
1023
.
doi: 10.1214/12-AOS986
.
Gautschi
W.
1959
Some elementary inequalities relating to the gamma and incomplete gamma function
.
Journal of Mathematical Physics
38
(
1959/60
),
77
81
.
https://doi.org/10.1002/sapm195938177
Ghosh
S.
,
Roy
M. K.
&
Biswas
S. C.
2016
Determination of the best fit probability distribution for monthly rainfall data in Bangladesh
.
American Journal of Mathematics and Statistics
6
(
4
),
170
174
.
doi: 10.5923/j.ajms.20160604.05
.
Gutiérrez
A. P. A.
,
Engle
N. L.
,
De Nys
E.
,
Molejón
C.
&
Martins
E. S.
2014
Drought preparedness in Brazil
.
Weather and Climate Extremes
3
,
95
106
.
https://doi.org/10.1016/j.wace.2013.12.001
Hussain
Z.
,
Mahmood
Z.
&
Hayat
Y.
2010
Modeling the daily rainfall amounts of north-west Pakistan for agricultural planning
.
Sarhad Journal of Agriculture
27
(
2
),
313
321
.
IBGE – Instituto Brasileiro de Geografia e Estatística. Censo
2020
Retrieved from http://www.ibge.gov.br (accessed 05 July 2021)
Jain
S. K.
&
Singh
V. P.
2003
Water resources systems planning and management
.
Elsevier
, Amsterdam.
Kousky
V. E.
1979
Frontal influences on northeast Brazil
.
Monthly Weather Review
107
(
9
),
1140
1153
.
https://doi.org/10.1175/1520-0493(1979)107 < 1140:FIONB > 2.0.CO;2
Langat
P. K.
,
Kumar
L.
&
Koech
R.
2019
Identification of the most suitable probability distribution models for maximum, minimum, and mean streamflow
.
Water
11
(
4
),
734
.
https://doi.org/10.3390/w11040734
Li
Z.
,
Brissette
F.
&
Chen
J.
2013
Finding the most appropriate precipitation probability distribution for stochastic weather generation and hydrological modelling in Nordic watersheds
.
Hydrological Processes
27
(
25
),
3718
3729
.
https://doi.org/10.1002/hyp.9499
Marengo
J. A.
,
Alves
L. M.
,
Soares
W. R.
,
Rodriguez
D. A.
,
Camargo
H.
,
Riveros
M. P.
&
Pabló
A. D.
2013
Two contrasting severe seasonal extremes in tropical South America in 2012: flood in Amazonia and drought in northeast Brazil
.
Journal of Climate
26
(
22
),
9137
9154
.
https://doi.org/10.1175/JCLI-D-12-00642.1
Marengo
J. A.
,
Torres
R. R.
&
Alves
L.
2017
Drought in Northeast Brazil – past, present, and future
.
Theoretical and Applied Climatology
129
(
3–4
),
1189
1200
.
https://doi.org/10.1007/s00704-016-1840-8
Martins
A. L. A.
,
Liska
G. R.
,
Beijo
L. A.
,
de Menezes
F. S.
&
Cirillo
M. Â
, .
2020
Generalized Pareto distribution applied to the analysis of maximum rainfall events in Uruguaiana, RS, Brazil
.
SN Applied Sciences
2
.
https://doi.org/10.1007/s42452-020-03199-8
Mcglade
J.
,
Bankoff
G.
,
Abrahams
J.
,
Cooper-knock
S. J.
,
Cotecchia
F.
,
Desanker
P.
&
Hirsch
F.
2019
Global Assessment Report on Disaster Risk Reduction 2019
. .
Melo
V. D. S.
&
Lima
L. M.
2021
Rain characterization of Catolé do Rocha micro-region in the state of Paraíba based on applied statistics
.
Revista Brasileira de Meteorologia
36
,
97
106
.
https://doi.org/10.1590/0102-77863610006
Myung
I. J.
2003
Tutorial on maximum likelihood estimation
.
Journal of Mathematical Psychology
47
(
1
),
90
100
.
Nkunzimana
A.
,
Bi
S.
,
Jiang
T.
,
Wu
W.
&
Abro
M. I.
2019
Spatiotemporal variation of rainfall and occurrence of extreme events over Burundi during 1960 to 2010
.
Arabian Journal of Geosciences
12
(
5
),
1
22
.
https://doi.org/10.1007/s12517-019-4335-y
Olofintoye
O. O.
,
Sule
B. F.
&
Salami
A. W.
2009
Best–fit probability distribution model for peak daily rainfall of selected cities in Nigeria
.
New York Science Journal
2
(
3
),
1
12
.
Önöz
B.
&
Bayazit
M.
1995
Best-fit distributions of largest available flood samples
.
Journal of Hydrology
167
(
1–4
),
195
208
.
https://doi.org/10.1016/0022-1694(94)02633-M
Ozonur
D.
,
Pobocikova
I.
&
de Souza
A.
2020
Statistical analysis of monthly rainfall in Central West Brazil using probability distributions
.
Modeling Earth Systems and Environment
,
1
11
.
https://doi.org/10.1007/s40808-020-00954-z
Palharini
R. S. A.
&
Vila
D. A.
2017
Climatological behavior of precipitating clouds in the northeast region of Brazil
.
Advances in Meteorology
2017
.
https://doi.org/10.1155/2017/5916150
Park
J.
,
Seo
S.
&
Kim
T. Y.
2009
A kappa distribution with a hydrological application
.
Stochastic Environmental Research and Risk Assessment
23
(
5
),
579
586
.
https://doi.org/10.1007/s00477-008-0243-5
Rao
V. B.
,
Satyamurty
P.
&
Brito
J. I. O. B.
1986
On the 1983 drought in north-east Brazil
.
Journal of Climatology
6
(
1
),
43
51
.
https://doi.org/10.1002/joc.3370060105
Ribeiro
B. T.
,
Avanzi
J. C.
,
Mello
C. R. D.
,
Lima
J. M. D.
&
Silva
M. L. N.
2007
Comparação de distribuições de probabilidade e estimativa da precipitação provável para região de Barbacena, MG (Comparison of probability distributions and estimation of probable precipitation for the region of Barbacena, MG)
.
Ciência e agrotecnologia
1
(
5
),
1297
1302
.
https://doi.org/10.1590/S1413-70542007000500004
Royston
J. P.
1982a
Algorithm AS 181: the W test for normality
.
Journal of the Royal Statistical Society. Series C (Applied Statistics)
31
(
2
),
176
180
.
https://doi.org/10.2307/2347986
Royston
J. P.
1982b
An extension of Shapiro and Wilk's w test for normality to large samples
.
Journal of the Royal Statistical Society: Series C (Applied Statistics)
31
(
2
),
115
124
.
https://doi.org/10.2307/2347973
Royston
J. P.
1995
Remark AS R94: a remark on algorithm AS 181: the W-test for normality
.
Journal of the Royal Statistical Society. Series C (Applied Statistics)
44
(
4
),
547
551
.
https://doi.org/10.2307/2986146
Salman
S. A.
,
Shahid
S.
,
Ismail
T.
,
Al-Abadi
A. M.
,
Wang
X. J.
&
Chung
E. S.
2019
Selection of gridded precipitation data for Iraq using compromise programming
.
Measurement
132
,
87
98
.
https://doi.org/10.1016/j.measurement.2018.09.047
Şen
Z.
&
Eljadid
A. G.
1999
Rainfall distribution function for Libya and rainfall prediction
.
Hydrological Sciences Journal
44
(
5
),
665
680
.
https://doi.org/10.1080/02626669909492266
Shapiro
S. S.
&
Wilk
M. B.
1965
An analysis of variance test for normality (complete samples)
.
Biometrika
52
(
3/4
),
591
611
.
https://doi.org/10.2307/2333709
Stern
R. D.
&
Coe
R.
1982
The use of rainfall models in agricultural planning
.
Agricultural Meteorology
26
(
1
),
35
50
.
https://doi.org/10.1016/0002-1571(82)90056-5
Sukrutha
A.
,
Dyuthi
S. R.
&
Desai
S.
2018
Multimodel response assessment for monthly rainfall distribution in some selected Indian cities using best-fit probability as a tool
.
Applied Water Science
8
(
5
),
145
.
https://doi.org/10.1007/s13201-018-0789-4
Vieira
F. M. C.
,
Machado
J. M. C.
,
Vismara
E.
&
Possenti
J. C.
2018
Probability distributions of frequency analysis of rainfall at the southwest region of Paraná State, Brazil
.
Revista de Ciências Agroveterinárias
17
,
260
266
.
https://doi.org/10.5965/223811711722018260
Wilhite
D. A.
2000
Drought as a natural hazard: concepts and definitions
. In:
Published in Drought: A Global Assessment
, Vol.
I
(
Wilhite
D. A.
, ed.).
chap. 1
.
Routledge
,
London
, pp.
3
18
.
WMO
.
2015
Global Climate in 2014 Marked by Extreme Heat and Flooding
. .
Ye
L.
,
Hanson
L. S.
,
Ding
P.
,
Wang
D.
&
Vogel
R. M.
2018
The probability distribution of daily precipitation at the point and catchment scales in the United States
.
Hydrology and Earth System Sciences
22
(
12
),
6519
6531
.
Zong
Z.
2006
Information-Theoretic Methods for Estimating of Complicated Probability Distributions
.
Elsevier
,
Amsterdam
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).