Recreational water quality surveillance involves comparing bacterial levels to set threshold values to determine beach closure. Bacterial levels can be predicted through models which are traditionally based upon multiple linear regression. The objective of this study was to evaluate exceedance probabilities, as opposed to bacterial levels, as an alternate method to express beach risk. Data were incorporated into a logistic regression for the purpose of identifying environmental parameters most closely correlated with exceedance probabilities. The analysis was based on 7,422 historical sample data points from the years 2000–2010 for 15 South Florida beach sample sites. Probability analyses showed which beaches in the dataset were most susceptible to exceedances. No yearly trends were observed nor were any relationships apparent with monthly rainfall or hurricanes. Results from logistic regression analyses found that among the environmental parameters evaluated, tide was most closely associated with exceedances, with exceedances 2.475 times more likely to occur at high tide compared to low tide. The logistic regression methodology proved useful for predicting future exceedances at a beach location in terms of probability and modeling water quality environmental parameters with dependence on a binary response. This methodology can be used by beach managers for allocating resources when sampling more than one beach.
INTRODUCTION
The US Beaches Environmental Assessment, Closure, and Health (BEACH) Act was passed in 2000 for the purpose of developing beach surveillance programs (United States Environmental Protection Agency (US EPA) 2004). BEACH surveillance programs intend to advise and inform beach users of health risks such as gastrointestinal and other waterborne illnesses they might encounter while swimming at beach sites. Towards this end, beach-water quality monitoring is based upon measures of fecal indicator bacteria (FIB), with enterococci the recommended FIB for marine environments.
In the summer of 2000, the State began the Florida Healthy Beaches Program (FHBP) as authorized by state legislation (Senate Bill 1412 and House Bill 2145). Therefore, from 2000 to date, the Florida Department of Health (FDOH) has run a surveillance program within all coastal counties. Samples within Miami-Dade County are collected once a week. Resampling occurs when the results exceed either the single sample standard of 104 colony forming units (CFU) per 100 ml or the 30-day geometric mean of 35 CFU/100 ml (Florida Department of Health). The surveillance samples are analyzed for confirmed enterococci by membrane filtration, which requires a 24-hour incubation period (US EPA 2002).
The 24-hour incubation period results in a lag time between sample collection and beach advisories. This lag time increases the rate of error in issuing advisories (Fleisher 1990a; Boehm et al. 2002; Kim & Grant 2004; Wade et al. 2006) as the value represents the FIB density of the prior day's water quality. This error may also be due to the environmental variables that affect FIB concentrations (Enns et al. 2012) and growth over time (Desmarais et al. 2002) or simply be influenced by the extreme variability in FIB densities over short periods of time (Fleisher 1990a, 1990b). This variability, in turn, influences the ability to accurately predict real-time advisories based upon a previous day's conditions. Moreover, the variability in FIB measures can be extreme over shorter periods of time (Boehm et al. 2002; Boehm 2007). To better address lag-times and the environmental variability in enterococci levels, this study illustrates the use of logistic regression as a means of predicting beach closures. Logistic regression is well suited for assessing beach closure scenarios due to its ability to predict a binary outcome (in this case beach closure or not) from a series of predictor variables. Logistic regression allows for the conversion of bacterial levels to exceedance probabilities. Consequently the beach manager can better allocate resources when sampling more than one beach.
METHODS
Study sites
The sites are located in the subtropical region of southeastern Florida (Figure 1) which is characterized by hot summers (mean of 32 °C) and warm-mild winters (20 °C). The rainfall patterns of this region are defined as the wet rainy season that ranges between May and October (average 21.0 cm per month); and the dry season between November and April (average 4.9 cm per month). The tidal range in our sampling locations varied, thus time and date specific tidal tables available through the National Oceanographic and Atmospheric Administration were used to estimate tidal stage when each sample was taken. The tidal range in this area is 0.7 m. Historical rainfall records were available through the South Florida Water Management District and averages for Miami-Dade County were used.
Statistical analysis
In order to determine an appropriate statistical model, the distribution of enterococci among beaches was tested against all commonly used distributions. First, the data set was broken down into four categories: (a) the complete historical dataset, (b) geographical beach site, (c) by month, and (d) by year. Each category was tested then for normality using Kolmogorov-Smirnov Goodness-of-Fit for normality, with a rejection of the normal distribution hypothesis for p values less than 0.05. If the normality test hypothesis failed, then enterococci density was tested for the following distributions: lognormal, exponential and gamma. Each distribution was tested separately: first for the entire data set and then by beach, month, and year. The probability of exceedance was then evaluated for each of these categories. The yearly category was further evaluated to investigate the potential impacts from hurricane events.
The effects of environmental variables on exceedance were also explored. A backward selection model was used with model retention at alpha = 0.10. Environmental parameters evaluated were those that were documented by the FDOH sampling staff. These parameters included beach location, air temperature, water temperature, rainfall within 24 hours before sampling, rainfall within 3 days before sampling, rainfall within 7 days before sampling, prior hurricanes, and tide conditions (1 = High Tide, 2 = Slack Tide, and 3 = Low Tide). Beach location was used to determine whether exceedances differed geographically.
Exceedance counts were based on the EPA single sample cut-off of 104 CFU/100 ml. The enterococci dichotomy values fell either below or above the EPA 104 exceedance levels for the purpose of the logistic regression analysis of this study. Any individual data point that was below the 104 value was considered below exceedances and was coded as 0, whereas any value that was above the 104 value was considered in exceedance and coded as 1. All statistical tests were generated using Statistical Analysis System software (SAS System) and Excel.
RESULTS
Historical aspects of sampling
All goodness-of-fit tests for commonly used distributions showed a p value of 0.01 or less. Therefore the data did not fit any of the distributions assessed, and enterococci density did not follow any commonly known distribution.
Probability of exceedances
Predictor variables for logistic regression
The logistic regressions showed that, of the environmental predictors included in the model, only tidal conditions showed a statistically significant influence on the presence or absence of an exceedance (Table 1). Enterococci exceedance was 2.475 times more likely to occur at high tide versus low tide, and 1.252 times more likely at high tide versus slack tide.
Odds ratio estimates . | |||
---|---|---|---|
Effect . | Point estimate . | 95% Wald confidence limits . | |
Tidal Conditions 1 vs 3 | 2.475 | 1.661 | 3.687 |
Tidal Conditions 2 vs 3 | 1.252 | 0.866 | 1.811 |
Odds ratio estimates . | |||
---|---|---|---|
Effect . | Point estimate . | 95% Wald confidence limits . | |
Tidal Conditions 1 vs 3 | 2.475 | 1.661 | 3.687 |
Tidal Conditions 2 vs 3 | 1.252 | 0.866 | 1.811 |
Tidal conditions as reported by the Florida Department of Health are coded as 1 = High Tide, 2 = Slack Tide, and 3 = Low Tide.
DISCUSSION
Historical and seasonal analysis
The overall historical data did not yield obvious patterns except for the fact that throughout time enterococci remained at low background levels in the environment with occasional exceedance values (3% of the time). Therefore we concluded that Miami-Dade County beach waters have low levels of exceedances historically and a low probability to have an exceedance spatially and temporally. The sporadic exceedances could be due to: nutrient loading (Boehm et al. 2002) or the high density of bathers or dogs (Wang et al. 2010). Also indicator bacteria have been previously found in the environment and can multiply in subtropical regions even though there is no known source (Toranzos 1991; Solo-Gabriele et al. 2000; Fleisher et al. 2010). When evaluating the data by beach, probability analysis indicated that some beaches were more prone to exceedances than others. There was no apparent geographic trend from north to south. There was no evidence that indicated that adjacent beaches were affecting each other or that one particular regional zone had more exceedances than another. Interestingly, ‘bay’ beaches consisting of Dog, Matheson, and Oleta Beaches were on both ends of the exceedance spectrum. Dog Beach is one of two beaches in the county (the other being Haulover Beach) that allow dogs, which may explain in part the higher values at this beach. When evaluating the data on a monthly basis, the probability of exceedance did not match seasonal patterns of rainfall nor did it seem to respond adversely to hurricane impacts. This runs counter to studies that find that storm water inputs result in elevated beach bacteria levels. The difference in these observations is likely due to sampling intervals. The sampling interval for the Miami-Dade data archived through the Florida Healthy Beaches Program is weekly. Studies with shorter time scales of sampling (in the order of hours) do show immediate impacts after storms (Enns et al. 2012), but these impacts are short-lived and can be easily missed during the weekly sampling strategy used to collect the long term record for Miami-Dade. On a monthly basis, the highest indicator levels were observed during March and October. October coincides with the highest tides in Miami-Dade County. This is consistent with the logistic regression model which found significant differences in enterococci levels between high and low tide and with beach specific studies conducted in Miami-Dade County showing a repetitive increase in enterococci levels with tide throughout the course of several consecutive days (Wright et al. 2011). The cause of the elevated exceedances for March are not known, but one may speculate that it may be associated with Spring Break, the time when schools and universities are on holiday allowing for more bathers at the beach.
The logistic regression model of the various environmental factors yielded tide as the only statistically significant predictor of an exceedance. This result is consistent with prior studies that found that in South Florida the source of the microbes is from the sand (Bonilla et al. 2007; Phillips et al. 2011). Studies of sand quality show that sand harbors elevated microbes. These microbes can wash into the sand from the water column or can be deposited directly by beach goers and animals (Elmir et al. 2009; Wright et al. 2009). The highest levels of microbes observed on the beach have been shown to occur just above the seaweed (or strand) line. This area is in contact with the shore water only during high tide allowing for a pulse release of bacteria during times of high water levels (Abdelzaher et al. 2010). More work is needed to evaluate the role of seaweed and its influence on the persistence and release of enterococci. An alternate but complimentary explanation of the effect of tide on enterococci levels is the net inflow of water coming into the beach at high tide. This can bring in enterococci from off shore sources. Determining which scenario is correct is subject to further research and the scenarios might not be mutually exclusive. Different beaches may have different mechanisms to explain the effect of tide. In order to further evaluate the causes of elevated bacterial levels at high tide, culturable organisms require further analysis. Also our laboratories have begun to evaluate microbial communities using next generation molecular tools as a means of identifying the presence or absence of the enterococci and other FIB (Cuvelier et al. 2014; Campbell et al. 2015; O'Connell unpublished).
CONCLUSIONS
This study illustrates the evaluation of data in terms of the probabilities of exceedance, and subsequently uses logistic regression to identify key environmental variables that are associated with exceedance values. With this information, the beach manager can determine which beach needs more extensive sampling, and identify environmental conditions during which samples should be collected. We recommend that beach managers distribute their resources to allow for more intense sampling at beaches characterized by higher probabilities of exceedance and balance the available resources by decreasing sampling intervals at beaches characterized by lower probabilities of exceedance.
This work also found that the logistic regression model was well suited for identifying key parameters associated with bacterial exceedances. The logistic regression model can be used in lieu of the more commonly used least squares linear regression. The logistic regression approach models beach opening versus closure (exceedances) which is more in line with the actual decisions that need to be taken by the beach manager. We recommend studies that compare the performance of a logistic regression approach as opposed to the more commonly used least squares linear regression approach.
We recommend that beach managers utilize ‘probability of exceedance’ information to allocate sampling resources and use logistic regression for the purpose of identifying key environmental parameters that result in beach closures. Moreover, we recommend that beach users be alerted with regard to the probability of exceedance as opposed to ‘open’ versus ‘closed’ status from the prior day's measures. This avoids the drawbacks associated with lag times due to sample analyses. We recommend that the risk to recreational beach use be simply expressed in terms of its exceedance probability.
ACKNOWLEDGEMENTS
We thank Ms Amy Doyle for copy editing early versions of this manuscript. We thank David Polk of the Florida Department of Health for the provision of data from the Florida Healthy Beaches Program.