Abstract

The reference conditions of Chlorophyll-a in lakes should be established in order to improve the water quality in these water bodies. A new method using segmental linear regression was developed to estimate the reference condition of Chlorophyll-a. The method can overcome the shortcomings of other methods, such as the quantile-selection-based method, which contains a certain level of subjectivity. The new method was used to estimate the annual reference condition of Chlorophyll-a in Taihu Lake. The log–log segmental regression results indicate that the distribution of specific volume (reciprocal of concentration) of Chlorophyll-a in Taihu Lake had a power-law tail. Both segmental regression and bootstrap approaches show two credible change points in the power-law relationship. This study's analysis shows that the value of 4.4 mg·m−3 was an appropriate annual reference value of Chlorophyll-a in Taihu Lake. Thus, the method would be useful in determining the numerical reference conditions of Chlorophyll-a for other shallow lakes.

INTRODUCTION

Currently, the water quality of lakes remains a serious problem in China. Establishing reference conditions of lakes is important for pollution control. Reference conditions are defined as the conditions of a lake that are the least impacted conditions by humans or considered to be the best water quality in attainable conditions (US EPA 2000a). The numerical reference conditions of nutrients can be quantified as the background value related to nutrients, which generally refer to Total Phosphorus (TP), Total Nitrogen (TN), Chlorophyll-a and Secchi Depth (SD) in lakes (US EPA 2000a, 2000b, 2010). The final aim of improving lake water quality is to restore the original state of lakes. Thus, the reference conditions of lake nutrients play a basic role in water pollution control.

It is difficult to find lakes that fit the requirements that allow the determination of reference conditions, particularly in developed industrial and agricultural areas. Several methods have been proposed to calculate the numerical reference conditions of nutrients. The United States Environmental Protection Agency (US EPA) has recommended several types of methods, such as paleolimnological reconstruction and statistical approaches (regression models, quantile selection methods and change-point detection methods) (Huo et al. 2009, 2014b; Huo & Xi 2014) to obtain the value of lake water quality reference conditions.

Statistical approaches are the most popular method for estimation of water quality reference conditions. Regression models such as models based on the morphoedaphic index (MEI) and stressor-response models have been used by the US EPA (2000a, 2010) and the REBECCA project (Relationships Between Ecological and Chemical Status of Surface Waters) in Europe (Solheim 2005). Cardoso et al. (2007) studied the TP reference conditions of lakes in Europe using MEI regression models. Gu et al. (2013) improved the MEI model for shallow lakes and forecast the TP reference concentration in the Taihu Lake basin. The uncertainty and accuracy of TP reference conditions using the MEI, export coefficient and diatom-pigment-inferred TP model methods have been tested by Salerno and his collaborators (Salerno et al. 2014). Although MEI methods have been widely applied, the approach can only be used to estimate TP reference conditions. Thus, Huo et al. (2013a) determined the reference conditions for SD and Chlorophyll-a in the eastern plain ecoregion lakes in China using the multiple regression method. Huo et al. (2013b, 2014a, 2014b, 2015a) and Zhang et al. (2014) applied regression methods of stressor-response models to several ecoregions in China. The regression methods based on stressor-response models can be used for nutrients and SD but not for Chlorophyll-a. These methods use Chlorophyll-a as the response, so the reference conditions of Chlorophyll-a are obtained before the methods are applied.

Statistical approaches based on quantile selection are other common methods for estimating the reference condition of lakes. The US EPA recommended the lake population distribution approach to calculate lake reference conditions. Dodds et al. (2006) applied the trisection approach to determine the reference conditions of lakes and reservoirs in Kansas. Chen et al. (2010) applied the frequency analysis and trisection approach to determine the nutrient reference values of Chao Lake. Several descriptive statistical approaches were compared to derive the nutrient reference conditions in the ecoregion lakes and reservoirs of Yungui Plateau (Chen et al. 2012). Zheng et al. (2009) applied a frequency analysis to estimate the nutrient reference conditions of Taihu Lake. Hua & Wang (2013) studied the nutrient reference conditions of Taihu Lake based on extreme value statistics. Wang et al. (2014) also studied the reference conditions of Taihu lake using nonparametric methods. The methods are easy to apply but can be arbitrarily affected by researchers’ methodological decisions. They need to select a quantile from observation data or model results, but there are no strict objective criteria to guide researchers in choosing an appropriate quantile.

To overcome the faults of the aforementioned methods, the US EPA used change-point detection methods to subjectively derive numerical nutrient criteria (US EPA 2010). Haggard et al. (2013) derived nutrient thresholds across the Red River Basin using a regression tree analysis (CART). Huo & Xi (2014) and Huo et al. (2015b) developed new methods, namely, nonparametric change point analysis (nCPA), and Bayesian hierarchical modelling (BHM), to estimate lake nutrient thresholds in China. All the aforementioned studies were based on nonparametric statistics with change-point detection. The mathematical theories for these methods, however, are complex and unfamiliar to environmental engineers, which has constrained their use and popularization.

Based on the change-point detection methods proposed by the US EPA, in this study, a new method based on the segmental linear regression approach, which is simpler than CART and nCPA, was used to derive the Chlorophyll-a annual reference values of lakes. The method was applied to the distribution of Chlorophyll-a in Taihu Lake, and its power-law relationship with the change points was determined. Moreover, the reference values of Chlorophyll-a in Taihu Lake were discussed and determined based on the change points.

MATERIALS AND METHODS

Data source

Taihu Lake is in the east of China, which is the country's most economically developed region. Taihu Lake is the third largest freshwater lake in China and is located next to the Shanghai metropolis. The area of the lake is approximately 2,445 km2, and its average depth is 1.9 m. The population in the Taihu Lake area, whose water supplies rely on Taihu Lake, is approximately 100,000,000 people. The water quality of this lake has seriously deteriorated in recent decades. Algae blooms often occur in the summer in most areas of the lake (Qin 2008). The nutrient levels in Taihu Lake have attracted the attention of many scientists because of the lake's key role in water supply, agriculture and fisheries in this region.

This study of the Chinese Ecosystem Research Network (CERN) and the Taihu Laboratory for Lake Ecosystem Research focuses on Chlorophyll-a observations from eight sites on Taihu Lake (Qin & Hu 2010). Figure 1 shows Taihu Lake and the site locations.

Figure 1

Locations of the sites in Taihu Lake.

Figure 1

Locations of the sites in Taihu Lake.

These eight sites represent typical aquatic environments in Taihu Lake. There is no site 2 for historical reasons. The time period for observations was from January 1995 to December 2006. Observations were conducted once a month, and the experimental method had been previously introduced in the literature (Qin & Hu 2010). There were seven missing data points and 1,145 available data points.

Methods

The methods were based on linear regression and some pretreatment was performed. The Chlorophyll-a monthly observations were arranged in order of concentration values. The values of Chlorophyll-a observations that were less than x were referred to as F(x). Values above 20 mg·m−3 were ignored as the presented method focused on the left tails and local feature of Chlorophyll-a values so that large values would not affect the reference condition calculations. The probability distribution function of Chlorophyll-a observations is calculated as follows: 
formula
(1)
where L is the total number of observations, and P is the probability. To facilitate the calculation in accordance with power-law calculation, the reciprocal of x should be h. Then, Equation (1) becomes: 
formula
(2)
where h can be called the ‘specific volume of Chlorophyll-a’ according to the physical definition. Mathematically, the equation implies a specific volume from the left tail of the Chlorophyll-a concentration distribution to its right tail, which is consistent with standard power-law calculation.
The power-law relationship, i.e., the Pareto law, states that P(h) must satisfy the following relation: 
formula
(3)
where h and are constants; h should have a low bound hmin that follows the probability density function definition, which is often calculated using the bootstrap method (Efron & Tibshirani 1993). Taking the logarithm on both sides of the equation, Equation (3) becomes Equation (4): 
formula
(4)
The parameters can be estimated using the ordinary least squares method (OLS). Equation (4) should be modified as a segmental linear regression method if there are change points. For example, with one change point the segmental linear regression models (Ulm 1991; Muggeo 2003; Betts et al. 2007) are as follows: 
formula
(5)
is the value of the change point. Function I is shown in Equation (6): 
formula
(6)
Equation (5) can be divided into n equations using n−1 change points (Table 1). The segmental linear regression with two or more change points is easily obtained in this manner. The parameters in Equation (5) can be estimated using extended linear regression, the method of which has been detailed by Muggeo (2003).
Table 1

Estimation results of Equation (5)

ValueStd errorΔβ0β1
Change point 1 −1.47 0.013 −0.61 – – 
Change point 2 −2.35 0.017 −0.66 – – 
– – – – −2.41 −0.65 
ValueStd errorΔβ0β1
Change point 1 −1.47 0.013 −0.61 – – 
Change point 2 −2.35 0.017 −0.66 – – 
– – – – −2.41 −0.65 

RESULTS

The log[P(h)] versus log(h) plot of Chlorophyll-a specific volume (reciprocal of concentration) versus the distributions is shown in Figure 2. A value for H larger than 0.714 m3·mg−1 implies that Chlorophyll-a concentrations less than 1.4 mg·m−3 did not represent a regular pattern. A change point is clearly shown between log[P(h)] and log(h) at 0.714 m3·mg−1. Figure 2 also shows that there was a power-law between the two variables where Chlorophyll-a concentrations were larger than 1.4 mg·m−3. The figure also indicates that the relationship between log[P(h)] and log(h) may have other change points.

Figure 2

Log–log plot of Chlorophyll-a specific volume (reciprocal of concentration) and P(h).

Figure 2

Log–log plot of Chlorophyll-a specific volume (reciprocal of concentration) and P(h).

After repeated experiments, the number of change points should be two when the results meet the requirements of statistical testing. Table 1 shows the main coefficient result of the segmental linear regression that corresponds to Equation (5).

The change point that corresponds to the concentration was approximately 4.35 mg·m−3, and the other was approximately 10.48 mg·m−3. All standard errors of coefficients were small compared with the value, and the significance value was p < 0.001. These data confirm the power-law relationships of the Chlorophyll-a ranging from 1.4 to 20 mg·m−3 in Taihu Lake and that the change points in the relationships were trustworthy. In addition, although the coefficients of regression are quite distinct from the results shown above, the results of the change points estimated for sites in Meiliang Bay are approximately 4.71 mg·m−3 and 10.75 mg·m−3, which is not significantly different from those calculated for all sites. This finding implies that spatial heterogeneity is not a necessary consideration when determining the change points using this method.

The two change points can divide Equation (5) into three equations, as in Equation (4). The R2 values of these three equations were larger than 0.99, suggesting that the results were credible. The values of γ are shown in Figure 3. Its numerical difference in three stages was approximately 100% and 50%, which provides sufficient evidence for the presence of change points.

Figure 3

Regression results of the log–log plot in three stages.

Figure 3

Regression results of the log–log plot in three stages.

In the segmental regression methods, hmin to establish the power-law relationship was determined using the bootstrap method (Efron & Tibshirani 1993). The outcome with 5,000 time bootstrap calculations is shown in Figure 4. The data for hmin were separated into three areas, which implies that there might be two hmin and that the two values can be considered change points of the power law. The analyses of the bootstrap and regression methods confirm one another. The value of 4.4 mg·m−3 is close to the highest frequency value (c = 4.6 mg·m−3), which was obtained using bootstrap in area II. In the third area, the highest estimated frequency value of hmin using bootstrap was approximately where the Chlorophyll-a concentration was 10.4 mg·m−3, which is identical to the segmental regression analysis result. Thus, the change point assessment using the regression method is credible. There is a credible power-law relationship with two change points, which had not been previously found in the monthly Chlorophyll-a concentration observations in Taihu Lake.

Figure 4

Bootstrap results of hmin.

Figure 4

Bootstrap results of hmin.

Discussion on reference conditions

Table 2 summarizes the reference conditions of Chlorophyll- a and nutrients based on the quantile selection method in Taihu Lake.

Table 2

Reference conditions of Chlorophyll-a and nutrients-based quantile selection methods

IndexResults
TN (mg/L) 0.60a, 0.71b, 0.66c, 0.78d 
TP (mg/L) 0.030a, 0.025b, 0.023c, 0.030d 
Chlorophyll-a (mg/m34a, 1.81b, 1.27c, 2.63d 
IndexResults
TN (mg/L) 0.60a, 0.71b, 0.66c, 0.78d 
TP (mg/L) 0.030a, 0.025b, 0.023c, 0.030d 
Chlorophyll-a (mg/m34a, 1.81b, 1.27c, 2.63d 

The quantiles of 5%–25% were all used for Taihu Lake to estimate the reference conditions in quantile-selection-based methods, which led to notably different results, particularly for the reference condition of Chlorophyll-a. The difference between maximum and minimum reference conditions for nutrients was approximately 30%, whereas the difference for Chlorophyll-a was approximately 230% for different quantiles. These facts showed the results arbitrarily affected by researchers’ methodological decisions, especially for the reference conditions of Chlorophyll-a. Compared with the quantile selection method, the change-point detection methods shown above objectively determine the Chlorophyll-a reference values.

The segmental regression and bootstrap results prove that the power-law relations between h and P(h) exist at Chlorophyll-a concentrations of 1.4–20 mg·m−3. Many studies, including by the US EPA, have demonstrated the importance of change points as thresholds of the lakes (US EPA 2010; Haggard et al. 2013; Huo & Xi 2014; Huo et al. 2015b). The change points and the beginning point of the segmental linear regression method could also be threshold points in the environmental system of Taihu Lake. So, the two change points with the beginning point can be candidate reference values of Chlorophyll-a in Taihu Lake.

The beginning-point value of the power-law relationship was 1.4 mg·m−3, which also the estimated reference conditions of Taihu Lake found using generalized extreme value distribution (GEV) models (Hua & Wang 2013), but the Chlorophyll-a concentration of 1.4 mg·m−3 was notably low for Taihu Lake. Less than 1% of the 10-year observations show values below 1.4 mg·m−3 in Taihu Lake. It is an improper value for the reference conditions of Chlorophyll-a because it may be quite difficult to achieve water quality aims for a baseline of 1.4 mg·m−3. At the same time, the value of 10.48 mg·m−3 exceeds the levels that produce eutrophication in lakes and is certainly too large to be a reference condition for Taihu Lake.

Huo & Xi (2014) used the non-parameter change-point detection method and the Bayesian change-point approach to obtain the nutrient and Chlorophyll-a criteria for lakes in eastern China and the calculated results of the change points and the Chlorophyll-a annual criteria were 5.2 mg·m−3 (corresponding to the TP–Chlorophyll-a relationship) and 3.5 mg·m−3 (corresponding to the TN–Chlorophyll-a relationship), and the mean value was 4.35 mg·m−3, which is near the first change point. The results of the Bayesian approach were 4.3 mg·m−3 and 3.4 mg·m−3. Thus, the larger value is close to the first change point; this result provides additional evidence for using the change points for the numerical reference condition of Chlorophyll-a in Taihu Lake. All of this study's results show that the first change points from the regression are the reference condition of Taihu Lake. Thus, the most suitable reference value of Chlorophyll-a in Taihu Lake is 4.4 mg·m−3.

CONCLUSIONS

In this paper, the annual reference value of Chlorophyll-a in Taihu Lake has been calculated based on the change-point detection method with segmental linear regression. The segmental linear regression and bootstrap approach shows a credible power-law relationship with two change points in the monthly Chlorophyll-a concentration observations in Taihu Lake, a relationship which has not been previously found. The method developed in this study can objectively estimate the numerical reference condition and overcome the drawbacks of quantile-selection-based methods. The segmental linear regression results and related discussion imply that the reference condition of Chlorophyll-a in Taihu Lake should be approximately 4.4 mg·m−3. Thus, the method has been shown to be effective and can be easily generalized to determine the numerical reference conditions of Chlorophyll-a in other shallow lakes.

ACKNOWLEDGEMENTS

This work was financially supported by the National Natural Science Foundation of China (Grant Nos. 51379060, 51739002), the Major Science and Technology Program for Water Pollution Control and Treatment (Grant No. 2012ZX07103-005), the Qing Lan Project and PAPD Project, Jiangsu postgraduate scientific research and innovation projects (CXZZ13_0271, KYLX15_0474) and the Fundamental Research Funds for the Central Universities (2016B21214, 2015B24714, 2015B36314).

REFERENCES

REFERENCES
Betts
M. G.
,
Forbes
G. J.
&
Diamond
A. W.
2007
Thresholds in songbird occurrence in relation to landscape structure
.
Conservation Biology
21
(
4
),
1046
1058
.
Cardoso
A. C.
,
Solimini
A.
,
Premazzi
G.
,
Carvalho
L.
,
Lyche
A.
&
Rekolainen
S.
2007
Phosphorus reference concentrations in European lakes
.
Hydrobiologia
584
(
1
),
3
12
.
Chen
Q.
,
Huo
S. L.
,
Xi
B. D.
,
Zan
F. Y.
&
Li
X.
2010
Study on establishing lake reference condition for nutrient
.
Ecology and Environmental Sciences
19
(
3
),
544
549
.
Chen
Q.
,
Huo
S. L.
,
Xi
B. D.
,
Zan
F. Y.
&
He
Z.
2012
Study on total phosphorus and chlorophyll-a reference conditions in Yungui Plateau Ecoregion lakes and reservoirs
.
Journal of Environmental Engineering Technology
2
(
3
),
184
192
.
Efron
B.
&
Tibshirani
R. J.
1993
An Introduction to the Bootstrap
.
Chapman & Hall/CRC
,
London, UK
.
Gu
L.
,
Li
Q. L.
,
Hua
Z. L.
&
Hong
B.
2013
The improved MEI model for forecasting TP reference concentration in Lake Taihu basin
.
Journal of Lake Sciences
25
(
3
),
347
351
.
Haggard
B. E.
,
Scott
J. T.
&
Longing
S. D.
2013
Sestonic chlorophyll-a shows hierarchical structure and thresholds with nutrients across the Red River Basin, USA
.
Journal of Environmental Quality
42
(
2
),
437
445
.
Hua
Z. L.
&
Wang
L.
2013
A new method for estimation the lake quality reference condition
.
Environmental Science
34
(
6
),
2134
2138
.
Hua
Z. L.
,
Wang
L.
,
Gu
L.
&
Chu
K. J.
2014
Estimation of the lake quality reference condition based on the threshold extreme theory
.
China Environmental Science
34
(
12
),
3215
3222
.
Huo
S. L.
&
Xi
B. D.
2014
Determining Nutrient Criteria by Stressor-Response Models and Case Study
.
Science Press
,
Beijing, China
.
Huo
S. L.
,
Chen
Q.
,
Xi
B. D.
,
Guo
X.
,
Chen
Y.
&
Liu
H.
2009
A literature review for lake nutrient criteria development
.
Ecology and Environmental Sciences
18
(
2
),
743
748
.
Huo
S. L.
,
Xi
B. D.
,
Su
J.
,
Zan
F. Y.
,
Chen
Q.
,
Ji
D.
&
Ma
C.
2013a
Determining reference conditions for TN, TP, SD and Chl-a in eastern plain ecoregion lakes, China
.
Journal of Environmental Sciences
25
(
5
),
1001
1006
.
Huo
S. L.
,
Xi
B. D.
,
Ma
C.
&
Liu
H.
2013b
Stressor–response models: a practical application for the development of lake nutrient criteria in China
.
Environmental Science & Technology
47
(
21
),
11922
11923
.
Huo
S. L.
,
Ma
C.
,
Xi
B. D.
,
Gao
R.
,
Deng
X.
,
Jiang
T.
,
He
Z.
,
Su
J.
,
Wu
F.
&
Liu
H.
2014a
Lake ecoregions and nutrient criteria development in China
.
Ecological Indicators
46
,
1
10
.
Huo
S. L.
,
Ma
C.
,
Xi
B. D.
,
Tong
Z.
,
He
Z.
,
Su
J.
&
Wu
F. C.
2014b
Determining ecoregional numeric nutrient criteria by stressor-response models in Yungui ecoregion lakes, China
.
Environmental Science and Pollution Research
21
(
14
),
8831
8846
.
Huo
S. L.
,
Ma
C.
,
Xi
B. D.
,
He
Z.
,
Su
J.
&
Wu
F. C.
2015b
Nonparametric approaches for estimating regional lake nutrient thresholds
.
Ecological Indicators
58
,
225
234
.
Muggeo
V. M. R.
2003
Estimating regression models with unknown break-points
.
Statistics in Medicine
22
(
19
),
3055
3071
.
Qin
B.
2008
Lake Taihu, China: Dynamics and Environmental Change
.
Springer
,
London, UK
.
Qin
B.
&
Hu
C. H.
2010
Chinese Ecosystem Positioning Observation and Research Data Sets: Taihu Lake
.
China Agriculture Press
,
Beijing, China
.
Salerno
F.
,
Viviano
G.
,
Carraro
E.
,
Manfredi
E. C.
,
Lami
A.
,
Musazzi
S.
,
Marchetto
A.
,
Guyennon
N.
,
Tartari
G.
&
Copetti
D.
2014
Total phosphorus reference condition for subalpine lakes: a comparison among traditional methods and a new process-based watershed approach
.
Journal of Environmental Management
145
,
94
105
.
Solheim
A.
2005
Reference conditions of European lakes. Indicators and methods for the Water Framework Directive Assessment of Reference Conditions. ReBeCCA project, European Union
.
US EPA
2000a
Nutrient Criteria Technical Guidance Manual: Lakes and Reservoirs. EPA 822-B00-001
.
United States Environmental Protection Agency, Office of Water
,
Washington, DC, USA
.
US EPA
2000b
Ambient Water Quality Criteria Recommendation: Lakes and Reservoirs in Nutrient Ecoregion II. EPA 822-B-00-007
.
United States Environmental Protection Agency, Office of Water
,
Washington, DC, USA
.
US EPA
2010
Using Stressor-Response Relationships to Derive Numeric Nutrient Criteria. EPA 820-S-10-001
.
United States Environmental Protection Agency, Office of Water
,
Washington, DC, USA
.
Wang
L.
,
Hua
Z. L.
,
Gu
L.
&
Chu
K. J.
2014
Estimating the reference nutrient levels of the shallow lakes in east China using a combination with several non-parametric methods
.
Advances in Water Science
25
(
5
),
724
730
.
Zhang
Y.
,
Huo
S. L.
,
Ma
C.
,
Xi
B. D.
,
Li
X.
&
Liu
H.
2014
Using stressor–response models to derive numeric nutrient criteria for lakes in the eastern plain ecoregion, China
.
CLEAN: Soil, Air, Water
42
(
11
),
1509
1517
.