Prewhitened causality analysis for the chlorophyll-a concentration in the Yeongsan River system

Blooming of algae has been a primary issue of concern for heavily polluted aquatic ecosystems. The chlorophyll-a (Chl-a) concentration depends on various hydrological, biochemical and anthropogenic components, which makes prediction of algal blooms complicated. A river regulation project in Yeongsan River, South Korea, involving the construction of a weir, had substantially altered the flow regime. A prewhitened time series analysis is a useful method for delineation of a causal relationship between two environmental variables. This study explores the impact of river regulation on algal blooming using both the prewhitened cross-correlation method and principal factor analysis. Both individual and comprehensive causality structures were configured for the variation in Chl-a concentration. A prewhitened cross-correlation analysis indicates that the water quality response patterns of the river system were changed to those of a reservoir after the river regulation project. A principal factor analysis of correlations indicates that the weir construction had a stronger impact on algal concentration than both the hydro-meteorological factor and difference in sampling location. Variation in stochastic structures from nutrients and water quality factors to algal bloom was substantially reduced by the construction of a weir, which can be explained by the relatively uniform flow pattern throughout the river regulation practice. doi: 10.2166/wcc.2018.259 ://iwaponline.com/wqrj/article-pdf/54/2/161/682865/wqrjc0540161.pdf Eunhyung Lee Sanghyun Kim (corresponding author) Department of Environmental Engineering, Pusan National University, Busandaehak-ro 63 beon-gil, Geumjung-gu, Busan 46241, Korea (ROK) E-mail: kimsangh@pusan.ac.kr Eunhye Na Water Environment Research Department, Water Quality Assessment Research Division, Institute of Environmental Research, Ministry of Environment, Kyungseo-dong Seo-gu Incheon, Korea (ROK) Kyunghyun Kim Youngsan River Environmental Research Center, National Institute of Environmental Research, Ministry of Environment, Cheomdangwuagiro 208-5, Gwangju, Korea (ROK)


INTRODUCTION
It is well known that algal blooms not only result in the decline of specific species, such as invertebrates, but also lead to deterioration of fish habitats due to increasing turbidity. Significant algal blooms often result in unpleasant taste and odor problems in drinking water as well as issues with filter clogging in water treatment plants. Furthermore, some species of cyanobacteria produce toxins (cyanotoxins) that cause significant ecological and human health concerns (WHO ).

Algal bloom problems in major river systems in South
Korea have become critical social issues since the Four Rivers Restoration Project was completed in 2011 (Park ). The effect of the large weirs constructed along the rivers during the project on the frequency and intensity of algal bloom is still controversial. Therefore, consideration of the existence of the weirs is particularly important when investigating the causality of algal blooms. Also, prediction of algal blooms is important because it is advantageous in the implementation of appropriate management practices, such as low flow augmentation (the release of water stored in a reservoir) for river systems.
The variation of algal blooms has been modeled using process-based approaches (Thomann & Mueller ; Alternatively, the prediction of behavior of algae has been explored using several statistical methods or tech- When we find a relationship between any environmental variable and chlorophyll-a (Chl-a) concentration, using common index to represent the amount of algal biomass, the conventional cross-correlation function often shows limitations due to the autocorrelation structure of each time series (Kim et al. ). This issue can be largely addressed through the introduction of a prewhitening method (Kim ). Case studies for the Elbe River in Germany and Murray River in Australia also indicated that the delineation of actual correlation between two water quality variables can be achieved through filtering a common stochastic structure between input and output time series (Maier & Dandy ; Lehmann & Rode ).
The relationships among many variables can be elucidated by the introduction of factor analysis with minimum loss of information. The biplot of factor analysis efficiently describes the correlation among different variables and was found to be useful in evaluating water quality at monitoring points (Mohamed et al. ) and in identifying contaminant sources associated with heavy metals (Basamba et al. ).
To understand the possible variation in causality due to environmental factors that influence algal blooms in terms of Chl-a concentration before and after the river regulation project, various time series of meteorological, hydrological and water quality parameters and Chl-a concentration were collected upstream and downstream of the weir, which was constructed in 2012 in the Yeongsan River, South Korea. This study addresses two issues based on the analysis of measurements of hydro-meteorological, nutrient, and water quality parameters during both pre and post periods of a river regulation project: (i) how a prewhitened relationship (Granger causality) between various environmental parameters and Chl-a concentration can be obtained as a stochastic structure associated with common drivers by eliminating seasonality from each time series; (ii) how multiple relationships between hydro-meteorological, nutrient, and water quality factors and algal bloom can be expressed through factor analysis. The first and second questions address linear and comprehensive causality structures, respectively, for variation of Chl-a concentration with and without the impact of river regulation.
Finally, we analyzed how the construction of the weir changed the algal concentration response pattern in the context of various environmental factors.

Study area and data acquisition
Considering data availability during pre and post periods of the river regulation project, two locations in the Yeongsan River were selected for this study (see Figure 1). The Seungchon Weir was constructed between 2010 and 2011 under 'the Four Rivers Restoration Project of Korea', which aimed to address water-related issues such as preventing floods and storing more water for droughts (Woo ).
Therefore, we define the period from 2007 to 2009 as the pre period of the project, 2010 to 2011 as the construction period, and 2012 to 2014 as the post period. For those periods, water quality samples were collected and analyzed at weekly intervals at the two locations, Kwangsan and Najoo, for upstream and downstream points of the Seungchon Weir (see Figure 1). The Ministry of Environment had been responsible for obtaining data for 15 water quality variables, which include biochemical oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO), pH, suspended solids (SS), electric conductivity (EC), water temperature (WT), Chl-a concentration, total nitrogen (TN), ammonium nitrogen (NH 3 N), nitrate nitrogen (NO 3 N), dissolved total nitrogen (DTN), total phosphorus (TP), dissolved total phosphorus (DTP), and phosphate phosphorus (PO 4 P). Chl-a was extracted in acetone and concentrations measured using spectrometry methods.
Meteorological data were obtained from datasets generated by the Korea Meteorological Administration. The dataset from the Kwangjoo Regional Meteorological Office (126°53′ 00″, 35°10′, 00″), which is located close to the Seungchon Weir, was selected as the representative meteorological data. In order to match the temporal resolution to the weekly water quality data for time series analysis, meteorological data were converted to weekly interval datasets. The meteorological data used in this study were atmosphere temperature (AT), rainfall (RAIN), wind speed (WS), humidity (HU), cloud amount (WIND), solar radiation (SoR), and duration of sunshine (SuD). Temporal resolution of other hydrological data such as rainfall and flowrate were also converted to a weekly interval.

Procedure for the time series analysis
If statistics of data show high skewness, a significant trend, or periodicity, then proper pretreatment is required before the primary analysis. A suitable transformation of the time series can improve the normality of the series, to eliminate trends and seasonality. The Box-Cox transformation (Box & Cox ) has been the most widely used method to remove non-normality from a time series, which can be defined as: where z(t) is the time series, and C and γ are constants.
The cross-correlation function, the degree of the linear relationship between two series, can be computed as: where C xy (t) is the cross-covariance between two series, i.e., x and y, and C x (0) and C y (0) are the variance estimates for the x and y series, respectively.
Depending on the stochastic structure of the two series, i.e., whether they are mutually cross-correlated or auto-correlated, and both are white noise processes, the variance of the cross-correlation function is distinctively expressed.
In this approach, the confidence intervals of the cross-correlation function are estimated based on the assumption that the time series had no pre-stochastic process.
To remove the stochastic signal from the pretreated series, an appropriate univariate model can be introduced.
The autocorrelation and partial autocorrelation structures of the reference series identify a suitable model structure (Box & Jenkins ). The time series model for the prewhitening process can be expressed as: where B is the backshift operator defined as B k x(t) ¼ x(t À k), k is a positive lag, φ x (B) ¼ 1 À φ 1 B À φ 2 B 2 À . . . À φ n B n , and Once the structure of the model has been determined, the identical model is used to remove the stochastic structure of the other series as: Therefore, the prewhitened series, u(t) and v(t) from Equations (4) and (5), can be obtained as residual estimates from Equations (4) and (5), respectively. The independence of the residuals can be checked by computing the autocorrelation and autocorrelation functions (ACFs) of the residuals and comparing the confidence intervals.
The prewhitened causality between the two series could be checked using the cross-correlation function of model residuals as follows: where C uv (t) is the cross covariance estimate, which can be calculated as: and C u (0) and C v (0) are the variance estimates for the prewhitened series.

PRINCIPAL FACTOR ANALYSIS
Principal factor analysis (PFA) is a multivariate analysis method to describe clustered relations among several variables. It is also referred to as a dimensional reduction method, which uses simplified factors to represent substantial amounts of information. Using CCFs obtained through prewhitened causality analysis, datasets were categorized into multiple time series depending on the location and sampling time. We characterized stochastic structures of the time series with the spatial distribution of factor analysis through biplots of factors such as meteorological, water quality, and nutrient components.
If X is a probability vector which has p variables, the average vector, μ, for the corresponding X vector can be defined. Since it is impossible to visually illustrate p dimensions, the dimension and sample size of the vector need to be downsized. An application of PFA reduces the sample size of a vector to s, which is lower than p. Equation (9) represents PFA for reduction of vector size as: where L is the matrix of factor loadings having a dimension as p × s, which consists of factor loadings (λ ij ), F is a common factor vector, ( f 1 , f 2 , ::, f s ) T , with zero as an average and I s×s as variance, ɛ is a specific factor, (ε 1 , ε 2 , ::, ε s ) T , with zero as an average and (φ 1 , φ 2 , ::, φ s ) T as variance.
The total variance in PFA can be separated into the common factor variance and the specific variance. The common factor variance is connected with other variables, but the specific variance is for an independent variable (Wrigley ).
The covariance matrix (Σ) can be defined as the sum of the factor loading matrix and a specific covariance matrix (ψ) through a factor decomposition as: The variance of X can be expressed as the sum of factor loadings and a specific variance as: where h 2 i is a common variance and φ ii is a specific variance which is independent of other variables.
Singular value decomposition (SVD) was used to estimate L and ψ using a reduced sample covariance matrix.
A standardized matrix (Y) can be calculated as: where v k is the eigenvector obtained through the spectral decomposition of the sample covariance matrix.
The relationship between sample covariance matrix (S) and standardized matrix (Y) can be expressed as: where n is the total number of environmental variables.
The covariance matrix can be estimated through spectral decomposition as: If n is known, Y, λ k , and v k can be obtained from Equation (12). The matrix of factor loadings can be calculated as: The total variance is used to estimate the specific variance matrix (Ψ) using the equation S ÀL (s) e L 0 (s) : Therefore, all variables can be expressed in terms of loading factors of factor 1 and factor 2, which can be obtained through a biplot of the corresponding coordinates. The relationship among different variables can be expressed through their spatial distributions (i.e., position and angle) in the biplot of factor analysis (Choi & Byun ).
In order to quantify the comprehensive relationships, the angles of biplots between vectors can be used to evaluate similarity between two vectors. The vector in a biplot can be expressed through its magnitude and direction, representing variance and relative location, respectively. Similarity between two vectors in a biplot can be mathematically defined in terms of the angle between them, which can be calculated as: where θ is the angle between two vectors, and kv i k is a norm of vector v i . The smaller the angle between two vectors, the higher the relationship between two factors.

Prewhitening time series analysis
In order to investigate the underlying stochastic structures of datasets, the univariate time series modeling procedure (i.e., Equation (4)) can be applied. However, statistics of the datasets such as skewness coefficient indicated that proper pretreatment using the Box-Cox transformation was necessary (Box & Cox ). A heuristic approach was used to find the most appropriate parameters, c and γ in Equations (1) and (2), for our datasets, and in most cases, the transformation generally improved the normality of the data. We investigated stochastic structures of the datasets through evaluations of the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the datasets. The results of prewhitening are demonstrated in Figure 4 for data from the upstream and downstream points.   (Figure 4(c)) and TP did not show any causality structure related to Chl-a concentration (Figure 4(d)).
The significant negative CCF at 0-lag is explained by the increasing flowrate due to rainfall, which can remove the algae in the river. Considering the change of flowrate after rainfall in the river, the reduction of Chl-a concentration seems physically meaningful. Also, the significant positive CCF at 1-lag can be explained by the nutrient inflow from agricultural areas, forest, and livestock farms after rainfall, which is the main cause of algal growth. No significant relationship between TP and Chl-a can be explained by the generally high TP concentration in all datasets. That is, the TP concentration is high enough that it is not limiting algal growth.
Both nitrogen and phosphorus balance per hectare of the agricultural land in Korea were three to four times the corresponding average values for OECD countries (OECD ).
Concentrations of TP did not control the algal blooming process for major rivers in South Korea (Kim et al. ).
Using prewhitening and cross-correlation methods, the prewhitened causality analysis between environmental factors and Chl-a concentration can be accurately evaluated.
In particular, the comparison of correlation analysis at 0-lag between the conventional and prewhitened methods provides improved analysis using prewhitening for algal responses. Figure 5

Correlation analysis of Chl-a concentration in the Yeongsan River
The results of prewhitened correlation at lag 0 between various environmental factors and Chl-a concentrations are summarized in Table 1. Meaningful correlations at 0 lag are ±0.28, which is the criterion for the 95% confidence level. Positive and negative correlations indicate the increasing tendency of Chl-a concentrations to increase and    Correlations between WT and Chl-a concentrations also showed more frequent and higher causality during the period of post weir construction than pre construction (see Table 1). As flow residence time increases after the weir construction, the surface water temperature is more subject to change. WT not only affects the size of the phytoplankton through controlling enzymatic reactions but also plays an important role as a limiting factor in controlling proliferation and microbial growth (Kormas et al. ). WT between 15°C and 25°C is known to be appropriate for a rapid growth rate of algae (Davis et al. ), but at WT below 13°C and above 30°C, the growth rates tend to decrease (Reynolds ; Chu & Rienzo ). As WT increases and remains high, the algal growth increases according to changing temperature.
For phosphorus components (TP, DTP, PO 4 P), the trend differences including negative correlation after weir  The reduction of flow velocity originating from the backwater impact of the weir can be strongly associated with the hydrometeorological factor, which results in algal blooming.
Significant correlations of negative lags for the downstream point are summarized in Table 3. The downstream     (Figure 7(a)).
The differences between pre and post periods of construction of the weir indicates that the temporal impact is more important than the spatial location impact, such as upstream (KS) or downstream (NJ).
The biplot of factor analysis for the hydro-meteorological factor is presented in Figure 7 (Figure 7(c)). This means that the weir construction introduces a notable similarity in the causality structure Numbers in parentheses are lag.
between the nutrient factors and algal blooms. The diversity of algal variation in the river system can be switched to a less variable nutrient process to algal bloom in a system similar to a reservoir. Furthermore, the differences between the upstream point (KS) and the downstream point (NJ) are reduced after the construction of the weir.   autocorrelation and periodicity produce a better explanation than the conventional approach.
So, we calculate the angle by using Equation (16) based on vector coordinates. Figure 8 presents angles between vectors from biplots using conventional CCFs representing temporal and spatial differences in various environmental factors (e.g., meteorological, nutrient, and water quality factors). Figure 8(a) indicates that the spatial difference between upstream and downstream of the weir is minor and the relationship due to nutrient is negligible. The location differences between Kwangsan and Najoo are not significant. Figure 8(b) shows that the temporal difference between upstream and downstream of the weir is significant for the meteorology factor. Except for the meteorology factor, the angle differences between pre and post weir construction are less than 20°degrees, which is small considering the maximum possible 180°degrees in a biplot. Figure 9 presents the angle differences for spatial and temporal perspectives of total, hydrometeorological, water quality, and nutrient factors. The angle differences between Kwangsan and Najoo (Figure 9(a)) are similar to those for Figure 8(a). However, the angle differences from the weir construction (pre and post) were substantial ( Figure 9(b)). The angles before the weir construction range from 32.9°to 81.3°, while those after the weir construction range from 9.4°to 15.2°. The angles after weir construction were more than two times smaller than the angles before weir construction for all environmental  Water temperature, oxygen related parameters, and phosphorus showed strong concurrent relationships to Chl-a concentration during the post river regulation period. The construction of the weir was responsible for significant causalities between hydro-meteorological components or nitrogen in negative lags and algae concentration. By taking spatial and temporal water system characteristics as factors in biplots, the introduction of factor analysis into the datasets of prewhitened cross-correlations can express the multi-relationship among the changes of water system characteristics clearly, while the traditional method fails to identify the impact of the weir in causality analysis.
According to the axes and coordinates in the factor analysis biplots, the impact of hydro-meteorological factors on algal blooming was smaller than that of the river regulation project. The weir construction introduces a substantial similarity in the causality structures between nutrient or water quality factors and algal bloom for both the upstream and downstream points. Evaluation of angle differences between biplot vectors of prewhitened CCFs provides quantitative insight between different environmental factors and Chl-a concentration.