Studies in water quality management have indicated significant relationships between land use/land cover (LULC) variables and water quality parameters. Thus, understanding this linkage is essential in protecting and developing water resources. This article extends the conventional geographical weighted regression (GWR) to a temporal version in order to take both spatial and temporal variations of such linkages into account, which has been ignored by many of the previous efforts. The approach has been evaluated for total nitrates and nitrites' concentration as the case study. For this, observations of 45 water quality sampling stations were examined in a time interval of 20 years (1992–2011), and the linkages between LULC variables and NO2 + NO3 concentration were extracted through Pearson correlation coefficient as a global regression model, the conventional geographic weighted regression, and the proposed spatio-temporal weighted regression (STWR). Comparing the results based on two global criteria of goodness-of-fitness (R2) and residual sum of squares (RSS) verifies that the simultaneous consideration of spatial and temporal variations by STWR substantially improves the results.
The quality of water resources has been continuously influenced by human activities (e.g., urbanization, industrialization, and cultivation) and natural phenomena (e.g., hydrological characteristics, atmospheric changes, soil erosion, and precipitation patterns) (Woli et al. 2004; Karimipour et al. 2005; Schoonover et al. 2005; Li & Zhang 2008; Tu 2011a, 2011b; Hao et al. 2013). Therefore, studying the effect of landscape on water quality and examining the relation between water quality parameters and land use/land cover (LULC) classes provides knowledge in order to monitor changes in the quality of water resources by considering LULC variables (Monteagudo et al. 2012; Fiquepron et al. 2013; Tu 2013; Giri & Qiu 2016; Kundu et al. 2017).
The linkages between LULC and water quality parameters have been examined by different researchers. Sliva & Williams (2001) investigated three catchments in the south of Ontario, Canada, and introduced urban land use as the most effective factor on the water quality. Ahearn et al. (2005) tested water quality condition of Cosumnes River in California by studying how LULC influences nitrate-N and total suspended solids load in dry seasons. In order to meet sustainable development of land use activities for protecting the Han River upstream source, Li et al. (2009) used correlation analysis, principal components analysis, and stepwise least squares multiple regression to determine the spatio-temporal variability of water quality variables and, in particular, their correlations with LULC in the 100 m riparian zone along the stream network. They found that the basin, in general, has a better water quality in the dry season than the rainy season, indicated by the primary pollutants including CODMn and nitrogen. Major ion compositions display large spatial and seasonal differences and are significantly related to land use and land cover in the riparian zone, while the riparian landscape could not explain most of the water quality variability in T, pH, turbidity, SPM, and CODMn. Seeboonruang (2012) statistically assessed the impact of land uses on surface water quality indexes. Finally, Mouri et al. (2011) investigated the effects of land cover and human impact on spatial and temporal variation in nutrient parameters in stream water.
Several methods have been suggested to examine the effect of LULC on the quality of water resources. For example, Schoonover et al. (2005) used multivariate statistical analyses such as principal component analysis (PCA) and factor analysis (FA). They found that nutrient and fecal coliform concentrations within watersheds with an impervious surface >5% often exceeded those in other watersheds during both base flow and storm flow. Also, fecal coliform bacteria in more urbanized areas often exceeded the US EPA's standard for recreational waters. Xian et al. (2007) and Monteagudo et al. (2012) studied the intended relationship through regression modeling. It is suggested, however, that the spatial autocorrelation and spatial non-stationary of water quality parameters must be taken into consideration (Chang 2008; Tu & Xia 2008). Based on Tobler's first law of geography (everything is related to everything else, but near things are more related (Tobler 1970)), water quality parameters in neighboring stations may be spatially correlated due to, for example, the effect of a certain pollution source (Tu & Xia 2008). The situation is even more complicated as spatial autocorrelation is varying in space (i.e., changes from place to place), which is called non-stationary (Pfeiffer et al. 2008). This complication can be partially handled by deploying local spatial regression models such as geographical weighted regression (GWR) (Miller 2012); however, due to the time-varying changes in the landscape and water quality parameters, the linkages between LULC and quality of water resources also change from time to time (Uuemaa et al. 2005). Therefore, it would be more appropriate to study such linkages within a long-term time interval (Dixon & Chiswell 1996; Halliday et al. 2012). In this case, time-related issues, e.g., temporal trend, cyclic movement, and seasonal movement must be taken into account by the model used (Han et al. 2006). Among others, the effect of temporal trend could be considered as an ascending or descending trend of a water quality specifications (i.e., concentrations, color, pH, conductivity, etc.); and the seasonal movement could be the effect of temperature and precipitation changes on water quality parameters (Kepner et al. 2004; Jarvie et al. 2008).
As a recent effort, Pratt & Chang (2012) tried to consider seasonal changes by separately analyzing the measurements of water quality parameters within a long-term time interval for wet and dry seasons. Although this can partially involve seasonal changes in the computations, the mentioned temporal issues were not fully incorporated in the regression model.
This research provides a local temporal-geographical regression model by extending the conventional GWR to a temporal version that involves all the temporal variations in the estimation model. We explain the proposed model and evaluate it for total nitrates and nitrites' concentration as the running case study. For this, the section below introduces the study area and materials, followed by a section describing the processing steps as well as the proposed model. The next section presents the results of evaluating the proposed model and compares them with the Pearson correlation coefficient (PCC) as a global regression model and the conventional GWR. Finally, conclusions and proposed ideas for future work are presented.
STUDY AREA AND MATERIALS
The study area is located in Washington State (Figure 1), which has a geographical extent of 45° 32–49° 00 N and 116° 57–124° 48 W, and is surrounded by Canada to the north, Oregon State to the south, Idaho State to the east, and the Pacific Ocean coast to the west. Washington is the eighteenth largest state in the USA with an area of 184.67 km2, more than 93% (172.45 km2) of which is covered by land and only about 7% (12.22 km2) is water basins.
The state of Washington has a mild wet climate due to the coastal winds and Cascade Mountain located to the west. As indicated by the WWDT (West Wide Drought Tracker) database and monthly average temperature of the state within an interval of 20 years (1992–2011), December (5.44 °C) and July (19.1 °C) are respectively the coldest and warmest months of year in Washington State. Furthermore, during this period of time, November (31.06 cm) and July (3.53 cm) have respectively the highest and lowest precipitation (Figure 2).
Water quality data
The pollutant sources of nitrite (NO2) and nitrate (NO3) of 45 sampling stations in the study area for a time period of 20 years (1992–2011) were acquired (Figure 3) from two databases: The observations of nine sampling stations were obtained from King County Water and Land Resources Database, which were collected in a supervision program under the title of King County Stream Monitoring Program (KCSMP; http://green2.kingcounty.gov/StreamsData/DataDownload.aspx). This program has existed since 1950 with the goal of supervising water quality of lakes and rivers in this area in order to ensure that surface water resources of King City are healthy, and water quality is improving. The observations for the other 36 sampling stations were obtained from the Department of Ecology of Washington State (http://www.ecy.wa.gov/programs/eap/fw_riv/rv_main.html) and collected with the aim of studying the water quality trend and conditions in Washington State.
In such an area with significant variations in precipitation and air temperature, it is essential to consider the changes in concentration of pollutant materials in water with variations of runoff and water flow (Kang et al. 2010). Therefore, the whole year was divided into four seasons (November to January, February to October, April to June, and July to September) based on the wettest and driest months by investigating the changes in atmospheric precipitation and their effects on concentration of NO2 + NO3 concentration (Figure 3(a)). Accordingly, the water quality data for each sampling station were classified into the four seasons (Figure 3(b)).
LULC data were provided by the National Oceanic and Atmospheric Administration (NOAA; http://csc.noaa.gov/ccapftp/) of the USA. This database belongs to NOAA's program, which plans to study and analyze the changes in LULC texture. It provides 25 LULC classes based on the satellite images with 30 m spatial resolution for the years 1992, 1996, 2001, 2006, and 2011, which are merged in this article into five classes of urban, agriculture, forests, grassland, and wetlands.
The catchment of each sampling station was delineated in ArcGIS 9.3 spatial analyst using the digital elevation model (DEM) of the study area, obtained from SRTM satellite images with the spatial resolution of 30 m. Moreover, an index for each LULC class in each of the 45 sampling stations was calculated (Table 1) as the percentage of the number of LULC pixels of each class to the total number of pixels in the catchment of the sampling station. As the water quality of each sampling station represents the water quality of its catchment, these indices can be used to examine the relationships between LULC classes of catchment and their water quality (Mehaffey et al. 2005; Yang & Jin 2010).
Finally, LULC was calculated for each year through a plain linear interpolation (Figure 4). This enables us to consider time dimension in the model, as well as to study the changes of the LULC classes of sampling stations' catchments.
PROCESSING AND MODELING
Common statistical analyses, e.g., ordinary least square regression, usually presume that observations are independent, which is not usually the case for spatial data due to their spatial autocorrelation and non-stationarity (Davidson & MacKinnon 2004; Monteagudo et al. 2012). Therefore, a number of processings are needed to examine the data against such special behaviors.
Spatial processings determine the correlations between sampling stations (e.g., autocorrelation) and examine the variations of these correlations in the study area (i.e., spatial non-stationary), which has to do with the spatial data and the parameters of the spatial model. In such a situation, deploying a local spatial statistical method, instead of a global one, improves the results and enables a better interpretation (Pfeiffer et al. 2008; Tu & Xia 2008; Tu 2011; Miller 2012; Pratt & Chang 2012).
Temporal processings identify the temporal variations of NO2 + NO3 concentration. Generally, temporal variations have four components of time trend (T), cyclic changes (C), seasonal changes (S), and irregular changes (I). The trend component shows the general direction of the time-series within a time interval, and can be estimated using a trend line or curve. The long-term fluctuations of this line or curve indicate cyclic changes, which can happen periodically. Seasonal changes are systematic or calendar movements, and irregular changes of time series are disordered changes considered as random events (Han et al. 2006). Experimentally, the temporal variable is assigned as a multiplication of T, C, S, and I (Han et al. 2006).
In the case of spatial autocorrelation in a time series data, the time series may be non-stationary. Similar to the spatial processings, such conditions violate the assumptions of common statistical analysis and hence disrupt the models (Dougherty 2007). Intuitively, a stationary time series has no specific trend in variations of its means and variances, and contains no cyclic changes (Chatfield 2003).
If the seasonal index is smaller than 1, it means that the amount of pollutant source is below the annual average of the observations. On the other hand, if the seasonal index is greater than 1, the amount of pollutant source is above the annual average of the observations.
Temporal extension of the geographical weighted regression
In estimating regression coefficients, which reflect how the LULC classes affect water quality parameters, GWR (as a local spatial regression) considers spatial autocorrelation of measurements by assigning weight to observations based on their distance from each other. On the other hand, it allows a better understanding of spatially varying relations by representing local coefficients while minimizing the variances at each station (Brunsdon et al. 1996; Fotheringham & Brunsdon 2003). Despite the ability of GWR in estimating spatially varying relationships between landscape and water quality parameters (Tu 2011a, 2011b; Pratt & Chang 2012), it lacks attention to temporal components such as temporal trend and seasonal variations in water quality observations, which are forms of temporal non-stationarity, and leads to misleading or even wrong results (Burt et al. 2009).
This research follows the framework and general principles of GWR and considers regression coefficients as polynomials of time, based on which the temporal components of the water quality data are modeled. Therefore, by extending GWR to time dimension, the extracted relations simultaneously take into account the spatial and temporal dynamisms of water quality data. The extended model is called spatio-temporal weighted regression (STWR).
Hence, the relations between time differences of dependent and independent variables are investigated.
Similar to the conventional GWR, local statistics of -student and (goodness of fit) can be calculated here, thereby interpreting the relations.
In order to manage the time components, variables in Equation (7) are considered as a polynomial, whose terms are determined by considering the existent temporal components of the water quality dataset.
RESULTS AND DISCUSSION
This section presents the evaluation results of the STWR model proposed in the above section, which was implemented in Matlab 2011, and compares them with the PCC as a global regression model, and the conventional GWR. The illustrations are produced by ArcGIS and SPSS 17.
Spatial and temporal variations of NO2 + NO3
As Figure 5(a) shows, spatial variations of NO2 + NO3 concentration reveal similarities in pollutant sources of close monitoring stations. In addition, pollution caused by NO2 + NO3 concentration is more distributed in central areas. On the other hand, similarity of the LISA index indicates some clusters in the central area (Figure 5(b)). Finally, Moran's index of 0.41 confirms spatial autocorrelation and similar behavior in adjacent stations.
In terms of temporal variations in NO2 + NO3 concentration, after an upward trend in the early years, NO2 + NO3 concentration shows a gradual depletion (Figure 6(a)). In addition to this variation trend, a specific seasonal variation pattern can be seen in the study area, in which the range of variations is obtained based on Equation (5) for the whole area (Figure 6(b) and 6(c)); however, because of the effects of spatial variations, temporal trend and seasonal variations have been separately investigated for each catchment. According to Figure 6(d), most stations located in the west of the study area have an ascending trend. These stations are related to catchments that have undergone a growth in deforestation and agricultural activities in recent years (Figure 4(b) and 4(c)).
Spatial and temporal changes in the landscape
In order to study the spatial and temporal changes in the landscape, LULC indices of the first and last year were used. To better understand these variations, landscape was represented as a continuous surface using spatial analysis tools in ArcGIS 9.3.
Regarding Table 1, spatial and temporal variations in the LULC index are quite evident (Figure 4). Among these changes, urban land use with 0.31% to 74.46% (SD = 22.71) in 1992 and 0.40% to 81.91% (SD = 24.91) in 2011 have the highest spatial variations. Next are the forest lands with variations from 4.05% to 85.05% (SD = 21.30) in 1992 and 4.92% to 84.58% (SD = 44.66) (Figure 4(c)). A degree of spatial variations is also observed in grassland and agricultural lands.
According to Figure 4, central and eastern areas have greater urban density, with urban as the dominant type of land use (Figure 4(a)); however, by moving further from the central and eastern parts, the densities of agricultural and forest lands are increased (Figure 4(b) and 4(c)). Grasslands are mostly located in the northern parts of the region (Figure 4(d)). Finally, the wetlands that have less spatial variations than other classes are slightly denser in western and eastern territories.
Similar to spatial variations, temporal variations can also be detected in the landscape texture of the catchments. Except for agriculture land use, other LULC variables have changed between 1992 and 2011, and the rate and range of these variations varies for different classes and stations. During this time period, urban and grassland land uses have increased (Figure 4(a) and 4(d)). In contrast, the extent of forests and wetlands have decreased (Figure 4(c) and 4(e)). Agricultural activities have also been expanded during recent years, but with a lower rate compared to variations in other classes (Figure 4(b)).
Spatio-temporal linkage between land use/cover and NO2 + NO3 concentration
We first extracted the overall relations between NO2 + NO3 concentration and five LULC classes of urban, agricultural, forests, grassland and wetlands using the PCC as a global regression model – which is calculated between LULC indices in 2001 and the mean of NO2 + NO3 concentration through the entire time interval (1992 to 2011). The -values (Table 2) imply the existence of a significant positive correlation between urban land use and NO2 + NO3 concentration and significant negative correlation between grasslands and NO2 + NO3 concentration. Nevertheless, the relations between other LULC classes (agriculture, forests, and wetlands) and NO2 + NO3 concentration are not significant.
Then, the proposed STWR model is deployed to examine the intended linkages, whose results were evaluated through two local criteria of ‘goodness-of-fit index’ and ‘-student test statistic’ at 95% level of confidence (Figures 7 to 11).
Although PCC indicates (Table 2) a significant positive correlation between urban land use and NO2 + NO3 concentration (, ), STWR (Figure 7) improves this and shows that this linkage could be locally positive or negative. It means that this linkage is spatial- and time-dependent. Nevertheless, the results are less reliable in the central areas, which have greater urban land use density (Figure 4(a)). Moreover, by examining the significance of the relations using the -student (Figure 7(b)), there is significant negative correlation between urban land use and NO2 + NO3 concentration in most stations.
Although PCC indicates (Table 2) no significant relation between agricultural land use and NO2 + NO3 concentration (, ), STWR (Figure 8) shows that this linkage could be locally positive or negative. Moreover, the results are more reliable in the south, west, and part of the east (Figure 8(a)), where agricultural texture is denser (Figure 4(b)). This is along with the results of the t-student test of significance (Figure 8(b)): southern and western stations that are located in agricultural lands, have significant positive relations with NO2 + NO3 concentration, while in other areas, due to less density of agricultural land use, negative significant relation is obtained.
Spatial variations pattern of the linkage between forest land cover and NO2 + NO3 concentration is very similar to the agriculture land use. Although PCC (Table 2) shows no significant relation between forest lands and NO2 + NO3 concentration (, ), STWR (Figure 9) indicates a considerable spatial non-stationarity, such that this linkage could be locally positive or negative. In addition, according to Figure 9(b), most of the stations in the area are positively correlated with NO2 + NO3 concentration, which means that growth in the forest land cover causes an increase in NO2 + NO3 concentration; however, in the southern areas, where agricultural activities have increased in recent years (Figure 4(b)), the relationship between forest land cover and NO2 + NO3 concentration is significantly negative.
According to Table 2, PCC indicates significant negative correlation between grassland cover and NO2 + NO3 concentration (, ); however, STWR (Figure 10) shows that this linkage, from place to place, could be significantly positive, negative or non-significant. Especially, in the northern area, where grassland cover is denser (Figure 4(d)), this linkage is significantly negative, meaning that with growth in grassland cover, NO2 + NO3 concentration decreases. On the other hand, in southern areas, where density of grassland cover has been declining due to agricultural activities in recent years (Figure 4(b)), there is a significant positive linkage between grassland cover and NO2 + NO3 concentration. Again, there is high spatial non-stationary both in the model predictability and in the linkage significance.
Finally, the PCC (Table 2) finds no significant linkage between wetlands and NO2 + NO3 concentration (, ). STWR (Figure 11) yields the same result. Although the area of wetlands in the central strip has decreased in recent years (Figure 4(e)), no significant linkage was detected with NO2 + NO3 concentration (Figure 11(b)).
In general, considerable spatial and temporal non-stationary is evident in all of the above cases, which is the result of changes in the LULC texture and effects of adjacent stations on each other. This confirms that the proposed local STWR model provides more reliable statistical results compared to a global regression model (Fotheringham & Brunsdon 2003; Tu 2013).
We also extracted the intended linkages using the conventional GWR and compared the results with the STWR model through two global indicators of goodness-of-fitness () and residual sum of squares () (Tables 3 and 4), which indicate improvement in the model quality. For GWR the global was calculated by mean of entire time period and LULC index of 2001, while for STWR the estimated regression coefficients are obtained based on the data for the time period of 1992 to 2010. Moreover, since the proposed model uses observation of the entire time period, the global measure (which is the square of differences between predicted and observed values) was used to compare the predictability of the model (Table 4). Again, for GWR the predicted values for NO2 + NO3 concentration were obtained from the observations of the year 2011 (as the middle year), while for STWR the observations of the entire time period of 1992 to 2010 were used (Chatterjee & Hadi 2013).
CONCLUSIONS AND FUTURE WORK
This article extended a temporal version of the conventional GWR to better examine the linkage between LULC and water quality parameters. The model was evaluated through the observations of NO2 + NO3 concentration in 45 sampling stations for a 20-year period of time. The results showed that by incorporating time dimension, more reliable linkages for NO2 + NO3 concentration are extracted, which are both location- and time-dependent; however. this would need to be investigated with other water quality parameters in the future.
The water quality parameters and LULC indices may vary in short or long time intervals. For example, as it is investigated here, seasonal variations of NO2 + NO3 concentration have shorter time intervals than its temporal trend variations; or agricultural activities reduce in cold seasons as short-term changes, while urbanization and deforestation are long-term and mostly irreversible changes. As the results indicate, spatial variations as well as short- and long-term temporal changes affect the extracted linkages.
Dependency of the linkage between LULC and water quality parameters to scale is an important issue to be considered (Buck et al. 2004; Uuemaa et al. 2005) because it specifies the areas to be examined (Pratt & Chang 2012). In other words, as LULC may change in different scales, the effect of LULC on water quality varies with the scale (Zhou et al. 2012). Therefore, a multiscale method is suggested to identify the pollutant source (Chang 2008; Pratt & Chang 2012).