Abstract
A small-scale water harvesting structure known as a sand dam has gained popularity across East Africa due to the efforts of non-governmental organizations. A sand dam is a subsurface water reservoir that stores water between sand grains. Stored thus, the water is filtered and protected from evaporation. This study uses remotely sensed data to investigate the impact of these structures on water storage and vegetative growth. The relationship between sand dams and water storage was modeled using a binary sand dam factor, climate data from the Famine Early Warning Systems Network Land Data Assimilation System (FLDAS), and water storage data measured by the Gravity Recovery and Climate Experiment (GRACE) twin satellites. The analysis revealed that GRACE largely fails to detect a statistically significant impact of sand dams on regional water storage. However, analysis of the Normalized Difference Vegetation Index (NDVI) indicated that sand dams have a significant impact on regional vegetation. Vegetative growth is correlated with groundwater levels, indicating that sand dams have a positive impact on water storage albeit on a smaller scale than GRACE can regularly detect. Significantly, this study shows that NDVI data can be used effectively to study small-scale, regional changes in vegetation and water storage.
HIGHLIGHTS
The impact of small-scale water harvesting structures on regional water resources is explored through spatial generalized linear models of remote sensing products.
Sand dams increase regional vegetation, which suggests groundwater levels are also positively impacted.
Sand dams may have a smaller impact on groundwater levels than believed.
NDVI and, to a lesser extent, GRACE record the impact of sand dams.
INTRODUCTION
One of the world's grand challenges, addressing the problem of water insecurity, is particularly problematic in developing nations where resources and infrastructure are limited. This is especially true in poorly connected rural communities, where women and children might spend 3–4 h a day collecting water (Henry et al. 2015). A single solution to water security in the developing world does not exist, due to variability in climate, hydrology, and geology. Each solution must be adapted to the unique requirements of the land and local community. Such solutions are johads, or small earthen dams, in India (Gupta 2011), fog-harvesting meshes in Peru and Chile (Qadir et al. 2018), and sand dams in sub-Saharan Africa (Borst & de Haas 2006). Despite the wide-ranging implementation of small-scale solutions to water security, very few studies have examined their effectiveness and ability to increase water availability.
Effective small-scale solutions to water security are essential to the maintenance of rural communities in developing countries. Rural communities are particularly vulnerable to climate change, and current climate projections indicate that rainfall will either decrease or be concentrated in fewer, higher intensity events in the developing world (Reyer et al. 2017). Small-scale water harvesting technologies are increasingly recognized as a viable solution to rural water security and climate change. However, most efforts to spread or intensify water harvesting technologies have limited success (Bouma et al. 2016), despite the great deal of time and money spent bringing water harvesting projects to fruition. For example, one sand dam constructed with volunteer labor takes 3 weeks to build and costs an average of 12,000 USD (Lasage et al. 2008; Lasage & Verburg 2015). Despite this infusion of resources, approximately 50% of sand are not functioning properly (Viducich 2015; de Trincheria et al. 2018). The widespread issues with sand dam effectiveness are a relatively new revelation (de Trincheria et al. 2015, 2018; Viducich 2015; Eisma & Merwade 2020), and similar issues may pervade other well-known water harvesting technologies.
Few studies have assessed the effectiveness of small-scale water harvesting technologies in rural communities. Thus far, evaluation of water harvesting has primarily been performed via simulation (Hut et al. 2008; Quilis et al. 2009; Lasage et al. 2015), field studies and surveys (Ngigi et al. 2005; Previati et al. 2010; Quinn et al. 2019; Eisma & Merwade 2020), meta-analysis of the existing literature (Lasage & Verburg 2015; Bouma et al. 2016), or multi-criteria analysis of potential projects (Jaber & Mohsen 2001; Garfì et al. 2011). While such techniques can offer valuable information regarding the effectiveness of specific water harvesting projects, their ability to provide a broad perspective is limited. Conversely, an assessment methodology based on remotely sensed data has the potential to provide insight unhindered by site accessibility, cost, and selection bias. Such a methodology must match a quantifiable expected impact of water harvesting structures with a remote sensing product that records that same impact. For example, sand dams supposedly raise the groundwater table (Borst & de Haas 2006; Excellent Development 2019). If this claim is true, a high density of sand dams should have a measurable impact on regional groundwater levels.
No studies assess the impact of water harvesting structures on regional water availability. This study aims to develop a methodology to assess the ability of water harvesting structures to improve regional water availability. The methodology is developed and applied to study whether sand dams in south-eastern Kenya are functioning as reported. Considering that most community-based water harvesting structures exist in developing nations with limited, publicly available local data, the methodology in this study is developed using globally available remotely sensed data to enable broader applicability in other regions. While this study uses sand dams in Kenya as a test case, the methodology presented here can be applied globally to assess the impact of local water harvesting schemes on regional water resources. For example, sand dams function as a type of managed aquifer recharge, and this methodology could be applied to study the effectiveness of various managed aquifer recharge schemes (Standen et al. 2020). Coastal urban centers and reclaimed islands have recently begun using sandy deposits to develop fresh groundwater lenses in a process similar to that of sand dams (Yao et al. 2019), which may also be examined by the methodology outlined here.
Sand dams
Sand dams are small dams built atop impermeable riverbeds in arid regions with infrequent, high-intensity rainfall (see Figure 1). The high-intensity rain washes sand overland where it builds up behind the dam. After a few rainy seasons, the sand begins storing water, where it is naturally filtered, protected from evaporation, and helps raise the groundwater level due to recharge from increased subsurface storage (Borst & de Haas 2006; Hut et al. 2008; Quilis et al. 2009).
Intensified sand dam construction began in Kenya in 1979, and estimates today suggest Kenya is home to over 3,000 sand dams (Viducich 2015; de Trincheria et al. 2018). Myriad NGOs have largely constructed sand dams with private funds and volunteer labor from local communities. The de-centralized sand dam construction effort has led to a dearth of publicly available information, with few NGOs providing details of their sand dam projects. Most sand dam information comes from a handful of researchers investigating one or two ‘ideal’ sand dams (Borst & de Haas 2006; Aerts et al. 2007; Hut et al. 2008; Quilis et al. 2009). Researchers have not yet examined whether sand dams are impacting the environment at a regional scale.
Most of Kenya's 3,000+ sand dams are in three counties: Kitui, Makueni, and Machakos (de Trincheria et al. 2015), hereafter referred to as the sand dam counties (SDCs). This study hypothesizes that the additional water storage provided by the high density of sand dams in the SDC leads to a higher regional groundwater table and increased vegetative cover. One study finds that a sand dam can influence the groundwater level up to 350 m upstream and downstream of the dam (Quilis et al. 2009); however, a sand dam's area of influence is also reported to be up to 2 km upstream of the dam and 500 m to each side of the stream (Ryan & Elsner 2016). Given the 3,000+ sand dams in the SDC, anywhere from 3.4 to 17.2% of the surface area in the SDC may be impacted. The sand dam's large area of influence should be detectable via remote sensing.
STUDY AREA AND DATA
Study area
The SDC in south-eastern Kenya (see Figure 2) cover 34,670 km2. This study compares water storage changes in the SDC to those in a buffer area of nearly equal size (34,537 km2). Annual rainfall within the SDC is approximately 700 mm compared to 750 mm in the buffer area (Fick & Hijmans 2017). Elevation in the SDC ranges from 246 to 1,803 m above sea level with a mean elevation of 830 m (Fischer et al. 2008). Similarly, elevation in the buffer area ranges from 180 to 1,991 m with an average of 853 m (Fischer et al. 2008). Most SDC soil (92.4%) is well-drained sandy clay loam (Ulsaker & Kilewe 1983; Hengl et al. 2017). The buffer area is also primarily sandy clay loam but has 27.0% sandy loam (Hengl et al. 2017). The SDC and buffer areas are mainly covered with natural and semi-natural terrestrial vegetation, 77.0 and 72.8%, respectively. Both areas are 21.1% cultivated terrestrial (FAO 2002).
Data
Gravity Recovery and Climate Experiment
Gravity Recovery and Climate Experiment (GRACE) measures Earth's gravity field via twin satellites (Tapley et al. 2004). Changes in water storage can be derived from changes in the Earth's gravity field, because most gravity field fluctuations result from shifting water volumes. Therefore, the GRACE data provides an estimate of the total water stored from the Earth's core to the Earth's surface. The satellites record the gravity field at a resolution ranging from 400 to 40,000 km every 30 days (Tapley et al. 2004). The mission, started in March 2002, exceeded its planned lifetime of 5 years and remained operational until October 2017. The GRACE dataset is nearly continuous with some months missing from 2011 onwards due to battery management efforts (Herman et al. 2012). The raw data from GRACE is processed by three research centers: Center for Space Research (CSR) at the University of Texas at Austin, Jet Propulsion Laboratory (JPL) at California Institute of Technology, and GeoforschungsZentrum (GFZ) Potsdam in Germany (Tapley et al. 2004). The centers each produce monthly spherical harmonic gravity models that JPL converts to gridded data of the change in total water storage relative to a time-mean baseline from 2004 to 2009 at a resolution of 1°, or approximately 111 km. For the study area, the GRACE data has an estimated error of approximately 3.8 cm (Landerer & Swenson 2012).
GRACE data has been used successfully in many studies to explore general trends in water storage at a variety of scales: 45,000 km2 (Hachborn et al. 2017), 55,000 km2 (Henry et al. 2011), 766,000 km2 (Huang et al. 2012), and 2,200,000 km2 (Cao et al. 2015) up to the global scale. GRACE data may contain signal leakage errors, but leakage errors are assumed to be homogenous at the small scale of this study (Chinnasamy & Ganapathy 2018). A GRACE Follow-On mission began in May 2018, providing an opportunity for future studies to apply the methodology developed here with updated and improved data. This study attempts to discern the consequences of man-made structures on water storage by analyzing trends in the GRACE total water storage data.
Normalized Difference Vegetation Index
Normalized Difference Vegetation Index (NDVI) is calculated from spectral bands recorded by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument every 1–2 days. Vegetation absorbs most of the blue (470 nm) and red (670 nm) wavelengths while reflecting most of the near-infrared radiation. The contrast between the absorbed and reflected wavelengths is manipulated to calculate a grid of NDVI values for Earth's entire surface (Didan et al. 2015). NDVI, ranging from negative one to one, serves as an indicator of the vegetation present on the land surface. Values close to one indicate the presence of thriving vegetation, whereas values from negative one to zero indicate the presence of bare ground or dead vegetation. MODIS NDVI grids are available in 16-day and monthly increments at resolutions ranging from 250 m to 1 km. For the purposes of this study, monthly MODIS NDVI at a resolution of 1 km is used (Didan 2015).
Famine Early Warning System Network Land Data Assimilation System
Famine Early Warning System Network Land Data Assimilation System (FLDAS) is a set of models designed to provide accurate climate estimates for the purpose of drought monitoring in data-sparse regions susceptible to food and water security issues (McNally et al. 2017). FLDAS currently provides daily and monthly estimates consisting of 25 different variables for sub-Saharan Africa. Research-quality FLDAS simulation C results are used for this study (Anderson et al. 2012; Yilmaz et al. 2014; McNally et al. 2017). Simulation C uses the Climate Hazards group Infrared Precipitation with Stations (CHIRPS) and the Modern Era Reanalysis for Research and Applications version 2 (MERRA-2) as forcing data (McNally et al. 2017). FLDAS provides simulation C outputs for the Noah Land Surface Model (Noah) at a resolution of 0.1° and Variable Infiltration Capacity (VIC) at a resolution of 0.25° (NASA/GSFC/HSL 2016).
METHODS
The methodology included here outlines two techniques for quantifying the impact of sand dams: one using GRACE total water storage and one using NDVI. Both techniques are based on developing a generalized linear model that seeks to define the relationship between the presence of sand dams and a magnitude of change in GRACE data or NDVI data. In both instances, covariates such as weather, time of year, and population are included to ensure that the observed changes in GRACE and NDVI can be attributed to sand dams. A significant (p < 0.05) coefficient for the sand dam indicator variable indicates that the sand dam does impact total water storage or vegetation.
Relating GRACE water storage to the presence of sand dams
Processing GRACE
GRACE monthly mass grids are produced by JPL from monthly GRACE spherical harmonic gravity models and independently generated by CSR, JPL, and GFZ. The monthly mass grids have been processed and are provided relative to the time-mean baseline of the gravity field from 2004 to 2009, inclusive (Swenson & Wahr 2006; Swenson et al. 2008; Cheng et al. 2011; Geruo et al. 2013). The resultant product is a monthly grid of the equivalent water depth of total water storage relative to a time-mean baseline. Variations in the monthly mass grids are attenuated at small scales due to filtering that occurs during processing. To restore the lost energy, a grid of multiplicative scale factors is applied to the GRACE data. The scale factors minimize the difference between the filtered and unfiltered variations in total water storage (Landerer & Swenson 2012).
The three available GRACE solutions (CSR, JPL, and GFZ) have slightly different error structures. Rather than select one solution to analyze, the ensemble mean of the three solutions was used to minimize noise in the scatter of the solutions by 5–10 mm root mean square (Sakumura et al. 2014).
The GRACE solutions were obtained for months available from the beginning of the GRACE mission to the end of 2016: April 2002 to December 2016 (TELLUS 2012). During this time, GRACE data is unavailable for 21 of the potential 177 months, leaving 156 months available for this analysis. Eighteen of the 21 months of missing data occurred during battery management efforts from 2011 to 2017 (Herman et al. 2012).
The Level-3 GRACE data has a resolution of 1°, while the FLDAS dataset, employed as a proxy for climate data in this research, has a resolution of 0.1°. The GRACE data was resampled using bilinear interpolation to a resolution of 0.1°, or approximately 11 km by 11 km, to allow for a one-to-one relationship between the two datasets (Cao et al. 2015).
Processing FLDAS for GRACE
As mentioned above, the GRACE data provides total water storage values relative to a time-mean baseline from 2004 to 2009, inclusive. To create a comparable FLDAS dataset, the gridded average value from 2004 to 2009 was calculated for each climate variable and subtracted from the raw data.
Validating FLDAS for the study area
Equation (3) can be used to test the validity of substituting FLDAS data for climate data in the study area. FLDAS includes the data necessary to solve Equation (3) for total water storage, which can then be compared with the GRACE-provided total water storage in the region. FLDAS data includes evapotranspiration, storm surface runoff (Qs), baseflow-groundwater runoff (Qsb), and precipitation.
Generalized linear model for GRACE
To determine whether the 3,000+ sand dams built in the SDC have a collective impact on groundwater storage at a regional scale, monthly generalized linear models (GLMs) were developed using the GRACE data as the dependent variable. As predictor variables, the GLM includes initial water storage, sand dam presence, and a suite of climate variables via principal components. The principal component analysis (PCA) was performed on 15 FLDAS climate variables: evapotranspiration, specific humidity, storm surface runoff, baseflow-groundwater runoff, total precipitation rate, soil moisture 0–10, 10–40, 40–100, and 100–200 cm underground, soil temperature 0–10, 10–40, 40–100, and 100–200 cm underground, near-surface air temperature, and near-surface wind speed. A sensitivity analysis conducted via stepwise regression indicated the contribution of each predictor variable to the overall adjusted R2 (Wang et al. 2016).
Principal component analysis
The Kaiser–Guttman criterion was used to determine the appropriate number of principal components to retain such that information-loss is minimized. The Kaiser–Guttman criterion dictates that every principal component with an eigenvalue greater than unity is retained. While this is not the most sophisticated technique available, the Kaiser–Guttman criterion is sufficient for datasets with a high communality (Yeomans & Golder 1982).
Finalized GRACE model
To develop a set of training and validation data for each monthly GLM, 43 or 44 points were randomly sampled from each available month (e.g. 43 or 44 points from each December for a total of 564 points). The sampled points were randomly split: 50% for a training set and 50% for validation. The coefficients of Equation (5) were determined through statistical analysis for every GLM. One GLM model was determined for every month of the year and validated via cross-validation.
Relating MODIS NDVI to the presence of sand dams
Processing MODIS NDVI and FLDAS
MODIS NDVI, hereafter NDVI, was obtained for the same period as the GRACE total water storage data, April 2002 to December 2016. Invalid NDVI data were omitted, and the data were scaled by 0.0001 (Didan et al. 2015). NDVI has a resolution of 1 km, so the FLDAS dataset was resampled using bilinear interpolation to a resolution of 1 km to allow for a one-to-one relationship with NDVI.
Generalized linear model for NDVI
To determine whether the 3,000+ sand dams built in the SDC have a collective impact on vegetation, a GLM was developed using NDVI data as the dependent variable. As predictor variables, the GLM includes sand dam presence, month of year, standardized population count, a suite of climate variables via principal components, and an autocovariate to account for spatial autocorrelation. Like the GRACE model, a sensitivity analysis of predictor variables will be conducted via stepwise regression.
Spatial autocorrelation
Analyzing spatial data introduces the issue of spatial autocorrelation. Spatial autocorrelation occurs, because a point is likely more similar to points nearby than distant points (Tobler 1970). This phenomenon produces residuals of statistical analyses that violate the assumption of independently and identically distributed errors, which may cause incorrect rejection of a true null hypothesis (Anselin 2002). Spatial autocorrelation can be detected via the calculation of Moran's I (Dormann et al. 2007). If Moran's I is statistically significant and/or if the Moran's I plot depicts a strong trend, then spatial autocorrelation may be an issue.
Finalized NDVI model
The training and validation datasets for the NDVI analysis were developed similarly to the GRACE training and validation datasets. However, the higher resolution of the NDVI data provides for a much denser network of points. 457 or 458 points were randomly sampled from each month, and the sampled points were randomly split in half into training and validation sets.
RESULTS
Sand dam impact on total water storage
GRACE water storage anomalies in the SDC and buffer area are similar. The mean (p = 0.92) and variance (p = 0.49) of the water storage anomalies in the two areas are not statistically different. However, GRACE water storage anomalies exhibit a significant difference between the SDC and buffer area stored water for individual months (see Figure 3). Overall, 60 of the 156 available months display a significant difference in the water storage anomalies of the two regions. Interestingly, a significant difference is most often observed as the region shifts from the dry season into the wet season. This may be a signature of the water retained in sand dams at the end of the dry season, but climatic differences must also be considered.
The validity of using FLDAS data as a proxy for climate data is determined by comparing FLDAS water storage anomalies calculated via Equation (3) with GRACE water storage anomalies in the SDC. The smoothed results (see Figure 4) indicate that the two datasets provide similar information regarding the magnitude and direction of water storage anomalies in the SDC. The FLDAS water storage anomalies largely fall within or near the GRACE error estimated by Landerer & Swenson (2012). The most glaring exception occurs from 2013 to 2014, likely caused by the battery management data gaps in GRACE.
The GLM designed in Equation (5) indicates that the SDC store a significantly different amount of water than the buffer area during 3 months of the year, and that this difference can likely be attributed to the presence of sand dams (see Table 1). The sand dam indicator variable is a significant factor for the months of January (p = 0.025), April (p = 0.037), and November (p = 0.015) while considering the confounding factors of climate and initial water storage. Interestingly, these 3 months showed relatively low fractions of significance in Figure 3, indicating that the difference between total water storage in the SDC and buffer area cannot be attributed solely to climate and initial conditions. In January, sand dams collectively decrease the amount of total water storage by 4.3%, while in April and November, sand dams increase the amount of total water storage by 4.6 and 3.6%, respectively. January, the first month of a dry season, may see lower levels of total water storage in the SDC than in the buffer area due to increased evapotranspiration potential resulting from added water storage during the preceding rainy season. This increased rate of evapotranspiration may reduce the amount of water stored in the SDC to levels below those of the buffer area. April and November, the wettest months of the two rainy seasons, see increased levels of water storage in the SDC.
. | Sand dam indicator variable (SD) . | . | |||
---|---|---|---|---|---|
Model . | Coefficient . | Standard error . | t-statistic . | p-value . | Model adjusted R2 . |
January | −0.043 | 0.02 | −2.26 | 0.025* | 0.68 |
February | −0.003 | 0.02 | −0.18 | 0.858 | 0.43 |
March | 0.019 | 0.01 | 1.50 | 0.134 | 0.61 |
April | 0.046 | 0.02 | 2.10 | 0.037* | 0.76 |
May | 0.027 | 0.01 | 1.85 | 0.066 | 0.79 |
June | −0.005 | 0.01 | −0.31 | 0.754 | 0.75 |
July | 0.005 | 0.01 | 0.34 | 0.731 | 0.73 |
August | −0.020 | 0.01 | −1.37 | 0.171 | 0.69 |
September | −0.001 | 0.01 | −0.08 | 0.937 | 0.76 |
October | 0.006 | 0.01 | 0.57 | 0.570 | 0.79 |
November | 0.036 | 0.01 | 2.45 | 0.015* | 0.83 |
December | −0.005 | 0.02 | −0.21 | 0.836 | 0.62 |
. | Sand dam indicator variable (SD) . | . | |||
---|---|---|---|---|---|
Model . | Coefficient . | Standard error . | t-statistic . | p-value . | Model adjusted R2 . |
January | −0.043 | 0.02 | −2.26 | 0.025* | 0.68 |
February | −0.003 | 0.02 | −0.18 | 0.858 | 0.43 |
March | 0.019 | 0.01 | 1.50 | 0.134 | 0.61 |
April | 0.046 | 0.02 | 2.10 | 0.037* | 0.76 |
May | 0.027 | 0.01 | 1.85 | 0.066 | 0.79 |
June | −0.005 | 0.01 | −0.31 | 0.754 | 0.75 |
July | 0.005 | 0.01 | 0.34 | 0.731 | 0.73 |
August | −0.020 | 0.01 | −1.37 | 0.171 | 0.69 |
September | −0.001 | 0.01 | −0.08 | 0.937 | 0.76 |
October | 0.006 | 0.01 | 0.57 | 0.570 | 0.79 |
November | 0.036 | 0.01 | 2.45 | 0.015* | 0.83 |
December | −0.005 | 0.02 | −0.21 | 0.836 | 0.62 |
*Significance at p < 0.05.
The sensitivity analysis revealed that had the largest impact on the monthly adjusted R2, contributing an average of 0.61. The remaining five predictors have a much smaller average influence on adjusted R2, listed from highest to lowest: (0.045), (0.013), (0.013), (0.006), and (0.001). While the categorical sand dam indicator variable contributes little to overall model adjusted R2, the variable is sometimes a significant predictor (see Table 1).
Overall, the monthly GLM models are a good fit, with adjusted R2 values ranging from 0.43 to 0.83 and averaging 0.70. In the SDC, the inclusion of the sand dam indicator variable generally results in an improved normalized mean square error (NMSE, see Figure 5). However, the inclusion of the sand dam indicator variable results in a slightly higher NMSE for the southern and northeastern portions of the SDC. These areas likely have fewer functioning sand dams or a lower density of sand dams, reducing the water storage signal. Alternatively, the southern and northeastern regions of the SDC may not show improvement due to aliasing of the GRACE signal (Winsemius et al. 2006) or possibly signal attenuation during GRACE pre-processing (Landerer & Swenson 2012).
Sand dam impact on vegetation
NDVI in the SDC and buffer area is similar in magnitude, but the mean (p = 0.001) and variance (p = 0.001) of NDVI in the two areas are statistically different. Sand dams may collectively have a significant impact on vegetation, but the relationship must be examined in conjunction with confounding factors.
The spatial autocorrelation in NDVI could cause issues with model validity and interpretability. The autocovariate function included in Equation (6) accounts for most of the spatial autocorrelation in the NDVI data (see Figure 6). Including an autocovariate in the model reduces Moran's I of the residuals from 0.4559 to 0.0419, more than a 90% decrease in spatial autocorrelation. A Monte Carlo simulation of Moran's I for the model with an autocovariate indicated that spatial autocorrelation is not an issue (p = 0.01).
The GLM designed in Equation (7) indicates that the SDC has significantly more vegetation than the buffer area, and that this difference can likely be attributed to sand dams. The sand dam indicator variable is significant (p < 0.001) when the GLM accounts for variations in climate, time of year, and population (see Table 2). The presence of sand dams increases NDVI by 0.07 standard deviations. The confounding factors included in the GLM also significantly impact NDVI. The amount of water present in the atmosphere, represented by PC2, has the greatest positive impact on NDVI; for every unit increase in PC2, NDVI increases by 0.19 standard deviations. The month of October has the greatest negative impact on NDVI–NDVI is 0.87 standard deviations lower in October than in January. October falls right at the end of the dry season and the beginning of the rainy season, so the vegetation is expected to be in poor condition in October.
. | Coefficient . | Standard error . | t-statistic . | p-value . |
---|---|---|---|---|
Intercept | 0.32 | 0.01 | 30.45 | <0.001* |
PC1 | −0.11 | 0.00 | −77.66 | <0.001* |
PC2 | 0.19 | 0.00 | 83.64 | <0.001* |
PC3 | −0.13 | 0.00 | −41.02 | <0.001* |
PC4 | −0.14 | 0.00 | −48.55 | <0.001* |
A | 0.08 | 0.00 | 138.13 | <0.001* |
January | 0.00 | – | – | – |
February | −0.28 | 0.01 | −20.16 | <0.001* |
March | −0.55 | 0.01 | −38.94 | <0.001* |
April | −0.42 | 0.01 | −28.92 | <0.001* |
May | −0.06 | 0.01 | −4.08 | <0.001* |
June | −0.11 | 0.01 | −7.63 | <0.001* |
July | −0.27 | 0.02 | −17.44 | <0.001* |
August | −0.46 | 0.02 | −29.10 | <0.001* |
September | −0.57 | 0.02 | −36.99 | <0.001* |
October | −0.87 | 0.01 | −61.57 | <0.001* |
November | −0.61 | 0.01 | −41.03 | <0.001* |
December | −0.03 | 0.01 | −2.21 | 0.027* |
Pop | −0.01 | 0.00 | −2.03 | 0.042* |
SD no (0) | 0.00 | – | – | – |
SD yes (1) | 0.07 | 0.01 | 12.18 | <0.001* |
. | Coefficient . | Standard error . | t-statistic . | p-value . |
---|---|---|---|---|
Intercept | 0.32 | 0.01 | 30.45 | <0.001* |
PC1 | −0.11 | 0.00 | −77.66 | <0.001* |
PC2 | 0.19 | 0.00 | 83.64 | <0.001* |
PC3 | −0.13 | 0.00 | −41.02 | <0.001* |
PC4 | −0.14 | 0.00 | −48.55 | <0.001* |
A | 0.08 | 0.00 | 138.13 | <0.001* |
January | 0.00 | – | – | – |
February | −0.28 | 0.01 | −20.16 | <0.001* |
March | −0.55 | 0.01 | −38.94 | <0.001* |
April | −0.42 | 0.01 | −28.92 | <0.001* |
May | −0.06 | 0.01 | −4.08 | <0.001* |
June | −0.11 | 0.01 | −7.63 | <0.001* |
July | −0.27 | 0.02 | −17.44 | <0.001* |
August | −0.46 | 0.02 | −29.10 | <0.001* |
September | −0.57 | 0.02 | −36.99 | <0.001* |
October | −0.87 | 0.01 | −61.57 | <0.001* |
November | −0.61 | 0.01 | −41.03 | <0.001* |
December | −0.03 | 0.01 | −2.21 | 0.027* |
Pop | −0.01 | 0.00 | −2.03 | 0.042* |
SD no (0) | 0.00 | – | – | – |
SD yes (1) | 0.07 | 0.01 | 12.18 | <0.001* |
*Significance at p < 0.05.
The sensitivity analysis revealed that the categorical factor had the largest impact on the model adjusted R2, contributing 0.392, followed closely by the autocovariate, A, contributing 0.303. The remaining six predictors have a much smaller influence on adjusted R2, listed from highest to lowest: (0.095), (0.066), (0.031), (0.017), (0.001), and (0.000). While the categorical sand dam indicator variable contributes little to overall model adjusted R2, the variable is a significant predictor (see Table 2).
The vegetation GLM simulates the standardized NDVI based on climate, time of year, population, spatial autocorrelation, and sand dam presence very well, resulting in an adjusted R2 of 0.69. Throughout the SDC, inclusion of the sand dam indicator variable results in an improved NMSE (see Figure 7). The overall model fit is good and establishes confidence that the model structure can provide valuable insight into the relationship between the collective impact of sand dams and NDVI.
DISCUSSION
Role of GRACE data in recording sand dam impact
The authors hypothesized that the sand dam's large area of influence coupled with the 3,000+ sand dams in the SDC would lead to substantial groundwater recharge that GRACE could detect. A monthly test of means revealed that some months of the year are more likely to produce differences in water storage in the SDC and buffer area. When accounting for climate factors and initial conditions, the GRACE GLMs indicated that GRACE total water storage data is only sufficient for detecting a significant regional impact of sand dams on water storage during 3 months of the year. The sand dams have a positive impact on total water storage during the months that experience the highest rainfall during each rainy season, April and November. However, the months following April and November do not exhibit a significant increase in SDC total water storage compared to the buffer area. This pattern indicates that sand dams capture more water, but they do not necessarily store significantly more water in their watershed for long periods. The inability to store water long-term can likely be attributed to seepage under the sand dam and high evapotranspiration (de Trincheria et al. 2018; Eisma & Merwade 2020). A study of 30 sand dams in Kenya found that up to 37% of sand dams have severe seepage issues, and up to 87% of sand dams lose over half of their stored water to evaporation (de Trincheria et al. 2015).
Most rainfall in the SDC is lost either to baseflow-groundwater runoff or evapotranspiration (Figure 8). A properly sited and constructed sand dam will minimize the amount of stored water lost to baseflow-groundwater runoff, because the dam will be built atop a near-impermeable streambed such as bedrock or consolidated clay. Nevertheless, sand dams are most often located in rural communities with limited or no access to subsurface surveying technology. Many sand dams may be constructed upon fractured bedrock, which could lead to significant water loss from the sand dam via rainwater infiltration into the bedrock (Kosugi et al. 2006). Furthermore, farmers cultivate the riparian zone and use water from the sand dam to irrigate their crops, increasing the amount of water lost from the sand dam via evapotranspiration. Water loss through baseflow-groundwater runoff and/or evapotranspiration could be further contributing to the failure of GRACE total water storage data to confirm the positive, collective impact of sand dams on regional water storage throughout the year.
Further complicating the ability of GRACE to detect a year-round, significant impact of sand dams on total water storage is the 1° resolution of the GRACE data. Sand dams have a positive impact on their local environment (Eisma & Merwade 2020), but these impacts may be occurring at too small a scale to be regularly observed via GRACE. The sand dams in the SDC may be constructed too far from each other to have overlapping areas of influence, decreasing the magnitude of their water storage signal. If approximately 50% of sand dams are not functioning properly due to issues of siltation and/or seepage (Viducich 2015; de Trincheria et al. 2018), the issue of GRACE's low resolution is exacerbated.
Link between increased vegetation and sand dams in the SDC
While the water storage GLM was unable to discern a significant impact of sand dams on regional water storage throughout the year, the vegetation GLM indicated that sand dams have a small, positive impact on the health and density of vegetative cover. Interestingly, vegetative cover and groundwater are linked, allowing conclusions to be drawn about the small-scale impact of sand dams on water storage (Le Maitre et al. 1999; Aguilar et al. 2012; Fu & Burgher 2015; Shemsanga et al. 2018; Eisma & Merwade 2020). This impact is restricted to the influence area of the sand dams, and thus occurs at a smaller scale than can be detected by GRACE each month.
NDVI is linked to groundwater depth, particularly in arid regions (Fu & Burgher 2015). Vegetation undergoes a marked decrease in height and structural complexity as groundwater retreats from the land surface (Le Maitre et al. 1999), leading to a decrease in NDVI (Fu & Burgher 2015). Sand dams store water just below the land surface, resulting in pseudo-perched groundwater. The pseudo-perched groundwater seeps through the stream banks and has the potential to raise the local groundwater table, supporting increased vegetative growth (Manzi & Kuria 2011; Quinn et al. 2019; Eisma & Merwade 2020). Farmers also grow crops irrigated with sand dam water on the streambanks, further increasing NDVI. The increased natural vegetation and cultivated crops result in increased transpiration, thereby lowering the groundwater table and reducing the duration of stored water in the dry season.
The interaction between the sand dam's influence on the local groundwater table and local vegetation creates a boom–bust dynamic for the vegetation (see Figure 9). During the rainy season, rainfall is stored in the sand dam and raises the local groundwater table; vegetation springs up as a result. The vegetation transpires water from the sand dam, drawing down the levels of stored water. When the rainy season ends and the sand dam dries up, the vegetation senesces and NDVI decreases. The cycle is repeated each rainy season. Figure 9 depicts this cycle very well, with the GLM coefficient being the lowest for the month right at the beginning of the rainy season and the highest at the end of a rainy season.
The NDVI GLM revealed that sand dams collectively impact NDVI and, therefore, the amount of available water in the region. The methodology was applied regionally in this study but can likely be applied to a different water harvesting technology on an individual basis. NDVI or GRACE, however, will not always be the appropriate remote sensing product to apply in this methodology. A dataset that is relevant to the function and/or impacts of the technology must be selected. For example, earthen dams create surface water storage. The Normalized Difference Water Index calculated from Landsat infrared radiation could be applied with this methodology to determine whether earthen dams are significantly contributing to available surface water (McFeeters 1996). High-resolution remotely sensed data can be an effective tool for exploring and quantifying impacts of small-scale water harvesting structures.
Future opportunities in the application of methodology
While every effort was made to develop a generalizable methodology that leveraged widely available datasets and well-documented techniques, future applications may benefit from an alternative approach. For example, GRACE data may not be an ideal dataset for small, regional studies. Future applications should consider using available local groundwater data to represent changes in total water storage, assuming that surface storage changes are negligible (Oiro et al. 2020). Furthermore, if GRACE data is appropriate for the selected study area and local groundwater data is available, the GRACE data should be downscaled to improve its resolution and accuracy (Miro & Famiglietti 2018; Seyoum et al. 2019). In addition, principal component analysis in inherently linear and may fail to represent trends in select GLM predictors. In such cases, alternative machine learning techniques designed to reduce the complexity of model predictors without losing information, like genetic programming, should be explored (Meshgi et al. 2014, 2015). Genetic programming is also capable of inducing an acceptable model given available data and model building blocks, which may be more effective than a PCA-style approach to representing climate variability (Chadalawada et al. 2020).
CONCLUSIONS
A methodology was developed and tested using two different remote sensing products. The methodology detected and quantified the effectiveness of a water harvesting technology common in south-eastern Kenya. The study revealed that sand dams likely do not significantly affect regional groundwater levels beyond a few months of the year. The GRACE total water storage data indicates that sand dams capture and store additional water during the rainy season, but that this additional storage is not significantly greater than water storage in an area without sand dams beyond the 2 months of the year with the most rainfall. GRACE's spatial resolution may limit the detection of increased water storage provided by sand dams, even when there are thousands of said structures in an area. MODIS NDVI, however, can detect the impact of sand dams on vegetative cover throughout the year. NDVI indicates that there is increased vegetation in an area with a high density of sand dams. This increased vegetation suggests higher groundwater levels resulting from the water storage added by sand dams. While GRACE and NDVI datasets were leveraged to quantify the regional impact of sand dams on water storage in this study, application of this methodology is not limited to GRACE and NDVI. Given the myriad remotely sensed data freely available, spatial GLMs show potential as a low-cost, adaptable technique for assessing water harvesting structures.
ACKNOWLEDGEMENTS
This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1333468.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories (GRACE: https://podaac.jpl.nasa.gov/; MODIS: https://lpdaac.usgs.gov/; FLDAS: https://disc.sci.gsfc.nasa.gov/).