Abstract
A novel methodology for suitable site selection for groundwater development based on river capture, pumping cost, and groundwater potential has been proposed for better groundwater utilisation. River capture and cost map have been generated from a calibrated groundwater model, simulated with forecasted hydrological time series data. The groundwater potential has been calculated with weighted overlay analysis. These three variables have been used to classify the model domain into five zones of groundwater development by K-Means clustering. The area with lower river capture, low cost of pumping, and high groundwater potential is found to be the best location for groundwater extraction. The methodology has been applied to the lower Ain River basin, in France.
HIGHLIGHTS
LSTM is a superior algorithm as compared to NAR for hydrological time-series forecasting.
The capture map has been utilised as an aquifer response to pumping in groundwater potential zonation.
Low proximity to streams does not justify high groundwater potential.
INTRODUCTION
One of the most significant and essential natural resources is groundwater (GW), which is kept in the underground geological formations of the earth's crust. It provides water for home, commercial, industrial, agricultural, and other development projects (Ayazi et al. 2010; Nampak et al. 2014). The presence and spread of GW are influenced by various anthropogenic and natural variables. Therefore, studying potential GW resources is essential for their wise development and usage (Bai et al. 2022). Studies on the effects of climate change indicate that there won't be much change in GW recharge in the foreseeable future (Crosbie et al. 2013; Taylor et al. 2013). The comparison of anthropogenic pumping and climate-driven factors showed that the decrease in GW resources is mainly caused by the combined effects of excessive pumping and climatic changes, albeit the contribution of pumping could potentially be much more significant than the natural replenishment (Wu et al. 2020). As the land and water resource development intensifies, it is becoming clear that the development of either GW or surface water affects the other. Therefore, it is crucial to identify the best locations for GW development projects to lessen the strain on surface water resources and GW storage.
GW potential zonation (determination of best locations for GW extraction) methodologies have come a long way with the use of physical methods (like borehole drilling and pumping tests), weighted overlay analysis (WOA) (Srivastava & Bhattacharya 2006; Pandey et al. 2014) and machine learning methods (Kumar et al. 2021; Bai et al. 2022). All procedures are based on the data collected by remote sensing, except for physical methods, which are expensive and time-consuming. These approaches do not consider the aquifer's response to the anticipated GW development (WOA and Machine Learning). If the aquifer's parameters are known, modelling the GW flow process can analyse the impact of pumping on the aquifer's capacity to support the expected GW development.
A combined effect of GW pumping on the aquifer and the river has been determined using the concept of a capture fraction map/capture map. The term ‘capture map’ was first articulated by Theis (1940), where he mentioned that water withdrawal by a well is balanced by a loss somewhere. He also mentioned that the cone of depression around a well could expand to recharge and discharge areas, thus altering the inflow and outflow patterns. Later, in 1954, Robert E. Glover and Glenn G. Balmer developed an analytical solution to study the transient capture of surface water from a river by a constant rate withdrawal well (Glover & Balmer 1954). In the last two decades, hypothetical models (Leake et al. 2010; Nadler et al. 2018), transient models based on previous data (Barlow & Dickerman 2001; Cosgrove & Johnson 2004), and steady-state models (Cosgrove & Johnson 2004) were used to generate capture maps. Leake et al. (2010) described the capture as fractions in terms of both volume and rates, which can be instantaneous or cumulative.
The objectives of this research work are as follows:
To forecast the input time-series data up to 40 years with LSTM and NAR.
To generate the river capture and cost map in the problem domain from the outputs of a calibrated GW flow model, simulated for the forecasted time-series data.
The calculation of GWPI with WOA.
To find the optimum pumping locations based on river capture, pumping cost, and GWPI.
METHODOLOGY
Data forecasting
A forecasting simulation model must have future time-series input data as its primary component. The accuracy of anticipated time series for various hydrological parameters, such as rainfall, evapotranspiration, and river discharge is always a determining factor in the capture map's accuracy. This study uses LSTM and NAR to predict stream discharge, rainfall, and stream water level. LSTM is a special type of Recurrent Neural Network (RNN) in which repeating unit (memory units) has a complex structure to ‘remember’ large sequences. These memory units also eliminate the problem of vanishing gradient encountered during backpropagation to obtain the weights of the neural network (Jozefowicz et al. 2015). NAR neural network applies a discrete nonlinear autoregressive network model for forecasting data (Ruiz et al. 2016). The network consists of multiple hidden layers (n-layers) that perform the nonlinear operations and an output layer that performs the regression operation. The hyperparameters of LSTM and NAR have been tuned with Linear Bayesian Optimisation (LBO) to achieve the best prediction accuracy. The Root-Mean-Square Error (RMSE) has been used as the accuracy metric as it combines the size of prediction errors for different data points into a single indicator of prediction accuracy. Each error has an impact on RMSE proportional to the size of the squared error; as a result, more significant errors have an outsized influence on RMSE. After comparing the RMSE of LSTM and NAR on test data, the most accurate method has been utilised to forecast the time series data.
GW modelling and simulation
A transient GW flow model is required to generate the river capture and cost maps. The GW model has been conceptualised in GMS (Aquaveo) environment with MODFLOW 2005 (Harbaugh 2005). The model parameters have been calibrated with the Parameter ESTimation (PEST) program (Doherty 2015). After calibration, the forecasted time-series data have been used to simulate the GW flow process for future stress periods.
Capture map and cost map
Groundwater potential index
Here, is the normalised weight of GW-influencing factors calculated using Saaty's AHP method (Saaty 2014) and is the weight assigned to each subclass of the GW-influencing factor. The thematic raster for different hydrological parameters has been prepared in the GIS environment. Initially, 10 GW-influencing parameters (elevation, Topographic Wetness Index, slope, distance from river, DD, land use map, surface geomorphology, soil HC, multi-resolution ridge top flatness, and MRVBF) have been selected for WOA. Expert analysis and previous research on the study area have shown that the five parameters (elevation, HC, MRVBF, DFR, and DD) have more importance than others. Therefore, these five hydrological parameters have been used to calculate GWPI.
The topography significantly influences the GW recharge and its movement, as the GW table frequently reflects the local terrain (Weiss 2001). TWI is used to quantify the topographical control of the hydrological processes. It is defined as log(As/tan(β)), where As is the local upslope area per unit contour length and tanβ is the steepness (slope) in degree (Moore et al. 1991). GW potential is greatly influenced by soil HC, which is a primary driver of GW recharge. The MRVBF index represents the flatness and lowness of the valley bottoms. When compared to channels, valley bottoms offer a good indication of the breadth of a subsurface flow path in flat terrain and hence a good GW potential (Gallant & Dowling 2003). Since the active river aquifer exchanges can replenish the withdrawn GW, proximity to rivers and streams increases GW potential.
K-Means clustering
K-Means clustering (MacQueen 1967) is an unsupervised learning algorithm that groups the unlabelled dataset into different clusters, such that the sum of distances between the data point and their corresponding cluster centres is minimised. The three independent variables (dimension), namely river capture, pumping cost, and GWPI have been used to cluster the corresponding locations into five clusters, representing areas in different zones for GW development. The elbow method is utilised to determine the optimal number of five clusters. The elbow method is an iterative process in which the WCSS (within-cluster sum of the square) is minimised as per the number of clusters. The optimum number of clusters is selected corresponding to the highest change of WCSS, represented by a drastic change in the WCSS vs. the number of cluster plots (the elbow). The K-Means clustering has been implemented with the sci-kit learn library (Pedregosa et al. 2011) in a Python environment.
IMPLEMENTATION
Study area
Time series data forecasting
. | . | . | RMSE . | ||
---|---|---|---|---|---|
. | Hyperparameters . | Optimum range . | Series 1 . | Series 2 . | Series 3 . |
LSTM | Shape of input features (lags) | 5–70 | 25.3 | 36.33 | 47.38 |
Maximum epochs | 60–230 | ||||
Initial learning rate | 0.001–0.05 | ||||
Mini-Batch Size | 11–60 | ||||
Regularisation weight | 10−7 to 10−5 | ||||
Number of hidden units | 5–70 | ||||
Stacked LSTM | Same as above | Same as above | 24.94 | 37.17 | 55.7 |
NAR | Shape of input features (lags) | 6–42 | 382.8 | 734.4 | 917.9 |
Number of hidden units | 5–28 |
. | . | . | RMSE . | ||
---|---|---|---|---|---|
. | Hyperparameters . | Optimum range . | Series 1 . | Series 2 . | Series 3 . |
LSTM | Shape of input features (lags) | 5–70 | 25.3 | 36.33 | 47.38 |
Maximum epochs | 60–230 | ||||
Initial learning rate | 0.001–0.05 | ||||
Mini-Batch Size | 11–60 | ||||
Regularisation weight | 10−7 to 10−5 | ||||
Number of hidden units | 5–70 | ||||
Stacked LSTM | Same as above | Same as above | 24.94 | 37.17 | 55.7 |
NAR | Shape of input features (lags) | 6–42 | 382.8 | 734.4 | 917.9 |
Number of hidden units | 5–28 |
The selected algorithm for different time series data has been highlighted corresponding to the lowest RMSE value.
GW modelling
The model conceptualisation has been done using available geological and hydrological data. The modelled aquifer thickness varies from 240 to 550 m, as observed in the lithology data by BRGM (Bureau de Recherches Géologiques et Minières), where SRTM data were used to get the surface elevation of the area. For the underlying sediments, a two-layered model was created, with each layer considered homogenous, horizontal, and isotropic. For the older sediment, the initial estimates of HC were taken to be 155.5 m/day, and for the younger alluvium, 259.2 m/s. The aquifer HC was estimated from the pumping test and the literature data obtained by BRGM. The area has been modelled with 250 m square grids and two layers. Two types of boundary conditions (no-flow boundary and specified flow boundary) were used to define the model domain. The inflow and outflow flux has been calculated by the observed GW heads using Darcy's flux (Darcy 1856). The Ain River is modelled as head-dependent boundary with stream package (STR1) in GMS. Figure 2(b) demonstrates how the watershed division line and the no-flow barrier were used to determine the eastern and western boundaries.
Further, the river stage data, measured at five locations in the study area, was obtained from Banque Hydro France. The river stage data from 2002 to 2015 was also examined to understand the data trend. This analysis showed a fluctuation of 2.54 m at Chazey-sur-Ain and 1.67 m at Pont-d'Ain, data from other tributaries, such as the Albarine River, also show a maximum fluctuation of 1.36 m. The water table variation is observed to be in the range of 1.2–3.7 m, which shows a stable GW scenario.
Rainfall was solely taken into account as a way to recharge the GW system in the area. The recharge input values were computed using rainfall data from the Météo France database and applied equally throughout the recharge polygons built in the model domain based on land use. The starting value of GW recharge was taken as 50, 10, 50, 60, and 80% of the rainfall (average 1,650 mm/year) for the six land use classes, namely vegetation, built-up, agriculture, fellow land, and sandy area respectively, which was further calibrated. With an extinction depth of 2 m, the projected annual evapotranspiration is 638 mm. The 10-year monthly rainfall average was taken for the future forecast, and corresponding recharge values were used. The model was calibrated from 2008 to 2010 using all observation wells, then again from 2010 to 2012 using the remaining wells, depending on the available data. Data from 2012 to 2015, which were available for four observation wells, were used to validate the model. The GW modelling of the area is further discussed in Bajpai et al. (2022).
GW capture and pumping cost
The calibrated GW model was run for future stress periods by taking the forecasted time-series data as input. The model for calculating the capture fraction was coupled with MODFLOW by modifying the hdf5 input files in MATLAB. The GW model has been used to identify the effects of long-run pumping (for the next 40 years) on water budgeting by considering the discharge well on the cell by cell basis. The cells lying in the alluvium zone around the river were only considered for the analysis.
GWPI
S/N . | Parameters . | Normalised weights . | Subclasses . | Weights . |
---|---|---|---|---|
1 | DFR (m) | 0.092 | <1,230.42 | 0.559 |
1,230.42–2,605.61 | 0.243 | |||
2,605.61–4,089.36 | 0.111 | |||
4,089.36–5,754.06 | 0.052 | |||
>5,754.06 | 0.035 | |||
2 | DD (km/km2) | 0.062 | <0.67 | 0.049 |
0.67–1.48 | 0.074 | |||
1.48–2.38 | 0.143 | |||
2.38–3.52 | 0.247 | |||
> 3.52 | 0.488 | |||
3 | MRVBF | 0.112 | < 1.32 | 0.037 |
1.32–3.4 | 0.064 | |||
3.4–5.21 | 0.129 | |||
5.21–7.15 | 0.265 | |||
>7.15 | 0.506 | |||
4 | Elevation (m) | 0.204 | <211 | 0.589 |
211–233.94 | 0.234 | |||
233.94–273.36 | 0.093 | |||
273.94–296.17 | 0.052 | |||
>296.17 | 0.033 | |||
5 | HC (m/day) | 0.53 | <18.53 | 0.032 |
18.53–72.67 | 0.047 | |||
72.67–126.8 | 0.135 | |||
126.8–830.53 | 0.233 | |||
>830.53 | 0.553 |
S/N . | Parameters . | Normalised weights . | Subclasses . | Weights . |
---|---|---|---|---|
1 | DFR (m) | 0.092 | <1,230.42 | 0.559 |
1,230.42–2,605.61 | 0.243 | |||
2,605.61–4,089.36 | 0.111 | |||
4,089.36–5,754.06 | 0.052 | |||
>5,754.06 | 0.035 | |||
2 | DD (km/km2) | 0.062 | <0.67 | 0.049 |
0.67–1.48 | 0.074 | |||
1.48–2.38 | 0.143 | |||
2.38–3.52 | 0.247 | |||
> 3.52 | 0.488 | |||
3 | MRVBF | 0.112 | < 1.32 | 0.037 |
1.32–3.4 | 0.064 | |||
3.4–5.21 | 0.129 | |||
5.21–7.15 | 0.265 | |||
>7.15 | 0.506 | |||
4 | Elevation (m) | 0.204 | <211 | 0.589 |
211–233.94 | 0.234 | |||
233.94–273.36 | 0.093 | |||
273.94–296.17 | 0.052 | |||
>296.17 | 0.033 | |||
5 | HC (m/day) | 0.53 | <18.53 | 0.032 |
18.53–72.67 | 0.047 | |||
72.67–126.8 | 0.135 | |||
126.8–830.53 | 0.233 | |||
>830.53 | 0.553 |
RESULTS
The series with the least RMSE in testing was used to develop the MODFLOW model for water consumption-based capture maps and cost map generation. It is observed from Figure 3 that, unlike NAR, the LSTM-based model can capture complex features for a long-term forecast. However, in the case of predictions averaged over 16 weeks, both LSTM and NAR models provide similar results. Either single LSTM or stacked LSTM has the least RMSE. For the hyper-tuned LSTM and NAR models, the optimum range of hyperparameters has been tabulated in Table 1, along with the RMSE values. Most of the long-term forecast is highly uncertain, but the 4-month interval average RMSE was low. In addition, the granularity of time series data for forecasting required for the MODFLOW model was high; hence these predictions were used to generate capture maps.
Once the forecasted data were created up to the year 2060, GW flow was simulated with these data. Predicted heads through the model were found stable (standard deviation – 0.375 m) with a slight upward trend (1.8 cm per year). Further, capture maps related to river-inflow and river-outflow were generated with the highest capture of 99.78% and the lowest of 36.39%. Another type of map was also developed by considering the cost of pumping, which is the function of water table depth.
The cost map represents the pumping cost which is assumed to be proportional to the depth of the GW. For the area close to the river within a 1 km buffer, the water table is below 10 m of depth from the surface, mostly limited to around 1 m. On the lower left side of the domain, some area in the vicinity of the river has large depth. This can be attributed to the hilly region and inaccurate modelling of the area. Nearly all the points on the right side of Ain River have a water table depth of less than 17 m. The GWPI values were extracted for the model domain for clustering along with river capture and cost.
K-Means clustering with three variables (river capture, cost, and GWPI) has been used to classify the model domain in five zones of GW development. Capture map of the river Ain helps to identify the location with less impact on the river flow. It is also affected by less GW inflow into the river or excessive water extraction through the river due to excessive pumping. The clusters have been analysed as
1. Low cost, high capture, and high GW potential: | These cells represent high river water capture, which has high GW potential. The pumping cost in these areas is low due to readily available river water. However, the dependency on the river in these areas can be detrimental to the river flow. |
2. Low cost, moderate capture, and low GW potential | These areas represent good pumping zones for short durations since the GW potential is low. |
3. Low cost, high capture, and low GW potential | These areas have a low cost of pumping, but most of the water is from the river only, which makes the area of low GW potential. |
4. Low cost, low capture, and high GW potential | These areas are best for new GW development projects, with low cost of pumping and low river capture. |
5. High cost, high capture, and mixed GW potential | These highly scattered zones are the worst locations for GW development works due to the high cost of pumping and immense strain on river water. |
1. Low cost, high capture, and high GW potential: | These cells represent high river water capture, which has high GW potential. The pumping cost in these areas is low due to readily available river water. However, the dependency on the river in these areas can be detrimental to the river flow. |
2. Low cost, moderate capture, and low GW potential | These areas represent good pumping zones for short durations since the GW potential is low. |
3. Low cost, high capture, and low GW potential | These areas have a low cost of pumping, but most of the water is from the river only, which makes the area of low GW potential. |
4. Low cost, low capture, and high GW potential | These areas are best for new GW development projects, with low cost of pumping and low river capture. |
5. High cost, high capture, and mixed GW potential | These highly scattered zones are the worst locations for GW development works due to the high cost of pumping and immense strain on river water. |
DISCUSSION
The prediction capabilities of LSTM and NAR have shown variation based on the temporal scale of the time series data. LSTM has proven superior in predicting temporal variation with daily time series data compared to NAR. However, they both performed more or less similarly for the 4-month average time series data. The GW model is an essential part of the whole research work, and the accuracy of the capture map depends on the GW model. Therefore, the model should be calibrated well before applying it for long-term prediction. In forecasting, volatility, a granularity of time series, and lead-time of prediction play a significant role in the selection of forecasting techniques. The study incorporates the prediction of time series through the forecasting method. The NAR neural network is computationally inexpensive compared to LSTM and is suitable for non-volatile time series with low lead time. Although LSTM-based networks are computationally expensive, they can predict a much more volatile time series with a higher lead time. Stochastic forecasting techniques can be used instead of deep learning-based for more accurate long-term predictions with wide uncertainty intervals.
The concept of capture has proven very useful in understanding and managing the water resource potential of an area at the reach scale as it represents the aquifer response to the pumping. A balanced abstraction of GW, considering the extent of stream capture, can prove to be a better way to manage our existing water resources. A calibrated physical-based GW model with accurate aquifer properties can provide a realistic and accurate aquifer response to a given pumping stress. Incorporating the capture map with GW potential mapping can reduce the uncertainty in the delineation of suitable pumping locations. As evident from Figure 6, the proximity to the river does not justify the development of GW extraction projects (as in the lower part of the river, there are many cells which are not suitable for pumping).
The lower Ain River basin has a very stable GW table. Increased industrial and agricultural activities in the lower Ain River basin have, in turn, increased the stresses on the area's water resources in the future. In the present study, an effort was made to analyse the long-run impact of increasing water pumping on river water budgeting by developing a capture map and cost map for the lower Ain Valley. Developed maps are applied to identify the potential of the area. The capture map suggested that nearly 90 percentiles of the considered area satisfies nearly 65.54% of the total water demand from the river. The river capture in the middle part of the study area, near Chazey-sur-Ain, was more affected by pumping, whereas the upper part near Pont-d'Ain was found to be less affected by pumping.
CONCLUSION
A novel methodology to determine suitable GW pumping locations based on the concept of capture fraction and GW potential has been proposed in this research work. The pumping cost was also considered in the unsupervised clustering (K-Means) to find the best locations to minimise the cost and the capture from the river in high GW potential zones. The LSTM and NAR have been utilised to predict time-series data to simulate the GW model up to 2060. This research is expected to help the decision makers, where the models are used for evaluating and testing the management outcomes related to GW and surface water resources. This research provides a novel constraint for the future development of GW based on balanced GW and surface water resource management. The water quality is also a constraint that should be included in holistic GW resource management. Although the best pumping locations have been derived by minimising the river capture, the aquifer's vulnerability to contamination due to pumping can be integrated with this methodology in future research work.
ACKNOWLEDGEMENT
This work was performed within the framework of the EUR H2O'Lyon (ANR-17-EURE-0018) of Université de Lyon (UdL) through the ‘Investissements d'Avenir’ programme operated by the French National Research Agency (ANR) and through a joint Indian-French International Research Project (IRP-CNRS) on ‘Effects of River-Aquifer Exchanges on Riverine Ecosystem Resilience to Global Change Comparative Approach of the Ganga and Rhône River Basin Networks’ (2021–2026).
FUNDING
No funding received.
AUTHORS CONTRIBUTIONS
S.G., A.O., and P.H. drafted and devised the idea of the work and supervised its timely progress and guided in the technical and literature aspects. G.S. and O.A. created the groundwater models. M.B. carried out the model simulations and forecasting. R.K. drafted the results and formatted the literature accordingly. All authors discussed the results and contributed to the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.