Abstract

Land subsidence in response to declining water table at plains under sparse data is investigated using ALPRIFT, introduced recently by the authors at the stage of its proof-of-concept. ALPRIFT is a framework, which pools consensually together seven general-purpose data layers with a scoring system of prescribed rates accounting for local variations and prescribed weights accounting for their relative importance. It is a subsidence vulnerability indexing (SVI) approach, which estimates relative values and is subject to inherent subjectivities. The paper treats the transformation of SVI into a risk indexing (RI) capability through a scheme, in which ALPRIFT breaks down into ALRIF, characterising passive local effects and into water-driven PT, characterising active system-wide effects. The addition of passive and active processes renders total vulnerability but their products render a measure of risk index. A modelling strategy is formulated for SVI and RI at two levels to treat inherent subjectivities and involves data fusion by using catastrophe theory. The strategy is applied to an aquifer subject to decline in water table at the coast of Lake Urmia, with sparse data. The results provide evidence for the proof-of-concept on SVI and RI using ROC/AUC performance metrics.

HIGHLIGHTS

  • Use ALPRIFT for subsidence risk and vulnerability indexing by two levels of learning.

  • Break ALPRIFT to ALRIF (Passive Vulnerability Index-PVI) & PT (Active – AVI).

  • PVI + AVI give Total Vulnerability Index (TVI) and PVI*AVI give Risk Indexing (RI).

  • At Levels 1 and 2, PVI/AVI/TVI/RI are mapped by GA and ANN respectively.

  • Results for ALPRIFT, PVI, AVI, TVI and RI are fit-for-purpose and match with InSAR.

Graphical Abstract

Graphical Abstract
Graphical Abstract

INTRODUCTION

A decline in water table is widely known to induce subsidence due to a subsequent loss of hydrostatic forces in the form of land settlements including both gradual downward movements and sudden sinking without significant horizontal movements. Subsidence is a highly variable process and has a wide range of impacts including adverse effects on safety, interruptions of services and transportation, as well as environmental impacts and damage. Research on subsidence is topical and some recent works use experimental techniques to understand the complexity of the problem. For instance, Li et al. (2014) reports on a set of experiments on the deformation behaviour of sands under abstraction–recharge cycles. Ma et al. (2010) identify the hysteresis effect or time-dependent lag in deformation when the water level recovers. Wei et al. (2017) investigate experimentally the deformation of samples taken from a confined aquifer and aquitard with lithology of silty clay, silt and fine sands and they characterise elastic–plastic deformation for all samples. Studies of this nature contribute to the better understanding of subsidence and complement earlier investigations, e.g., Bouwer (1977), who shows through simple calculations that a decline of 10 m in water table can induce as much as 5–50 cm subsidence and hence this may be taken as a rule-of-thumb. The diversity of topical research on subsidence is outlined below but the primary aim of the paper is to develop risk management tools for land subsidence when data availability is sparse.

Existing research trends are summarised in Table 1 in broad terms, which categorise them as: (i) mathematical models use modelling capabilities of groundwater flows to predict the decline in water table and thereby to facilitate studies on subsidence and their impacts (e.g., Modoni et al. 2013); (ii) remote sensing techniques are used to monitor land subsidence to investigate effects of hydrogeological settings (e.g., Gao et al. 2016); and (iii) use GIS techniques to study subsidence problems. The latter group is suitable for cases with sparse data but Nadiri et al. (2018a) argue that there is not much published work yet. They introduce ALPRIFT as a new technique, where past techniques are often simplistic, e.g., the hazard risk assessment index techniques (Huang et al. 2012). The paper precludes the techniques in the first and second categories but is focused on innovating ALPRIFT, as detailed below.

Table 1

Examples of studies on groundwater-induced land subsidence for a few international occurrences

GroupReferenceModels or dataModel or softwareLocation
Empirical equation and mathematical models Modoni et al. (2013)  Ground deformation Classical principle of Terzaghi (1943) Bologna (Italy) 
 Subsoil characterisation   
 Groundwater regime   
Faunt et al. (2016)  Surface-water inflows CVHMa California (USA) 
 Land-use maps   
 Climate data   
Shrestha et al. (2017)  Physical characteristics SHETRAN for GWL predictions Kathmandu Valley (Nepal) 
 Hydro-meteorology GIS processing for overlaying compressibility, Soil thickness, GWL fluctuation rasters  
 GW abstraction   
 Compressibility raster   
 Soil thickness raster   
 GWL fluctuation   
Deng et al. (2018)  Hydrological data such as: MODFLOW Tianjin (China) 
 Rainfall infiltration coefficient SUB package  
 The extinction depth of evaporation   
 Permeability coefficient   
 Specific storage   
 Elastic skeleton storage   
 Non-elastic skeleton storage rate   
 Vertical anisotropy   
Sundell et al. (2019)  Borehole data Geostatistical soil-stratification model A railway shaft and tunnel in Stockholm and Varberg (Sweden) 
  Hydrogeological data Inverse calibrated groundwater model  
  Damage data Elasto-plastic model of subsidence  
   A model to estimate damages  
InSAR processing Gao et al. (2016)  Envisat ASAR images DORISb Beijing (China) 
Chen et al. (2016)  Envisat ASAR images DORIS Beijing (China) 
 TerraSAR-X stripmap   
Hu et al. (2019)  GPS data GAMMA software Beijing (China) 
 Sentinel-1   
Zhou et al. (2019)  ENVISAT ASAR GAMMA software Beijing (China) 
  RADARSAT-2   
GIS based Hu et al. (2009)  Hazard data layers: 1. Groundwater exploitation intensity; 2. Cumulative subsidence volume; 3. Land subsidence velocity Overlay analysis Tianjin (China) 
 Vulnerability data layers: 1. Construction land proportion; 2. Population density; 3. Gross Domestic Product per km2   
 Disaster reducing data layers: 1. Urbanisation level; 2. Volume of reducing GW exploitation; 3. Length of level survey per km2   
Huang et al. (2012)  1. Annual recovery rate of GWL; 2. Thickness of the confined aquifer; 3. Thickness of the soft clay Hazard risk assessment of land subsidence Wuxi city (China) 
Pradhan et al. (2014)  1. Altitude; 2. Slope; 3. Aspect; 4. Lithology; 5. Distance from fault; 6. Distance from river; 7. NDVIc; 8. Soil; 9. Stream power index; 10. Topographic wetness index EBFd; FRe Kinta Valley (Malaysia) 
Ghorbanzadeh et al. (2018)  1. Land use; 2. GWL; 3. Aspect; 4. DEM; 5. Distance to streams; 6. Rainfall; 7. Density of wells; 8. Slope; 9. Lithology ANFISf Amol (Iran) 
Nadiri et al. (2018a)  ALPRIFT ALPRIFT; SFLg Shabestar (Iran) 
Sadeghfam et al. (2020a)  ALPRIFT Fuzzy Catastrophe Scheme Marand (Iran) 
Nadiri et al. (2020)  ALPRIFT A modelling strategy at two levels Ardabil (Iran) 
Present study ALPRIFT A modelling strategy at two levels; Data fusion by catastrophe theory Tasuj (Iran) 
GroupReferenceModels or dataModel or softwareLocation
Empirical equation and mathematical models Modoni et al. (2013)  Ground deformation Classical principle of Terzaghi (1943) Bologna (Italy) 
 Subsoil characterisation   
 Groundwater regime   
Faunt et al. (2016)  Surface-water inflows CVHMa California (USA) 
 Land-use maps   
 Climate data   
Shrestha et al. (2017)  Physical characteristics SHETRAN for GWL predictions Kathmandu Valley (Nepal) 
 Hydro-meteorology GIS processing for overlaying compressibility, Soil thickness, GWL fluctuation rasters  
 GW abstraction   
 Compressibility raster   
 Soil thickness raster   
 GWL fluctuation   
Deng et al. (2018)  Hydrological data such as: MODFLOW Tianjin (China) 
 Rainfall infiltration coefficient SUB package  
 The extinction depth of evaporation   
 Permeability coefficient   
 Specific storage   
 Elastic skeleton storage   
 Non-elastic skeleton storage rate   
 Vertical anisotropy   
Sundell et al. (2019)  Borehole data Geostatistical soil-stratification model A railway shaft and tunnel in Stockholm and Varberg (Sweden) 
  Hydrogeological data Inverse calibrated groundwater model  
  Damage data Elasto-plastic model of subsidence  
   A model to estimate damages  
InSAR processing Gao et al. (2016)  Envisat ASAR images DORISb Beijing (China) 
Chen et al. (2016)  Envisat ASAR images DORIS Beijing (China) 
 TerraSAR-X stripmap   
Hu et al. (2019)  GPS data GAMMA software Beijing (China) 
 Sentinel-1   
Zhou et al. (2019)  ENVISAT ASAR GAMMA software Beijing (China) 
  RADARSAT-2   
GIS based Hu et al. (2009)  Hazard data layers: 1. Groundwater exploitation intensity; 2. Cumulative subsidence volume; 3. Land subsidence velocity Overlay analysis Tianjin (China) 
 Vulnerability data layers: 1. Construction land proportion; 2. Population density; 3. Gross Domestic Product per km2   
 Disaster reducing data layers: 1. Urbanisation level; 2. Volume of reducing GW exploitation; 3. Length of level survey per km2   
Huang et al. (2012)  1. Annual recovery rate of GWL; 2. Thickness of the confined aquifer; 3. Thickness of the soft clay Hazard risk assessment of land subsidence Wuxi city (China) 
Pradhan et al. (2014)  1. Altitude; 2. Slope; 3. Aspect; 4. Lithology; 5. Distance from fault; 6. Distance from river; 7. NDVIc; 8. Soil; 9. Stream power index; 10. Topographic wetness index EBFd; FRe Kinta Valley (Malaysia) 
Ghorbanzadeh et al. (2018)  1. Land use; 2. GWL; 3. Aspect; 4. DEM; 5. Distance to streams; 6. Rainfall; 7. Density of wells; 8. Slope; 9. Lithology ANFISf Amol (Iran) 
Nadiri et al. (2018a)  ALPRIFT ALPRIFT; SFLg Shabestar (Iran) 
Sadeghfam et al. (2020a)  ALPRIFT Fuzzy Catastrophe Scheme Marand (Iran) 
Nadiri et al. (2020)  ALPRIFT A modelling strategy at two levels Ardabil (Iran) 
Present study ALPRIFT A modelling strategy at two levels; Data fusion by catastrophe theory Tasuj (Iran) 

aCentral Valley Hydrologic Model; bDelft Object oriented Radar Interferometric Software; cNormalized Difference Vegetation Index; dEvidential Belief Function; eFrequency Ratio; fAdaptive Neuro Fuzzy Inference System; gSugeno Fuzzy Logic.

The paper builds on vulnerability mapping by the ALPRIFT framework as presented by Nadiri et al. (2018a) and Sadeghfam et al. (2020a), and tests different strategies of risk mapping problem, where mapping produces relative values suitable to cases with sparse data using largely general-purpose data not collected for such studies. The success of the concept of risk is often indicated by the diversification of its application and adaptation of its definitions in a variety of ways. Generally, some of the approaches to calculate risk include: (i) probabilistic approach which quantifies risk by multiplying the consequences of hazard with their likelihood, as reviewed by Khatibi (2011); (ii) combination of load and resistance, as reviewed by Tung et al. (2005) and Sadeghfam et al. (2018); and (iii) indexing risk as a product of vulnerability and hazard (as reviewed in Living with risk (2004)). The probabilistic techniques providing absolute values of risk are often referred to as risk quantification techniques, which use the probabilistic approaches, but these are precluded in the paper for their extensive data requirements. For sparse data, risk indexing (mapping) techniques are used and the paper transforms vulnerability indexing by ALPRIFT into a risk indexing problem.

ALPRIFT is the acronym of seven data layers, which comprise: aquifer media (A), land use (L), pumping of groundwater, recharge (R), aquifer thickness impact (I), fault distance (F) and decline of water table (T). A further data layer comprises results of Interferometric Synthetic Aperture Radar (InSAR). Sadeghfam et al. (2020a) examine closely these data layers and note that the ALRIF data layers (five of seven) have the capacity only to prevail at local scales in accounting for geogenic and natural processes; whereas the PT data layers (the remaining two) prevail at the system-wide scale owing to the hydraulic properties of water acting as a conduit for the processes stemming from anthropogenic activities. For instance, the data layer A is bound to vary at a local scale without any correlation between its occurrence throughout a study area; whereas the P data layer has the capacity to impact system-wide under the right conditions through hydraulic processes. Notably, the paper refers to ALRIF as passive vulnerability index (PVI), whereas PT is referred to as active vulnerability index (AVI).

The distinction on passive and active processes inherent in ALPRIFT (i.e., PVI and AVI) is a justification for developing a risk indexing capability but this needs to be tested under different conditions. The concepts underlying PVI and AVI lead to formulating two problems: (i) the additive model of PVI (ALRIF data layers) and AVI (PT data layers) to serve as the total vulnerability indexing (TVI) problem; (ii) the multiplicative PVI and AVI model is developed as a novel risk index (RI) problem. A critical examination of these new indices is vital to ensure that PVI, AVI, TVI and RI are properties contained in ALPRIFT data layers but not noise. The paper formulates a modelling strategy at two levels. At Level 1, genetic algorithm (GA) is used to optimise the weights of data layers without testing its performance. At Level 2, supervised artificial neural network (ANN) is used, in which the target values are ascertained through data fusion to serve the training and testing phases. The data fusion approach presented by the paper is also and is based on catastrophe theorem.

Besides building evidence for PVI, AVI, TVI and RI that represent significant signals, further novelties in the paper are as follows. (i) A modelling strategy is formulated at two levels but with different selections as normally carried out by the author's ongoing programme of research, e.g., vulnerability indexing of land subsidence (Nadiri et al. 2018a, 2020; Sadeghfam et al. 2020a), vulnerability to groundwater contamination (Nadiri et al. 2018b), groundwater level prediction (Moazamnia et al. 2019). (ii) The data fusion in the paper applies to combining PVI, AVI or RI with InSAR results to produce more consistent results, the algorithm for which is through the innovative use of catastrophe theory, deriving its basis from the decision theory given by Cheng et al. (1996). Data fusion is used widely and the best example is the human mind capable of combining and making sense from different senses. Notably, artificial intelligence (AI) techniques (GA, ANN, fuzzy logic) are now established modelling tools and therefore the paper suffices just to their specification. The strategy is investigated at Tasuj aquifer, which suffers from decline of water table and consequently land subsidence.

METHODOLOGY

This section presents the rationale for transforming ALPRIFT, presented by Nadiri et al. (2018a), into a risk mapping technique, which comprises a new thinking on ALPRIFT.

The thinking on transforming ALPRIFT into a risk mapping technique

ALPRIFT is treated through a new thinking, the rationale for which comprises: (i) a breakdown of the ALPRIFT data layers into PVI (ALRIF data layers) and AVI (PT data layers); (ii) the additive model of PVI and AVI provides the TVI problem and their product the RI problem.

Basic ALPRIFT

Subsidence risk mapping presented by the paper builds on the ALPRIFT framework presented by Nadiri et al. (2018a) for mapping subsidence vulnerability indices (SVI) over aquifer areas, where mapping or indexing techniques produce relative values. As a mirror of the DRASTIC framework by Aller et al. (1987), ALPRIFT is a collective contribution of seven data layers defined in Table 2, which are: aquifer media (A), land use (L), pumping of groundwater, recharge (R), aquifer thickness impact (I), fault distance (F) and decline of water table (T). The rationale for ALPRIFT comprises: (i) divide a study area into square grids (pixel) and associate their GIS model with seven data layers; (ii) use a scoring system, in which local physical variations are accounted for by assigning rates values in the range from 1 to 10, and relative importance of the data layers are accounted for by assigning rate values in the range from 1 to 5. SVI for each pixel is processed, as follows:
formula
(1)
where the r-subscript denotes rate as specified in Table 3, originally given by Nadiri et al. (2018a), and w-subscript denotes weight with the following prescribed values: , ; , , , , .
Table 2

Definition, required raw data and process for seven data layers

Passive Vulnerability Indexing (PVI) Aquifer media (A) Definition Soil texture (mixture of gravel, sand, silt, clay) forms the porous media and has direct proportionality with vulnerability of land subsidence 
Raw data 28 geological logs at the location of observation wells 
Process Assigning rate based on recommendation by Nadiri et al. (2018a); Interpolate by IDW techniques 
Land use (L) Definition Human activities can affect directly vulnerability. Nadiri et al. (2018a) categorised land uses to different groups as: mining/resources extraction, irrigated farming, dam construction, built-up, dry farming/grassland, barren land 
Raw data Sentinel-2 image satellite on 1 July 2017 with 20 m resolution 
Process Image processing by ENVI software 
Recharge (R) Definition High recharge can mitigate the effect of water decline and has indirect proportionality with vulnerability of land subsidence 
Raw data Slope: obtained by Landsat satellite image; Soil permeability: obtained by geological logs; Precipitation: obtained by Tasuj meteorological station 
Process Assign rates based on Piscopo (2001); Overlay 
Impact of aquifer thickness (I) Definition The higher aquifer thickness (saturated/unsaturated), greater subsidence potential 
Raw data Geological logs; Geo-electrical profiles 
Process Interpolate by IDW techniques 
Fault distance (F) Definition Faults and related tectonic movement can amplify subsidence of granular material. Hence, distance from faults is indirectly proportional to subsidence vulnerability 
Raw data Location of faults regarding geological map 
Process Calculate Euclidian distance by GIS software 
Active Hazard Indexing (AVI) Pumpage of groundwater (P) Definition Groundwater pumpage to meet the demand is an external factor that can induce land subsidence, therefore, it has direct proportionality with subsidence 
Raw data Annual volume of groundwater discharge at abstraction wells 
Process Draw Thiessen polygon for observation wells; Calculate the equivalent height of groundwater pumpage for each polygon and assign the obtained values for the corresponding observation well; Interpolate by inverse distance weighted (IDW) technique 
Water table decline (T) Definition Water table decline increases effective stress and induces subsidence 
Raw data Groundwater level time series at 28 observation wells 
Process Calculate trend (slope) of water table decline/increase for each observation well; Interpolate the obtained trend by IDW techniques 
Passive Vulnerability Indexing (PVI) Aquifer media (A) Definition Soil texture (mixture of gravel, sand, silt, clay) forms the porous media and has direct proportionality with vulnerability of land subsidence 
Raw data 28 geological logs at the location of observation wells 
Process Assigning rate based on recommendation by Nadiri et al. (2018a); Interpolate by IDW techniques 
Land use (L) Definition Human activities can affect directly vulnerability. Nadiri et al. (2018a) categorised land uses to different groups as: mining/resources extraction, irrigated farming, dam construction, built-up, dry farming/grassland, barren land 
Raw data Sentinel-2 image satellite on 1 July 2017 with 20 m resolution 
Process Image processing by ENVI software 
Recharge (R) Definition High recharge can mitigate the effect of water decline and has indirect proportionality with vulnerability of land subsidence 
Raw data Slope: obtained by Landsat satellite image; Soil permeability: obtained by geological logs; Precipitation: obtained by Tasuj meteorological station 
Process Assign rates based on Piscopo (2001); Overlay 
Impact of aquifer thickness (I) Definition The higher aquifer thickness (saturated/unsaturated), greater subsidence potential 
Raw data Geological logs; Geo-electrical profiles 
Process Interpolate by IDW techniques 
Fault distance (F) Definition Faults and related tectonic movement can amplify subsidence of granular material. Hence, distance from faults is indirectly proportional to subsidence vulnerability 
Raw data Location of faults regarding geological map 
Process Calculate Euclidian distance by GIS software 
Active Hazard Indexing (AVI) Pumpage of groundwater (P) Definition Groundwater pumpage to meet the demand is an external factor that can induce land subsidence, therefore, it has direct proportionality with subsidence 
Raw data Annual volume of groundwater discharge at abstraction wells 
Process Draw Thiessen polygon for observation wells; Calculate the equivalent height of groundwater pumpage for each polygon and assign the obtained values for the corresponding observation well; Interpolate by inverse distance weighted (IDW) technique 
Water table decline (T) Definition Water table decline increases effective stress and induces subsidence 
Raw data Groundwater level time series at 28 observation wells 
Process Calculate trend (slope) of water table decline/increase for each observation well; Interpolate the obtained trend by IDW techniques 
Table 3

The prescribed rates for ALPRIFT frame work by Nadiri et al. (2018a) 

Aquifer media (A)a
Land use (L)
Pumping of groundwater (P)
Recharge (R)
Impacts of aquifer thickness (I)
Fault distance (F)
Decline of water table (T)
RangeRateRangeRateRangeRateRangeRateRangeRateRangeRateRangeRate
Clay 8–10 Mining/resources extraction 9–10 <0.0001 0 ≤ r < 4 10 0 ≤ r < 25 0 ≤ r < 1 10 0 ≤ r < 0.2 
Silt 8–9 Irrigated farming 7–9 0.0001 ≤ r < 0.005 4 ≤ r < 9 25 ≤ r < 55 1 ≤ r < 2 0.2 ≤ r < 0.5 
Karstic formations 6–8 Dam construction 6–9 0.005 ≤ r < 0.01 9 ≤ r < 14 55 ≤ r <90 2 ≤ r < 3 0.5 ≤ r <0.9 
Sand 3–5 Built-up (residential, etc.) 4–8 0.01 ≤ r < 0.5 14 ≤ r < 19 90 ≤ r <130 3 ≤ r < 4 0.9 ≤ r < 1.4 
Gravel 2–3 Transportation 3–4 0.5 ≤ r < 1 19 ≤ r < 24 130 ≤ r < 175 4 ≤ r < 5 1.4 ≤ r < 2 
Rock types, e.g., sedimentary, etc. 1–3 Dry farming/grassland 1–2 1 ≤ r < 5 >24 175 ≤ r < 225 >5 2 ≤ r < 2.7 
Oxidised organic soil 8–10 Barren land 5 ≤ r < 20   225 ≤ r < 280   2.7 ≤ r < 3.5 
    20 ≤ r < 40   280 ≤ r < 340   3.5 ≤ r < 4.4 
    40 ≤ r < 65   340 ≤ r < 405   4.4 ≤ r < 5.4 
    >65 10   >405 10   >5.4 10 
Aquifer media (A)a
Land use (L)
Pumping of groundwater (P)
Recharge (R)
Impacts of aquifer thickness (I)
Fault distance (F)
Decline of water table (T)
RangeRateRangeRateRangeRateRangeRateRangeRateRangeRateRangeRate
Clay 8–10 Mining/resources extraction 9–10 <0.0001 0 ≤ r < 4 10 0 ≤ r < 25 0 ≤ r < 1 10 0 ≤ r < 0.2 
Silt 8–9 Irrigated farming 7–9 0.0001 ≤ r < 0.005 4 ≤ r < 9 25 ≤ r < 55 1 ≤ r < 2 0.2 ≤ r < 0.5 
Karstic formations 6–8 Dam construction 6–9 0.005 ≤ r < 0.01 9 ≤ r < 14 55 ≤ r <90 2 ≤ r < 3 0.5 ≤ r <0.9 
Sand 3–5 Built-up (residential, etc.) 4–8 0.01 ≤ r < 0.5 14 ≤ r < 19 90 ≤ r <130 3 ≤ r < 4 0.9 ≤ r < 1.4 
Gravel 2–3 Transportation 3–4 0.5 ≤ r < 1 19 ≤ r < 24 130 ≤ r < 175 4 ≤ r < 5 1.4 ≤ r < 2 
Rock types, e.g., sedimentary, etc. 1–3 Dry farming/grassland 1–2 1 ≤ r < 5 >24 175 ≤ r < 225 >5 2 ≤ r < 2.7 
Oxidised organic soil 8–10 Barren land 5 ≤ r < 20   225 ≤ r < 280   2.7 ≤ r < 3.5 
    20 ≤ r < 40   280 ≤ r < 340   3.5 ≤ r < 4.4 
    40 ≤ r < 65   340 ≤ r < 405   4.4 ≤ r < 5.4 
    >65 10   >405 10   >5.4 10 

aAt each grid point with complex set of strata, a scoring system is used for the A data layer, according to which the depth of each stratum is multiplied by the prescribed rate. The sum of all the scores is used as a representation of the aquifer media data layer.

The appropriate general-purpose and site-specific data for each of the seven data layers at each pixel are processed at the preliminary stage towards forming datasets, in which the minimum possible SVI is 23 and its maximum possible value is 230. The best practice procedure for the data is given later.

An insight into ALPRIFT – a rationale for risk mapping

The domain of the influence of each of the ALPRIFT data layers in Equation (1) can be local or system-wide. Similar to Sadeghfam et al. (2020a), the paper argues that a local domain of influence would refer to the correlation among a number of pixels as a measure of PVI of the ALRIF data layers. Their common properties include: (i) prescribed ratings for ALRIF are such that higher SVI values are indicative of a greater potential for subsidence; (ii) they are related to natural geogenic processes; and (iii) while it is reasonable to expect correlation at local scales, owing to a diversity of interplaying factors, system-wide correlations are unlikely. The remaining two data layers of PT are related to the action of the water table, in which P is an anthropogenic factor and T is a direct response to it. These together have a potential for system-wide domain of effects and account for activating hydrostatic pressure through withdrawing from any control volume. Withdrawing hydrostatic pressure triggers active subsidence processes and hence AVI, but can also be referred to as active load or pressure. Notably, the R data layer has also the potential for system-wide effects but it is included in ALRIF, as it accounts for natural processes.

The potential of the ALPRIFT data layers for local and system-wide effects may be represented as follows:
formula
(2a)
formula
(2b)
AVI may prevail under annual fluctuation of water table regimes but these are likely to be small; whereas under persistent declining water table, AVI is likely to be system-wide. The two quantities of PVI and AVI invoke similarities with the concept of risk in terms of hazard and likelihood, as in the classical definition of risk. Generally, risk quantification refers to the scope of tools capable of estimating absolute values of risk defined in terms of mathematical products of consequences of hazards and their likelihoods, expressed as follows (Almoussawi & Christian 2005; Khatibi 2011):
formula
(3a)

The likelihood dimension concerns the loss of hydrostatic pressure and amplifies the potential for compaction and consolidation towards subsidence. This is accounted for by P (Pumpage) and T (water table). These effects are acted spontaneously to trigger subsidence.

The key to Equation (3a) is that the concept of risk is open-ended and the definition is adapted to each problem area. For instance, for relative risks leading to RI or mapping the risk concept is often expressed as (Living with Risk 2004):
formula
(3b)
Notably, vulnerability in Equation (3b) matches consequences of hazard in Equation (3a) and hazard that of likelihood. The semantic mismatch should not prevent the application of either of them but it could overshadow the underlying concepts. As a result, the paper puts emphasis on the term passive and active as the key for any equation expressing risk, in which the potentials for adverse effects (or harm) are passive, but an agent such as water is needed to ensure that the potential adverse effects reach system-wide. These are expressed as follows:
formula
(3c)
If AVI is limited to the normal annual fluctuation, the impact of AVI is likely to be confined to local areas and therefore an additive model is also appropriate to produce TVI, expressed as:
formula
(3d)

Processing InSAR – interferometric synthetic-aperture radar

Although vulnerability or risk index is not dimensionally equivalent to subsidence measured in length, each of these quantities are normalised in the range of 0–1 so that they can be mapped onto one another. In this way, the measured subsidence, either in situ or by satellite, is treated as proxy for either vulnerability or risk. Sentinel-1 satellite SAR data are observed information records on Earth's surface, which go back to 2010. They are processed by InSAR using phase difference between two SAR observations. Phase is defined as the fraction of one complete sine wave cycle and is often highly correlated with terrain topography and therefore SAR images together with the phase information contain amplitude information related to the strength of radar responses.

The interferometric wide (IW) swath products in this study comprise a 250 km swath with a spatial resolution of 5 m × 20 m, which include three sub-swaths captured by terrain observation with progressive scans SAR (TOPSAR). This technique provides homogeneous image quality throughout the swath with uniform signal-to-noise ratio (SNR) and distributed target ambiguity ratio (DTAR). The procedures in InSAR processing lead to producing surface elevations in eight steps and are detailed in Figure 1.

Figure 1

Data layer preparation, InSAR possessing and Level 1 strategy to mapping PVI, AVI, TVI and RI.

Figure 1

Data layer preparation, InSAR possessing and Level 1 strategy to mapping PVI, AVI, TVI and RI.

Primary concepts and tools used for modelling strategy

The paper uses GA, ANN and data fusion together with the cusp type of catastrophe functions and therefore they are specified in this sub-section for serving as building blocks for the formulation of a modelling strategy on PVI, AVI, LVI and RI and these are below.

Genetic algorithm (GA)

The paper uses GA, originally given by Holland (1992), as an optimisation technique to identify the values of the weights. It generates high-quality solutions to optimisation and search problems and is a widely used bio-inspired optimisation method by emulating such genetic operators as mutation, crossover and selection (Mitchell-Olds 1996). Its search process uses four evolutionary processes: (i) initialisation, (ii) selection, (iii) crossover, (iv) mutation (Davis 1991). The paper uses a best practice procedure, in which the initial population evolves as follows: (i) the population in each iteration forms a generation with an assessment of the fitness of each individual in the population; (ii) the genome of each individual evolves and forms a new generation through a modification (recombined and mutated randomly); (iii) the new generation of candidate solutions is used in the next iteration of the algorithm; and (iv) the process is terminated at the maximum number of generations or when the modelled fitness level reaches a designated value. The formulated objective function to identify the weights is given later.

Artificial neural network (ANN)

ANNs emulate the working processes in the brain and are parallel information processing systems through a set of neurons for transforming input signals to output signals by using a system of weights expressed as activation functions. The neurons in a layer are not connected to each other but each one in a layer is connected to all the neurons of the next layer. The paper uses a multi-layer feed-forward perceptron (MLP) network, where its topology comprises an input layer, a hidden layer and an output layer. Learning the weight parameters between the input–hidden and hidden–output layers uses the Levenberg–Marquardt (LM) algorithm, hence MLP-LM. The topology of the hidden layer is identified through a trial-and-error procedure. Implementations of ANN are detailed by Haykin (1999) and ASCE (2000).

Data fusion

The modelling strategy formulated by the paper innovates by introducing a data fusion technique that uses catastrophe theory rooted in the decision theory by Cheng et al. (1996). Data fusion is defined by See & Abrahart (2001) as the amalgamation of information from different data sources, which is widely practised in electrical engineering. It is characterised at signal level, pixel level, feature level and decision level (Jiang et al. 2011), each with its own definition and procedures. Abdelgawad & Bayoumi (2012) present another categorisation as low-level fusion, medium-level fusion, high-level fusion and multilevel fusion.

Data fusion models are formulated for inference, estimation, feature maps, aggregation, abstract sensors, classification and compression (Abdelgawad & Bayoumi 2012). A variety of techniques supports data fusion and they include statistical matching, Bayesian inference (as reviewed by Chen & Han 2016), moving average filters (Rodriguez et al. 2015) and grey relational analysis (Bai et al. 2016). The paper innovates by using cusp catastrophe to combine results of a lower level model for PVI, AVI or RI by pairing them with InSAR results as inputs. The outputs of PVI, AVI or RI serve as targets for the Level 2 model.

Catastrophe theory serving data fusion

Catastrophe theory, developed by Thom (1977), serves as a basis to the multi-criteria decision theory with wide applications to different fields of groundwater, such as groundwater potential field (Sadeghfam et al. 2016), groundwater intrinsic/specific vulnerability analysis to pollution (Nadiri et al. 2018b) and saltwater intrusion (Sadeghfam et al. 2020b). Catastrophe theory is characterised by control parameters and state variables with a range of characteristic functions. In this study, only the cusp catastrophe function is used, which gives the following function:
formula
(4)
where a is taken as PVI, AVI or RI and b is taken as InSAR results. For further information on types of catastrophe functions, reference may be made to Cheng et al. (1996).

Model learning strategy for vulnerability and risk indexing at two levels

While the paper uses the scoring system in ALPRIFT, it treats the inherent subjectivity in weights by introducing a modelling strategy at two levels. They comprise: Level 0: pre-process ALPRIFT data layers and prepare the InSAR layer; Level 1: learn the weights associated with each data layer using GA and prepare PVI, AVI, TVI or RI maps; and Level 2: carry out data fusion by using PVI, AVI or RI and InSAR results to produce target data for ANN models; train and test ANNs using ALPRIFT data layers but under direct supervision.

Level 0: pre-processing

The paper refers to the activities prior to the implementation of the modelling strategy as Level 0 and these comprise any set of activities prior to the risk/vulnerability mapping tasks. Activities at Level 0 are concerned with addressing data availability; deciding the scope of the study and decisions made on transforming raw data into datasets, as well as formulating a modelling strategy.

Level 1: using GA for optimisation

The modelling strategy at Level 1 employs GA, as specified in the section ‘Genetic algorithm’ and uses all the data points in the identification process without testing the performance of the model. The objective function is formulated as the sum of absolute difference between risk index and normalised subsidence obtained by InSAR and is expressed as follows:
formula
(5)
where Z is objective function; are the unknown weight values to be identified; i denotes the cell counter and N the total number of cells and this is minimised by GA. Once the weights of the data layers are identified, the values of PVI, AVI, TVI and RI are processed at each and all pixels.

Level 2 using ANN for learning

The modelling strategy at Level 2 is rather complex and comprises the following steps: (i) data fusion for the production of target quantities of PVI, AVI and RI; (ii) formulating three supervised ANN models; and (iii) using the three ANN models for mapping PVI, AVI and RI. These are outlined below and illustrated in Figure 2.

Figure 2

Level 2 strategy to mapping PVI, AVI, TVI and RI.

Figure 2

Level 2 strategy to mapping PVI, AVI, TVI and RI.

Step 1 – Data fusion: A novel data fusion technique is introduced, which uses Equation (4) and comprises two input parameters of a and b as follows: (i) the a-parameter takes the variable values of PVI, AVI or RI at each and all pixels fed from GA modelling at Level 1; and (ii) the values of the b-parameter take the InSAR value at each and all pixels. These are expressed as:
formula
(6a)
formula
(6b)
formula
(6c)

It is noted that each of Equations (6a)–(6c) has an output, which use the normalised value of PVI, AVI, RI and InSAR at each pixel but are adjusted through the nth root. The choice of the 2nd root for a and the 3rd root for b signify one scheme but other schemes are worth exploring.

Step 2 – Supervised ANN models: Three ANN models are constructed for mapping PVI, AVI and RI. The inputs to these three ANN models use the rated data layers and the target values are fed from Step 1 at Level 2. The models are then trained and tested through the usual trial-and-error procedure to identify their appropriate topology.

Step 3 – Mapping: The maps of the three ANN models are produced for PVI, AVI and RI but TVI mapping is carried out as the sum of PVI and AVI.

Performance metrics

The performances of each mapping activity for PVI, AVI, TVI and RI (ALPRIFT, GA and three ANN models) are studied by the receiver operating characteristic (ROC) curve and the area under curve (AUC). Swets (1988) developed this matric to measure the accuracy of a diagnostic system. It divides the events regarding the diagnosis into four groups as true positive (TP), true negative (TN), false positive (FP) and false negative (FN). In the subsidence problem, an event refers to hazard, vulnerability and risk indices, while diagnosis refers to Earth's displacement. The ROC curve plots FP proportion versus TP proportion at various threshold settings. A preferable ROC curve is the one with deviation towards the upper left corner. AUC quantifies the accuracy of the ROC curve, defined as the proportion of the area beneath the ROC curve to the total area. Thus, AUC varies between 0.5 and 1. AUC = 0.5 indicates no discrimination or equal FP and TP proportion while its ROC curve lies along the major diagonal line. On the other hand, AUC = 1 indicates perfect discrimination and its ROC curve follows the upper and left axes.

The performances of the ANN models for the mapping of PVI, AVI, TVI and RI use determination coefficient (R2) and root mean square error (RMSE) metrics. The value of R2 for ‘perfect’ model fits is 1 but when its value approaches zero, its performance become poorer as an indication of no correlation. The best RMSE value is 0 but has no upper limit, although the lower the value the better the performance.

Jenks optimisation method

The actual values of PVI, AVI, TVI and RI at each pixel are variables but this hampers extracting the patterns for an understanding of their inherent variabilities. Therefore, the results are communicated by their classifications into bands using the Jenks optimisation method. It is a data clustering method to identify the optimum arrangement for different classes through finding the minimum average deviation from the class mean by maximising the deviation of each class from the means of other classes. It reduces the variance within the classes and maximises the variance between classes as the main goals of the Jenks optimisation method (Jenks 1967). This paper uses five bands as follows: Band 1 (low), Band 2 (relatively low), Band 3 (moderate), Band 4 (relatively high), Band 5 (high). Using this technique reduces the subjectivity associated with indices’ classification by expert judgement.

STUDY AREA

System description

Tasuj (also known as Tesi) plain is located in the west of East Azerbaijan province, northwest Iran and forms the northern coast of Lake Urmia (see Figure 3(a)), currently distressed due to water level decline over recent years since 2000. The plain is bounded by Mishov and Qatir Uchan Mountains at its north, Shabestar plain at its east and Salmas plain at its west. The aquifer and plain formed from alluvial deposits and cover an area of 295 km2 (see Figure 3(b) and 3(c)). In general, the diversity of the formations in the basin is wide and all three groups of rocks including igneous, metamorphosed and sedimentary are observable in the geological formations from Precambrian to the recent era. Figure 3(c) also illustrates the spatial location of fault in the south-east of the study area.

Figure 3

Study area: (a) location map, river and observation wells; (b) groundwater flow direction and abstraction wells; (c) lithological map.

Figure 3

Study area: (a) location map, river and observation wells; (b) groundwater flow direction and abstraction wells; (c) lithological map.

This aquifer is unconfined and serves as the main source for domestic and agricultural uses in the city of Tasuj and the villages within the plain. The aquifer and seasonal watercourses flow to Lake Urmia in the direction of north-to-south (see Figure 3(a) and 3(b)). Notably, the general slope of the plain decreases from the north-east, Qatir Uchan Mountains, to south-west, shoreline of Lake Urmia. There are 69 qanats, 41 springs and 144 deep and semi-deep tube wells which discharge an estimated annual volume of 16 × 106 m3 from the aquifer (East Azerbaijan Water Authority (EAWA) 2012). According to the Emberger (1930) method, the study area is characterised as an arid and cold climate zone, and the maximum and minimum temperature are, respectively, 33 °C and −11 °C at Tasuj station in a ten-year period (EAWA 2008–2018). The average annual precipitation is 290 mm during a ten-year period (provided by Meteorological Organization of Iran 2008–2018).

The decline slopes are 2.3 and 8.6 cm per month for Lake Urmia and Tasuj aquifer, respectively, using linear regression analysis. The concern over the rapid decline of groundwater levels at the aquifer is attributable to policy failure with serious environmental and ecological impacts, e.g., saltwater intrusion and land subsidence.

Data preparation

ALPRIFT data layers

Best practice procedure for the seven ALPRIFT data layers is summarised in Table 2 along with the specification of raw data sources and required processing, similar to Nadiri et al. (2018a). The preparation of data layers L and T (land use and water table) in the paper differs from Nadiri et al. (2018a), as they used the decline in one-year period, whereas the paper uses the trend of water table decline as a more representative measure of decline.

Land use was prepared by image processing through Envi software in three steps, pre-processing, processing and post-processing. The pre-processing step applies geometric and atmospheric corrections by the following steps: (i) use normalized difference vegetation index (NDVI) to identify different land uses; (ii) define the determinable land uses as representative of local knowledge and image interpretations by the supervised classification approach to classify different land uses based on the maximum likelihood method. The results are evaluated using local knowledge and expert judgement with further modifications, if necessary.

InSAR images

Table 4 represents six InSAR single look complex (SLC) images from Sentinel-1A satellite acquired during April 2017 to April 2018. These images were processed five times as illustrated in the last two columns. The final result of InSAR processing is cumulative displacement for five interferometry analyses as discussed in the section ‘Processing InSAR’.

Table 4

Details of SAR images

Acquisition dateAcquisition timeOrbit numberOrbital headingTemporal base line (days)Master imageSlave image
2017.04.22 03:00:50 16,251 Descending 60 2017.04.22 2017.06.21 
2017.06.21 07:24:45 17,126 Descending 71 2017.06.21 2017.09.01 
2017.09.01 06:17:39 18,176 Descending 83 2017.09.01 2017.11.23 
2017.11.12 06:22:22 19,226 Descending 61 2017.11.23 2018.01.23 
2018.01.23 07:17:00 20,276 Descending 71 2018.01.23 2018.04.05 
2018.04.05 03:00:55 21,326 Descending    
Acquisition dateAcquisition timeOrbit numberOrbital headingTemporal base line (days)Master imageSlave image
2017.04.22 03:00:50 16,251 Descending 60 2017.04.22 2017.06.21 
2017.06.21 07:24:45 17,126 Descending 71 2017.06.21 2017.09.01 
2017.09.01 06:17:39 18,176 Descending 83 2017.09.01 2017.11.23 
2017.11.12 06:22:22 19,226 Descending 61 2017.11.23 2018.01.23 
2018.01.23 07:17:00 20,276 Descending 71 2018.01.23 2018.04.05 
2018.04.05 03:00:55 21,326 Descending    

Datasets

The square grid sizes selected in this study are 20 m × 20 m and this renders the total number of pixels to be 736,740. The dataset for the three ANN models were divided randomly into the training and testing phases in the proportion of 80 and 20%.

RESULTS

Four sets of results are presented: (i) at Level 0, the results comprise the individual rated values (see Figure 4), as well as the processed InSAR results; (ii) at Level 1, the GA results are presented for PVI, AVI, TVI and RI; (iii) at Level 2, the ANN results are presented for PVI, AVI, TVI and RI; and (iv) these results are compared.

Figure 4

Rated data layers: (a) aquifer media; (b) land use; (c) recharge; (d) impact of aquifer thickness; (e) fault distance; (f) pumpage of groundwater; (g) trends in decline of water table.

Figure 4

Rated data layers: (a) aquifer media; (b) land use; (c) recharge; (d) impact of aquifer thickness; (e) fault distance; (f) pumpage of groundwater; (g) trends in decline of water table.

Level 0: processing the data layers and the benchmark by basic ALPRIFT

The seven ALPRIFT data layers were prepared by processing the available raw data as summarised in Table 2 based on the flowchart in Figure 1. The rated output data layers are presented in Figure 4 for the study area, as well as Figure 5(a), which displays land subsidence in cm/annum for the period from 2017.06.21 to 2018.04.05. Attention is drawn to Figure 4(f) and 4(g), which reveal the variations in P and T data layer with hydraulically connected strips at their overlaps with expected high AVI, as discussed in due course.

Figure 5

Results: (a) InSAR subsidence map in cm/annum; (b) ALPRIFT; GA results: (c1) PVI; (c2) AVI; (c3) TVI; (c4) RI; ANN results: (d1) PVI; (d2) AVI; (d3) TVI; (d4) RI.

Figure 5

Results: (a) InSAR subsidence map in cm/annum; (b) ALPRIFT; GA results: (c1) PVI; (c2) AVI; (c3) TVI; (c4) RI; ANN results: (d1) PVI; (d2) AVI; (d3) TVI; (d4) RI.

The mapping results are classified into five bands by using the Jenks optimisation method (Jenks 1967), in which Band 5 represents high vulnerability/risk, and Band 1, that of low vulnerability/risk. A visual comparison of ALPRIFT in Figure 5(b) with InSAR results in Figure 5(a) indicates areas of strong convergences but also strong divergences, particularly at the strips of PT displaying some hydraulic connectivity. Based on the performance metrics of ROC/AUC in Figure 6 with AUC = 0.56, the performance of ALPRIFT is fit-for-purpose and good enough to serve as the basis for benchmarking the improvements at Levels 1 and 2, although improvements are needed to reduce inherent noise for defensible decision-making. Level 1 modelling results are presented in the next section.

Figure 6

ROC curve and AUC metrics: (a) Level 1 GA result; (b) Level 2 ANN result: Level 1 – a1: PVI; a2: AVI; a3: TVI; a4: RI; Level 2 – b1: PVI; b2: AVI; b3: TVI; b4: RI.

Figure 6

ROC curve and AUC metrics: (a) Level 1 GA result; (b) Level 2 ANN result: Level 1 – a1: PVI; a2: AVI; a3: TVI; a4: RI; Level 2 – b1: PVI; b2: AVI; b3: TVI; b4: RI.

Level 1: modelling results for mapping using GA

The results at Level 1 use GA, the default parameters for which are given in Table 5 together with the identified weight values for each data layer. The comparison in the table is not directly possible, as the weights of ALPRIFT are assumed to change in the range of 1–5 but those of GA between 1 and 7; notably, ALPRIFT data layers are quite divergent and this may be a reflection of the subjectivity in the scoring system.

Table 5

Optimised weight of incorporated data layers for GA and number of neurons and performance criteria for trained ANNs

GA at Level 1Data layerPassive vulnerability
Active vulnerability
ALRIFPT
 ALPRIFT weights 
 GA weight 4.5 
Default GA Parameters: The number of populations, crossover percentage and mutation percentage are 40%, 80% and 30% respectively. Maximum iteration number was set into 200 by experience, as the optimisation behaviour reached its asymptotic behaviour with no further reduction in the cost function 
GA at Level 1Data layerPassive vulnerability
Active vulnerability
ALRIFPT
 ALPRIFT weights 
 GA weight 4.5 
Default GA Parameters: The number of populations, crossover percentage and mutation percentage are 40%, 80% and 30% respectively. Maximum iteration number was set into 200 by experience, as the optimisation behaviour reached its asymptotic behaviour with no further reduction in the cost function 
ANNs at Level 2ANN modelsNumber of neurons
R2
RMSE
Input layerHidden layerOutput layerTotalTrainingTestingTotalTrainingTesting
 ANN 1 (PVI) 0.829 0.829 0.828 0.027 0.029 0.027 
ANN 2 (AVI) 0.773 0.774 0.767 0.029 0.027 0.029 
ANN 3 (RI) 0.869 0.870 0.869 0.026 0.026 0.026 
ANNs at Level 2ANN modelsNumber of neurons
R2
RMSE
Input layerHidden layerOutput layerTotalTrainingTestingTotalTrainingTesting
 ANN 1 (PVI) 0.829 0.829 0.828 0.027 0.029 0.027 
ANN 2 (AVI) 0.773 0.774 0.767 0.029 0.027 0.029 
ANN 3 (RI) 0.869 0.870 0.869 0.026 0.026 0.026 

GA is used to reduce inherent noise for a defensible decision-making and the results presented in Figure 5(c1)5(c4) for values of PV, AVI, TVI and RI at Level 1, respectively, are compared visually with both Figure 5(a) (InSAR) and 5(b) (the ALPRIFT benchmark). Even a visual comparison indicates significant convergences particularly along strips of hydraulic connectivity due to PT data layers and some divergences.

The convergence between Figure 5(c2) (AVI values) and 5(c4) (RI values) and those of Figure 5(a) (InSAR) are evident; as well as those between Figure 5(c2) (AVI values) and 5(c4) (RI values) with Figure 4(f) (pumpage). The visual evidence for the role of water to act in strips of the study area to actuate risk is evident and so is the successful role of data fusion. A visual comparison of the results in Figure 5(c2) (AVI values) and 5(c4) (RI values) shows that AVI strongly influences RI along a certain area but in the remaining parts the concordance between PVI and TVI is stronger. Arguably, existence of AVI is a significant learning emerging from the methodology presented by the paper and the role of data fusion is demonstrable.

The curvature of the ROC curve and the performance metrics of AUC in Figure 6 shows improvements in the AUC value for PVI, AVI, TVI and RI to make them defensible in decision-making. The results provide significant evidence for existence of PVI and AVI.

Level 2: ANN modelling results for mapping

The results at Level 2 are presented in Table 5 using ANN, for which the topology of the three ANN models were decided through a trial-and-error procedure. It also presents the performance metrics of the selected topology for PVI, AVI and RI, according to which there is a slight deterioration in the quality of the performance metric from the training phase to the testing phase, but this is expected. Overall, the models are fit-for-purpose in terms of RMSE and R2 performance metrics for studying the performances of PVI, AVI, TVI and RI.

The ANN mapping results, classified into five bands using the Jenks optimisation method (similar to Level 1 models) are presented in Figure 5(d1)5(d4) for values of PVI, AVI, TVI and RI at Level 2, respectively, and compared visually with both Figure 5(a) (InSAR) and 5(b) (ALPRIFT), as well as with Figure 5(c1)5(c4). They show notably more convergences and more reduction in divergences. In particular, the convergences between Figure 5(d2) and 5(d4) and those of Figure 5(c2) and 5(c4) are evident, as well as those between Figure 5(d2) and 5(d4) with Figure 4(f). The visual evidence for the role of water to act in strips of the study area as the agent of actuating risk is evident and this is reinforced by the curvature of ROC and the performance metrics of AUC in Figure 6, which shows improvements in the AUC value for PVI, AVI, TVI and RI to make them defensible in decision-making. A visual comparison of the results in Figure 5(d2) (AVI) and 5(d4) (RI) further shows that AVI at Level 2 strongly influences RI at Level 2 along the strip but in the remaining parts, the concordance between PVI and TVI is stronger, as per Figure 5(d1) and 5(d3). The results provide significant evidence for existence of PVI and AVI.

Inter-comparison of the results

The overview of Figure 6 is that it captures the emerging narrative on the signals inherent in ALPRIFT and the two levels of modelling to improve on the signals as follows. (i) Level 0: the signals in basic ALPRIFT, despite containing subjectivity in rates and weights, are significant as per AUC value of 0.56 and its concave curvature but better information extraction techniques are needed for more defensible decision-making and hence the modelling strategy at two levels. (ii) Level 1: GA renders AUC values of 0.61, 0.63, 0.67, 0.67 for PVI, AVI, TVI and RI, respectively; and the curvatures of ROC become more concaved. (iii) At Level 2, ANNs render AUC values of 0.68, 0.66, 0.70 and 0.71 for PVI, AVI, TVI and RI, respectively; and the curvatures of ROCs become even more concaved. Hence, the modelling strategy makes a difference in a defensible way to extract inherent correlations within the ALPRIFT data layers.

The proportions of the areas swept by Band 1–Band 5 are delineated and presented for each set of results and displayed in Figure 7. The results evidently show that the worst case of Band 5 is currently significant as per each set of results but increased learning tends to unravel a greater proportion for adverse bands. Therefore, searching for more improved modelling strategy is important. The majority of the study area seems to be located in Bands 1–2, but an ongoing investigation by the authors (yet to be published) shows that the situation is worsening.

Figure 7

Percentage of areas swept by different bands of results.

Figure 7

Percentage of areas swept by different bands of results.

The convergences and divergences between SVI using ALPRIFT and InSAR results are also the focus of attention, but the reasons for the divergence are not clear-cut. However, it is noted that the correspondence between PVI and TVI, as well as that between AVI and RI are obvious. Notably, the central strip along the north–south direction acting as a hydraulic corridor to actuate risk is in evidence. A further patch at the north-west seems to be high in AVI. The rest of the areas are rather low in PVI and TVI and therefore low in AVI and RI.

The performance metrics including ROC/AUC signify convergences and divergences between InSAR results and PVI, AVI, TVI and RI values. A further measure is ideal to provide a visual aid to understand the convergences and divergences and these are presented in terms of the scatter diagrams for the ANN results, as well as those for the residuals. The results are presented in Figure 8(a1)8(a3) for the scatter diagram and Figure 8(b1)8(b3) for the scatter diagram of the residuals. The three scatter diagrams (Figure 8(a1)8(a3)) display similar shapes, approximately elongated triangular in the range of 0.2–0.7. Values in the vicinity of the 45° line mark better performances but those bulging out of the line display significant divergences that require some explanations. Likewise, three residual scatter diagrams (Figure 8(b1)8(b3)) display three similar shapes and in the range of 0.2 to −0.3. Values in the vicinity of the line of 0.0 residuals mark better performances but those bulging out of the vicinity show high divergence normally where InSAR values are high. The figures for TVI do not follow the above patterns.

Although the improvements from Level 0 to Level 2 are notable, as per results in Figures 6 and 8, there is room for improvement and one to achieve this is through testing different strategies, as well as more improved measured data. The pixels at low bands are liable for more divergences. The exception for the TVI shapes are likely to stem from not developing an ANN model but deducing it from PVI and AVI.

Figure 8

Scatter and residual error diagram and for trained ANNs.

Figure 8

Scatter and residual error diagram and for trained ANNs.

The field evidence for land subsidence is the basis for ground truthing and a glimpse of the situation on the ground is displayed in Figure 9. As the vicinities of observation wells are often filled by granular material (gravel) after subsidence, the reported values of subsidence from field observation do not provide sufficiently reliable data to test the developed methodology. Additionally, there are other uncertainties associated with the material of the piezometers.

Figure 9

RI at Level 2 and field evidence of subsidence documented on 26 October 2018.

Figure 9

RI at Level 2 and field evidence of subsidence documented on 26 October 2018.

DISCUSSION

The programme of research by the authors in the paper is towards developing a GIS-based ‘working tool’ for mapping subsidence at plains over aquifers. Arguably, the current status of the reported research on ALPRIFT, PVI, AVI, TVI and RI related to subsidence is a strong proof-of-concept, as classified by NASA (see: https://www.nasa.gov/sites/default/files/trl.png) corresponding to TRL4 or higher. The future direction of the authors’ work is to raise these techniques to Technical Readiness Level 9 (TRL9).

The rates and weights of seven ALPRIFT data layers are prescribed that were suggested by Nadiri et al. (2018a). These values were set to cover different hydrogeological settings of study areas. However, the values are inevitably subjective as they are set by expert judgement and this is a normal, as acknowledged by Nadiri et al. (2018a) and the present study. This explains why the problem of enhancing model accuracy by learning from observed values is topical for all frameworks, such as DRASTIC and ALPRIFT is no exception. The present study aims to reduce inherent subjectivities by a modelling strategy. Evidently, the ALPRIFT framework is generic and applies to all sites; model enhancement strategies are also generic and can be applied to other sites; but the identified rates and weights are site-specific because they are learned from a set of site-specific data.

In addition to the generic nature of the ALPRIFT framework, it is in its infancy and the study areas investigated are yet to expand. It is necessary to investigate more areas with complex hydrological settings such as confined or multiple aquifers. In these cases, although prescribed weights and rates are still generic, some data layers should be prepared in different ways. For example, water table decline is substituted by the decline in piezometric level, or where aquifer crops out should be considered to prepare the recharge data layer.

The authors are exploring different ways of dividing ALPRIFT data layers into PVI and AVI and testing various modelling strategies. Their ongoing research, both reported in the paper and by Sadeghfam et al. (2020a), confirms that the inherent signals are significant in the general purpose ALPFRIFT data layers for having AUC values in the range of 0.6–0.7 to support existence of PVI, AVI, TVI and RI as new quantities. They are currently investigating ways of improving the strength of the identified signals by testing different modelling strategies in two layers. The basis of the techniques developed in the paper may be captured by two terms, heuristics and framework. The term heuristics refers to any broader techniques where their rational justification is judged by the efficacy of the results to enable modellers to discover or learn about data to gain a new insight but normally without any theoretical or empirical basis. The particular choices made in the paper are all on a heuristic basis and without any theoretical or empirical justifications. Also, the term framework signifies the cases, where there is no theory, empirical knowledge or even measurements to justify the selected data layers.

PVI (ALRIF) extracts information on land subsidence with local significance and AVI (PT) on that of anthropogenic activities with system-wide significance. Although ROC/AUC indicates that signals are significant, other combinations of data layers are also feasible, e.g., ALIF and RPT with respect to the role of water creating a system-wide impact.

The characterisation of the result as fit-for-purpose and defensible is discussed in detail by Khatibi et al. (2020) to invoke criticality that modelling results are not the end but means to an end and therefore the underlying assumptions have to be examined, as follows. At Level 0, the authors had directly observed subsidence values but the observation wells were not associated with GPS facilities and therefore the observation values were found unsuitable for ground truthing. At Level 1, the paper uses GA for the optimisation problem as it is widely used, but other techniques should also be investigated. At Level 2, data fusion was investigated by one possible scheme without sensitivity studies, but other techniques should also be investigated. Khatibi et al. (2020) refer to modelling strategies of this nature as ‘inclusivity in multiple models’ (IMM) and argue that there is a mathematical condition to ensure the improvements in the model accuracy. There are a wide range of strategies possible and this itself is the subject of research. The choice for the modelling strategy presented in the paper was a demonstration of the feasibility without maximising the extracted information from the data.

Overall, with the benefit of hindsight, the treatment of TVI in terms of a simple addition of PVI and AVI seems a poor choice simply based on the scatter diagram and the residuals displayed in Figure 8(a3) (scatter diagram for RI) and 8(b3) (scatter diagram of residuals for RI). This does not undermine the modelling results produced in the paper but just that TVI values at Level 2 can be better. The authors’ future plans include testing the performances of learning from inclusive multiple models with different data fusion techniques.

CONCLUSION

The paper presents evidence on the proof-of-concept for transforming SVI into a RI capability for plains over aquifers, particularly for where there is sparse data availability including uncertain or imprecision inherent in data. The methodology is formed by collective contribution of ALPRIFT data layers using a scoring system of prescribed rates and prescribed weights. The paper divides the data layers into two groups: (i) the five ALRIF data layers account for PVI measuring local resistance to variations; (ii) the two PT data layers account for AVI measuring system-wide effects; (iii) the additive model of PVI and AVI renders the TVI capability equivalent to basic ALPRIFT; and (iv) the multiplicative model of PVI and AVI renders the RI capability.

The methodology is implemented through an IMM strategy at two levels, as follows: (i) at Level 0, preliminary decisions are made; (ii) at Level 1, GA is used as an optimisation technique to identify the weight values for vulnerability and risk mapping of the study area; and (iii) at Level 2, a three-step modelling procedure is devised, which comprises: Step 1: the data fusion of vulnerability and risk indices with InSAR results produce outputs that are fairly placed to serve as target values; Step 2: supervised ANN models are constructed for vulnerability and risk mapping, which use the rated ALPRIFT data layers together with the weights produced at Level 1 and target values as per Step 1 and thereby to produce a trained and tested models; Step 3: the models are applied to produce maps of the study area for vulnerability and risk indices.

A study of the results provides further evidence that in spite of subjectivity in ALPRIFT, it is capable of capturing significant signals from site-specific data but they need to be improved for a defensible decision-making. The two-level modelling strategy formulated by the paper provides evidence that the splitting of the ALPRIFT data layers into two groups for mapping PVI and AVI produce improved signals in vulnerability and risk mapping. As such, there are strips of the study area where the effects of PT are transmitted beyond local pixels and where risk mapping is more reliable; whereas there are areas, where active PT effects are not significant and remain limited to local impacts.

DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

REFERENCES

REFERENCES
Abdelgawad
A.
Bayoumi
M.
2012
Resource-Aware Data Fusion Algorithms for Wireless Sensor Networks
.
Springer
,
New York, Dordrecht Heidelberg
,
London
.
Aller
L.
Bennett
T.
Lehr
J. H.
Petty
R. J.
Hackett
G.
1987
DRASTIC: A Standardized System for Evaluating Ground Water Pollution Potential Using Hydrogeologic Settings
.
Robert S. Kerr Environmental Research Laboratory, Office of Research and Development, US Environmental Protection Agency
,
Ada, OK
,
USA
.
Almoussawi
R.
Christian
C.
2005
Fundamentals of quantitative risk analysis
.
Journal of Hydroinformatics
7
(
2
),
61
77
.
ASCE Task Committee on Application of Artificial Neural Networks in Hydrology
2000
Artificial neural networks in hydrology. I: preliminary concepts
.
Journal of Hydrologic Engineering
5
(
2
),
115
123
.
Bai
Y.
Xie
J.
Wang
X.
Li
C.
2016
Model fusion approach for monthly reservoir inflow forecasting
.
Journal of Hydroinformatics
18
(
4
),
634
650
.
Chen
Y.
Han
D
.
2016
Big data and hydroinformatics
.
Journal of Hydroinformatics
18
(
4
),
599
614
.
Chen
M.
Tomás
R.
Li
Z.
Motagh
M.
Li
T.
Hu
L.
Gong
H.
Li
X.
Yu
J.
Gong
X.
2016
Imaging land subsidence induced by groundwater extraction in Beijing (China) using satellite radar interferometry
.
Remote Sensing
8
(
6
),
468
.
Cheng
C. H.
Liu
Y. H.
Lin
Y.
1996
Evaluating a weapon system using catastrophe series based on fuzzy scales
. In
Soft Computing in Intelligent Systems and Information Processing. Proceedings of the 1996 Asian Fuzzy Systems Symposium
.
IEEE
. pp.
212
217
.
Davis
L.
1991
Handbook of Genetic Algorithms
.
Van Nostrand Reinhold
,
New York
,
USA
.
Emberger
L.
1930
Climate on a formula applicable in botanical geography
.
Comptes Rendus de l'Académie des Sciences
1991
,
389
390
.
Faunt
C. C.
Sneed
M.
Traum
J.
Brandt
J. T.
2016
Water availability and land subsidence in the Central Valley, California, USA
.
Hydrogeology Journal
24
(
3
),
675
684
.
Gao
M.
Gong
H.
Chen
B.
Zhou
C.
Chen
W.
Liang
Y.
Shi
M.
Si
Y.
2016
InSAR time-series investigation of long-term ground displacement at Beijing Capital International Airport, China
.
Tectonophysics
691
,
271
281
.
Ghorbanzadeh
O.
Blaschke
T.
Aryal
J.
Gholaminia
K.
2020
A new GIS-based technique using an adaptive neuro-fuzzy inference system for land subsidence susceptibility mapping
.
Journal of Spatial Science
32
(
3
),
401
418
.
Haykin
S.
1999
Neural Networks: A Comprehensive Foundation
, 2nd edn.
Prentice Hall
,
Upper Saddle River, NJ
,
USA
.
Holland
J. H.
1992
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence
.
MIT Press
,
Cambridge, MA
,
USA
.
Hu
B.
Zhou
J.
Wang
J.
Chen
Z.
Wang
D.
Xu
S.
2009
Risk assessment of land subsidence at Tianjin coastal area in China
.
Environmental Earth Sciences
59
(
2
),
269
.
Hu
L.
Dai
K.
Xing
C.
Li
Z.
Tomás
R.
Clark
B.
Shi
X.
Chen
M.
Zhang
R.
Qiu
Q.
Lu
Y
.
2019
Land subsidence in Beijing and its relationship with geological faults revealed by Sentinel-1 InSAR observations
.
International Journal of Applied Earth Observation and Geoinformation
82
,
101886
.
Jenks
G. F.
1967
The data model concept in statistical mapping
.
International Yearbook of Cartography
7
,
186
190
.
Jiang
D.
Zhuang
D.
Huang
Y.
Fu
J
2011
Survey of multispectral image fusion techniques in remote sensing applications. Image Fusion, Its Applications
. (
Zheng
Y.
, ed.).
IntechOpen
,
London
, pp.
1
23
.
doi:10.5772/10548
.
Khatibi
R.
2011
Evolutionary systemic modelling of practices on flood risk
.
Journal of Hydrology
401
(
1–2
),
36
52
.
Living with Risk
2004
A Global Review of Disaster Reduction Initiatives
.
Inter-Agency Secretariat of the International Strategy for Disaster Reduction (UN/ISDR)
.
Ma
Z. L.
Chen
G. F.
Zhao
R.
Chen
H. F.
Xia
S. L.
2010
A new indoor simulating system for ground settlement due to pumping
.
Water Resources and Hydropower Engineering
41
(
4
),
39
43
.
Modoni
G.
Darini
G.
Spacagna
R. L.
Saroli
M.
Russo
G.
Croce
P.
2013
Spatial analysis of land subsidence induced by groundwater withdrawal
.
Engineering Geology
167
,
59
71
.
Nadiri
A. A.
Taheri
Z.
Khatibi
R.
Barzegari
G.
Dideban
K.
2018a
Introducing a new framework for mapping subsidence vulnerability indices (SVIs): ALPRIFT
.
Science of the Total Environment
628
,
1043
1057
.
Nadiri
A. A.
Khatibi
R.
Khalifi
P.
Feizizadeh
B.
2020
A study of subsidence hotspots by mapping vulnerability indices through innovatory ‘ALPRIFT’ using artificial intelligence at two levels
.
Bulletin of Engineering Geology and the Environment
79
,
3989
4003
.
Pradhan
B.
Abokharima
M. H.
Jebur
M. N.
Tehrany
M. S.
2014
Land subsidence susceptibility mapping at Kinta Valley (Malaysia) using the evidential belief function model in GIS
.
Natural Hazards
73
(
2
),
1019
1042
.
Rodriguez
A.
Bermudez
M.
Rabunal
J. R.
Puertas
J.
2015
Fish tracking in vertical slot fishways using computer vision techniques
.
Journal of Hydroinformatics
17
(
2
),
275
292
.
Sadeghfam
S.
Ehsanitabar
A.
Khatibi
R.
Daneshfaraz
R.
2018
Investigating ‘risk'of groundwater drought occurrences by using reliability analysis
.
Ecological Indicators
94
,
170
184
.
Sadeghfam
S.
Khatibi
R.
Dadashi
S.
Nadiri
A. A.
2020a
Transforming subsidence vulnerability indexing based on ALPRIFT into risk indexing using a new fuzzy-catastrophe scheme
.
Environmental Impact Assessment Review
82
,
106352
.
Sadeghfam
S.
Khatibi
R.
Daneshfaraz
R.
Rashidi
H. B.
2020b
Transforming vulnerability indexing for saltwater intrusion into risk indexing through a fuzzy catastrophe scheme
.
Water Resources Management
34
(
1
),
175
194
.
See
L.
Abrahart
R. J.
2001
Multi-model data fusion for hydrological forecasting
.
Computers & Geosciences
27
(
8
),
987
994
.
Shrestha
P. K.
Shakya
N. M.
Pandey
V. P.
Birkinshaw
S. J.
Shrestha
S.
2017
Model-based estimation of land subsidence in Kathmandu Valley, Nepal
.
Geomatics, Natural Hazards and Risk
8
(
2
),
974
996
.
Sundell
J.
Haaf
E.
Tornborg
J.
Rosén
L.
2019
Comprehensive risk assessment of groundwater drawdown induced subsidence
.
Stochastic Environmental Research and Risk Assessment
33
(
2
),
427
449
.
Swets
J. A
.
1988
Measuring the accuracy of diagnostic systems
.
Science
240
(
4857
),
1285
1293
.
Tung
Y. K.
Yen
B. C.
Melching
C. S.
2005
Hydrosystems Engineering Reliability Assessment and Risk Analysis
.
McGraw-Hill Professional
,
New York
,
USA
.
Zhou
C.
Gong
H.
Chen
B.
Li
X.
Li
J.
Wang
X.
Gao
M.
Si
Y.
Guo
L.
Shi
M.
Duan
G.
2019
Quantifying the contribution of multiple factors to land subsidence in the Beijing Plain
.
China with Machine Learning Technology. Geomorphology
335
,
48
61
.