Abstract
Groundwater (GW) plays a key role in water supply in basins. As global warming and climate change affect groundwater level (GWL), it is important to predict it for planning and managing water resources. This study investigates the GWL of the Yazd-Ardakan Plain basin in Iran for the base period of 1979–2005 and predicts for periods of 2020–2059 and 2060–2099. Lagged temperature and rainfall are used as inputs to hybrid and standalone artificial neural network (ANN) models. In this study, the rat swarm algorithm (RSA), particle swarm optimisation (PSO), salp swarm algorithm (SSA), and genetic algorithm (GA) are used to adjust ANN models. The outcomes of these models are then entered into an inclusive multiple model (IMM) as an ensemble model. In this study, the output of climate models is also inserted into the IMM model to improve the estimation accuracy of temperature, rainfall, and GWL. The monthly average temperature for the base period is 12.9 °C, while average temperatures for 2020–2059 under RCP 4.5 and RCP 8.5 scenarios are 14.5 and 15.1 °C, and for 2060–2099 they are 16.41 and 18.5 °C under the same scenarios, respectively. In future periods, rainfall is low in comparison with the base period. Lagged rainfall and temperature of the base period are inserted into ANN-RSA, ANN-SSA, ANN-PSO, ANN-GA, and ANN models to predict GWL for the base period. Outputs of IMM, ANN, and the five hybrid models (ANN-RSA, ANN-SSA, ANN-PSO, and ANN-GA) indicate that root mean square errors (RMSE) are 2.12, 3.2, 4.58, 6.12, 6.98, and 7.89 m, respectively, in the testing level. It is found that GWL depletion in 2020–2059 under RCP 4.5 and RCP 8.5 scenarios are 0.60–0.88 m and 0.80–1.16 m, and in 2060–2099 under the same scenarios they are 1.49–1.97 m and 1.75–1.98 m, respectively. The results highlight the need to prevent overexploitation of GW in the Ardakan-Yazd Plain to avoid water shortages in the future.
HIGHLIGHTS
Introducing a new ensemble model for integrating outputs of general circulation models.
Introducing a new ensemble model for predicting GWL.
Introducing a new feature selection method for choosing the best inputs.
Graphical Abstract
INTRODUCTION
Population growth, droughts, and water scarcity are challenges faced by decision-makers in terms of water supply (Farrokhi et al. 2021; Ghanbari-Adivi et al. 2022). Groundwater (GW) is an important global water supply source subject to increasing overexploitation. Predicting variations in the groundwater levels (GWL) driven by climate change is an important topic to ensure water supply and appropriate management at basin scale for future scenarios (Banadkooki et al. 2020). Sustainable water resource management is an important topic for managers (Chitsazan et al. 2015; Ghazi et al. 2021; Gong et al. 2018; Gong et al. 2016; Rezaie-balf et al. 2017; Yoon et al. 2016). Tapoglou et al. (2014) conducted a study to investigate the effects of different climate scenarios on the GWL of a basin in Greece and concluded that decreasing precipitation will reduce the basin's GWL. Shrestha et al. (2016) used MODFLOW and combined model intercomparison project phase 5 (CMIP5) to predict GWL in future periods for a basin in Vietnam. They reported that the temperature would increase by 1.5 and 4.9 °C by the end of the 21st century under representative concentration pathway (RCP) 4.5 and 8.5, respectively, and that GWL would drop in short, medium, and long-terms. Chang et al. (2016) reported that under climate change, some aquifers in the USA might not supply the required water for future periods. Zhou et al. (2020) also showed the impact of climate change on the GW flow dynamics in China and the changing pattern in GWL. Patil et al. (2020) showed that temperature and rainfall increase by 2.59 °C and 81.50%, respectively, expected for the period 2021–2050, will lead to a GW recharge increase of 24.91% in the Hiranyakeshi watershed. Based on general circulation models and the MODFLOW model, Shrestha et al. (2020) also predicted a future GWL decline in Nepal. Based on Hamidov et al. (2020), climate change is expected to decrease GWL from 1.75 m in 2050 to 1.79 m by 2100 in north-west Uzbekistan. In the lake Tana basin, in Ethiopia, Tigabu et al. (2021) predicted surface runoff increases and the GWL declines under RCP 4.5 and RCP 8.5 scenarios.
The top three best input combination of temperature and rainfall to predict GWL, based on GT test and associated and Vratio indices (TE, temperature; RA, rainfall)
Inputs . | ![]() | Vratio . |
---|---|---|
TE (t − 1), RA (t − 1), TE (t − 2), RA (t − 3), RA (t − 2) | 0.002 | 0.0008 |
TE (t − 1), R (t − 1), TE (t − 2), RA (t − 3), RA (t − 2), TE (t − 3) | 0.005 | 0.0022 |
TE (t − 1), R (t − 1), TE (t − 2), RA (t − 3), RA (t − 2), TE (t − 3), TE (T − 4) | 0.007 | 0.0031 |
Inputs . | ![]() | Vratio . |
---|---|---|
TE (t − 1), RA (t − 1), TE (t − 2), RA (t − 3), RA (t − 2) | 0.002 | 0.0008 |
TE (t − 1), R (t − 1), TE (t − 2), RA (t − 3), RA (t − 2), TE (t − 3) | 0.005 | 0.0022 |
TE (t − 1), R (t − 1), TE (t − 2), RA (t − 3), RA (t − 2), TE (t − 3), TE (T − 4) | 0.007 | 0.0031 |
(a) The levels of modelling process for predicting GWL, (b) name of GCMs included in CMIP5, (c) definition of RCP scenarios investigated, (d) structure of the ensemble model using outputs of CMIP5 models to predict temperature and rainfall (circles shows neurons of each layer).
(a) The levels of modelling process for predicting GWL, (b) name of GCMs included in CMIP5, (c) definition of RCP scenarios investigated, (d) structure of the ensemble model using outputs of CMIP5 models to predict temperature and rainfall (circles shows neurons of each layer).
The main aim of this paper is to predict monthly GWL for future medium- (2020–2059) and long-term periods (2060–2099) based on rainfall and temperature from a base period (1979–2005) in one of Iran's key basins. Future GWL is based on climate change conditions based on CMIP5 using RCP 4.5 and RCP 8.5 scenarios. The study addresses the following research gaps related to the prediction of GWL:
- 1.
Individual climate models may not give accurate outputs. In this study, an ensemble framework is suggested to provide temperature and rainfall data based on individual climate models (Sharafati et al. 2020a). Temperature and rainfall data are provided based on climate models under RCP scenarios, using downscaling methods and outputs are then inserted into an ANN model as an ensemble model. The temperature and rainfall outputs of the ensemble models are the weighted outputs of the climate models.
Although ANN models have great potential for predicting GWL (Table 1A), robust optimisation algorithms should be used to adjust ANN parameters. In this paper, a new evolutionary algorithm, the rat swarm algorithm (RSA), is applied to train ANN models. RSA is easy to implement and is a robust optimisation algorithm for finding an accurate solution (Dhiman et al. 2021). To assess RSA's ability to train ANN, RSA is benchmarked against multiple optimisation algorithms (Dhiman et al. 2021). Even though there are tools like deep learning for simulation, their modelling process is complex. Preparing structures and parameters for deep learning models may be complex and time-consuming. Thus, hybrid and optimised ANN models are robust and reliable for predicting variables.
- 2.
While previous studies only used individual ANN models for predicting GWL, this study introduces an ensemble model, namely the inclusive multiple model (IMM)-ANN, to predict GWL. The IMM-ANN uses outputs of ANN-PSO, ANN-RSA and ANN-GA to achieve the final outputs of GWL. The IMM model reduces computational errors during the modelling process.
- 3.
One of the research gaps identified in previous studies is the lack of a proposal for a global strategy to determine inputs. In this article, lagged temperature and rainfall data are utilised as inputs to the models for predicting GWL. To determine the time lags of inputs, a new test, namely the hybrid gamma test-RSA (GT-RSA), is used in this study.
A key objective of this project is to predict GWL under climate change conditions. Modelling GL under climate change is challenging. Modellers need GCM outputs to predict temperature and rainfall. Incorporating the outputs of GCM models into an ensemble model can improve accuracy. Next, we need to predict GWL. Because the GWL depends on different factors, robust models are needed. Our study introduces a new ensemble for predicting GWL under climate change. Using the outputs of the GCM model, the new ensemble model predicts temperature and rainfall. Unlike other ensemble models, the new model does not require complicated computations. As an example, the Bayesian model averaging requires prior and posterior distributions. Furthermore, the paper introduces a new method for selecting inputs based on climate parameters. Modellers can use this method when they have a large number of inputs. In the final step, we build new MLP models for predicting the GWL and climate parameters using new optimisation algorithms. Our paper presents a new ensemble model, a new hybrid GT, and new MLP models for predicting GWL and climate parameters.
MATERIAL AND METHODS
Outline of the current study
The outline of the current study is as follows:
- 1.
As a first step, rainfall, temperature, and GWL data were collected for the base period (1979–2005). The RCP 4.5 and 8.5 scenarios from CMIP5 models were used to simulate temperature and rainfall for future medium- (2020–2059) and long-term periods (2060–2099). As the outputs of the models are large scale, we downscaled them (see section 2.2) to provide more reliable estimations. While it may be possible to obtain high-resolution downscaled datasets, this study used large-scale GCM outputs to illustrate the application of downscaling methods for downscaling data.
- 2.
The downscaled data of GCM models, i.e. temperature and rainfall, were inserted into an ANN model as an IMM model to improve accuracy. Using the IMM model and individual GCM models, the best model for predicting future temperature and rainfall is determined. The IMM models are explained in section 2.3.7.
- 3.
RSA is coupled with the GT to choose the best inputs and reduce the complexity of gamma tests. Section 3.1 presents the structure of the hybrid GT. This test is used to determine the best input data for predicting targets.
- 4.
Optimisation algorithms are coupled with ANN models to create optimised ANN models.
- 5.
ANN-RSA, ANN-SSA, ANN-PSO, ANN-GA, ANN-ANN, and ANN models are used to predict GWL, temperature, and rainfall (outputs).
- 6.
For the future and base period, the outputs of hybrid individual ANN models were inserted into the IMM model.
- 7.
Finally, the status of GWL for the future and base period is examined.
Climate models and scenarios
In this study, two RCP scenarios, namely RCP 4.5 and RCP 8.5, are used, along with multiple climate models (CMIP5), summarised in Figure 1(b). The definition of RCP scenarios is given in Figure 1(c). Figure 1(d) shows the structure of IMM as an ensemble model for predicting temperature and rainfall. Decision-makers requested climate models (CMIP5) and RCP scenarios for water resource planning and management. CMIP6 models can be used for simulations in future studies. In this study, the CMIP5 models were used. In the next studies the results of the CMPIP5 and CMPI6 models will be compared.










Machine-learning models used to predict GWL
Structure of ANN models
(a) Mathematical model of ANN, (b) structure of ANN model (I: input, f: activation function, output, and β: bias) (Jalali et al. 2020).
(a) Mathematical model of ANN, (b) structure of ANN model (I: input, f: activation function, output, and β: bias) (Jalali et al. 2020).
Modellers need fast calculation and convergence when training ANNs. A high degree of accuracy and fast computation are two main reasons for applying different optimisation algorithms (Mohanty et al. 2015). Ideally, algorithms should have fewer random parameters. PSO, RSA, SSA, and GA were chosen for this study due to their high accuracy, fast convergence, and good balance between exploration and exploitation. In addition, the algorithms are easy to develop. Two parameters of RSA are unknown: the size of the population and the number of iterations. Therefore, RSA does not have many unknown parameters. The modelling uncertainty decreases when the algorithm does not have many random parameters. SSA leaders guide the other solutions towards optimal solutions. As a result, SSA can find the best solutions earlier.
Rat swarm algorithm






Salpswarm algorithm









Particle swarm optimisation










Genetic algorithm
In optimisation, genetic algorithm (GA) is known as a useful evolutionary algorithm. GA, based on three operators, namely mutation, crossover, and selection, provides high-quality solutions (Li et al. 2021). The use of mutation operators increases the diversity of solutions. For this reason, GA can have a high ability to solve complex problems (Abdullah et al. 2021). Chromosomes are agents of GA and the quality of each chromosome is determined by computing the fitness function. Afterwards, different operators of GA are used to update the solutions (Li et al. 2021). A selection operator is applied to identify the best chromosomes with the best values of the objective function. In GA, crossover is utilised to combine the genetic information of two parents to generate a new offspring. To maintain the diversity of chromosomes, a mutation operator is used. In a mutation, a gene on the chromosomes is randomly changed. The optimisation process finishes when the stop condition is met. However, the optimisation algorithms are reliable tools for adjusting ANN parameters. The next section explains the structure of hybrid ANN models using optimisation algorithms.
Hybrid optimisation algorithms and ANN models
In this research, the ANN parameters are defined as decision variables of optimisation algorithms. The ANN parameters are obtained using optimisation algorithms as follows:
- 1.
ANN models are applied to estimate the data in the training phase.
- 2.
If the stop condition is satisfied, testing data is used to run ANN models in the testing phase, otherwise ANN models go to the next phase.
- 3.
Parameters of ANN models, including weight and bias, are considered as the initial population of optimisation algorithms.
- 4.
The precision of ANN models is determined by the computation of fitness functions. In this study, root mean square error (RMSE) is utilised as the fitness function. For each set of parameters, the ANN model run is computed using training data and the objective function.
- 5.
Operators of different optimisation algorithms are applied to update the positions of agents, which represent values of decision variables. Thus, changes in positions of agents in an optimisation algorithm mean updates of ANN parameters.
- 6.
The stop condition is monitored. If it is satisfied, the algorithms go to stage 3 and the model moves to the next phase; otherwise the model moves to stage 5.
Inclusive multiple models
Data analysis
Several error indices are utilised for determining the precision of models. Different indicators are used to evaluate the models comprehensively. We can identify the best model using these indicators.
- 1.
- 2.
- 3.
- 4.
The accuracy of a model cannot be evaluated with a single error index. Different indices were used to determine the best model for predicting temperature, rainfall, and GWL. Using different error indices, the best model should be superior to the others. Different characteristics of each error index aid modellers in making better judgments.
CASE STUDY
(a) Location of the case study, (b) average monthly temperature for the base period (1979–2005), (c) average monthly rainfall for the base period, and (d) GWL time series over the base period.
(a) Location of the case study, (b) average monthly temperature for the base period (1979–2005), (c) average monthly rainfall for the base period, and (d) GWL time series over the base period.
Gamma test














RESULTS AND DISCUSSION
Temperature and rainfall prediction
Comparison of climate models for predicting temperature (a) and rainfall (b) for the base period (1979–2005), based on RMSE, MAE, NSE, and PBIAS.
Comparison of climate models for predicting temperature (a) and rainfall (b) for the base period (1979–2005), based on RMSE, MAE, NSE, and PBIAS.
Predicted rainfall (a) and temperature (b) for future short/medium- (2020–2059) and long-term (2060–2099) periods.
Predicted rainfall (a) and temperature (b) for future short/medium- (2020–2059) and long-term (2060–2099) periods.
Selection of inputs
In this study, temperature and rainfall are used as inputs to predict monthly GWL. Lagged rainfall and temperature are chosen by a hybrid GT test. Table 1 shows the first to third best input combinations. The best input combination includes TE(t − 1), TE (t − 2), RA (t − 1), RA (t − 2), RA (t − 3). It is obtained automatically, therefore modellers do not have to test many input combinations. A correlation test can determine significant input parameters but does not give the best input scenario.
Heat map correlation of (a) rainfall with GWL and (b) temperature and GWL.
Sensitivity analysis to determine random parameters
Values of random parameters
Algorithms . |
---|
SSA, ![]() ![]() |
PSO: ![]() ![]() |
GA: mutation probability: 0.6, crossover probability: 0.40 |
Algorithms . |
---|
SSA, ![]() ![]() |
PSO: ![]() ![]() |
GA: mutation probability: 0.6, crossover probability: 0.40 |
Investigation of the precision of models for predicting GWL
Investigation of the accuracy of models based on (a) RMSE, (b) MAE, (c) PBIAS, and (d) NSE.
Investigation of the accuracy of models based on (a) RMSE, (b) MAE, (c) PBIAS, and (d) NSE.
FURTHER DISCUSSION
Predicting GWL for future periods
In the future, GWL variations will follow a similar pattern to the base period. GWL starts at a high level and ends at a low level. Variations caused by climate parameters in the future and the base period are the main causes of these variations. Global warming, increasing temperatures and decreasing rainfall are reasons for the decrease in GWL. Decision-makers will face a real challenge in relation to water supply in this basin. Decreasing GWL in the future may be detrimental to the economy and agriculture in the basin. The overexploitation of GWL in the current period has intensified water shortages in the future, as predicted for other areas worldwide. The use of renewable energy in the future could decrease water consumption for the generation of power. The status of GWL in the short, medium, and long-term shows that decision-makers should limit overexploitation of GW resources in the Yazd-Ardakan Plain basin. Furthermore, changes in cultivation patterns are an effective way of managing water resources in the basin. Figure 13 shows some peaks for GWL in the base and future periods. Wet durations and anomalous rainfall patterns may result in sudden fluctuations in GWL during study periods.
Box plots of GWL for periods (a) 2020–2059 and (b) 2060–2099 based on RCP 8.5.
Figure 1A in Appendix A shows GWL depletion in the future under both investigated RCP scenarios. To compute GWL depletion, GWL for future periods is compared with that of the base period. As can be observed in Figure 1(a), GWL depletion in the basin in future periods of 2020–2059 under RCP 4.5, 2020–2059 under RCP 8.5, 2060–2099 under RCP 4.5, and 2060–2099 under RCP 8.5 is 0.60 to 0.88 m, 0.80 to 1.16 m, 1.49 to 1.97 m, and 1.75 to 1.98 m, respectively. Thus, there is a fall in GWL in the future. Future studies can use other climate models such as CMIP6 for predicting temperature and rainfall in the future periods and compare GWL changes with those estimated in the current study.
However, results generally indicate that the status of GWL in future periods will decrease significantly. Thus, it is essential to suggest strategies for GW management based on the following points:
- 1.
Desalination of water is an effective solution to decrease water consumption for agriculture and irrigation. Some greenhouses use solar radiation for freshwater production (Panahi et al. 2021). Desalinated water can be used to meet irrigation and agriculture demands. Water desalination can be a cheaper solution than water transformation plans for water supply (Ehteram et al. 2021b). Decision-makers suggested that water be transferred from the Persian Gulf (south of Iran) to Yazd, but it would be expensive.
- 2.
The use of reclaimed water can be considered as an effective solution for the artificial recharge of aquifers (Sharafati et al. 2020b). Moreover, reclaimed water can be used for water supply in the agriculture sector. In this regard, a wastewater treatment plant can help with water treatment. Decision-makers might consider constructing ponds and transferring the treated municipal effluents into the ponds.
- 3.
The installation of smart meters on the wells could be another effective solution for preventing overexploitation.
- 4.
It is suggested to prevent the drilling of unauthorised wells by imposing strict rules.
- 5.
Although agriculture is a source of income, encouraging people to use non-renewable energy for monetisation is another way to reduce GW consumption.
- 6.
Raising public awareness and warning about the water crisis through the media is another solution.
- 7.
The current study provides spatial and temporal maps of GWL for future and current periods. As a result, policymakers will have a deeper understanding of how GWL varies under climate change conditions. The current strategies would be ineffective for managing water resources in the future, and so policymakers can better design new strategies rooted in deeper knowledge of the local impacts of climate change. Nevertheless, implementing new policies is complex because they must satisfy all stakeholders. Based on the study's results, it may be recommended that cultivation patterns are changed to reduce irrigation requirements. In the current study, the GWL maps help decision-makers identify critical regions. Additionally, they can prepare new plans for agriculture and irrigation management in the basin based on the predicted rainfall, temperature, and GWL.
- 8.
The study selected the ANN model because this model can be easily implemented. Also, the ANN parameters can be easily adjusted. Although there are models such as deep learning, these models have many unknown parameters. The authors explored convolutional neural networks and long short-term memory neural networks as robust deep learning tools for predicting GWL. However, the number of hidden layers, cell states, kernel functions, batch size, and number of epochs are unknown in the structure of the models. Thus, the implementation of these models may be complex. Also, this study used robust optimisers to develop the old version of ANN models.
- 9.
Since the policymakers had defined the project based on the outputs of CMIP5 models, the modellers used these models in the current study. However, we suggested the CMPI6 models for predicting temperature and rainfall in future periods.
The current study had two main innovations: the IMM model and the hybrid GT. The GCM models were individual models, but the IMM improved the accuracy of the outputs by integrating the GCM outputs. Additionally, the hybrid test in the current study is a robust alternative to other feature available methods. Gamma tests automatically select the best inputs, while other methods, such as deep learning, require complex strategies. Additionally, the IMM model provides reliable results for GWL predictions. For predicting different variables, the IMM model is a robust ensemble method.
This study provides a deep understanding of the available water resources. Based on predicted GWL, decision-makers can plan for water supply in future periods. Moreover, the decreasing GWL of the future periods is a concern for decision-makers and policymakers. Water consumption can be reduced by changing the cultivation pattern. However, different suggestions for water supply have been presented by decision-makers. The water transfer plan from the Persian Gulf is one of the ideas for the water supply of the Yazd province. However, it should be considered that the water transfer plans are complex and have environmental effects.
The managed aquifer recharge (MAR) method for improving GW security is becoming increasingly popular worldwide (Ross & Hasnain 2018). The use of flood flows for irrigation and recharge, and the development of GW reserves in drought situations have been investigated and proposed by researchers. The use of conjunctive management and MAR to link surface water and GW is growing as a result of the development of sophisticated and creative methods. In this study, temporal and spatial maps of GWL were provided by the models. These maps can support decision-makers in identifying the critical regions with the lowest GWL.
CONCLUSIONS
GWL prediction is important for water resource management in different basins. This research uses IMM and multiple hybrid ANN models to predict GWL in the Ardakan-Yazd Plain. Climate models are applied to predict temperature and rainfall in the short, medium, and long-term periods of 2020–2059 and 2060–2099, respectively. The monthly average rainfall for 2020–2059 under RCP 4.5 and RCP 8.5, and 2060–2099 under RCP 4.5 and RCP 8.5 is 5.91 mm, 4.25 mm, 5.25 mm, and 3.41 mm, respectively. Results reveal that the temperature in future periods will be higher than in the base period. The performance of models reveals that IMM models based on climate data enhance the precision of climate models for predicting these variables. The results reveal that the NSE of IMM and hybrid ANN models are very good (0.96 and 0.76–0.95, respectively). The investigation of GWL for the future and base periods indicates that GWL in the future is lower than that in the base period. GWL depletions for 2020–2059 under RCP 4.5 and RCP 8.5, and 2060–2099 under RCP 4.5 and RCP 8.5 are 0.60 to 0.88, 0.80 to 1.16, 1.49–1.97, and 1.75 to 1.98, respectively. This study can be developed using different models. In future studies, other methods such as machine-learning models can be applied for downscaling climate data. Besides, the potential of IMM can be benchmarked against other ensemble models.
AUTHOR CONTRIBUTIONS
Mohammad Ehteram, Zahra Kalntari, and Carla Sofia Ferreira conceptualised the work together; Kwok-Wing Chau applied formal techniques to analyse the data; Mohammad Ehteram, Zahra Kalntari, and Carla Sofia Ferreira involved in the design of methodology; Kwok-Wing Chau and Mohammad-Kazem Emami validated the data.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.