## ABSTRACT

This study presents the ‘Dual Path CNN-MLP’, a novel hybrid deep neural network (DNN) architecture that merges the strengths of convolutional neural networks (CNNs) and multilayer perceptrons (MLPs) for regional groundwater flow simulations. This model stands out from previous DNN approaches by managing mixed input types, including both imagery and numerical vectors. Such flexibility allows the diverse nature of groundwater data to be efficiently utilized without the need to convert it into a uniform format, which often leads to oversimplification or unnecessary expansion of the dataset. When applied to the northeast Qatar aquifer, the model demonstrates high accuracy in simulating transient groundwater flow fields, benchmarked against the well-established MODFLOW model. The model's efficacy is confirmed through *k*-fold cross-validation, showing an error margin of less than 12% across all examined locations. The study also examines the model's ability to perform uncertainty analysis using Monte Carlo simulations, finding that it achieves around 1% average absolute percentage error in estimating the mean hydraulic head. Errors are mostly found in areas with significant variations in the hydraulic head. Switching to this machine learning model from the conventional MODFLOW simulator boosts computational efficiency by about 99%, showcasing its advantage for tasks like uncertainty analysis in repetitive groundwater simulations.

## HIGHLIGHTS

This study focuses on machine learning–based regional groundwater flow modeling.

The ‘Dual Path CNN-MLP’ architecture for mixed input data types is proposed here.

The model's efficacy in predicting the northeast Qatar aquifer's hydraulic head is assessed.

Alignment with MODFLOW outcomes are demonstrated in the study.

## INTRODUCTION

Numerical models of regional groundwater flow are vital tools for understanding groundwater systems and predicting how they react to groundwater extraction, climate variability, land use changes, and other influential factors (Zhou & Li 2011; Omar *et al.* 2020; Liu *et al.* 2022; Sun *et al.* 2023). To calculate the hydraulic head, these models solve an implicit system of equations derived from discretized versions of the partial differential equations governing groundwater flow and transport. This discretization is often achieved through finite difference or finite element methods, and the process is iteratively repeated at each time step. These numerical models are extensively employed in aquifer resource management (Omar *et al.* 2020; Miro *et al.* 2021; Ostad-Ali-Askari & Shayannejad 2021), design of aquifer storage and recovery schemes (LaHaye *et al.* 2021; Tiwari *et al.* 2022, 2023), drought risk management (Shivakoti *et al.* 2019; Wossenyeleh *et al.* 2021), and aquifer pollution control (Robinson *et al.* 2009; Guo *et al.* 2021; Panjehfouladgaran & Rajabi 2022; Shakeri *et al.* 2023). However, the computational expense of numerical methods used in regional groundwater simulations can pose a significant obstacle for tasks that involve repeated simulations, such as uncertainty analysis and simulation-based optimization. To overcome this issue, a solution is to use a limited number of numerical simulations to train data-driven surrogate models that can replace numerical models in such tasks.

Data-driven surrogate models utilized in this context are primarily defined within the scope of univariate (Tian *et al.* 2016; Mirarabi *et al.* 2019; Sarma & Singh 2022) or multivariate (Rizeei *et al.* 2018; Najafzadeh *et al.* 2022) time series forecasting (Gong *et al.* 2018; Azizpour *et al.* 2022; Eslami *et al.* 2022). These models typically utilize the historical time series of head data, occasionally along with auxiliary data such as precipitation, evaporation, discharge, and irrigation, to predict future heads. Table 1 lists various models previously applied in this context, accompanied by their respective references.

Model type . | References . |
---|---|

Feed-forward neural networks | Kouziokas et al. (2018) |

Gaussian processes emulators | Kopsiaftis et al. (2019), Rajabi (2019) |

Polynomial chaos expansion | Laloy et al. (2013), Rajabi (2019) |

Kriging | Clifton & Neuman (1982), Asadi & Adhikari (2022) |

Radial basis functions | Nourani et al. (2017) |

Support vector machines | Yoon et al. (2011), Lal & Datta (2018) |

DNNs | Mo et al. (2020), Payne et al. (2022), Zheng et al. (2023), Xia et al. (2023) |

Model type . | References . |
---|---|

Feed-forward neural networks | Kouziokas et al. (2018) |

Gaussian processes emulators | Kopsiaftis et al. (2019), Rajabi (2019) |

Polynomial chaos expansion | Laloy et al. (2013), Rajabi (2019) |

Kriging | Clifton & Neuman (1982), Asadi & Adhikari (2022) |

Radial basis functions | Nourani et al. (2017) |

Support vector machines | Yoon et al. (2011), Lal & Datta (2018) |

DNNs | Mo et al. (2020), Payne et al. (2022), Zheng et al. (2023), Xia et al. (2023) |

Surrogate models have been widely utilized in regional groundwater flow studies for various purposes, including pumping optimization (Christelis *et al.* 2019; Han *et al.* 2020), uncertainty quantification (Sreekanth & Datta 2011; Asher *et al.* 2015; Gadd *et al.* 2019), and sensitivity analysis (Miao *et al.* 2019; Chen *et al.* 2021). Although these models are effective in predicting pointwise head values, they face challenges in accurately representing the spatially continuous nature of groundwater hydraulic heads, which are better conceptualized as maps rather than discrete points. Traditional interpolation methods between pointwise estimates may oversimplify and distort reality, especially in areas with complex head distributions. Addressing this, image-to-image regression emerges as a robust solution, allowing for the direct prediction of hydraulic head distributions without the need for interpolation, thus preserving spatial details (Zhu & Zabaras 2018). This approach effectively captures the spatial relationships and patterns influenced by the region's geological and hydrological features (Celik & Aslan 2020), offering significant advantages over traditional methods by producing higher-quality outcomes (Rajabi *et al.* 2022).

Several recent studies, as summarized in Table 2, have utilized image-to-image regression for groundwater flow and contaminant transport simulations. These studies predominantly employed encoder–decoder convolutional deep neural networks (ED-CNNs), such as the attention U-Net, to perform image-to-image regression (Kontos *et al.* 2022; Xia *et al.* 2023; Zheng *et al.* 2023). Through a data-driven supervised learning approach, these models have demonstrated their ability to produce accurate results for both forward and inverse simulations, while significantly reducing computational time compared to state-of-the-art numerical solvers. Although most of these studies have focused on steady-state flow conditions (Taccari *et al.* 2022; Xia *et al.* 2023; Zheng *et al.* 2023), a few have also considered transient flow (e.g., Mo *et al.* 2019). In the case of transient flow, a typical approach involves using the output from the previous time step as an input to the network for predicting the current time step's output.

Reference . | Test case . | Input images . | Output images . | Model . | |
---|---|---|---|---|---|

Zheng et al. (2023) | Steady-state groundwater flow through a single-layer, heterogeneous confined aquifer, affected by a contaminant source with time-varying strength^{a} | Hydraulic conductivity, contaminant source strength | Hydraulic head, contaminant concentration | Combines a generative adversarial network and a convolutional neural network | |

Xia et al. (2023) | Single-layer, heterogeneous aquifer with multiple time-varying source strengths | Hydraulic conductivity field, contaminant source | Hydraulic head, contaminant concentration | Convolutional encoding-decoding NN | |

Taccari et al. (2023) | Single-layer, heterogeneous confined aquifer a pumping well | Pumping well location hydraulic conductivity | Steady-state heads distribution | DeepONet (vanilla, and extended) | |

Kontos et al. (2022) | With two pumping wells and six suspected possible instantaneous contaminant sources | Scenario-specific factors encompass: the pollution status of each node, the specific day on which each node becomes polluted, the pollution duration at each node, and the hydraulic head drawdown observed at each node | Contaminant source location | Convolutional encoder–decoder | |

Taccari et al. (2022) | Single-layer, heterogeneous confined aquifer | Head at the boundaries, location of the boundaries, hydraulic conductivity field | Steady-state head distribution | Data-driven Attention U-Net | |

Jiang et al. (2021) | Time-dependent multiphase flow in a single-layer, channelized geological system | The binary channelized formation, saturation, and pressure maps at time | Saturation and pressure maps at time | Autoregressive residual U-net | |

Zhou & Tartakovsky (2021) | Heterogeneous aquifer with a pointwise contaminant source^{a} | Initial contaminant concentration distribution | Concentration distribution | Convolutional encoder–decoder NN | |

Mo et al. (2019) | Single-layer, heterogeneous aquifer with a pointwise contaminant source^{b} | Hydraulic conductivity field, source terms | Hydraulic head, contaminant concentration | Convolutional encoder–decoder NN | |

Pan et al. (2022) | Single-layer, heterogeneous unconfined aquifer | Hydraulic conductivity | Hydraulic head | Convolutional-cycle generative adversarial NN |

Reference . | Test case . | Input images . | Output images . | Model . | |
---|---|---|---|---|---|

Zheng et al. (2023) | Steady-state groundwater flow through a single-layer, heterogeneous confined aquifer, affected by a contaminant source with time-varying strength^{a} | Hydraulic conductivity, contaminant source strength | Hydraulic head, contaminant concentration | Combines a generative adversarial network and a convolutional neural network | |

Xia et al. (2023) | Single-layer, heterogeneous aquifer with multiple time-varying source strengths | Hydraulic conductivity field, contaminant source | Hydraulic head, contaminant concentration | Convolutional encoding-decoding NN | |

Taccari et al. (2023) | Single-layer, heterogeneous confined aquifer a pumping well | Pumping well location hydraulic conductivity | Steady-state heads distribution | DeepONet (vanilla, and extended) | |

Kontos et al. (2022) | With two pumping wells and six suspected possible instantaneous contaminant sources | Scenario-specific factors encompass: the pollution status of each node, the specific day on which each node becomes polluted, the pollution duration at each node, and the hydraulic head drawdown observed at each node | Contaminant source location | Convolutional encoder–decoder | |

Taccari et al. (2022) | Single-layer, heterogeneous confined aquifer | Head at the boundaries, location of the boundaries, hydraulic conductivity field | Steady-state head distribution | Data-driven Attention U-Net | |

Jiang et al. (2021) | Time-dependent multiphase flow in a single-layer, channelized geological system | The binary channelized formation, saturation, and pressure maps at time | Saturation and pressure maps at time | Autoregressive residual U-net | |

Zhou & Tartakovsky (2021) | Heterogeneous aquifer with a pointwise contaminant source^{a} | Initial contaminant concentration distribution | Concentration distribution | Convolutional encoder–decoder NN | |

Mo et al. (2019) | Single-layer, heterogeneous aquifer with a pointwise contaminant source^{b} | Hydraulic conductivity field, source terms | Hydraulic head, contaminant concentration | Convolutional encoder–decoder NN | |

Pan et al. (2022) | Single-layer, heterogeneous unconfined aquifer | Hydraulic conductivity | Hydraulic head | Convolutional-cycle generative adversarial NN |

*Note*: , Forward mapping; , Direct inverse mapping.

^{a}Groundwater contaminant source identification is done by employing the surrogate model in the context of the iterative local updating ensemble smoother algorithm.

^{b}The surrogate model is employed in the Markov Chain Monte Carlo algorithm for contaminant source identification.

In the studies reviewed above, there is a common practice of transforming all the inputs of the deep neural network (DNN) models into images to perform image-to-image regression. However, certain aspects, such as constant head or specified flux boundary conditions, pumping and injection rates, and total recharge rate, could potentially be more effectively represented as values or vectors rather than images. The simplification of homogenizing inputs using images is primarily driven by the unique challenges posed by mixed data obtained from groundwater flow and transport modeling, where each data type may require different preprocessing steps, including scaling, normalization, and feature engineering. The treatment of mixed data remains an active area of research, heavily influenced by the specific task and the desired outcome. Therefore, it is crucial to explore and develop approaches that can adequately handle the diverse nature of data types encountered in groundwater flow and contaminant simulations.

In addition, all studies reviewed in Table 2 focus solely on two-dimensional (2D) hypothetical test cases with square domains. While the surrogate models proposed in these studies have demonstrated promising results for addressing subsurface flow problems, the current models assume relatively simple heterogeneity, often relying on Gaussian or bimodal distributions (Zheng *et al.* 2023). Incorporating input data based on real geological heterogeneities would impose more demanding requirements on the image-to-image regression models (Jiang *et al.* 2021). This is an aspect that needs to be thoroughly addressed to ensure the models can effectively handle the complexities introduced by real-world geological conditions.

The current study aims to address the above research gaps by proposing an innovative hybrid DNN model capable of handling mixed inputs consisting of both images and numeric values. The primary objective is to leverage this model to accurately estimate the full hydraulic head distribution within a real-world aquifer characterized by two distinct layers and a heterogeneous hydraulic conductivity. This application will provide insights into the model's ability to handle the complexities introduced by real-world geological conditions, including layered structures, and varying hydraulic conductivity values. The accuracy and reliability of the model's predictions are assessed by comparing them against the outputs of an established numerical solver.

## METHODS

In this section, we present our test case and describe the process of numerical simulation used to generate training, validation, and test datasets for the hybrid DNN model. Subsequently, we outline the framework of the supervised training task and provide a detailed description of the model architecture and training approach.

### Study area

^{2}or 13% of the entire surface area of Qatar. The area is mostly flat, and surface elevation ranges from 0 at the sea boundary to 48 m above the mean sea level further inland. The topography is characterized by sunken formations that result from the dissolution of limestone, creating karst features that are beneficial for replenishing the groundwater supply (Baalousha 2016a).

The climate is generally hot and arid, with hot summers and short, mild winters. Rainfall is highly erratic and mostly occurs between November and March. The average temperature in summer can exceed 40°C, while in winter it can drop to around 15°C. Apart from flash floods, surface water is nonexistent. The average annual rainfall is 80 mm per year. Recharge, in the context of Qatar's hydrogeology, predominantly takes place via terrestrial depressions where rainwater runoff accumulates following precipitation events. Recharge from rainfall is estimated to be 7–10% of annual rainfall (Baalousha *et al.* 2015). Flash floods resulting from thunderstorms account for much of the recharge potential. Hence, there is high uncertainty in natural recharge rates.

The aquifer in this area is unconfined, extensively karstic, and fractured (Eccleston *et al.* 1981; Ahmad & Al-Ghouti 2020). Due to high salinity, the groundwater in Qatar is generally not used directly as drinking water for the public, but it is widely used for agricultural and domestic consumption (Ahmad & Al-Ghouti 2020). The study area of northeast Qatar has a relatively high density of pumping wells. According to Schlumberger Water Services (2009), the freshwater lens in the north-central part of Qatar has been reduced from 15% of the country's area in 1971 to less than 2% in 2009.

### Numerical simulations

For the numerical simulations, the widely used MODFLOW model has been chosen. Several previous studies (e.g., Baalousha 2016a, 2016b; Ahmad & Al-Ghouti 2020) have focused on the numerical simulation of the Qatar aquifer or regions that overlap with our study area. Building on our understanding of this previous work, we modeled the aquifer as a two-layer system. Layer 1 represents the uppermost layer, with a thickness range of 16.34–103.58 m, while layer 2 constitutes the lower layer, with a thickness range of 339.73–356.83 m. The model boundaries consist of the sea to the east and north, representing constant head boundaries, while the western boundary accounts for lateral regional flow. According to previous studies, this lateral inflow is estimated to be approximately 2 million m^{3}/year (Schlumberger Water Services 2009; Baalousha 2016b).

A quasi-3D model domain, which encompasses 713,878 cells, each with dimensions of 100 × 100 m, was set up in this study. These cells are structured into two layers, with each layer containing 356,939 cells. The MODFLOW model was developed using the spatial recharge distribution data from Baalousha *et al.* (2018). Groundwater recharge from rainfall varies spatially, influenced by factors such as rainfall data, surface topography, and soil type. In Qatar, recharge predominantly takes place within land depressions where rainwater gathers post-rainstorm events (Baalousha *et al.* 2018). Consequently, most of the recharge occurs in specific areas with limited surface extent.

The model was calibrated under steady-state conditions using groundwater level measurements from the predevelopment era of the 1950s. The calibration process involved employing the parameter estimation (PEST) code (Welter *et al.* 2015) to estimate the distribution of heterogeneous hydraulic conductivity. To account for the spatial variability of hydraulic conductivity and reduce the number of parameters, we adopt the pilot points approach. A total of 51 pilot points were randomly selected and utilized. During model calibration, total groundwater recharge is kept constant at 8,227.86 m^{3}/day (about 3 million m^{3}/year). This amount is calculated based on a soil water budget model that was developed by Baalousha *et al.* (2018) to determine the distribution of net recharge from rainfall.

The calibrated steady-state numerical model was then utilized to construct a transient model. For this purpose, the steady-state groundwater conditions served as the initial conditions for the subsequent simulations. The transient model was designed to simulate 30 years, encompassing two stress periods per year, namely, the months with higher and lower precipitation. Specifically, in each year, the months from November through March experienced marginally higher precipitation, while the remaining seven months had slightly lower rainfall. We will refer to these two periods as ‘wet’ and ‘dry’ months, although it is important to note that the region experiences annual rainfall consistently less than 80 mm/year. Therefore, when we use the term ‘wet period’, we are referring to months with marginally higher precipitation, which are not truly ‘wet’ by typical standards.

### Creating the training and validation dataset for model development

We integrated the transient MODFLOW model with the Python FloPy package (Bakker *et al.* 2016) to streamline model simulation and post-processing. Utilizing FloPy, we developed a custom code to automatically generate MODFLOW input files, execute the model, and extract hydraulic head distributions. To generate the training dataset for the hybrid DNN model, two model parameters are varied in the simulations, as follows:

(1)

*Total annual recharge*: The total annual recharge is randomly varied between 2.7 and 12 m^{3}/year using a uniform probability distribution. This range is equivalent to 0.9–4 times the annually calibrated recharge value. The sampled recharge values are then distributed between the wet stress period months (November through March) and the dry stress period months of the year, with ratios of 0.8 and 0.2, respectively. As the spatial distribution of the total recharge remains consistent with the description provided in Section 2.2, the hybrid DNN model takes the total annual recharge as a numerical input.(2)

*Hydraulic conductivity*: The model domain is divided into 100 rectangular zones, each measuring 4,100 × 8,800 m. The hydraulic conductivity values for each zone are derived by multiplying the calibrated hydraulic conductivity values with random samples from a normal distribution, which has a mean value of 1.925 and a standard deviation of 0.4. This process was consistently applied to both layers, resulting in the creation of a checkerboard pattern. As this process changes the hydraulic conductivity distribution, the hydraulic conductivity is an image input to the machine learning (ML) model.

The above parameters, including the recharge rates and hydraulic conductivity values, are loosely based on existing literature and empirical data. It is important to emphasize that these parameters are utilized as illustrative values within the context of our study. Given the inherent variability and complexity of hydrogeological systems, the exact values for these parameters are challenging to determine precisely. The primary aim of employing these values is not to assert their absolute accuracy but rather to demonstrate the efficacy and applicability of our novel ML-based modeling approach to a real-case scenario.

The regional groundwater model is executed over a 30-year period, capturing the initial head distribution and the head distribution at the end of each year. The objective is to develop an ML model that can accurately simulate the head changes occurring during a 1-year period. For this purpose, we gather the hydraulic conductivity maps for the two layers (as two images), the head distribution at the start of the year (as an image), and the total recharge rate during that year (as a numeric value) – all as inputs. With this input data, the hybrid DNN aims to predict the head distribution at the end of the wet and dry period of that year as two image outputs. All input and output images utilized in the model are configured to a resolution of 876 × 404 pixels. The simulations were executed on a computer equipped with an Intel Core i7-7700HQ CPU, clocked at 2.80GHz, and supported by 12 GB of RAM. Our study entailed the completion of 1,000 numerical simulations, each generating data of about 105 MB across 63 distinct CSV files, cumulatively resulting in a dataset approaching 100 GB in total storage requirement.

### Surrogate modeling based on mixed inputs

*et al.*2022). It requires learning a function that maps both the input images and the numerical inputs to an output image. The image inputs are denoted as , and the scalar inputs as . We seek a mapping function from these inputs to an output image such that:where ( is the

*n*th input channel,

*H*is the image height, and

*W*is the image width), , and ( is the output channels, denoting predictions for the wet and dry periods). The loss function minimized is the mean squared error (MSE) (Chen

*et al.*2022; Taccari

*et al.*2022):where is the total number of pixels in the output image. This function

*f*is trained on a dataset of matching input–output image pairs along with their associated scalar values and, once trained, can be used to predict the output image from a new set of input images and scalars.

### Network architecture

*K*is the kernel. The convolution operation captures essential spatial information in the form of feature maps. A 3 × 3 kernel offers a good balance between computational efficiency and the ability to capture relevant spatial features. This choice is informed by prevalent practices in deep learning, where 3 × 3 kernels are widely adopted for their efficiency in various pretrained models and architectures, such as Visual Geometry Group (VGG) and ResNet.

Layer . | Specification . | Activation . | Output shape . |
---|---|---|---|

Image input | Shape = (219, 101, 3) | None | (None, 219, 101, 3) |

Conv2D | Filters = 16, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 16) |

Conv2D | Filters = 16, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 16) |

Conv2D | Filters = 32, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 32) |

Conv2D | Filters = 64, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 64) |

MaxPooling2D | Pool size = (2, 2) | None | (None, 109, 50, 64) |

Dropout | Rate = 0.5 | None | (None, 109, 50, 64) |

Flatten | – | None | (None, 348800) |

Numerical _input | Shape = (1,) | None | (None, 1) |

Concatenate | – | None | (None, 348801) |

Dense | Units = 32 | ReLU | (None, 32) |

Dense | Units = 64 | ReLU | (None, 64) |

Dense | Units = 128 | ReLU | (None, 128) |

Dropout | Rate = 0.5 | None | (None, 128) |

Dense | Units = 219 × 101 | ReLU | (None, 219,101) |

Reshape | Target shape = (219, 101, 1) | None | (None, 219, 101, 1) |

Layer . | Specification . | Activation . | Output shape . |
---|---|---|---|

Image input | Shape = (219, 101, 3) | None | (None, 219, 101, 3) |

Conv2D | Filters = 16, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 16) |

Conv2D | Filters = 16, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 16) |

Conv2D | Filters = 32, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 32) |

Conv2D | Filters = 64, kernel size = 3, padding = ‘same’ | ReLU | (None, 219, 101, 64) |

MaxPooling2D | Pool size = (2, 2) | None | (None, 109, 50, 64) |

Dropout | Rate = 0.5 | None | (None, 109, 50, 64) |

Flatten | – | None | (None, 348800) |

Numerical _input | Shape = (1,) | None | (None, 1) |

Concatenate | – | None | (None, 348801) |

Dense | Units = 32 | ReLU | (None, 32) |

Dense | Units = 64 | ReLU | (None, 64) |

Dense | Units = 128 | ReLU | (None, 128) |

Dropout | Rate = 0.5 | None | (None, 128) |

Dense | Units = 219 × 101 | ReLU | (None, 219,101) |

Reshape | Target shape = (219, 101, 1) | None | (None, 219, 101, 1) |

The Dual Path CNN-MLP model was developed to integrate both image-based and vector-based inputs after recognizing the limitations of a purely MLP-based approach. Initially, the MLP model, designed to handle numerical data, was fed hydraulic conductivity values, head distribution, and recharge rates. However, due to the high dimensionality (63,328 dimensions), it struggled with underfitting, failed to capture complex spatial patterns, and ultimately made inaccurate predictions of head distribution. These results are not included here because the model performed so poorly that presenting them would be irrelevant. To address this limitation, the Dual Path CNN-MLP model leverages the spatial pattern recognition capabilities of CNNs to handle hydraulic conductivity as image data. This improves the prediction accuracy by utilizing spatial correlations and ensures computational efficiency through convolution operations that reduce dimensionality while preserving essential features. The approach significantly outperforms the MLP model, offering much more accurate predictions and faster convergence on solutions, optimizing training time and computational resources.

### Model training and validation

To efficiently alleviate the computational demands during training of the Dual Path CNN-MLP model, we implement average pooling, employing a (4,4) pool size to down-sample the image data from 876 × 404 to 219 × 101 while retaining essential spatial patterns. In addition, the input data underwent min–max normalization to ensure robust model training. The dataset was segmented into training and validation sets, employing the *k*-fold cross-validation technique, a widely recognized method for evaluating deep learning models to achieve more dependable outcomes (Khan *et al.* 2021). This technique, known for its efficacy in mitigating overfitting and ensuring the broad applicability of the dataset (Garre *et al.* 2020; Nguyen *et al.* 2021; Vu *et al.* 2022), involves distributing the entire dataset into *k* equal parts. Each part, or fold, is alternately used for validation, with the remaining folds allocated for training, a strategy that fosters diverse training experiences across the models (Saud *et al.* 2020). This cycle results in the training of *k* distinct models on slightly varied data segments. The efficacy of each model is then evaluated based on its performance on the validation fold, employing metrics like the correlation coefficient (*R*^{2}) and root mean squared error (RMSE). The model exhibiting the highest average accuracy and lowest average error across all validation folds is selected. Its performance is further assessed on a test dataset to obtain an unbiased estimation of its capabilities. This comprehensive methodology enhances the stability and reliability of our model's performance estimates, minimizing variability arising from random data partitioning. The model development process was carried out within a Jupyter Notebook environment, making use of the TensorFlow and Keras libraries.

*et al.*2021; Li

*et al.*2021). These three criteria are calculated as follows:where , is the actual value of sample

*i*, is the predicted value of sample

*i*, is the average of actual value for sample

*i*, and

*N*is the sample size.

## RESULTS AND DISCUSSION

In this section, the numerical simulation outputs are presented, and the Dual Path CNN-MLP surrogate model is validated. The model is then employed for uncertainty propagation analysis, and the resulting estimates are compared with the known target values.

### Numerical simulation results

*et al.*2019; Ajjur & Al-Ghamdi 2022). These studies consistently highlight higher hydraulic conductivities in Qatar's northern regions. Moreover, within our study area, hydraulic conductivity demonstrates an incremental increase from the southwest, progressing eastward and northward toward the sea, a pattern also observed and corroborated by Baalousha

*et al.*(2019) for the same region.

### Training results of the Dual Path CNN-MLP model

The selection of hyper-parameters for the Dual Path CNN-MLP model was carried out through iterative experimentation, with their specific values detailed in Table 4. The Adam optimizer was selected for its efficiency, coupled with the MSE as the chosen loss function to guide the training process. We configured the training with a batch size of 16, spanning over 60 epochs, and applied a learning rate of 0.001 to ensure precise model adjustments during optimization. The dataset encompassed 6,000 training samples, alongside 1,500 reserved for testing purposes.

Hyperparameter . | Value . |
---|---|

Optimizer | Nadam |

Loss function | MSE |

Samples | 6,000 |

Test samples | 1,500 |

Validation split | 0.166 |

Batch size | 16 |

Epochs | 60 |

Learning rate | 0.001 |

Hyperparameter . | Value . |
---|---|

Optimizer | Nadam |

Loss function | MSE |

Samples | 6,000 |

Test samples | 1,500 |

Validation split | 0.166 |

Batch size | 16 |

Epochs | 60 |

Learning rate | 0.001 |

*k*-fold cross-validation with , iterating each fold through 60 epochs using a similar subset size of approximately 6,000 samples. This method allowed for training on five-folds while designating one-fold exclusively for validation purposes, facilitating the training of the model under six distinct conditions to enhance generalizability and robustness. The MSE variations during each validation phase are depicted in Figure 4, which also illustrates the average MSE across all training and validation folds. This visualization confirms the model's consistent performance improvement across folds, underscoring its capacity to achieve high accuracy and indicating effective overfitting mitigation for a more dependable performance assessment. The validation accuracy achieved across the

*k*-fold cross-validation process was notably high, with scores of 0.998, 0.998, 0.996, 0.996, 0.997, and 0.996 for each fold. In parallel, the validation RMSE values recorded for each fold were 0.012, 0.012, 0.018, 0.018, 0.013, and 0.018, showcasing the model's consistent predictive precision. Based on these results, the model demonstrating the optimal balance of the highest average accuracy and minimal error was selected for further evaluation.

The higher MSE observed in training compared to validation accuracy in Figure 4 can be attributed to several factors. First, using five-sixth of the data for training exposes the model to a wider variety of cases and noise, thereby increasing the variability in error calculation. This contrasts with the validation phase, which uses only one-sixth of the data and potentially captures less complexity and noise, resulting in a lower MSE. In addition, since each fold is validated only once without being directly trained on, the model can better generalize to this unseen data subset, which may lead to lower MSE during validation phases. Moreover, the use of a logarithmic scale to display MSE exaggerates these differences, making the discrepancies between training and validation MSE more pronounced than they might appear on a linear scale.

To rigorously assess the selected model's ability to predict groundwater head values during both wet and dry periods, we utilized MAPE, RMSE, and EVS as key performance indicators across the training, validation, and testing phases. Detailed in Table 5, the results highlight the model's high degree of accuracy and precision. Specifically, MAPE values fall within a narrow range of 1.977–2.028%, demonstrating the model's consistent accuracy and minimal average error in estimating groundwater levels. RMSE figures, lying between 0.016 and 0.018 m, further affirm the model's precision in its predictions. Moreover, EVS scores approaching 0.996 illustrate the model's exceptional capacity to account for the variance observed in numerical simulations, especially during wet months, showcasing its predictive reliability.

Months . | Periods . | MAPE (%) . | RMSE (m) . | EVS . |
---|---|---|---|---|

Wet | Training | 2.028 | 0.018 | 0.996 |

Validation | 1.977 | 0.016 | 0.997 | |

Testing | 2.025 | 0.018 | 0.996 | |

Dry | Training | 1.53 | 0.0045 | 0.997 |

Validation | 1.47 | 0.0043 | 0.997 | |

Testing | 1.52 | 0.0045 | 0.997 |

Months . | Periods . | MAPE (%) . | RMSE (m) . | EVS . |
---|---|---|---|---|

Wet | Training | 2.028 | 0.018 | 0.996 |

Validation | 1.977 | 0.016 | 0.997 | |

Testing | 2.025 | 0.018 | 0.996 | |

Dry | Training | 1.53 | 0.0045 | 0.997 |

Validation | 1.47 | 0.0043 | 0.997 | |

Testing | 1.52 | 0.0045 | 0.997 |

#### Head time series prediction

#### Analyzing error distribution across the head map

To assess the performance of the neural network in simulating head distribution, we employed a test data sample. In Figure 7(a), the neural network's input parameters are presented, encompassing hydraulic conductivity maps in layers 1 and 2, along with the recharge time series. Figure 7(b) depicts the reference hydraulic head distribution obtained through numerical simulation. In parallel, Figure 7(c) showcases the hydraulic head estimates produced by the Dual Path CNN-MLP model, visualized as a contour plot. To demonstrate disparities, Figure 7(d) exhibits the spatial distribution of absolute percentage error between the numerical and DNN model outcomes. As demonstrated, the spatial distribution of hydraulic head values generated by the numerical model and the DNN model are similar. The percentage error at various points varies between 0 and 12%, with the majority of points having errors less than 4%. The most pronounced errors belong to regions characterized by the steepest spatial gradients of head change.

### Uncertainty propagation analysis

*f*represents the system's mathematical function or model. This process is repeated many times, generating a dataset of outcomes from which statistical properties, such as mean (), standard deviation (), and the probability density function , are estimated. These properties provide insights into the expected behavior of the system, its variability, and the probabilistic relationship between inputs and the outcome. MCS is particularly useful for assessing risk and evaluating the impact of uncertain parameters (Baalousha 2016c; Wang

*et al.*2020).

Within the presented figures, a notable agreement between and can be recognized based on the outputs of DNN and the numerical model during wet and dry months. Specifically, during wet months, the alignment between mean maps of the numerical model and DNN model (Figure 8(b) and 8(c)) significantly surpasses that of the standard deviation maps (Figure 8(e) and 8(d)). Conversely, during dry months, the agreement between standard deviation maps the numerical model and DNN model deviations (Figure 8(g) and 8(h)) is more pronounced compared to the alignment with mean maps (Figure 8(j) and 8(k)). Maximum absolute errors stand at 0.009 and 0.008 for and 0.01 and 0.004 for estimations during wet and dry months, respectively, primarily concentrated near regions of high head values. The average absolute percentage error for mean head values between the numerical model and DNN model is 0.61% for wet months and 0.81% for dry months, underscoring a higher degree of agreement with the target maps during dry months.

### Computation efficiency

This reduction in computation time illustrates the significant efficiency gains achievable with the proposed ML model.

## CONCLUSIONS

In this study, we developed and tested the ‘Dual Path CNN-MLP’, a novel hybrid model that combines CNNs and MLPs to address the challenge of simulating regional groundwater flow. By effectively handling a mix of input types, including imagery and numerical data, this model overcomes the traditional limitations associated with the homogenization of data formats, preserving the integrity of the diverse groundwater data and enhancing the model's utility in real-world applications. Our methodology involved applying this model to simulate transient groundwater flow within the northeast Qatar aquifer, comparing its performance against the established MODFLOW model through rigorous validation techniques, including *k*-fold cross-validation. Key findings from our study reveal that the Dual Path CNN-MLP model achieves remarkable precision in predicting groundwater flow, with an error margin of less than 12% across all tested locations. The model notably excels in uncertainty analysis, evidenced by its ability to estimate the mean hydraulic head with an average absolute percentage error of about 1%, and demonstrates substantial computational efficiency, offering a 99% reduction in computation time over the MODFLOW simulations.

The proposed model has the capacity to integrate additional factors such as porosity and surface topology, among others, either as images or vector inputs. However, in this investigation, we deliberately chose not to expand the model's input spectrum. This was based on the observation that our current methodology achieves notable accuracy without needing to factor in these additional elements. The decision on whether to include a more comprehensive set of parameters is dependent on the specific requirements of each study and presents an opportunity for further exploration in future research. Given that this study primarily explores 2D simulations, there is potential for extending our work into three-dimensional groundwater modeling. Such expansion would allow for a better understanding of aquifer behaviors and geological complexities. Future research could consider the model's adaptability to varied hydrogeological scenarios, its ability to process a broader array of data types, and ways to enhance its architecture for improved predictive performance. In addition, integrating physical loss functions, akin to those used in Physics-Informed Neural Networks, could offer a pathway to increase the physical realism of the simulations, thereby enhancing their feasibility and applicability in real-world settings.

## ACKNOWLEDGEMENTS

The authors would like to acknowledge the support of Hamed Bin Khalifa University-Qatar, and Sultan Qaboos University (SQU), Oman, for the support received through the awarded grants NPRP13S-0129-200198 (for SQU, EG/DVC/WRC/21/01). The support of the research group DR/RG/017 is also appreciated.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Computational intelligence and neuroscience, 2019*, 2859429

*arXiv preprint arXiv:2304.12299*

*Approaches in highly parameterized inversion – PEST*

*++ Version 3, a Parameter ESTimation and uncertainty analysis software suite optimized for large environmental model*s (No. 7-C12). US Geological Survey