Abstract
Knowledge of hydraulic conductivity (K) is inevitable for sub-surface flow and aquifer studies. Hydrologists and groundwater researchers are employing data-driven techniques to indirectly evaluate K using porous media characteristics as an alternative to direct measurement. The study examines the ability of the Adaptive Neuro-Fuzzy Inference System (ANFIS) to predict the K of porous media using two membership functions (MFs), i.e., triangular and Gaussian, and support vector machine (SVM) via four kernel functions, i.e., linear, quadratic, cubic, and Gaussian. The techniques used easily measurable parameters namely effective and mean grain size, uniformity coefficient, and porosity as input variables. A 70 and 30% dataset is used for the training and testing of models, respectively. The correlation coefficient (R) and root mean square error (RMSE) were used to evaluate the models. The Gaussian MF-based ANFIS model outperformed the triangular model having R and RMSE values of 0.9661 & 0.0010 and 0.9532 & 0.0015, respectively, whereas the quadratic kernel-based SVM model with R and RMSE values of 0.9520 and 0.0015 performs better than the other SVM models. Based on the evaluation of ANFIS and SVM models, the study establishes the efficacy of the Gaussian MF-based ANFIS model in estimating the K of porous media.
HIGHLIGHTS
The study focuses to develop a hydraulic conductivity model based on easily measurable grain-size parameters using the two data-driven techniques, i.e., ANFIS and support vector machine.
The data-driven approaches result in the quick estimation of hydraulic conductivity, which is useful in determining the groundwater recharge with precise accuracy.
Graphical Abstract
INTRODUCTION
Hydraulic conductivity (K) is a simple, yet critical porous media parameter that directs the flow of fluids through porous media using Darcy's law. The K of porous media is reliant on the physical properties of the flowing fluid and the transmission medium such as particle size, porosity, and pore connectivity (Chandel et al. 2021). Darcy's law directly connects the seepage of fluid to the K and its knowledge is important for groundwater recharge, landslide, and soil stability analysis. Soil's K values can be measured or predicted using direct and indirect approaches (Chapuis 2012; Elbisy 2015). Various efforts have been invested in estimating the K due to the complex geometry of the soil particles and the multiscale pore structure of porous media (Jougnot et al. 2021).
The direct approach consists of the K measurement in the laboratory or field. Laboratory methods include constant-head and falling-head tests, while the field methods include ring infiltrometer, instant profile, test basins (Williams & Ojuri 2021; Chandel et al. 2022a), auger hole, tension infiltrometer (Raoof et al. 2011), borehole permeameter, and pressure infiltrometer (Deb & Shukla 2012). A detailed review of various laboratory and field methods for determining the K of soil is covered by Amoozegar & Warrick (1986) and Wang et al. (2017). Direct measurement is expensive and time-consuming and becomes infeasible due to temporal and geographical variations. It also needs sophisticated instruments and competent operators (Arshad et al. 2013; Chandel & Shankar 2022). This resulted in the invention and widespread use of indirect techniques for estimating the K from more easily and inexpensively measured soil parameters such as porosity, specific gravity, and the percentage of sand, silt, and clay content (More et al. 2022). The indirect approaches involve the estimation of K using empirical equations and models developed using data-driven techniques based on correlation analysis, which relates the K to relevant contributing factors. The existing empirical methods have the significant benefit of being able to estimate the K value more quickly than the direct measurement (Williams & Ojuri 2021; Chandel et al. 2022b). As a result of their domain-specific development, empirical equations cannot be applied outside of those boundary conditions (Chandel & Shankar 2021).
Recent research shows the effective use of data-driven techniques in different fields of engineering (Das et al. 2012). Learning, modelling, and obtaining a pattern from an experimental approach are all complex tasks that data-driven techniques can perform with precise accuracy. The data-driven techniques can either be single such as artificial neural networks (ANNs), fuzzy, and support vector machine (SVM), or hybrids such as ANFIS and genetic algorithm-ANN (Williams & Ojuri 2021). Particle swarm optimization and Monte Carlo analysis can also be used for model optimization and uncertainty analysis (Torabi Podeh et al. 2022). These techniques have been effectively practiced in water resource engineering challenges (Ghorbani et al. 2016; Kumar et al. 2020) and have a better prediction efficiency as compared to the direct approaches (Sihag et al. 2021). Ekhmaj (2010) developed the ANN and the multilinear regression (MLR) model to estimate the infiltration rate for different types of Libyan soil and suggested that ANN has better prediction efficiency than MLR. To estimate the K value for clay liners, Das et al. (2012) found that the SVM model was more effective than the neural network (NN) model. Yilmaz et al. (2012) used ANNs and ANFIS to estimate the K of coarse-grained soils and found the ANFIS model to be more reliable. Arshad et al. (2013) predicted the K using multi-layer perceptron neural networks (MLPNNs), MLR, ANFIS, and radial basis function neural network (RBFNN) models and found that ANFIS and RBFNN were efficient approaches for K prediction with precise accuracy. Elbisy (2015) studied SVM for estimating the K of sandy soil and found that SVM based on the radial basis function (RBF) model had a higher level of accuracy when compared to the linear and sigmoid-based models. Sihag (2018) developed a fuzzy logic and ANN-based model for estimating the K of soil, and the results showed that the ANN approach performed well. Sihag et al. (2018) used ANN, SVM, Gaussian process (GP), random forest (RF), M5P model tree, and two conventional models to estimate the infiltration rate of fly-ash-mixed soils, and the result showed that SVM with RBF kernel was the best-fit modelling technique among others. Boadu (2020) explored support vector regression (SVR) and MLR to predict K and found that the SVR models were more accurate and achieve better performance.Singh et al. (2021a) used SVM, RF, and MLR models to estimate the K of the soil and concluded that the SVM model has better prediction efficiency. Singh et al. (2021a) investigated ANN, MLR, RF, and M5P tree-based models to predict the infiltration rate. The results showed that all models have relatively good prediction capability, with RF-based models outperforming the others.
Several data-driven techniques were found to be useful in the literature review for estimating the K of porous media using easily measurable soil properties; however, ANFIS and SVM showed impressive prediction performance in comparison to other techniques in the respective studies. The Takagi–Sugeno fuzzy inference system is used in ANFIS, and it has been proven to be an effective tool for capturing non-linear relationships between inputs and outputs (Nawaz et al. 2015). SVMs reduce overfitting as they are based on the principle of structural risk optimization and statistical learning algorithm, and they have gained more popularity because of better generalization (Das et al. 2012; Elbisy 2015). To the best of our knowledge, the ANFIS and SVM techniques have not been used together to examine which technique is more effective in estimating the K of porous media. Based on the literature review and the potential of ANFIS and SVM, the current study is conducted to examine and compare the potency of both techniques in developing prediction models for estimating the K of porous media. The main objectives of the study are given as follows:
- 1.
To estimate the hydraulic conductivity of porous media by developing the ANFIS and SVM models and evaluate their efficacy via statistical indicators, i.e., R and RMSE.
- 2.
To compare the outcomes of the ANFIS and SVM models in terms of prediction performance capability.
Theoretical overview of ANFIS and SVM
Adaptive Neuro-Fuzzy Inference System
Layer 1 – Every single node produces membership grades of an input variable. The shapes of the membership functions, such as generalized bell-shaped, Gaussian, triangle-shaped, and trapezoidal-shaped functions, are used to categorize the fuzzy set associated with each node.
Layer 2 – Every node is a fixed node labelled as ∏ and the output of every node is the product of the incoming signal, which represents the firing strength of a rule.
Layer 3 – Every node is a fixed node labelled as N and computes the normalized firing strengths.
Layer 4 – Every node i is an adaptive node and computes the input of the ith rule for the model output with node function.
Layer 5 – Computes the overall output of the ANFIS by summing outputs of all incoming signals.
Support vector machine
SVM was developed by Vapnik, and is based on statistical learning theory. In support of vector regression, the input x is projected into a high-dimensional feature space using a kernel function, and then a linear model is developed in this feature space (Das et al. 2012).
is the non-linear mapping function that transforms the original input space to a higher dimensional feature space, w is the weight vector of the regression function, b is the bias of the regression function.

n is the number of patterns that contain all the information necessary to solve the learning task at hand hereinafter refers to as support vectors.
C is a regularization constant that determines the trade-off between the training error and the generalization performance.

MATERIALS AND METHODOLOGY
Experimental study
Model development
The second phase of the study used the observations obtained from the experimental investigations for estimating the K of porous media using ANFIS and SVM techniques. The agreement between the predicted and observed K values was checked statistically by calculating R and RMSE. The modelling strategies used and performance evaluation parameters are summarized in the following section.
ANFIS modelling strategy
SVM modelling strategy
Model performance evaluation
is the experimentally observed K value;
is the model-predicted K value;
is the experimental observation mean value;
is the model prediction mean value (Elbisy 2015).
RESULTS AND DISCUSSION
The observations of the laboratory experiments and findings of the developed ANFIS and SVM models to estimate the K have been analysed and discussed in the following section. The first section describes the statistical analyses of the porous media dataset, and the second section presents the performance analysis of the developed models.
Statistical analysis
The porous media properties obtained in the laboratory include fines (%), fine sand (%), medium sand (%), coarse sand (%), fine gravel (%), d10, d50, d60, Cu, n, and K. The statistical indices such as maximum, minimum, mean, and standard deviation values of each porous media property are presented in Table 1. The observed K values for 56 porous media samples have maximum and minimum values of 0.315 and 0.006 cm/s, respectively. The correlation coefficient of d10, d50, Cu, and n with the K is presented in Table 2. From Table 2, it is evident that d10 and n have a more significant influence on K, whereas d50 and Cu have a moderate influence on K with correlation coefficients of 0.94 & 0.77 and 0.58 & 0.52, respectively.
Physical properties of the samples
Property . | Maximum . | Minimum . | Mean . | Standard deviation . |
---|---|---|---|---|
Fines (%) | 3.256 | 0.000 | 0.479 | 0.745 |
Fine sand (%) | 91.234 | 11.354 | 44.662 | 22.302 |
Medium sand (%) | 88.646 | 8.579 | 49.250 | 20.614 |
Coarse sand (%) | 13.529 | 0.000 | 4.046 | 4.763 |
Fine gravel (%) | 20.944 | 0.000 | 1.563 | 4.689 |
d10 (mm) | 0.410 | 0.090 | 0.213 | 0.073 |
d50 (mm) | 1.184 | 0.249 | 0.472 | 0.191 |
d60 (mm) | 1.906 | 0.273 | 0.566 | 0.315 |
Cua | 5.580 | 1.793 | 2.883 | 0.806 |
na | 0.427 | 0.316 | 0.377 | 0.030 |
K (cm/s) | 0.315 | 0.006 | 0.071 | 0.065 |
Property . | Maximum . | Minimum . | Mean . | Standard deviation . |
---|---|---|---|---|
Fines (%) | 3.256 | 0.000 | 0.479 | 0.745 |
Fine sand (%) | 91.234 | 11.354 | 44.662 | 22.302 |
Medium sand (%) | 88.646 | 8.579 | 49.250 | 20.614 |
Coarse sand (%) | 13.529 | 0.000 | 4.046 | 4.763 |
Fine gravel (%) | 20.944 | 0.000 | 1.563 | 4.689 |
d10 (mm) | 0.410 | 0.090 | 0.213 | 0.073 |
d50 (mm) | 1.184 | 0.249 | 0.472 | 0.191 |
d60 (mm) | 1.906 | 0.273 | 0.566 | 0.315 |
Cua | 5.580 | 1.793 | 2.883 | 0.806 |
na | 0.427 | 0.316 | 0.377 | 0.030 |
K (cm/s) | 0.315 | 0.006 | 0.071 | 0.065 |
aThe dimensionless properties.
Correlation coefficient of various soil parameters with K
. | d10 . | d50 . | Cu . | n . | K . |
---|---|---|---|---|---|
d10 | 1.00 | ||||
d50 | 0.52 | 1.00 | |||
Cu | 0.36 | 0.83 | 1.00 | ||
n | 0.77 | 0.51 | 0.33 | 1.00 | |
K | 0.94 | 0.58 | 0.52 | 0.77 | 1 |
. | d10 . | d50 . | Cu . | n . | K . |
---|---|---|---|---|---|
d10 | 1.00 | ||||
d50 | 0.52 | 1.00 | |||
Cu | 0.36 | 0.83 | 1.00 | ||
n | 0.77 | 0.51 | 0.33 | 1.00 | |
K | 0.94 | 0.58 | 0.52 | 0.77 | 1 |
Dataset for ANFIS and SVM
The success of the model to estimate the K of the porous media depends upon the degree of the training dataset. Out of 56 sample datasets, 40 were chosen for training, while the remaining 16 were used for testing the model. Input variables include d10, d50, Cu, and n, whereas K was considered as the output variable. Table 3 shows the range of different input and output parameters used in ANFIS and SVM techniques.
Characteristics of training and testing data
. | Training data . | Testing data . | ||||
---|---|---|---|---|---|---|
. | Maximum . | Minimum . | Mean . | Maximum . | Minimum . | Mean . |
Input variable | ||||||
d10 (mm) | 0.410 | 0.090 | 0.218 | 0.296 | 0.136 | 0.200 |
d50 (mm) | 1.184 | 0.249 | 0.503 | 0.871 | 0.300 | 0.395 |
Cua | 5.580 | 1.793 | 3.033 | 4.261 | 1.946 | 2.508 |
na | 0.427 | 0.316 | 0.381 | 0.415 | 0.321 | 0.367 |
Output variable | ||||||
K (cm/s) | 0.315 | 0.006 | 0.080 | 0.132 | 0.018 | 0.050 |
. | Training data . | Testing data . | ||||
---|---|---|---|---|---|---|
. | Maximum . | Minimum . | Mean . | Maximum . | Minimum . | Mean . |
Input variable | ||||||
d10 (mm) | 0.410 | 0.090 | 0.218 | 0.296 | 0.136 | 0.200 |
d50 (mm) | 1.184 | 0.249 | 0.503 | 0.871 | 0.300 | 0.395 |
Cua | 5.580 | 1.793 | 3.033 | 4.261 | 1.946 | 2.508 |
na | 0.427 | 0.316 | 0.381 | 0.415 | 0.321 | 0.367 |
Output variable | ||||||
K (cm/s) | 0.315 | 0.006 | 0.080 | 0.132 | 0.018 | 0.050 |
aThe dimensionless properties.
Estimation of K using ANFIS
Performance evaluation parameter of training and testing datasets for the ANFIS model
Studied model . | No. of MF . | Training . | Testing . | ||
---|---|---|---|---|---|
R . | RMSE . | R . | RMSE . | ||
Triangular (ANFIS) | 2 | 0.9726 | 0.0007 | 0.9532 | 0.0015 |
Gaussian (ANFIS) | 2 | 0.9884 | 0.0002 | 0.9661 | 0.0010 |
Studied model . | No. of MF . | Training . | Testing . | ||
---|---|---|---|---|---|
R . | RMSE . | R . | RMSE . | ||
Triangular (ANFIS) | 2 | 0.9726 | 0.0007 | 0.9532 | 0.0015 |
Gaussian (ANFIS) | 2 | 0.9884 | 0.0002 | 0.9661 | 0.0010 |
Scatter plots of triangular (ANFIS) model for predicting K during (a) training and (b) testing stages.
Scatter plots of triangular (ANFIS) model for predicting K during (a) training and (b) testing stages.
Scatter plots of Gaussian (ANFIS) model for predicting K during (a) training and (b) testing stages.
Scatter plots of Gaussian (ANFIS) model for predicting K during (a) training and (b) testing stages.
Estimation of K using SVM
Performance evaluation parameter of training and testing datasets for the SVM model
Studied model . | Training . | Testing . | ||
---|---|---|---|---|
R . | RMSE . | R . | RMSE . | |
Linear (SVM) | 0.9741 | 0.0008 | 0.9296 | 0.0021 |
Gaussian (SVM) | 0.9781 | 0.0006 | 0.9404 | 0.0016 |
Cubic (SVM) | 0.9724 | 0.0011 | 0.9352 | 0.0028 |
Quadratic (SVM) | 0.9831 | 0.0004 | 0.9520 | 0.0015 |
Studied model . | Training . | Testing . | ||
---|---|---|---|---|
R . | RMSE . | R . | RMSE . | |
Linear (SVM) | 0.9741 | 0.0008 | 0.9296 | 0.0021 |
Gaussian (SVM) | 0.9781 | 0.0006 | 0.9404 | 0.0016 |
Cubic (SVM) | 0.9724 | 0.0011 | 0.9352 | 0.0028 |
Quadratic (SVM) | 0.9831 | 0.0004 | 0.9520 | 0.0015 |
Scatter plots of linear (SVM) model for predicting K during (a) training and (b) testing stages.
Scatter plots of linear (SVM) model for predicting K during (a) training and (b) testing stages.
Scatter plots of Gaussian (SVM) model for predicting K during (a) training and (b) testing stages.
Scatter plots of Gaussian (SVM) model for predicting K during (a) training and (b) testing stages.
Scatter plots of cubic (SVM) model for predicting K during (a) training and (b) testing stages.
Scatter plots of cubic (SVM) model for predicting K during (a) training and (b) testing stages.
Scatter plots of quadratic (SVM) model for predicting K during (a) training and (b) testing stages.
Scatter plots of quadratic (SVM) model for predicting K during (a) training and (b) testing stages.
Comparison of models
Comparison of experimental K values with the predicted K values using Gaussian (ANFIS) and quadratic (SVM) models.
Comparison of experimental K values with the predicted K values using Gaussian (ANFIS) and quadratic (SVM) models.
CONCLUSIONS
The study focuses on estimating the K of porous media using two data-driven techniques, i.e., ANFIS and SVM. Based on the correlation analysis, d10, d50, Cu, and n were found to be the influential parameters to predict the K of the porous media. The results of the study infer that both techniques are effective with the dataset in estimating the K values. The findings of the ANFIS model show that the Gaussian (ANFIS) model outperforms the triangular (ANFIS) model having R and RMSE values of 0.9661 and 0.0010 for the testing dataset, respectively, whereas the quadratic (SVM) model performs better than linear (SVM), Gaussian (SVM), and cubic (SVM) models with R and RMSE values of 0.9520 and 0.0015, respectively, for the testing dataset. The comparison of performance evaluation parameters signifies that the Gaussian (ANFIS) approach predicts better K values as compared to the SVM approaches. Furthermore, the prediction performance of these techniques can be checked with a greater number of datasets, and the application of these approaches in other hydrological studies could be the subject of in-depth study in the future.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.