## Abstract

Accurate hydraulic conductivity (*K*) estimation of porous media is crucial in hydrological studies. Recently, groundwater investigators have utilized neural computing techniques to indirectly estimate soil sample *K* instead of time-consuming direct methods. The present study utilizes easily measurable characteristics, i.e., grain size at 10 and 50% finer by weight, porosity, and uniformity coefficient as input variables to examine the efficacy of feed-forward neural network (FFNN), Kohonen self-organizing maps (KSOM), and multiple linear regression (MLR) models in estimating the *K* of soil samples. Model development and validation used 70 and 30% of datasets, respectively. The determination coefficient (*R*^{2}), root mean square error (RMSE), and mean bias error (MBE) were used to compare the model performance with the measured *K* values. The study's outcome indicates that the FFNN and KSOM models better estimate the *K* value, while the MLR model performs merely satisfactorily. Overall, during validation, the FFNN model correlates better with the measured values having *R*^{2}, RMSE, and MBE of 0.94, 0.016, and 0.006, whereas the corresponding values for KSOM are 0.91, 0.024, and −0.004, and that for MLR are 0.87, 0.024, and 0.013, respectively. Notably, the FFNN model exhibits superior prediction performance and can be employed in aquifers for precise *K* estimation.

## HIGHLIGHTS

The study examines the efficacy of FFNN, KSOM, and MLR models in estimating the

*K*of porous media.The statistical indicators reveal that the FFNN model performs better in estimating the

*K*of porous media and can be employed in sub-surface flow studies for precise*K*estimation.

## INTRODUCTION

Understanding and accurate prediction of hydraulic conductivity (*K*) substantially impact the sub-surface flow, solute transport through the porous media, and hydrogeology and aquifer studies (Pliakas & Petalas 2011; Lee *et al.* 2015). The *K* of porous media is a measure of how easily a fluid particle can permeate through the network of interconnected pores (Lu *et al.* 2012; Chandel *et al.* 2021). The porous media characteristics, i.e., grain size, porosity, and uniformity of soil particles, are the fundamental independent variables, which control the *K* of porous media (Yilmaz *et al.* 2012; Wang *et al.* 2017). Apart from this, sorting of the grain size plays a vital role in the *K* estimation. The considerable spatial variability in *K* estimation implies that the large porous medium samples should be tested to assess the hydraulic characteristics (More & Deka 2018). Due to multiple pore structures and the non-uniform configuration of porous media particles, numerous efforts have been performed by various investigators to estimate the *K* of porous media (Zieba 2017; Jougnot *et al.* 2021).

Various direct and indirect methods can be employed to compute the *K* of porous media. The direct method is a set of field and experimental techniques that include the Guelph permeameter, pumping test method, tension and ring infiltrometer, and falling and constant head permeameter test (Pucko & Verbovsek 2015; Thakur *et al.* 2022). Due to the limited knowledge about the aquifer and hydraulic boundaries, estimating hydraulic conductivity in the field is difficult. It also requires prolonged testing durations and large investments, making them less reliable (Chandel *et al.* 2022b). However, the experimental *K* measurement techniques present difficulties in collecting representative porous media samples (Riha *et al.* 2018). Direct methods of measuring *K* are time consuming and less efficient due to spatial and temporal variations (Arshad *et al.* 2013). Consequently, indirect methods evolved and became prominent to compute the *K* of porous media based on easily measurable characteristics, i.e., grain size, porosity, percentage of gravel, sand and silt, uniformity of porous media, and specific gravity (Akbulut 2005; More & Deka 2018). The indirect methods include empirical methods and data-driven techniques. The empirical methods correlate the *K* with various grain-size parameters and predict the *K* value more rapidly than the direct measurement (Odong 2007). Empirical methods are restricted to their respective domains because they were developed under specified boundary conditions and may result in random errors in the hydraulic conductivity values (Chandel & Shankar 2022).

Data-driven techniques frequently known as ‘machine learning methods’ have found use in several areas of water resources engineering during the past few decades. These techniques can precisely accomplish the complex task of training, modelling, and validating the data points obtained from the experimental work (Williams & Ojuri 2021). They are either based on supervised learning, i.e., specific input and output data points, or unsupervised learning, i.e., no specific input and output data points. The supervised learning techniques include artificial neural networks (ANNs), support vector machine (SVM), and adaptive-neuro fuzzy inference system (ANFIS), while the Kohonen self-organizing maps (KSOM) is the unsupervised learning technique (Sen *et al.* 2019). These techniques have gained wide applicability in several subfields of groundwater and irrigation engineering (Yilmaz *et al.* 2012; Kumar *et al.* 2020) due to their more efficient prediction efficiency compared to the traditional direct approaches.

Akbulut (2005) investigated the performance of the ANN approach in computing the *K* of 95 coarse porous media samples. The *K* values based on ANN were compared to the multiple linear regression (MLR) and two empirical methods. The results show that ANN is a powerful prediction tool that outperforms the other approaches in predicting *K* values. Yilmaz *et al.* (2012) developed neural computing models using ANN and ANFIS to predict the *K* of coarser porous media. Due to a better correlation between grain size and *K*, different grain sizes, i.e., d_{10}, d_{30}, and d_{60}, were used to develop the model. The ANFIS model provided more reliable outcomes of *K* value than the ANN model. Arshad *et al.* (2013) evaluated three data-driven techniques, i.e., MLR, ANFIS, and ANN, to predict the *K* of porous media. The *K* values predicted by the ANFIS model were better than the other two approaches. More & Deka (2018) predicted the *K* values of soil samples based on measurable parameters, i.e., specific gravity, porosity, and percentage of gravel, sand, and silt using ANN, fuzzy-logic, and ANFIS. The values of statistical indicators infer that the ANFIS outperforms the other two approaches. Naganna & Deka (2019) evaluated the ANN, ANFIS, and SVM techniques to estimate the *K* of porous media. The study concluded that ANN and ANFIS models perform satisfactorily in computing the *K* values. Williams & Ojuri (2021) examined the performance of feed-forward neural network (FFNN) and MLR to estimate the *K* value of porous media. The study employs six input variables for model development, and the statistical indicators reveal that FFNN outperforms MLR in predicting *K* values. Despite the unencumbered prediction capability of the supervised algorithms, the missing values and outliers in the data points can reduce their prediction efficacy (Kumar *et al.* 2020).

On the other hand, the unsupervised learning-based KSOM converts large-dimension data into a small grid map by clustering (Kohonen *et al.* 1996) and creates the intrinsic correlation within the parameters. Due to clustering, the missing values can be effectively replaced using the map characteristics, thereby resulting in no hindrance during the model prediction (Kumar *et al.* 2020). Because of its adaptability, the KSOM has been utilized in water quality and evapotranspiration modelling (Rustum & Adeloye 2007; Adeloye *et al.* 2011) and irrigation and soil moisture studies (Ohana-Levi *et al.* 2019; Kumar *et al.* 2020). From the scientific literature, it has been found that the KSOM has not been used previously to estimate the hydraulic conductivity of porous media. Apart from KSOM, the FFNN has been utilized by a few investigators to estimate the *K* value based on grain-size characteristics. The grain size, uniformity coefficient, and porosity are the key influencing *K* factors, hence considered independent variables for model development. The study evaluates the performance of the supervised (FFNN), unsupervised (KSOM), and MLR learning techniques in estimating the *K* value. The objectives of this study are as follows:

- 1.
To develop and validate the FFNN, KSOM, and MLR models to estimate the

*K*of porous media based on easily measurable grain size characteristics. - 2.
To compare the prediction performance of the FFNN, KSOM, and MLR models using different statistical indicators.

## MATERIALS AND METHODOLOGY

### Materials and experimental methodology

#### Soil samples and grain-size analysis

In the present study, 165 representative soil samples were collected from the Beas riverbank of Kangra district (31°43′N and 75°32′E) in Himachal Pradesh, India, to carry out experimental investigations. The soil samples were attained using a thin-walled sampler having a penetrating length and cross-sectional diameter of 100 and 8.5 cm respectively. Located in the northwest Himalayas, the study area has an elevation of 3,200 m above the mean sea level. The collected soil samples were composed of fine gravel (4.75–10 mm), coarse, medium, and fine sand (0.075–4.75), and a few proportions of silt (<0.075 mm). Initially, the soil samples were subjected to the grain-size analysis following ASTM (2007) to determine the grain size at 10%, 30%, 50%, and 60% finer by weight (*d _{10}, d_{30}, d_{50}*, and

*d*), and uniformity coefficient. The pycnometer method was used to compute the specific gravity of the soil samples, which is helpful in the porosity determination.

_{60}#### Direct determination of hydraulic conductivity

*K*measurement setup, which includes permeameters, manometer tubes, and an overhead water supply tank. The galvanised iron pipe is used to build the permeameters having a total and test length of 1 and 0.46 m, respectively. For measuring the head difference observations, the permeameter is equipped with pressure-taping spots along its periphery, spaced 0.46 m apart from one another. To maintain the constant head, an overhead water tank at a height of 2.60 m above the ground receives water constantly from the recirculating tank at the bottom. The standard methodology described in ASTM (2006) and Chandel

*et al.*(2022a) was used to estimate the

*K*of collected soil samples. For

*K*determination, the common measurement uncertainty can arise due to the discharge measurement equipment, sample preparation, and fluctuations in head difference (Charbeneau

*et al.*2011). To minimize the experimental uncertainty error, for each sample, five to six observations of discharge and head difference were recorded, and then the average value was used to determine

*K*. During experiments, the temperature of the water was recorded using a digital thermometer, and Darcy's equation was used to compute the hydraulic conductivity of soil samples (Qiu & Wang 2015).where

*Q*= discharge rate (m

^{3}/s),

*h*= manometer head difference (m),

*A*= cross-sectional area of sample (m

^{2}), and

*L*= test length (m).

The dataset obtained from the experimental work has been utilized to develop the FFNN, KSOM, and MLR models. The grain size (*d _{10}* and

*d*), uniformity coefficient (

_{50}*U*), and porosity of packed bed (

*n*) were used as input variables to predict the

*K*value obtained from three permeameters. In this study, for one soil sample,

*d*, and

_{10}, d_{50}*U*values are the same, whereas the porosity is varied with the change in the permeameter diameter, which corresponds to different

*K*values obtained from the 5.08, 10.16, and 15.24 cm permeameter, respectively. A total number of 495 observations have been obtained from the experimental work for 165 soil samples. The dataset corresponding to 115 soil samples (70%) and 50 soil samples (30%) was used for model development and validation, respectively. The modelling details regarding the data-driven techniques used in the study have been discussed in the following section.

#### Feed-forward neural network

*K*of soil samples was determined using a multilayer perceptron FFNN. The sigmoid and linear activation functions were used in the neurons of the hidden and output layers, respectively. The best-performing network is selected based on a trial-error method, i.e., by varying the number of neurons to 10 in the hidden layer. The purpose of the trial-error method is to reduce error and attain a better correlation between the training data points while using a minimum number of hidden neurons to prevent overfitting (Kumar

*et al.*2020). For the present study, Figure 2 depicts the best-performing FFNN architecture with five hidden neurons. The learning algorithm, i.e., Levenberg-Marquart, was applied to the training network because of its superior convergence and low residuals. The FFNN model was developed and validated in Matlab R2022a using the neural network toolbox, based on the methodology as depicted in Figure 3.

#### Kohonen self-organizing maps

*et al.*1996). It uses clustering to transform multidimensional data into a simple relationship. Unsupervised competitive learning is utilized to optimize the input signal with the neurons of the Kohonen map. The clustering process of the input data is carried out to obtain an identical pattern similar to the output or its neighbouring unit (Rustum & Adeloye 2007). The KSOM includes two layers, i.e., input (high dimensional) and output (low dimensional). Figure 4 indicates that both layers are interconnected with each other. The output layer is represented by a two-dimensional grid having

*M*neurons. These neurons comprise the same set of data points similar to the input vector. The clustering process of the KSOM algorithm also reduces dimensionality as the number of neurons on the map is usually less than the input data points. Due to this, the high-dimensional data can be analysed effectively.

*M*value) is determined using the following equation (Garcia & Gonzalez 2004):where

*N*= total data points in the training dataset. By using the

*M*value, the number of rows and columns in the map is computed using the following equation (Garcia & Gonzalez 2004):where

*E*and

_{1}*E*are the largest and second largest eigenvalue of the training dataset, and

_{2}*L*and

_{1}*L*are the number of rows and columns in the map, respectively.

_{2}*D*= Euclidean distance between input and code vector

_{i}*i, x*, and

_{j}*w*=

_{ij}*j*th element of the current input and weight vector, respectively,

*m*= mask and its value is equal to 0 when

_{j}*x*is missing in the input vector, else it is 1, and

_{j}*n*= size of the input vector. The neuron corresponding to the least

*D*value is referred to as the best matching unit (BMU) or winning neuron as depicted in Figure 4. Further, to enhance the agreement with the input data, the code vectors of the BMU and its neighbouring unit are modified using the following equation (Adeloye

_{i}*et al.*2011):where

*w*=

_{i}*i*th code vector,

*t*= time,

*α(t)*and

*h*= learning rate and neighbourhood function at time

_{ci}(t)*t*. As a result, each node on the map develops the ability to identify the input vector comparable to itself. This characteristic is described as self-organizing because the classification is attained without providing any external output (Penn 2005). The process repeats itself until a specific error or optimum epoch is achieved. The

*α(t)*and

*h*determine the learning efficacy of the KSOM and hence must be selected carefully. The expression to determine

_{ci}(t)*α(t)*and

*h*(Kumar

_{ci}(t)*et al.*2021) are as follows:where = initial learning rate,

*T*= convergence training length and is equal to 250/(N)

^{0.5}(Adeloye

*et al.*2011),

*r*and

_{c}*r*= position of node

_{i}*c*and

*i*, respectively, on the KSOM map, and

*β(t)*= neighbourhood radius.

*α(t)*and

*h*reduce monotonically as the number of iterations increases. The trained KSOM quality depends on the value of topographic (

_{ci}(t)*t*) and quantization (

_{e}*q*) error and is determined as follows:where

_{e}*x*=

_{i}*i*th input vector,

*w*= prototype vector corresponding to BMU for

_{c}*x*, and

_{i}*u*= binary integer and is equal to 1 if the first and second BMUs are not adjacent, else equal to 0. The KSOM can be used for nonlinear interpolation, model identification, prediction, and generalization of data points (Kumar

*et al.*2021). However, in this study, the KSOM is utilized for prediction purposes as shown in Figure 5.

*K*values from the validation dataset, which were either missing or deliberately removed (as depicted in Figure 5), are presented to the KSOM to determine the BMU. The number of BMUs is proportional to the input vectors used during the validation dataset. The missing

*K*values were determined based on the corresponding values in the BMU. The KSOM model was developed and validated in Matlab R2022a using the SOM toolbox, based on the methodology as presented in Figure 6.

#### Multiple linear regression

Apart from FFNN and KSOM, the MLR technique was also used to develop a model using easily measurable grain-size characteristics. The detailed explanation related to MLR and its application has been discussed by Williams & Ojuri (2021) and Elbisy (2015) and hence not discussed here. The data analysis toolbox in Microsoft Excel was used to develop the MLR model. A regression equation was generated using the training dataset, which comprises *d _{10}*,

*d*,

_{50}*U*, and

*n*as the independent variables and

*K*as the dependent variable. The performance of the developed equation was evaluated using the validation dataset.

### Statistical analysis

*K*was evaluated using different statistical indicators i.e., root mean square error (RMSE), determination coefficient (

*R*

^{2}), mean absolute error (MAE), mean bias error (MBE), scatter index (SI), and agreement index (AI) (Naeej

*et al.*2017). The analysis of variance, i.e., analysis of variance statistics, was used to examine the significance of the regression line. The equations to compute the statistical indicators are as follows:where

*z*and

_{i}*p*= the actual and model-predicted values, respectively, and = mean of actual and model-predicted values, respectively, and

_{i}*m*= total dataset used. The values of RMSE, MAE, MBE, and SI vary from 0 to ∞, and 0 to 1 for AI and

*R*

^{2}(Chandel

*et al.*2022b). The

*R*

^{2}and AI values are closer to 1, and lower values of RMSE, MAE, MBE, and SI indicate a better correlation between the actual and model-predicted values (Naeej

*et al.*2017).

## RESULTS AND DISCUSSION

The study utilizes the easily measurable grain-size characteristics and experimental observations to develop FFNN, KSOM, and MLR models. The first section of the study includes the statistical analysis and dataset used for model development and validation, whereas the second section comprises the *K* estimation using different models and their quantitative evaluation via statistical performance indicators.

### Statistical analysis

Initially, the grain-size analysis was conducted on the collected soil samples to determine the grain-size characteristics i.e., *d _{10}*,

*d*,

_{30}*d*,

_{50}*d*, and

_{60}*U*. Apart from this the gravel, sand and silt content percentage,

*n*and

*K*values were also determined from the experimental investigations. The statistical summary, i.e., the minimum, maximum, mean, and standard deviation of the experimental observations have been mentioned in Table 1. The

*n*and

*K*values for the collected soil samples vary from 0.286 to 0.426 and 0.010 to 0.342 cm/s, respectively.

Characteristics . | Minimum . | Maximum . | Mean . | Standard deviation . |
---|---|---|---|---|

d_{10} (mm) | 0.100 | 0.409 | 0.211 | 0.089 |

d_{30} (mm) | 0.193 | 0.821 | 0.393 | 0.154 |

d_{50} (mm) | 0.280 | 1.496 | 0.667 | 0.308 |

d_{60} (mm) | 0.372 | 2.101 | 0.936 | 0.474 |

U^{a} | 2.150 | 9.567 | 4.493 | 1.497 |

n^{a} | 0.286 | 0.426 | 0.362 | 0.029 |

Gravel (%) | 2.200 | 31.240 | 7.050 | 4.430 |

Sand (%) | 66.780 | 95.460 | 88.880 | 4.450 |

Silt (%) | 1.130 | 7.340 | 4.070 | 1.430 |

K (cm/s) | 0.010 | 0.342 | 0.069 | 0.061 |

Characteristics . | Minimum . | Maximum . | Mean . | Standard deviation . |
---|---|---|---|---|

d_{10} (mm) | 0.100 | 0.409 | 0.211 | 0.089 |

d_{30} (mm) | 0.193 | 0.821 | 0.393 | 0.154 |

d_{50} (mm) | 0.280 | 1.496 | 0.667 | 0.308 |

d_{60} (mm) | 0.372 | 2.101 | 0.936 | 0.474 |

U^{a} | 2.150 | 9.567 | 4.493 | 1.497 |

n^{a} | 0.286 | 0.426 | 0.362 | 0.029 |

Gravel (%) | 2.200 | 31.240 | 7.050 | 4.430 |

Sand (%) | 66.780 | 95.460 | 88.880 | 4.450 |

Silt (%) | 1.130 | 7.340 | 4.070 | 1.430 |

K (cm/s) | 0.010 | 0.342 | 0.069 | 0.061 |

** ^{a}**The dimensionless characteristics.

Further, to check the correlation between the independent parameters (*d _{10}, d_{30}, d_{50}, d_{60}, U,* and

*n*) and dependent parameter (

*K*), a correlation matrix has been examined. The grain-size characteristics, i.e.,

*d*and

_{10}*d*indicate a better correlation (0.94 and 0.80) and

_{50}*d*and

_{30}*d*show a poor correlation (0.40 and 0.34) with the

_{60}*K*value as presented in Table 2. However, the uniformity coefficient and porosity show a relatively good negative and positive correlation (−0.58 and 0.60) with the

*K*value, respectively. Therefore, based on the correlation value, the parameters, i.e.,

*d*,

_{10}*d*,

_{50}*U*, and

*n*, are considered the input variables for the model development.

. | d_{10}
. | d_{30}
. | d_{50}
. | d_{60}
. | U
. | n
. | K
. |
---|---|---|---|---|---|---|---|

d_{10} | 1.00 | ||||||

d_{30} | 0.93 | 1.00 | |||||

d_{50} | 0.89 | 0.96 | 1.00 | ||||

d_{60} | 0.82 | 0.91 | 0.96 | 1.00 | |||

U | −0.08 | 0.15 | 0.31 | 0.47 | 1.00 | ||

n | 0.02 | −0.21 | −0.37 | −0.50 | −0.96 | 1.00 | |

K | 0.94* | 0.40 | 0.80* | 0.34 | −0.58* | 0.60* | 1.00 |

. | d_{10}
. | d_{30}
. | d_{50}
. | d_{60}
. | U
. | n
. | K
. |
---|---|---|---|---|---|---|---|

d_{10} | 1.00 | ||||||

d_{30} | 0.93 | 1.00 | |||||

d_{50} | 0.89 | 0.96 | 1.00 | ||||

d_{60} | 0.82 | 0.91 | 0.96 | 1.00 | |||

U | −0.08 | 0.15 | 0.31 | 0.47 | 1.00 | ||

n | 0.02 | −0.21 | −0.37 | −0.50 | −0.96 | 1.00 | |

K | 0.94* | 0.40 | 0.80* | 0.34 | −0.58* | 0.60* | 1.00 |

***The statistically significant correlation (*p* < 0.05).

In this study, experimental observations of 165 soil samples have been determined to develop a dataset for model development and validation. The datasets corresponding to 115 soil samples (70%) and 50 soil samples (30%) were utilized for model development and validation, respectively. Table 3 represents the statistical summary of the input (*d _{10}*,

*d*,

_{50}*U*, and

*n*) and output (

*K*) parameters used for the model development and validation.

. | Model development . | Model validation . | ||||||
---|---|---|---|---|---|---|---|---|

Parameter . | Minimum . | Maximum . | Mean . | Standard deviation . | Minimum . | Maximum . | Mean . | Standard deviation . |

d_{10} (mm) | 0.100 | 0.409 | 0.203 | 0.090 | 0.115 | 0.395 | 0.228 | 0.084 |

d_{50} (mm) | 0.280 | 1.450 | 0.637 | 0.298 | 0.390 | 1.496 | 0.738 | 0.320 |

U^{a} | 2.150 | 9.567 | 4.466 | 1.544 | 2.498 | 8.323 | 4.554 | 1.387 |

n^{a} | 0.286 | 0.426 | 0.363 | 0.029 | 0.297 | 0.415 | 0.360 | 0.028 |

K (cm/s) | 0.010 | 0.342 | 0.067 | 0.065 | 0.011 | 0.221 | 0.072 | 0.052 |

. | Model development . | Model validation . | ||||||
---|---|---|---|---|---|---|---|---|

Parameter . | Minimum . | Maximum . | Mean . | Standard deviation . | Minimum . | Maximum . | Mean . | Standard deviation . |

d_{10} (mm) | 0.100 | 0.409 | 0.203 | 0.090 | 0.115 | 0.395 | 0.228 | 0.084 |

d_{50} (mm) | 0.280 | 1.450 | 0.637 | 0.298 | 0.390 | 1.496 | 0.738 | 0.320 |

U^{a} | 2.150 | 9.567 | 4.466 | 1.544 | 2.498 | 8.323 | 4.554 | 1.387 |

n^{a} | 0.286 | 0.426 | 0.363 | 0.029 | 0.297 | 0.415 | 0.360 | 0.028 |

K (cm/s) | 0.010 | 0.342 | 0.067 | 0.065 | 0.011 | 0.221 | 0.072 | 0.052 |

** ^{a}**The dimensionless parameter.

### FFNN model for *K* estimation

The data points corresponding to the 115 soil samples were utilized for training the architecture of the FFNN model. The trial-error method for various neuron configurations in the hidden layer was performed to identify the best-performing FFNN model. Based on the analysis, the best-performing FFNN model consists of four input variables (*d _{10}*,

*d*,

_{50}*U*, and

*n*), one hidden layer (five neurons), and an output variable (

*K*). The efficacy of the FFNN model to predict

*K*was evaluated using statistical indicators and a scatter plot. The statistical indicators computed for the FFNN model are presented in Table 4.

Statistical indicators . | FFNN model . | KSOM model . | MLR model . | |||
---|---|---|---|---|---|---|

Development . | Validation . | Development . | Validation . | Development . | Validation . | |

RMSE | 0.012 | 0.016 | 0.018 | 0.024 | 0.022 | 0.024 |

R^{2} | 0.964 | 0.943 | 0.937 | 0.909 | 0.891 | 0.868 |

MAE | 0.006 | 0.007 | 0.010 | 0.012 | 0.013 | 0.017 |

MBE | 0.000 | 0.006 | −0.002 | −0.004 | −0.003 | 0.013 |

SI | 0.178 | 0.229 | 0.266 | 0.276 | 0.311 | 0.331 |

AI | 0.980 | 0.977 | 0.975 | 0.963 | 0.969 | 0.950 |

Statistical indicators . | FFNN model . | KSOM model . | MLR model . | |||
---|---|---|---|---|---|---|

Development . | Validation . | Development . | Validation . | Development . | Validation . | |

RMSE | 0.012 | 0.016 | 0.018 | 0.024 | 0.022 | 0.024 |

R^{2} | 0.964 | 0.943 | 0.937 | 0.909 | 0.891 | 0.868 |

MAE | 0.006 | 0.007 | 0.010 | 0.012 | 0.013 | 0.017 |

MBE | 0.000 | 0.006 | −0.002 | −0.004 | −0.003 | 0.013 |

SI | 0.178 | 0.229 | 0.266 | 0.276 | 0.311 | 0.331 |

AI | 0.980 | 0.977 | 0.975 | 0.963 | 0.969 | 0.950 |

*K*values during model development and validation. The

*K*values predicted using the FFNN model correlated well with the measured values, with

*R*

^{2}values of 0.96 and 0.94 during model development and validation, respectively. During validation, the performance of the FFNN model was good for lower

*K*values (<0.15 cm/s), whereas moderately decent for

*K*values greater than 0.15 cm/s as illustrated in Figure 7(b). This implies that the FFNN model could not efficiently learn

*K*values greater than 0.15 cm/s from the training dataset. The prediction results of the FFNN model were similar to the outcomes of Yilmaz

*et al.*(2012), who employed the FFNN model to estimate the

*K*using three grain-size parameters. Figure 7 depicts a uniform scatter along a 1-1 line derived by studying the linear regression between the measured and FFNN-predicted

*K*values. During model development and validation, the slope of the regression line was not different (

*p*> 0.05) from the 1-1 line, implying that the FFNN model has a negligible bias in predicting the

*K*value of soil samples. Further, this is confirmed by the low MBE values mentioned in Table 4.

### KSOM model for *K* estimation

*K*was developed and validated corresponding to the data points obtained from the 115 and 50 soil samples, respectively. The default values of the learning rate (

*α*= 0.5) and neighbourhood radius (

*β*= max(

*L*,

_{1}*L*)/4) were used initially to train the KSOM model, where

_{2}*L*and

_{1}*L*indicate the map dimensions, determined by Equation (3). The final map size is determined by the SOM toolbox as per Equation (2), and the number of neurons on the final map is adjusted so that it is equal to the product of

_{2}*L*and

_{1}*L*. The map size obtained from the KSOM model has dimensions of 12 × 8 and comprises 96 units. The map's errors, i.e., topographic and quantization, are 0.094 and 0.197, respectively. The significant characteristic of KSOM is the ability to visually examine the correlation between various parameters through the development of the component planes. Figure 8 represents the KSOM component planes for the parameters used in this study.

_{2}The component plane for each parameter is composed of 96 hexagonal units, and their values are denoted by the adjacent colour coding. The component plane's high, medium, and low values are indicated by the yellow, light blue, and dark blue colours, respectively. As a result, the component planes make it easier to visually identify the regions, where the parameter is low, high, or medium. The parameters, i.e., *d _{10}*,

*d*,

_{50}*U*,

*n*, and

*K*, on the component planes vary from 0.107–0.362 mm, 0.324–1.30 mm, 2.62–8.43, 0.302–0.403, and 0.015–0.202 cm/s, respectively.

The visual examination of Figure 8 reveals that the colour gradient of the *d _{10}* and

*d*component plane is parallel with the

_{50}*K*component plane, i.e., low

*d*and

_{10}*d*values are correlated with the low

_{50}*K*values and vice-versa. The component planes of

*U*and

*n*parameters are inversely correlated at the top left and bottom right sides. However, the colour gradient is similar, mainly in the central region of these parameters. Notably, some observations have been drawn by looking at the component plane of

*U*,

*n*, and

*K*. On the top left side, the low

*K*values are correlated with the low and high values of

*n*and

*U*, respectively. In contrast, on the top right side, higher

*K*values are correlated with the medium values of

*n*and

*U*. Similarly, more observations can be drawn from the component planes. Part-wise assessment of component planes shows varying correlations between the parameters; however, a general assessment cannot be made.

*K*of soil samples. Table 4 represents the values of statistical indicators computed for the KSOM model during development and validation. Figure 9 depicts the model development and validation scatter plots between the measured and KSOM-predicted

*K*values.

Based on the scatter plot, the efficacy of the KSOM model to predict the *K* was relatively decent with an *R*^{2} value of 0.94 and 0.91 during model development and validation, respectively. The performance of the KSOM model during validation to predict lower *K* values (<0.06 cm/s) was better, while mere satisfactory performance for *K* values (>0.06 cm/s) as shown in Figure 7(b). The slope of the regression line during model development and validation was not different (*p* > 0.05) from the 1-1 line, suggesting a low bias in the KSOM model in predicting the *K* value. This is further verified by the low negative MBE values mentioned in Table 4. The negative MBE is due to the low predicted *K* values via the KSOM model than the measured values.

### MLR model for *K* estimation

*d*,

_{10}*d*,

_{50}*U*, and

*n*, and

*K*as the output parameter. Further, the data points corresponding to the 50 soil samples were utilized to validate the MLR model. Equation (16) represents the developed MLR model for predicting the

*K*of soil samples.

*K*was evaluated using scatter plots and statistical indicators. The statistical indicators computed for the MLR model are presented in Table 4. Figure 10 indicates the scatter plot between measured and MLR model-based predicted

*K*values. The developed MLR model's predictability of

*K*was satisfactory during development, with an

*R*

^{2}of 0.89, while merely decent performance during validation, with an

*R*

^{2}of 0.87. During model validation, the slope of the regression line was significantly different (

*p*< 0.05) from the 1-1 line, suggesting a bias in the MLR model. This is further verified by the MBE of 0.013 in the MLR model in predicting

*K*of soil samples.

### Comparison of FFNN, KSOM, and MLR models

*K*of porous media. For the validation data points, Figure 11 represents the variations of

*K*values predicted based on FFNN, KSOM, and MLR models with the measured values. The data points corresponding to the FFNN model closely followed the trend of the measured values compared to the KSOM and MLR models, indicating the better performance of the FFNN model in estimating the

*K*value.

Further, from the statistical performance indicators (Table 4), it has been found that the FFNN model for estimating the *K* of soil samples performs better than the KSOM and MLR models. During the validation, the *R*^{2} and AI values for the FFNN model are 0.94 and 0.97, respectively, whereas 0.91 and 0.96, and 0.87 and 0.95 for the KSOM and MLR models, respectively. For the FFNN model, the *R*^{2} and AI values are closer to 1, indicating a better correlation with the measured values and also signifying that independent variables in the model effectively incorporate the variation in the dependent variable. Lower MAE, MBE, RMSE, and SI values, i.e., 0.007, 0.006, 0.016, and 0.229 for the FFNN model, indicate less variability with the measured values, compared to the KSOM and MLR models during validation. This further substantiates the efficacy of the FFNN model in estimating the *K* of soil samples. The prediction efficacy of the FFNN, KSOM, and MLR models is influenced by the information they learn during the training of model development. Notably, prediction efficacy improves when more information within a given input range is provided. In the present study, the *K* values ranging from 0.01 to 0.07 cm/s appeared more frequently in the training dataset, resulting in better performance of the developed models in the lower region compared to the higher *K* values during validation.

## CONCLUSIONS

The present study is focused on examining the efficacy of FFNN, KSOM, and MLR models to predict the hydraulic conductivity of porous media based on easily measurable soil characteristics. The correlation analysis between the dependent and independent parameters indicates that the grain size (*d _{10}* and

*d*), uniformity coefficient, and porosity were identified as the main factors influencing

_{50}*K*. The

*K*values predicted using FFNN, KSOM, and MLR models were compared with the measured values via different statistical indicators and scatter plots. The FFNN model performs better in estimating the

*K*of soil samples during model development and validation, indicating the best-performing model. In contrast, the KSOM and MLR models perform relatively decent when estimating the

*K*value during model development, but their performance during validation is merely adequate. The statistical indicators

*R*

^{2}and AI for the FFNN model are 0.94 and 0.97, respectively, and the lower values of MAE, MBE, RMSE, and SI (0.007, 0.006, 0.016, and 0.229, respectively) compared to the KSOM and MLR models during validation substantiate the efficacy of the FFNN model in estimating the

*K*value. Comparing different models demonstrated that neural computing techniques can reduce uncertainties and provide new approaches that eliminate correlation inconsistency. The study encourages future research into these techniques for estimating

*K*for more porous media samples with different independent parameters.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.