Accurate hydraulic conductivity (K) estimation of porous media is crucial in hydrological studies. Recently, groundwater investigators have utilized neural computing techniques to indirectly estimate soil sample K instead of time-consuming direct methods. The present study utilizes easily measurable characteristics, i.e., grain size at 10 and 50% finer by weight, porosity, and uniformity coefficient as input variables to examine the efficacy of feed-forward neural network (FFNN), Kohonen self-organizing maps (KSOM), and multiple linear regression (MLR) models in estimating the K of soil samples. Model development and validation used 70 and 30% of datasets, respectively. The determination coefficient (R2), root mean square error (RMSE), and mean bias error (MBE) were used to compare the model performance with the measured K values. The study's outcome indicates that the FFNN and KSOM models better estimate the K value, while the MLR model performs merely satisfactorily. Overall, during validation, the FFNN model correlates better with the measured values having R2, RMSE, and MBE of 0.94, 0.016, and 0.006, whereas the corresponding values for KSOM are 0.91, 0.024, and −0.004, and that for MLR are 0.87, 0.024, and 0.013, respectively. Notably, the FFNN model exhibits superior prediction performance and can be employed in aquifers for precise K estimation.

  • The study examines the efficacy of FFNN, KSOM, and MLR models in estimating the K of porous media.

  • The statistical indicators reveal that the FFNN model performs better in estimating the K of porous media and can be employed in sub-surface flow studies for precise K estimation.

Understanding and accurate prediction of hydraulic conductivity (K) substantially impact the sub-surface flow, solute transport through the porous media, and hydrogeology and aquifer studies (Pliakas & Petalas 2011; Lee et al. 2015). The K of porous media is a measure of how easily a fluid particle can permeate through the network of interconnected pores (Lu et al. 2012; Chandel et al. 2021). The porous media characteristics, i.e., grain size, porosity, and uniformity of soil particles, are the fundamental independent variables, which control the K of porous media (Yilmaz et al. 2012; Wang et al. 2017). Apart from this, sorting of the grain size plays a vital role in the K estimation. The considerable spatial variability in K estimation implies that the large porous medium samples should be tested to assess the hydraulic characteristics (More & Deka 2018). Due to multiple pore structures and the non-uniform configuration of porous media particles, numerous efforts have been performed by various investigators to estimate the K of porous media (Zieba 2017; Jougnot et al. 2021).

Various direct and indirect methods can be employed to compute the K of porous media. The direct method is a set of field and experimental techniques that include the Guelph permeameter, pumping test method, tension and ring infiltrometer, and falling and constant head permeameter test (Pucko & Verbovsek 2015; Thakur et al. 2022). Due to the limited knowledge about the aquifer and hydraulic boundaries, estimating hydraulic conductivity in the field is difficult. It also requires prolonged testing durations and large investments, making them less reliable (Chandel et al. 2022b). However, the experimental K measurement techniques present difficulties in collecting representative porous media samples (Riha et al. 2018). Direct methods of measuring K are time consuming and less efficient due to spatial and temporal variations (Arshad et al. 2013). Consequently, indirect methods evolved and became prominent to compute the K of porous media based on easily measurable characteristics, i.e., grain size, porosity, percentage of gravel, sand and silt, uniformity of porous media, and specific gravity (Akbulut 2005; More & Deka 2018). The indirect methods include empirical methods and data-driven techniques. The empirical methods correlate the K with various grain-size parameters and predict the K value more rapidly than the direct measurement (Odong 2007). Empirical methods are restricted to their respective domains because they were developed under specified boundary conditions and may result in random errors in the hydraulic conductivity values (Chandel & Shankar 2022).

Data-driven techniques frequently known as ‘machine learning methods’ have found use in several areas of water resources engineering during the past few decades. These techniques can precisely accomplish the complex task of training, modelling, and validating the data points obtained from the experimental work (Williams & Ojuri 2021). They are either based on supervised learning, i.e., specific input and output data points, or unsupervised learning, i.e., no specific input and output data points. The supervised learning techniques include artificial neural networks (ANNs), support vector machine (SVM), and adaptive-neuro fuzzy inference system (ANFIS), while the Kohonen self-organizing maps (KSOM) is the unsupervised learning technique (Sen et al. 2019). These techniques have gained wide applicability in several subfields of groundwater and irrigation engineering (Yilmaz et al. 2012; Kumar et al. 2020) due to their more efficient prediction efficiency compared to the traditional direct approaches.

Akbulut (2005) investigated the performance of the ANN approach in computing the K of 95 coarse porous media samples. The K values based on ANN were compared to the multiple linear regression (MLR) and two empirical methods. The results show that ANN is a powerful prediction tool that outperforms the other approaches in predicting K values. Yilmaz et al. (2012) developed neural computing models using ANN and ANFIS to predict the K of coarser porous media. Due to a better correlation between grain size and K, different grain sizes, i.e., d10, d30, and d60, were used to develop the model. The ANFIS model provided more reliable outcomes of K value than the ANN model. Arshad et al. (2013) evaluated three data-driven techniques, i.e., MLR, ANFIS, and ANN, to predict the K of porous media. The K values predicted by the ANFIS model were better than the other two approaches. More & Deka (2018) predicted the K values of soil samples based on measurable parameters, i.e., specific gravity, porosity, and percentage of gravel, sand, and silt using ANN, fuzzy-logic, and ANFIS. The values of statistical indicators infer that the ANFIS outperforms the other two approaches. Naganna & Deka (2019) evaluated the ANN, ANFIS, and SVM techniques to estimate the K of porous media. The study concluded that ANN and ANFIS models perform satisfactorily in computing the K values. Williams & Ojuri (2021) examined the performance of feed-forward neural network (FFNN) and MLR to estimate the K value of porous media. The study employs six input variables for model development, and the statistical indicators reveal that FFNN outperforms MLR in predicting K values. Despite the unencumbered prediction capability of the supervised algorithms, the missing values and outliers in the data points can reduce their prediction efficacy (Kumar et al. 2020).

On the other hand, the unsupervised learning-based KSOM converts large-dimension data into a small grid map by clustering (Kohonen et al. 1996) and creates the intrinsic correlation within the parameters. Due to clustering, the missing values can be effectively replaced using the map characteristics, thereby resulting in no hindrance during the model prediction (Kumar et al. 2020). Because of its adaptability, the KSOM has been utilized in water quality and evapotranspiration modelling (Rustum & Adeloye 2007; Adeloye et al. 2011) and irrigation and soil moisture studies (Ohana-Levi et al. 2019; Kumar et al. 2020). From the scientific literature, it has been found that the KSOM has not been used previously to estimate the hydraulic conductivity of porous media. Apart from KSOM, the FFNN has been utilized by a few investigators to estimate the K value based on grain-size characteristics. The grain size, uniformity coefficient, and porosity are the key influencing K factors, hence considered independent variables for model development. The study evaluates the performance of the supervised (FFNN), unsupervised (KSOM), and MLR learning techniques in estimating the K value. The objectives of this study are as follows:

  • 1.

    To develop and validate the FFNN, KSOM, and MLR models to estimate the K of porous media based on easily measurable grain size characteristics.

  • 2.

    To compare the prediction performance of the FFNN, KSOM, and MLR models using different statistical indicators.

Materials and experimental methodology

Soil samples and grain-size analysis

In the present study, 165 representative soil samples were collected from the Beas riverbank of Kangra district (31°43′N and 75°32′E) in Himachal Pradesh, India, to carry out experimental investigations. The soil samples were attained using a thin-walled sampler having a penetrating length and cross-sectional diameter of 100 and 8.5 cm respectively. Located in the northwest Himalayas, the study area has an elevation of 3,200 m above the mean sea level. The collected soil samples were composed of fine gravel (4.75–10 mm), coarse, medium, and fine sand (0.075–4.75), and a few proportions of silt (<0.075 mm). Initially, the soil samples were subjected to the grain-size analysis following ASTM (2007) to determine the grain size at 10%, 30%, 50%, and 60% finer by weight (d10, d30, d50, and d60), and uniformity coefficient. The pycnometer method was used to compute the specific gravity of the soil samples, which is helpful in the porosity determination.

Direct determination of hydraulic conductivity

The soil samples were tested for hydraulic conductivity with a constant head permeameter of different diameters (5.08, 10.16, and 15.24 cm), as indicated in Figure 1.
Figure 1

Experimental soil hydraulic conductivity measuring setup.

Figure 1

Experimental soil hydraulic conductivity measuring setup.

Close modal
Figure 1 depicts the line diagram of the K measurement setup, which includes permeameters, manometer tubes, and an overhead water supply tank. The galvanised iron pipe is used to build the permeameters having a total and test length of 1 and 0.46 m, respectively. For measuring the head difference observations, the permeameter is equipped with pressure-taping spots along its periphery, spaced 0.46 m apart from one another. To maintain the constant head, an overhead water tank at a height of 2.60 m above the ground receives water constantly from the recirculating tank at the bottom. The standard methodology described in ASTM (2006) and Chandel et al. (2022a) was used to estimate the K of collected soil samples. For K determination, the common measurement uncertainty can arise due to the discharge measurement equipment, sample preparation, and fluctuations in head difference (Charbeneau et al. 2011). To minimize the experimental uncertainty error, for each sample, five to six observations of discharge and head difference were recorded, and then the average value was used to determine K. During experiments, the temperature of the water was recorded using a digital thermometer, and Darcy's equation was used to compute the hydraulic conductivity of soil samples (Qiu & Wang 2015).
(1)
where Q = discharge rate (m3/s), h = manometer head difference (m), A = cross-sectional area of sample (m2), and L = test length (m).

The dataset obtained from the experimental work has been utilized to develop the FFNN, KSOM, and MLR models. The grain size (d10 and d50), uniformity coefficient (U), and porosity of packed bed (n) were used as input variables to predict the K value obtained from three permeameters. In this study, for one soil sample, d10, d50, and U values are the same, whereas the porosity is varied with the change in the permeameter diameter, which corresponds to different K values obtained from the 5.08, 10.16, and 15.24 cm permeameter, respectively. A total number of 495 observations have been obtained from the experimental work for 165 soil samples. The dataset corresponding to 115 soil samples (70%) and 50 soil samples (30%) was used for model development and validation, respectively. The modelling details regarding the data-driven techniques used in the study have been discussed in the following section.

Feed-forward neural network

The most widely used artificial neural network is the FFNN, in which the network is trained via various forward and backward passes until a specified number of epochs or minimum error is attained (Dawson & Wilby 2001). The FFNN is the supervised learning approach, in which both input and output are provided to obtain the synaptic weights during the model development (Jain & Kumar 2007). The FFNN includes three layers, i.e., input, output, and hidden layers, as presented in Figure 2. During training, the model stored information consisting of the network structure and the synaptic weights. The model utilizes the learning experience of the training to predict the output when new input data are provided. Before presenting the input data to the network, the variables were normalized between −1 and +1.
Figure 2

Representation of FFNN used in the present study.

Figure 2

Representation of FFNN used in the present study.

Close modal
In this study, the K of soil samples was determined using a multilayer perceptron FFNN. The sigmoid and linear activation functions were used in the neurons of the hidden and output layers, respectively. The best-performing network is selected based on a trial-error method, i.e., by varying the number of neurons to 10 in the hidden layer. The purpose of the trial-error method is to reduce error and attain a better correlation between the training data points while using a minimum number of hidden neurons to prevent overfitting (Kumar et al. 2020). For the present study, Figure 2 depicts the best-performing FFNN architecture with five hidden neurons. The learning algorithm, i.e., Levenberg-Marquart, was applied to the training network because of its superior convergence and low residuals. The FFNN model was developed and validated in Matlab R2022a using the neural network toolbox, based on the methodology as depicted in Figure 3.
Figure 3

Flowchart of FFNN modelling approach.

Figure 3

Flowchart of FFNN modelling approach.

Close modal

Kohonen self-organizing maps

The KSOM frequently referred to as the Kohonen or Feature map is a popular unsupervised neural network (Kohonen et al. 1996). It uses clustering to transform multidimensional data into a simple relationship. Unsupervised competitive learning is utilized to optimize the input signal with the neurons of the Kohonen map. The clustering process of the input data is carried out to obtain an identical pattern similar to the output or its neighbouring unit (Rustum & Adeloye 2007). The KSOM includes two layers, i.e., input (high dimensional) and output (low dimensional). Figure 4 indicates that both layers are interconnected with each other. The output layer is represented by a two-dimensional grid having M neurons. These neurons comprise the same set of data points similar to the input vector. The clustering process of the KSOM algorithm also reduces dimensionality as the number of neurons on the map is usually less than the input data points. Due to this, the high-dimensional data can be analysed effectively.
Figure 4

Schematic representation of the BMU in a KSOM.

Figure 4

Schematic representation of the BMU in a KSOM.

Close modal
The optimum number of neurons in the output layer (M value) is determined using the following equation (Garcia & Gonzalez 2004):
(2)
where N = total data points in the training dataset. By using the M value, the number of rows and columns in the map is computed using the following equation (Garcia & Gonzalez 2004):
(3)
where E1 and E2 are the largest and second largest eigenvalue of the training dataset, and L1 and L2 are the number of rows and columns in the map, respectively.
During the initial stages of KSOM training, the input data points are normalized to ensure that all variables contribute equally to developing the map. Then a normalized input vector is selected randomly and delivered to every neuron, which is presented on the map. To find the code vector most comparable to the input vector, the KSOM utilizes the concept of Euclidean distance, which is determined as follows:
(4)
where Di = Euclidean distance between input and code vector i, xj, and wij = jth element of the current input and weight vector, respectively, mj = mask and its value is equal to 0 when xj is missing in the input vector, else it is 1, and n = size of the input vector. The neuron corresponding to the least Di value is referred to as the best matching unit (BMU) or winning neuron as depicted in Figure 4. Further, to enhance the agreement with the input data, the code vectors of the BMU and its neighbouring unit are modified using the following equation (Adeloye et al. 2011):
(5)
where wi = ith code vector, t = time, α(t) and hci(t) = learning rate and neighbourhood function at time t. As a result, each node on the map develops the ability to identify the input vector comparable to itself. This characteristic is described as self-organizing because the classification is attained without providing any external output (Penn 2005). The process repeats itself until a specific error or optimum epoch is achieved. The α(t) and hci(t) determine the learning efficacy of the KSOM and hence must be selected carefully. The expression to determine α(t) and hci(t) (Kumar et al. 2021) are as follows:
(6)
(7)
where = initial learning rate, T = convergence training length and is equal to 250/(N)0.5 (Adeloye et al. 2011), rc and ri = position of node c and i, respectively, on the KSOM map, and β(t) = neighbourhood radius. α(t) and hci(t) reduce monotonically as the number of iterations increases. The trained KSOM quality depends on the value of topographic (te) and quantization (qe) error and is determined as follows:
(8)
(9)
where xi = ith input vector, wc = prototype vector corresponding to BMU for xi, and u = binary integer and is equal to 1 if the first and second BMUs are not adjacent, else equal to 0. The KSOM can be used for nonlinear interpolation, model identification, prediction, and generalization of data points (Kumar et al. 2021). However, in this study, the KSOM is utilized for prediction purposes as shown in Figure 5.
Figure 5

Prediction of the missing variable of the input vector via KSOM.

Figure 5

Prediction of the missing variable of the input vector via KSOM.

Close modal
Initially, the KSOM model is developed using the input vectors of the training dataset. Then, the K values from the validation dataset, which were either missing or deliberately removed (as depicted in Figure 5), are presented to the KSOM to determine the BMU. The number of BMUs is proportional to the input vectors used during the validation dataset. The missing K values were determined based on the corresponding values in the BMU. The KSOM model was developed and validated in Matlab R2022a using the SOM toolbox, based on the methodology as presented in Figure 6.
Figure 6

Flowchart of KSOM modelling approach.

Figure 6

Flowchart of KSOM modelling approach.

Close modal

Multiple linear regression

Apart from FFNN and KSOM, the MLR technique was also used to develop a model using easily measurable grain-size characteristics. The detailed explanation related to MLR and its application has been discussed by Williams & Ojuri (2021) and Elbisy (2015) and hence not discussed here. The data analysis toolbox in Microsoft Excel was used to develop the MLR model. A regression equation was generated using the training dataset, which comprises d10, d50, U, and n as the independent variables and K as the dependent variable. The performance of the developed equation was evaluated using the validation dataset.

Statistical analysis

The quantitative performance of the FFNN, KSOM, and MLR models to predict K was evaluated using different statistical indicators i.e., root mean square error (RMSE), determination coefficient (R2), mean absolute error (MAE), mean bias error (MBE), scatter index (SI), and agreement index (AI) (Naeej et al. 2017). The analysis of variance, i.e., analysis of variance statistics, was used to examine the significance of the regression line. The equations to compute the statistical indicators are as follows:
(10)
(11)
(12)
(13)
(14)
(15)
where zi and pi = the actual and model-predicted values, respectively, and = mean of actual and model-predicted values, respectively, and m = total dataset used. The values of RMSE, MAE, MBE, and SI vary from 0 to ∞, and 0 to 1 for AI and R2 (Chandel et al. 2022b). The R2 and AI values are closer to 1, and lower values of RMSE, MAE, MBE, and SI indicate a better correlation between the actual and model-predicted values (Naeej et al. 2017).

The study utilizes the easily measurable grain-size characteristics and experimental observations to develop FFNN, KSOM, and MLR models. The first section of the study includes the statistical analysis and dataset used for model development and validation, whereas the second section comprises the K estimation using different models and their quantitative evaluation via statistical performance indicators.

Statistical analysis

Initially, the grain-size analysis was conducted on the collected soil samples to determine the grain-size characteristics i.e., d10, d30, d50, d60, and U. Apart from this the gravel, sand and silt content percentage, n and K values were also determined from the experimental investigations. The statistical summary, i.e., the minimum, maximum, mean, and standard deviation of the experimental observations have been mentioned in Table 1. The n and K values for the collected soil samples vary from 0.286 to 0.426 and 0.010 to 0.342 cm/s, respectively.

Table 1

Statistics summary of the basic characteristics of soil samples

CharacteristicsMinimumMaximumMeanStandard deviation
d10 (mm) 0.100 0.409 0.211 0.089 
d30 (mm) 0.193 0.821 0.393 0.154 
d50 (mm) 0.280 1.496 0.667 0.308 
d60 (mm) 0.372 2.101 0.936 0.474 
Ua 2.150 9.567 4.493 1.497 
na 0.286 0.426 0.362 0.029 
Gravel (%) 2.200 31.240 7.050 4.430 
Sand (%) 66.780 95.460 88.880 4.450 
Silt (%) 1.130 7.340 4.070 1.430 
K (cm/s) 0.010 0.342 0.069 0.061 
CharacteristicsMinimumMaximumMeanStandard deviation
d10 (mm) 0.100 0.409 0.211 0.089 
d30 (mm) 0.193 0.821 0.393 0.154 
d50 (mm) 0.280 1.496 0.667 0.308 
d60 (mm) 0.372 2.101 0.936 0.474 
Ua 2.150 9.567 4.493 1.497 
na 0.286 0.426 0.362 0.029 
Gravel (%) 2.200 31.240 7.050 4.430 
Sand (%) 66.780 95.460 88.880 4.450 
Silt (%) 1.130 7.340 4.070 1.430 
K (cm/s) 0.010 0.342 0.069 0.061 

aThe dimensionless characteristics.

Further, to check the correlation between the independent parameters (d10, d30, d50, d60, U, and n) and dependent parameter (K), a correlation matrix has been examined. The grain-size characteristics, i.e., d10 and d50 indicate a better correlation (0.94 and 0.80) and d30 and d60 show a poor correlation (0.40 and 0.34) with the K value as presented in Table 2. However, the uniformity coefficient and porosity show a relatively good negative and positive correlation (−0.58 and 0.60) with the K value, respectively. Therefore, based on the correlation value, the parameters, i.e., d10, d50, U, and n, are considered the input variables for the model development.

Table 2

Correlation matrix between dependent and independent parameters

d10d30d50d60UnK
d10 1.00       
d30 0.93 1.00      
d50 0.89 0.96 1.00     
d60 0.82 0.91 0.96 1.00    
U −0.08 0.15 0.31 0.47 1.00   
n 0.02 −0.21 −0.37 −0.50 −0.96 1.00  
K 0.94* 0.40 0.80* 0.34 −0.58* 0.60* 1.00 
d10d30d50d60UnK
d10 1.00       
d30 0.93 1.00      
d50 0.89 0.96 1.00     
d60 0.82 0.91 0.96 1.00    
U −0.08 0.15 0.31 0.47 1.00   
n 0.02 −0.21 −0.37 −0.50 −0.96 1.00  
K 0.94* 0.40 0.80* 0.34 −0.58* 0.60* 1.00 

*The statistically significant correlation (p < 0.05).

In this study, experimental observations of 165 soil samples have been determined to develop a dataset for model development and validation. The datasets corresponding to 115 soil samples (70%) and 50 soil samples (30%) were utilized for model development and validation, respectively. Table 3 represents the statistical summary of the input (d10, d50, U, and n) and output (K) parameters used for the model development and validation.

Table 3

Statistical summary of the dataset used

Model development
Model validation
ParameterMinimumMaximumMeanStandard deviationMinimumMaximumMeanStandard deviation
d10 (mm) 0.100 0.409 0.203 0.090 0.115 0.395 0.228 0.084 
d50 (mm) 0.280 1.450 0.637 0.298 0.390 1.496 0.738 0.320 
Ua 2.150 9.567 4.466 1.544 2.498 8.323 4.554 1.387 
na 0.286 0.426 0.363 0.029 0.297 0.415 0.360 0.028 
K (cm/s) 0.010 0.342 0.067 0.065 0.011 0.221 0.072 0.052 
Model development
Model validation
ParameterMinimumMaximumMeanStandard deviationMinimumMaximumMeanStandard deviation
d10 (mm) 0.100 0.409 0.203 0.090 0.115 0.395 0.228 0.084 
d50 (mm) 0.280 1.450 0.637 0.298 0.390 1.496 0.738 0.320 
Ua 2.150 9.567 4.466 1.544 2.498 8.323 4.554 1.387 
na 0.286 0.426 0.363 0.029 0.297 0.415 0.360 0.028 
K (cm/s) 0.010 0.342 0.067 0.065 0.011 0.221 0.072 0.052 

aThe dimensionless parameter.

FFNN model for K estimation

The data points corresponding to the 115 soil samples were utilized for training the architecture of the FFNN model. The trial-error method for various neuron configurations in the hidden layer was performed to identify the best-performing FFNN model. Based on the analysis, the best-performing FFNN model consists of four input variables (d10, d50, U, and n), one hidden layer (five neurons), and an output variable (K). The efficacy of the FFNN model to predict K was evaluated using statistical indicators and a scatter plot. The statistical indicators computed for the FFNN model are presented in Table 4.

Table 4

Statistical indicators for FFNN, KSOM, and MLR models

Statistical indicatorsFFNN model
KSOM model
MLR model
DevelopmentValidationDevelopmentValidationDevelopmentValidation
RMSE 0.012 0.016 0.018 0.024 0.022 0.024 
R2 0.964 0.943 0.937 0.909 0.891 0.868 
MAE 0.006 0.007 0.010 0.012 0.013 0.017 
MBE 0.000 0.006 −0.002 −0.004 −0.003 0.013 
SI 0.178 0.229 0.266 0.276 0.311 0.331 
AI 0.980 0.977 0.975 0.963 0.969 0.950 
Statistical indicatorsFFNN model
KSOM model
MLR model
DevelopmentValidationDevelopmentValidationDevelopmentValidation
RMSE 0.012 0.016 0.018 0.024 0.022 0.024 
R2 0.964 0.943 0.937 0.909 0.891 0.868 
MAE 0.006 0.007 0.010 0.012 0.013 0.017 
MBE 0.000 0.006 −0.002 −0.004 −0.003 0.013 
SI 0.178 0.229 0.266 0.276 0.311 0.331 
AI 0.980 0.977 0.975 0.963 0.969 0.950 

Figure 7 represents the scatter plot between the experimentally measured and FFNN-predicted K values during model development and validation. The K values predicted using the FFNN model correlated well with the measured values, with R2 values of 0.96 and 0.94 during model development and validation, respectively. During validation, the performance of the FFNN model was good for lower K values (<0.15 cm/s), whereas moderately decent for K values greater than 0.15 cm/s as illustrated in Figure 7(b). This implies that the FFNN model could not efficiently learn K values greater than 0.15 cm/s from the training dataset. The prediction results of the FFNN model were similar to the outcomes of Yilmaz et al. (2012), who employed the FFNN model to estimate the K using three grain-size parameters. Figure 7 depicts a uniform scatter along a 1-1 line derived by studying the linear regression between the measured and FFNN-predicted K values. During model development and validation, the slope of the regression line was not different (p > 0.05) from the 1-1 line, implying that the FFNN model has a negligible bias in predicting the K value of soil samples. Further, this is confirmed by the low MBE values mentioned in Table 4.
Figure 7

Scatter plot between measured and FFNN-predicted K values: (a) model development and (b) model validation.

Figure 7

Scatter plot between measured and FFNN-predicted K values: (a) model development and (b) model validation.

Close modal

KSOM model for K estimation

The KSOM model to predict K was developed and validated corresponding to the data points obtained from the 115 and 50 soil samples, respectively. The default values of the learning rate (α = 0.5) and neighbourhood radius (β = max(L1, L2)/4) were used initially to train the KSOM model, where L1 and L2 indicate the map dimensions, determined by Equation (3). The final map size is determined by the SOM toolbox as per Equation (2), and the number of neurons on the final map is adjusted so that it is equal to the product of L1 and L2. The map size obtained from the KSOM model has dimensions of 12 × 8 and comprises 96 units. The map's errors, i.e., topographic and quantization, are 0.094 and 0.197, respectively. The significant characteristic of KSOM is the ability to visually examine the correlation between various parameters through the development of the component planes. Figure 8 represents the KSOM component planes for the parameters used in this study.
Figure 8

Component planes in the KSOM for different parameters. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.143.

Figure 8

Component planes in the KSOM for different parameters. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.143.

Close modal

The component plane for each parameter is composed of 96 hexagonal units, and their values are denoted by the adjacent colour coding. The component plane's high, medium, and low values are indicated by the yellow, light blue, and dark blue colours, respectively. As a result, the component planes make it easier to visually identify the regions, where the parameter is low, high, or medium. The parameters, i.e., d10, d50, U, n, and K, on the component planes vary from 0.107–0.362 mm, 0.324–1.30 mm, 2.62–8.43, 0.302–0.403, and 0.015–0.202 cm/s, respectively.

The visual examination of Figure 8 reveals that the colour gradient of the d10 and d50 component plane is parallel with the K component plane, i.e., low d10 and d50 values are correlated with the low K values and vice-versa. The component planes of U and n parameters are inversely correlated at the top left and bottom right sides. However, the colour gradient is similar, mainly in the central region of these parameters. Notably, some observations have been drawn by looking at the component plane of U, n, and K. On the top left side, the low K values are correlated with the low and high values of n and U, respectively. In contrast, on the top right side, higher K values are correlated with the medium values of n and U. Similarly, more observations can be drawn from the component planes. Part-wise assessment of component planes shows varying correlations between the parameters; however, a general assessment cannot be made.

The scatter plots and statistical indicators were used to evaluate the performance of the KSOM model to predict the K of soil samples. Table 4 represents the values of statistical indicators computed for the KSOM model during development and validation. Figure 9 depicts the model development and validation scatter plots between the measured and KSOM-predicted K values.
Figure 9

Scatter plot between measured and KSOM-predicted K values: (a) model development and (b) model validation.

Figure 9

Scatter plot between measured and KSOM-predicted K values: (a) model development and (b) model validation.

Close modal

Based on the scatter plot, the efficacy of the KSOM model to predict the K was relatively decent with an R2 value of 0.94 and 0.91 during model development and validation, respectively. The performance of the KSOM model during validation to predict lower K values (<0.06 cm/s) was better, while mere satisfactory performance for K values (>0.06 cm/s) as shown in Figure 7(b). The slope of the regression line during model development and validation was not different (p > 0.05) from the 1-1 line, suggesting a low bias in the KSOM model in predicting the K value. This is further verified by the low negative MBE values mentioned in Table 4. The negative MBE is due to the low predicted K values via the KSOM model than the measured values.

MLR model for K estimation

The MLR model was developed using the data points of 115 soil samples. The developed MLR model comprises input parameters i.e., d10, d50, U, and n, and K as the output parameter. Further, the data points corresponding to the 50 soil samples were utilized to validate the MLR model. Equation (16) represents the developed MLR model for predicting the K of soil samples.
(16)
The performance of the MLR model to predict the K was evaluated using scatter plots and statistical indicators. The statistical indicators computed for the MLR model are presented in Table 4. Figure 10 indicates the scatter plot between measured and MLR model-based predicted K values. The developed MLR model's predictability of K was satisfactory during development, with an R2 of 0.89, while merely decent performance during validation, with an R2 of 0.87. During model validation, the slope of the regression line was significantly different (p < 0.05) from the 1-1 line, suggesting a bias in the MLR model. This is further verified by the MBE of 0.013 in the MLR model in predicting K of soil samples.
Figure 10

Scatter plot between measured and MLR-predicted K values: (a) model development and (b) model validation.

Figure 10

Scatter plot between measured and MLR-predicted K values: (a) model development and (b) model validation.

Close modal

Comparison of FFNN, KSOM, and MLR models

The findings of the FFNN, KSOM, and MLR models were evaluated statistically and graphically to identify the best model for estimating the K of porous media. For the validation data points, Figure 11 represents the variations of K values predicted based on FFNN, KSOM, and MLR models with the measured values. The data points corresponding to the FFNN model closely followed the trend of the measured values compared to the KSOM and MLR models, indicating the better performance of the FFNN model in estimating the K value.
Figure 11

Comparison of FFNN, KSOM, and MLR model-based K with the measured K values.

Figure 11

Comparison of FFNN, KSOM, and MLR model-based K with the measured K values.

Close modal

Further, from the statistical performance indicators (Table 4), it has been found that the FFNN model for estimating the K of soil samples performs better than the KSOM and MLR models. During the validation, the R2 and AI values for the FFNN model are 0.94 and 0.97, respectively, whereas 0.91 and 0.96, and 0.87 and 0.95 for the KSOM and MLR models, respectively. For the FFNN model, the R2 and AI values are closer to 1, indicating a better correlation with the measured values and also signifying that independent variables in the model effectively incorporate the variation in the dependent variable. Lower MAE, MBE, RMSE, and SI values, i.e., 0.007, 0.006, 0.016, and 0.229 for the FFNN model, indicate less variability with the measured values, compared to the KSOM and MLR models during validation. This further substantiates the efficacy of the FFNN model in estimating the K of soil samples. The prediction efficacy of the FFNN, KSOM, and MLR models is influenced by the information they learn during the training of model development. Notably, prediction efficacy improves when more information within a given input range is provided. In the present study, the K values ranging from 0.01 to 0.07 cm/s appeared more frequently in the training dataset, resulting in better performance of the developed models in the lower region compared to the higher K values during validation.

The present study is focused on examining the efficacy of FFNN, KSOM, and MLR models to predict the hydraulic conductivity of porous media based on easily measurable soil characteristics. The correlation analysis between the dependent and independent parameters indicates that the grain size (d10 and d50), uniformity coefficient, and porosity were identified as the main factors influencing K. The K values predicted using FFNN, KSOM, and MLR models were compared with the measured values via different statistical indicators and scatter plots. The FFNN model performs better in estimating the K of soil samples during model development and validation, indicating the best-performing model. In contrast, the KSOM and MLR models perform relatively decent when estimating the K value during model development, but their performance during validation is merely adequate. The statistical indicators R2 and AI for the FFNN model are 0.94 and 0.97, respectively, and the lower values of MAE, MBE, RMSE, and SI (0.007, 0.006, 0.016, and 0.229, respectively) compared to the KSOM and MLR models during validation substantiate the efficacy of the FFNN model in estimating the K value. Comparing different models demonstrated that neural computing techniques can reduce uncertainties and provide new approaches that eliminate correlation inconsistency. The study encourages future research into these techniques for estimating K for more porous media samples with different independent parameters.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Adeloye
A. J.
,
Rustum
R.
&
Kariyama
I. D.
2011
Kohonen self-organizing map estimator for the reference crop evapotranspiration
.
Water Resources Research
47
(
8
),
1
19
.
Akbulut
S.
2005
Artificial neural networks for predicting the hydraulic conductivity of coarse-grained soils
.
Eurasian Soil Science
38
(
4
),
392
398
.
Arshad
R. R.
,
Sayyad
G.
,
Mosaddeghi
M.
&
Gharabaghi
B.
2013
Predicting saturated hydraulic conductivity by artificial intelligence and regression models
.
International Scholarly Research Notices
2013
,
1
8
.
ASTM
2006
Standard D2434 – Permeability of Granular Soils (Constant Head)
.
ASTM International
, PA,
USA
.
ASTM
2007
Standard D422 – Particle-Size Analysis of Soils
.
ASTM International
, PA,
USA
.
Chandel
A.
&
Shankar
V.
2022
Evaluation of empirical relationships to estimate the hydraulic conductivity of borehole soil samples
.
ISH Journal of Hydraulic Engineering
28
(
4
),
368
377
.
Charbeneau
R. J.
,
Klenzendorf
J. B.
&
Barrett
M. E.
2011
Methodology for determining laboratory and in situ hydraulic conductivity of asphalt permeable friction course
.
Journal of Hydraulic Engineering
137
(
1
),
15
22
.
Dawson
C. W.
&
Wilby
R. L.
2001
Hydrological modelling using artificial neural networks
.
Progress in Physical Geography
25
(
1
),
80
108
.
Garcia
H. L.
&
Gonzalez
I. M.
2004
Self-organizing map and clustering for wastewater treatment monitoring
.
Engineering Applications of Artificial Intelligence
17
(
3
),
215
225
.
Jain
A.
&
Kumar
A. M.
2007
Hybrid neural network models for hydrologic time series forecasting
.
Applied Soft Computing
7
(
2
),
585
592
.
Jougnot
D.
,
Thanh
L. D.
,
Van Do
P.
,
Thuy
T. T. C.
,
Hue
D. T. M.
&
Hung
N. M.
2021
Predicting water flow in fully and partially saturated porous media: a new fractal-based permeability model
.
Hydrogeology Journal
29
(
6
),
2017
2031
.
Kohonen
T.
,
Oja
E.
,
Simula
O.
,
Visa
A.
&
Kangas
J.
1996
Engineering applications of the self-organizing map
.
Proceedings of the IEEE
84
(
10
),
1358
1384
.
Kumar
N.
,
Adeloye
A. J.
,
Shankar
V.
&
Rustum
R.
2020
Neural computing modelling of the crop water stress index
.
Agricultural Water Management
239
,
106259
.
Lee
B. J.
,
Lee
J. H.
,
Yoon
H.
&
Lee
E.
2015
Hydraulic experiments for determination of in-situ hydraulic conductivity of submerged sediments
.
Scientific Reports
5
(
1
),
1
5
.
Naeej
M.
,
Naeej
M. R.
,
Salehi
J.
&
Rahimi
R.
2017
Hydraulic conductivity prediction based on grain-size distribution using M5 model tree
.
Geomechanics and Geoengineering
12
(
2
),
107
114
.
Odong
J.
2007
Evaluation of empirical formulae for determination of hydraulic conductivity based on grain-size analysis
.
Journal of American Science
3
(
3
),
54
60
.
Ohana-Levi
N.
,
Bahat
I.
,
Peeters
A.
,
Shtein
A.
,
Netzer
Y.
,
Cohen
Y.
&
Ben-Gal
A.
2019
A weighted multivariate spatial clustering model to determine irrigation management zones
.
Computers and Electronics in Agriculture
162
,
719
731
.
Penn
B. S.
2005
Using self-organizing maps to visualize high-dimensional data
.
Computers & Geosciences
31
(
5
),
531
544
.
Pucko
T.
&
Verbovsek
T.
2015
Comparison of hydraulic conductivities by grain-size analysis, pumping, and slug tests in Quaternary gravels, NE Slovenia
.
Open Geosciences
1
,
308
317
.
Riha
J.
,
Petrula
L.
,
Hala
M.
&
Alhasan
Z.
2018
Assessment of empirical formulae for determining the hydraulic conductivity of glass beads
.
Journal of Hydrology and Hydromechanics
66
(
3
),
337
347
.
Rustum
R.
&
Adeloye
A. J.
2007
Replacing outliers and missing values from activated sludge data using Kohonen self-organizing map
.
Journal of Environmental Engineering
133
(
9
),
909
916
.
Sen
D.
,
Aghazadeh
A.
,
Mousavi
A.
,
Nagarajaiah
S.
,
Baraniuk
R.
&
Dabak
A.
2019
Data-driven semi-supervised and supervised learning algorithms for health monitoring of pipes
.
Mechanical Systems and Signal Processing
131
,
524
537
.
Thakur
D.
,
Chandel
A.
&
Shankar
V.
2022
Estimation of hydraulic conductivity of porous media using data-driven techniques
.
Water Practice & Technology
17
(
12
),
2625
2638
.
Wang
J. P.
,
François
B.
&
Lambert
P.
2017
Equations for hydraulic conductivity estimation from particle size distribution: a dimensional analysis
.
Water Resources Research
53
(
9
),
8127
8134
.
Yilmaz
I.
,
Marschalko
M.
,
Bednarik
M.
,
Kaynar
O.
&
Fojtova
L.
2012
Neural computing models for prediction of permeability coefficient of coarse-grained soils
.
Neural Computing and Applications
21
,
957
968
.
Zieba
Z.
2017
Influence of soil particle shape on saturated hydraulic conductivity
.
Journal of Hydrology and Hydromechanics
65
(
1
),
80
87
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).