Abstract
Accurate hydraulic conductivity (K) estimation of porous media is crucial in hydrological studies. Recently, groundwater investigators have utilized neural computing techniques to indirectly estimate soil sample K instead of time-consuming direct methods. The present study utilizes easily measurable characteristics, i.e., grain size at 10 and 50% finer by weight, porosity, and uniformity coefficient as input variables to examine the efficacy of feed-forward neural network (FFNN), Kohonen self-organizing maps (KSOM), and multiple linear regression (MLR) models in estimating the K of soil samples. Model development and validation used 70 and 30% of datasets, respectively. The determination coefficient (R2), root mean square error (RMSE), and mean bias error (MBE) were used to compare the model performance with the measured K values. The study's outcome indicates that the FFNN and KSOM models better estimate the K value, while the MLR model performs merely satisfactorily. Overall, during validation, the FFNN model correlates better with the measured values having R2, RMSE, and MBE of 0.94, 0.016, and 0.006, whereas the corresponding values for KSOM are 0.91, 0.024, and −0.004, and that for MLR are 0.87, 0.024, and 0.013, respectively. Notably, the FFNN model exhibits superior prediction performance and can be employed in aquifers for precise K estimation.
HIGHLIGHTS
The study examines the efficacy of FFNN, KSOM, and MLR models in estimating the K of porous media.
The statistical indicators reveal that the FFNN model performs better in estimating the K of porous media and can be employed in sub-surface flow studies for precise K estimation.
INTRODUCTION
Understanding and accurate prediction of hydraulic conductivity (K) substantially impact the sub-surface flow, solute transport through the porous media, and hydrogeology and aquifer studies (Pliakas & Petalas 2011; Lee et al. 2015). The K of porous media is a measure of how easily a fluid particle can permeate through the network of interconnected pores (Lu et al. 2012; Chandel et al. 2021). The porous media characteristics, i.e., grain size, porosity, and uniformity of soil particles, are the fundamental independent variables, which control the K of porous media (Yilmaz et al. 2012; Wang et al. 2017). Apart from this, sorting of the grain size plays a vital role in the K estimation. The considerable spatial variability in K estimation implies that the large porous medium samples should be tested to assess the hydraulic characteristics (More & Deka 2018). Due to multiple pore structures and the non-uniform configuration of porous media particles, numerous efforts have been performed by various investigators to estimate the K of porous media (Zieba 2017; Jougnot et al. 2021).
Various direct and indirect methods can be employed to compute the K of porous media. The direct method is a set of field and experimental techniques that include the Guelph permeameter, pumping test method, tension and ring infiltrometer, and falling and constant head permeameter test (Pucko & Verbovsek 2015; Thakur et al. 2022). Due to the limited knowledge about the aquifer and hydraulic boundaries, estimating hydraulic conductivity in the field is difficult. It also requires prolonged testing durations and large investments, making them less reliable (Chandel et al. 2022b). However, the experimental K measurement techniques present difficulties in collecting representative porous media samples (Riha et al. 2018). Direct methods of measuring K are time consuming and less efficient due to spatial and temporal variations (Arshad et al. 2013). Consequently, indirect methods evolved and became prominent to compute the K of porous media based on easily measurable characteristics, i.e., grain size, porosity, percentage of gravel, sand and silt, uniformity of porous media, and specific gravity (Akbulut 2005; More & Deka 2018). The indirect methods include empirical methods and data-driven techniques. The empirical methods correlate the K with various grain-size parameters and predict the K value more rapidly than the direct measurement (Odong 2007). Empirical methods are restricted to their respective domains because they were developed under specified boundary conditions and may result in random errors in the hydraulic conductivity values (Chandel & Shankar 2022).
Data-driven techniques frequently known as ‘machine learning methods’ have found use in several areas of water resources engineering during the past few decades. These techniques can precisely accomplish the complex task of training, modelling, and validating the data points obtained from the experimental work (Williams & Ojuri 2021). They are either based on supervised learning, i.e., specific input and output data points, or unsupervised learning, i.e., no specific input and output data points. The supervised learning techniques include artificial neural networks (ANNs), support vector machine (SVM), and adaptive-neuro fuzzy inference system (ANFIS), while the Kohonen self-organizing maps (KSOM) is the unsupervised learning technique (Sen et al. 2019). These techniques have gained wide applicability in several subfields of groundwater and irrigation engineering (Yilmaz et al. 2012; Kumar et al. 2020) due to their more efficient prediction efficiency compared to the traditional direct approaches.
Akbulut (2005) investigated the performance of the ANN approach in computing the K of 95 coarse porous media samples. The K values based on ANN were compared to the multiple linear regression (MLR) and two empirical methods. The results show that ANN is a powerful prediction tool that outperforms the other approaches in predicting K values. Yilmaz et al. (2012) developed neural computing models using ANN and ANFIS to predict the K of coarser porous media. Due to a better correlation between grain size and K, different grain sizes, i.e., d10, d30, and d60, were used to develop the model. The ANFIS model provided more reliable outcomes of K value than the ANN model. Arshad et al. (2013) evaluated three data-driven techniques, i.e., MLR, ANFIS, and ANN, to predict the K of porous media. The K values predicted by the ANFIS model were better than the other two approaches. More & Deka (2018) predicted the K values of soil samples based on measurable parameters, i.e., specific gravity, porosity, and percentage of gravel, sand, and silt using ANN, fuzzy-logic, and ANFIS. The values of statistical indicators infer that the ANFIS outperforms the other two approaches. Naganna & Deka (2019) evaluated the ANN, ANFIS, and SVM techniques to estimate the K of porous media. The study concluded that ANN and ANFIS models perform satisfactorily in computing the K values. Williams & Ojuri (2021) examined the performance of feed-forward neural network (FFNN) and MLR to estimate the K value of porous media. The study employs six input variables for model development, and the statistical indicators reveal that FFNN outperforms MLR in predicting K values. Despite the unencumbered prediction capability of the supervised algorithms, the missing values and outliers in the data points can reduce their prediction efficacy (Kumar et al. 2020).
On the other hand, the unsupervised learning-based KSOM converts large-dimension data into a small grid map by clustering (Kohonen et al. 1996) and creates the intrinsic correlation within the parameters. Due to clustering, the missing values can be effectively replaced using the map characteristics, thereby resulting in no hindrance during the model prediction (Kumar et al. 2020). Because of its adaptability, the KSOM has been utilized in water quality and evapotranspiration modelling (Rustum & Adeloye 2007; Adeloye et al. 2011) and irrigation and soil moisture studies (Ohana-Levi et al. 2019; Kumar et al. 2020). From the scientific literature, it has been found that the KSOM has not been used previously to estimate the hydraulic conductivity of porous media. Apart from KSOM, the FFNN has been utilized by a few investigators to estimate the K value based on grain-size characteristics. The grain size, uniformity coefficient, and porosity are the key influencing K factors, hence considered independent variables for model development. The study evaluates the performance of the supervised (FFNN), unsupervised (KSOM), and MLR learning techniques in estimating the K value. The objectives of this study are as follows:
- 1.
To develop and validate the FFNN, KSOM, and MLR models to estimate the K of porous media based on easily measurable grain size characteristics.
- 2.
To compare the prediction performance of the FFNN, KSOM, and MLR models using different statistical indicators.
MATERIALS AND METHODOLOGY
Materials and experimental methodology
Soil samples and grain-size analysis
In the present study, 165 representative soil samples were collected from the Beas riverbank of Kangra district (31°43′N and 75°32′E) in Himachal Pradesh, India, to carry out experimental investigations. The soil samples were attained using a thin-walled sampler having a penetrating length and cross-sectional diameter of 100 and 8.5 cm respectively. Located in the northwest Himalayas, the study area has an elevation of 3,200 m above the mean sea level. The collected soil samples were composed of fine gravel (4.75–10 mm), coarse, medium, and fine sand (0.075–4.75), and a few proportions of silt (<0.075 mm). Initially, the soil samples were subjected to the grain-size analysis following ASTM (2007) to determine the grain size at 10%, 30%, 50%, and 60% finer by weight (d10, d30, d50, and d60), and uniformity coefficient. The pycnometer method was used to compute the specific gravity of the soil samples, which is helpful in the porosity determination.
Direct determination of hydraulic conductivity
The dataset obtained from the experimental work has been utilized to develop the FFNN, KSOM, and MLR models. The grain size (d10 and d50), uniformity coefficient (U), and porosity of packed bed (n) were used as input variables to predict the K value obtained from three permeameters. In this study, for one soil sample, d10, d50, and U values are the same, whereas the porosity is varied with the change in the permeameter diameter, which corresponds to different K values obtained from the 5.08, 10.16, and 15.24 cm permeameter, respectively. A total number of 495 observations have been obtained from the experimental work for 165 soil samples. The dataset corresponding to 115 soil samples (70%) and 50 soil samples (30%) was used for model development and validation, respectively. The modelling details regarding the data-driven techniques used in the study have been discussed in the following section.
Feed-forward neural network
Kohonen self-organizing maps
Multiple linear regression
Apart from FFNN and KSOM, the MLR technique was also used to develop a model using easily measurable grain-size characteristics. The detailed explanation related to MLR and its application has been discussed by Williams & Ojuri (2021) and Elbisy (2015) and hence not discussed here. The data analysis toolbox in Microsoft Excel was used to develop the MLR model. A regression equation was generated using the training dataset, which comprises d10, d50, U, and n as the independent variables and K as the dependent variable. The performance of the developed equation was evaluated using the validation dataset.
Statistical analysis
RESULTS AND DISCUSSION
The study utilizes the easily measurable grain-size characteristics and experimental observations to develop FFNN, KSOM, and MLR models. The first section of the study includes the statistical analysis and dataset used for model development and validation, whereas the second section comprises the K estimation using different models and their quantitative evaluation via statistical performance indicators.
Statistical analysis
Initially, the grain-size analysis was conducted on the collected soil samples to determine the grain-size characteristics i.e., d10, d30, d50, d60, and U. Apart from this the gravel, sand and silt content percentage, n and K values were also determined from the experimental investigations. The statistical summary, i.e., the minimum, maximum, mean, and standard deviation of the experimental observations have been mentioned in Table 1. The n and K values for the collected soil samples vary from 0.286 to 0.426 and 0.010 to 0.342 cm/s, respectively.
Characteristics . | Minimum . | Maximum . | Mean . | Standard deviation . |
---|---|---|---|---|
d10 (mm) | 0.100 | 0.409 | 0.211 | 0.089 |
d30 (mm) | 0.193 | 0.821 | 0.393 | 0.154 |
d50 (mm) | 0.280 | 1.496 | 0.667 | 0.308 |
d60 (mm) | 0.372 | 2.101 | 0.936 | 0.474 |
Ua | 2.150 | 9.567 | 4.493 | 1.497 |
na | 0.286 | 0.426 | 0.362 | 0.029 |
Gravel (%) | 2.200 | 31.240 | 7.050 | 4.430 |
Sand (%) | 66.780 | 95.460 | 88.880 | 4.450 |
Silt (%) | 1.130 | 7.340 | 4.070 | 1.430 |
K (cm/s) | 0.010 | 0.342 | 0.069 | 0.061 |
Characteristics . | Minimum . | Maximum . | Mean . | Standard deviation . |
---|---|---|---|---|
d10 (mm) | 0.100 | 0.409 | 0.211 | 0.089 |
d30 (mm) | 0.193 | 0.821 | 0.393 | 0.154 |
d50 (mm) | 0.280 | 1.496 | 0.667 | 0.308 |
d60 (mm) | 0.372 | 2.101 | 0.936 | 0.474 |
Ua | 2.150 | 9.567 | 4.493 | 1.497 |
na | 0.286 | 0.426 | 0.362 | 0.029 |
Gravel (%) | 2.200 | 31.240 | 7.050 | 4.430 |
Sand (%) | 66.780 | 95.460 | 88.880 | 4.450 |
Silt (%) | 1.130 | 7.340 | 4.070 | 1.430 |
K (cm/s) | 0.010 | 0.342 | 0.069 | 0.061 |
aThe dimensionless characteristics.
Further, to check the correlation between the independent parameters (d10, d30, d50, d60, U, and n) and dependent parameter (K), a correlation matrix has been examined. The grain-size characteristics, i.e., d10 and d50 indicate a better correlation (0.94 and 0.80) and d30 and d60 show a poor correlation (0.40 and 0.34) with the K value as presented in Table 2. However, the uniformity coefficient and porosity show a relatively good negative and positive correlation (−0.58 and 0.60) with the K value, respectively. Therefore, based on the correlation value, the parameters, i.e., d10, d50, U, and n, are considered the input variables for the model development.
. | d10 . | d30 . | d50 . | d60 . | U . | n . | K . |
---|---|---|---|---|---|---|---|
d10 | 1.00 | ||||||
d30 | 0.93 | 1.00 | |||||
d50 | 0.89 | 0.96 | 1.00 | ||||
d60 | 0.82 | 0.91 | 0.96 | 1.00 | |||
U | −0.08 | 0.15 | 0.31 | 0.47 | 1.00 | ||
n | 0.02 | −0.21 | −0.37 | −0.50 | −0.96 | 1.00 | |
K | 0.94* | 0.40 | 0.80* | 0.34 | −0.58* | 0.60* | 1.00 |
. | d10 . | d30 . | d50 . | d60 . | U . | n . | K . |
---|---|---|---|---|---|---|---|
d10 | 1.00 | ||||||
d30 | 0.93 | 1.00 | |||||
d50 | 0.89 | 0.96 | 1.00 | ||||
d60 | 0.82 | 0.91 | 0.96 | 1.00 | |||
U | −0.08 | 0.15 | 0.31 | 0.47 | 1.00 | ||
n | 0.02 | −0.21 | −0.37 | −0.50 | −0.96 | 1.00 | |
K | 0.94* | 0.40 | 0.80* | 0.34 | −0.58* | 0.60* | 1.00 |
*The statistically significant correlation (p < 0.05).
In this study, experimental observations of 165 soil samples have been determined to develop a dataset for model development and validation. The datasets corresponding to 115 soil samples (70%) and 50 soil samples (30%) were utilized for model development and validation, respectively. Table 3 represents the statistical summary of the input (d10, d50, U, and n) and output (K) parameters used for the model development and validation.
. | Model development . | Model validation . | ||||||
---|---|---|---|---|---|---|---|---|
Parameter . | Minimum . | Maximum . | Mean . | Standard deviation . | Minimum . | Maximum . | Mean . | Standard deviation . |
d10 (mm) | 0.100 | 0.409 | 0.203 | 0.090 | 0.115 | 0.395 | 0.228 | 0.084 |
d50 (mm) | 0.280 | 1.450 | 0.637 | 0.298 | 0.390 | 1.496 | 0.738 | 0.320 |
Ua | 2.150 | 9.567 | 4.466 | 1.544 | 2.498 | 8.323 | 4.554 | 1.387 |
na | 0.286 | 0.426 | 0.363 | 0.029 | 0.297 | 0.415 | 0.360 | 0.028 |
K (cm/s) | 0.010 | 0.342 | 0.067 | 0.065 | 0.011 | 0.221 | 0.072 | 0.052 |
. | Model development . | Model validation . | ||||||
---|---|---|---|---|---|---|---|---|
Parameter . | Minimum . | Maximum . | Mean . | Standard deviation . | Minimum . | Maximum . | Mean . | Standard deviation . |
d10 (mm) | 0.100 | 0.409 | 0.203 | 0.090 | 0.115 | 0.395 | 0.228 | 0.084 |
d50 (mm) | 0.280 | 1.450 | 0.637 | 0.298 | 0.390 | 1.496 | 0.738 | 0.320 |
Ua | 2.150 | 9.567 | 4.466 | 1.544 | 2.498 | 8.323 | 4.554 | 1.387 |
na | 0.286 | 0.426 | 0.363 | 0.029 | 0.297 | 0.415 | 0.360 | 0.028 |
K (cm/s) | 0.010 | 0.342 | 0.067 | 0.065 | 0.011 | 0.221 | 0.072 | 0.052 |
aThe dimensionless parameter.
FFNN model for K estimation
The data points corresponding to the 115 soil samples were utilized for training the architecture of the FFNN model. The trial-error method for various neuron configurations in the hidden layer was performed to identify the best-performing FFNN model. Based on the analysis, the best-performing FFNN model consists of four input variables (d10, d50, U, and n), one hidden layer (five neurons), and an output variable (K). The efficacy of the FFNN model to predict K was evaluated using statistical indicators and a scatter plot. The statistical indicators computed for the FFNN model are presented in Table 4.
Statistical indicators . | FFNN model . | KSOM model . | MLR model . | |||
---|---|---|---|---|---|---|
Development . | Validation . | Development . | Validation . | Development . | Validation . | |
RMSE | 0.012 | 0.016 | 0.018 | 0.024 | 0.022 | 0.024 |
R2 | 0.964 | 0.943 | 0.937 | 0.909 | 0.891 | 0.868 |
MAE | 0.006 | 0.007 | 0.010 | 0.012 | 0.013 | 0.017 |
MBE | 0.000 | 0.006 | −0.002 | −0.004 | −0.003 | 0.013 |
SI | 0.178 | 0.229 | 0.266 | 0.276 | 0.311 | 0.331 |
AI | 0.980 | 0.977 | 0.975 | 0.963 | 0.969 | 0.950 |
Statistical indicators . | FFNN model . | KSOM model . | MLR model . | |||
---|---|---|---|---|---|---|
Development . | Validation . | Development . | Validation . | Development . | Validation . | |
RMSE | 0.012 | 0.016 | 0.018 | 0.024 | 0.022 | 0.024 |
R2 | 0.964 | 0.943 | 0.937 | 0.909 | 0.891 | 0.868 |
MAE | 0.006 | 0.007 | 0.010 | 0.012 | 0.013 | 0.017 |
MBE | 0.000 | 0.006 | −0.002 | −0.004 | −0.003 | 0.013 |
SI | 0.178 | 0.229 | 0.266 | 0.276 | 0.311 | 0.331 |
AI | 0.980 | 0.977 | 0.975 | 0.963 | 0.969 | 0.950 |
KSOM model for K estimation
The component plane for each parameter is composed of 96 hexagonal units, and their values are denoted by the adjacent colour coding. The component plane's high, medium, and low values are indicated by the yellow, light blue, and dark blue colours, respectively. As a result, the component planes make it easier to visually identify the regions, where the parameter is low, high, or medium. The parameters, i.e., d10, d50, U, n, and K, on the component planes vary from 0.107–0.362 mm, 0.324–1.30 mm, 2.62–8.43, 0.302–0.403, and 0.015–0.202 cm/s, respectively.
The visual examination of Figure 8 reveals that the colour gradient of the d10 and d50 component plane is parallel with the K component plane, i.e., low d10 and d50 values are correlated with the low K values and vice-versa. The component planes of U and n parameters are inversely correlated at the top left and bottom right sides. However, the colour gradient is similar, mainly in the central region of these parameters. Notably, some observations have been drawn by looking at the component plane of U, n, and K. On the top left side, the low K values are correlated with the low and high values of n and U, respectively. In contrast, on the top right side, higher K values are correlated with the medium values of n and U. Similarly, more observations can be drawn from the component planes. Part-wise assessment of component planes shows varying correlations between the parameters; however, a general assessment cannot be made.
Based on the scatter plot, the efficacy of the KSOM model to predict the K was relatively decent with an R2 value of 0.94 and 0.91 during model development and validation, respectively. The performance of the KSOM model during validation to predict lower K values (<0.06 cm/s) was better, while mere satisfactory performance for K values (>0.06 cm/s) as shown in Figure 7(b). The slope of the regression line during model development and validation was not different (p > 0.05) from the 1-1 line, suggesting a low bias in the KSOM model in predicting the K value. This is further verified by the low negative MBE values mentioned in Table 4. The negative MBE is due to the low predicted K values via the KSOM model than the measured values.
MLR model for K estimation
Comparison of FFNN, KSOM, and MLR models
Further, from the statistical performance indicators (Table 4), it has been found that the FFNN model for estimating the K of soil samples performs better than the KSOM and MLR models. During the validation, the R2 and AI values for the FFNN model are 0.94 and 0.97, respectively, whereas 0.91 and 0.96, and 0.87 and 0.95 for the KSOM and MLR models, respectively. For the FFNN model, the R2 and AI values are closer to 1, indicating a better correlation with the measured values and also signifying that independent variables in the model effectively incorporate the variation in the dependent variable. Lower MAE, MBE, RMSE, and SI values, i.e., 0.007, 0.006, 0.016, and 0.229 for the FFNN model, indicate less variability with the measured values, compared to the KSOM and MLR models during validation. This further substantiates the efficacy of the FFNN model in estimating the K of soil samples. The prediction efficacy of the FFNN, KSOM, and MLR models is influenced by the information they learn during the training of model development. Notably, prediction efficacy improves when more information within a given input range is provided. In the present study, the K values ranging from 0.01 to 0.07 cm/s appeared more frequently in the training dataset, resulting in better performance of the developed models in the lower region compared to the higher K values during validation.
CONCLUSIONS
The present study is focused on examining the efficacy of FFNN, KSOM, and MLR models to predict the hydraulic conductivity of porous media based on easily measurable soil characteristics. The correlation analysis between the dependent and independent parameters indicates that the grain size (d10 and d50), uniformity coefficient, and porosity were identified as the main factors influencing K. The K values predicted using FFNN, KSOM, and MLR models were compared with the measured values via different statistical indicators and scatter plots. The FFNN model performs better in estimating the K of soil samples during model development and validation, indicating the best-performing model. In contrast, the KSOM and MLR models perform relatively decent when estimating the K value during model development, but their performance during validation is merely adequate. The statistical indicators R2 and AI for the FFNN model are 0.94 and 0.97, respectively, and the lower values of MAE, MBE, RMSE, and SI (0.007, 0.006, 0.016, and 0.229, respectively) compared to the KSOM and MLR models during validation substantiate the efficacy of the FFNN model in estimating the K value. Comparing different models demonstrated that neural computing techniques can reduce uncertainties and provide new approaches that eliminate correlation inconsistency. The study encourages future research into these techniques for estimating K for more porous media samples with different independent parameters.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.