ABSTRACT
Infiltration is crucial in the hydrological cycle, serving as the primary process that increases soil moisture. This study investigates soil infiltration rate (IR) prediction using various techniques, including GMDH, Gaussian Process, SVM, ANN, and MARS. 190 field observations were collected from Alashtar sub-watersheds in Lorestan, Iran. 70% of the observations were used for model preparation, while 30% were used for validation. The input variables for the study are Time, Sand, Clay, Silt, pH, Electrical Conductivity, Moisture Content, Soil Bulk Density, Porosity, Calcium Carbonate, Phosphorus, Organic Carbon, Organic Matter, Nitrogen, and Temperature, while IR is the output variable. Obtained results indicate that the ANN has a higher accuracy with coefficient of correlation values as 0.9366, 0.8624, mean absolute error values as 0.0607, 0.1000, Nash Sutcliffe model efficiency values as 0.8732, 0.7350, scattering index values as 0.3108, 0.5003, and Legates and McCabe's Index values as 0.6585, 0.5654 by using training and testing data sets, respectively. A sensitivity analysis highlighted that time is the parameter that most influences estimating the IR. The study underscores the precision of ANN in predicting soil infiltration rates and the need for AI-based models in hydrological models to improve accuracy and reliability in IR prediction.
HIGHLIGHTS
The infiltration rate (IR) of the soil is estimated using the group method of data handling, Gaussian process, support vector machine, artificial neural networks (ANNs), and multivariate adaptive regression splines.
Inter-comparison of different AI-based models revealed that ANN is the most efficient model.
Sensitivity analysis suggested that time is the most influential parameter on estimating the IR.
INTRODUCTION
Infiltration is crucial in the hydrological cycle, serving as the primary process that increases soil moisture. It occurs when water enters and moves within the soil's layers. The difference in energy levels propels the movement of water through this porous material. This process's key factors include gravitational pull, capillary action, and water's tendency to adhere to surfaces. The infiltration rate (IR), a vital concept in this context, measures the speed at which water penetrates the soil from the surface, reflecting the soil's ability to take in water over a given time frame. Infiltration, a key element impacting soil's water retention capacity, is crucial for various professionals, including hydrologists, irrigation and agricultural engineers, and soil scientists. It aids in determining multiple aspects, such as soil moisture levels, runoff dynamics, and the movement of sediments and solutes.
IR is a crucial process in the hydrological cycle, regulating water movement and impacting soil moisture, plant growth, and groundwater recharge. It regulates surface runoff, erosion, and soil moisture content, ensuring precipitation is not lost to runoff (Seiler & Gat 2007). It also affects soil properties and water retention, with balanced soils maintaining water movement and retention. It also influences evapotranspiration and climate feedback, with regions with adequate infiltration supporting a balanced cycle (Farmer et al. 2003). Infiltration also has environmental implications, regulating ecosystem health and supporting biodiversity. Proper land management practices, such as no-till farming and cover cropping, are essential for maintaining soil health and moisture retention (Reicosky 2020).
IR is essential for estimating artificial groundwater recharge, designing irrigation and drainage systems, and water balance models (Parhi et al. 2007; Ma & Shao 2008; Bayabil et al. 2019). Accurate measurement of the IR is vital for developing effective hydrological models (Shirmohammadi & Skaggs 1984). However, this quantification faces challenges due to the time and space variability of soil hydraulic properties. Such variability often stems from changes in land use and soil characteristics (Muñoz et al. 2017). Research indicates that land-use patterns significantly impact infiltration, influenced by factors such as soil management strategies, tillage practices, and vegetation types (Brown et al. 2005; García-Ruiz et al. 2008; Wang et al. 2016). Additionally, soil properties such as texture, structure, porosity, hydraulic conductivity, existing moisture conditions, suction head, temperature, humidity, rainfall intensity, and water quality also play a significant role in determining IR (Liu et al. 2011). Various physical soil properties influence the infiltration characteristics. Among these are soil texture, moisture content, and density, which significantly affect infiltration (Angelaki et al. 2013). Soil texture plays a pivotal role in influencing infiltration. The soil's ability to hold water, crucial for water accessibility, is determined by its texture and structure (Al-Azawi 1985). IRs are typically higher in unsaturated soils and decrease over time, stabilizing constantly. Infiltration characteristics show significant variation due to differences in soil texture, type, and various soil conditions. Conducting experimental infiltration measurements is complex and challenging, often described as labor-intensive, tedious, and time-consuming (Vand et al. 2018). The assessment of the infiltration process is complicated by spatial and temporal variations, making it a complex field of study (Pandey & Pandey 2018). Furthermore, because of their significant dependance on soil physical properties, variations in IR can be attributed to the various parameters of infiltration models. The accuracy of IR determination can be enhanced by quantifying the spatial variability of soil properties.
Numerous studies have suggested using conventional infiltration models as alternatives to experimental observations (Mishra et al. 2003; Singh et al. 2018). However, employing any specific model requires a thorough understanding of its boundary conditions and assumptions. Soil water researchers have introduced various models like Kostiakov, Horton, Philip, Holton, Green-Ampt, and Modified Kostiakov for estimating infiltration (Richards 1931; Philips 1957; Mishra et al. 2003; Sihag et al. 2017). Mishra et al. (2003) categorized these models into physical, semi-empirical, and empirical. Most of these models are based on assumptions such as homogeneous water absorption, constant pounding head, and steady IR, which are rarely observed in actual field conditions, potentially leading to inaccurate predictions.
Quantification of IR is a complex phenomenon due to the variability in soil hydraulic properties. These properties vary significantly over time and space, leading to spatial heterogeneity and temporal variability. Soil texture, land use, and microtopography also contribute to these variations. Changes influence temporal variability in soil moisture, precipitation events, temperature, and biological activity. Surface sealing and crusting can also affect IRs. Measurement limitations include small-scale measurements, disturbance of soil structure, and temporal inconsistency. AI-based models face limitations in addressing these interactions. Data requirement, overfitting, and parameter sensitivity are also challenges. Vegetation and land cover also play a role in influencing IRs, but their type and density can vary widely over time and space. Addressing these challenges requires robust data collection methods and advanced modeling techniques, like AI-based models.
The use of AI-based models in civil engineering and water resources engineering problems is increasing day by day. Various researchers used AI-based models to solve their complex problems (Singh et al. 2017; Arora et al. 2019; Sihag et al. 2019; Pandhiani et al. 2020; Singh 2020; Aradhana et al. 2021; Bhoria et al. 2021; Sihag et al. 2021; Sepahvand et al. 2021a, b; Singh et al. 2021a, b; Singh et al. 2022; 2023; Nivesh et al. 2022; Sihag et al. 2022; Arora et al. 2024; Singh & Minocha 2024a, b, c). Some researchers have also employed AI-based models to estimate the infiltration process, focusing on soil properties. These AI-based models have shown high precision in infiltration prediction, as evidenced in studies by Singh et al. (2017), Sihag et al. (2017), which demonstrate that soil physical properties and elapsed time can be effectively used to estimate the infiltration process with greater accuracy. Therefore, in this study, AI-based models are employed to enhance the precision and reliability of the models. These include the group method of data handling (GMDH), support vector machine (SVM), Gaussian process (GP), multivariate adaptive regression splines (MARS), and artificial neural network (ANN). The study aims to develop empirical models for accurately estimating the IR, thereby contributing to enhanced water management and environmental modeling. It includes providing insights and tools that can be used for effective water resource management and addressing ecological challenges in the Alashtar sub-watersheds and similar regions.
Research significance
The study conducted in the Alashtar sub-watersheds in Lorestan, Iran, uniquely contributes to hydrological science and environmental management by addressing critical challenges of climate change and population growth. It is a significant finding that the ANN model excels in predicting IRs in a region with a complex geological and varied climatic profile is of paramount importance. This model's superiority, proven through advanced statistical evaluations, establishes ANN as a vital tool for precise hydrological analysis in similar environments. The practical implications of this research are extensive, benefiting sectors like agriculture, water resource management, and environmental conservation, where accurate infiltration data is crucial. Methodologically, the study sets a new benchmark in hydrological research by employing advanced statistical tools and graphical models for model comparison, enhancing the clarity and reliability of findings and serving as a guide for future research. Additionally, the sensitivity analysis provides insights into the influence of time on IRs and offers valuable information for optimizing resource use in environmental planning. Overall, this research marks a significant stride in advancing hydrological science, offering a more accurate and reliable methodology for managing ecological challenges in regions undergoing similar changes.
RESEARCH METHODOLOGY
In the study at the Alashtar sub-watersheds, Lorestan, Iran, the methodology included collecting 190 field observations to develop empirical models to estimate the soil IR. It involved integrating soil characteristics such as sand, clay, and pH. cutting-edge AI-based models, such as GP, SVM, GMDH, MARS, and ANN, have been utilized to estimate parameters accurately. The models' effectiveness was assessed using metrics such as correlation coefficient (CC) and mean absolute error, offering insights for hydrological science and environmental management in climate change and population growth.
Gaussian process
Support vector machine
Group method of data handling
In the mid-1960s, Ivakhnenko, a notable Russian mathematician and cyberneticist, introduced a groundbreaking method for modeling complex systems without the necessity of understanding their internal mechanisms. This method, recognized as the GMDH, described by Ziari et al. (2016), focuses on creating self-organizing models, specifically high-order polynomials based on input variables. This model has been widely applied in various domains, including prediction, classification, and control composition. The GMDH algorithm stands out due to its inductive strategy for modeling multi-parametric datasets mathematically. Its key characteristic is the complete automation of structural and parametric optimization of models. This aspect makes GMDH particularly valuable in data mining, knowledge discovery, modeling of complex systems, optimization, and pattern recognition.
A key feature of GMDH algorithms is their systematic evaluation of progressively intricate polynomial models in an inductive manner. The selection of the best model is based on an external criterion. Moreover, using the Volterra functional series, these algorithms can approximate the relationship between input and output variables. As noted by Anastasakis & Mort (2001), the discrete equivalent of this series is the Kolmogorov–Gabor polynomial. This attribute enhances the GMDH's adaptability for modeling intricate relationships across diverse scientific and engineering disciplines.
Within GMDH methodologies, simple partial models are standard, typically encompassing functions up to the second degree. Such inductive approaches are frequently called polynomial neural networks (Madala 2019). The development of GMDH revealed a notable correlation between the challenges of modeling noisy data and the concept of signal transmission through a noisy channel. This understanding paved the way for the establishment of a theory of noise-immune modeling.
A fundamental concept of this theory is that the intricacy of an optimal predictive model should be in direct proportion to the amount of uncertainty inherent in the data. Essentially, this means that more significant uncertainty in the data, often caused by noise, necessitates a more streamlined optimization technique characterized by fewer estimated parameters. Consequently, GMDH theory evolved as an inductive approach that automatically adjusts the model's complexity to suit the noise variance in fuzzy data. This adaptive feature of GMDH, aligning model complexity with data uncertainty, led to its recognition as one of the pioneering information technologies for extracting knowledge from experimental data. This conceptualization and its practical application were notably discussed by Ivakhnenko & Stepashko (1985), the importance of GMDH's significance in data analysis and modeling.
Multivariate adaptive regression splines
MARS, a non-parametric regression method developed by Friedman (1991), expands upon conventional linear models by automatically integrating non-linear impacts and interactions among variables. This method is particularly valued for its ability to articulate complex non-linear relationships between predictor variables and the response variable. A notable feature of the MARS model is its utilization of both forward and backward stepwise procedures. As de Andrés et al. (2011b) outlined, the forward stepwise approach in MARS is akin to selecting an appropriate set of input variables. This step incrementally builds the model by adding variables and their interactions that significantly improve performance.
In the MARS algorithm framework, v denotes the knot's location, while bn (u, v) and bn + (u, v) refer to specific spline functions. The algorithm operates in three distinct phases: first, a forward stepwise approach is employed to select spline BFs; second, a backward stepwise process is used to remove BFs until an optimal set is identified iteratively; and third, a smoothing technique is applied to enhance the consistency of the final MARS model approximation. A generalized cross-validation (GVC) method is utilized to prioritize BFs for elimination based on their minimal contribution, as detailed by Dutta et al. (2018). While this model effectively fits estimation data, its ability to predict new instances is limited. To refine its predictive capabilities, surplus BFs are methodically discarded using a backward stepwise approach. The inclusion of BF in the model is guided by GVC. This GVC value is calculated by amplifying the mean squared residual error with a penalty that escalates with the model's complexity (De Andrés et al. 2011a).
Artificial neural networks
METHODOLOGY
Parameters . | Units . | Training dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|
Mean . | Standard deviation . | Min . | Max . | Mean . | Standard Deviation . | Min . | Max . | ||
Time | Min | 26.40 | 19.26 | 2.5 | 60 | 25.26 | 20.38 | 2.5 | 70 |
Sand | % | 48.03 | 12.81 | 26 | 69 | 46.27 | 12.11 | 26 | 68.15 |
Clay | % | 13.43 | 7.92 | 6.7 | 35.55 | 13.66 | 8.52 | 6.7 | 35.55 |
Silt | % | 38.48 | 8.87 | 24 | 54.33 | 40.00 | 7.70 | 25.15 | 54.33 |
PH | – | 7.97 | 0.11 | 7.74 | 8.14 | 7.98 | 0.09 | 7.74 | 8.14 |
EC | – | 305.40 | 74.92 | 176.5 | 501 | 311 | 75.60 | 176.5 | 501 |
Moisture content | % | 2.18 | 1.02 | 1.21 | 5.6 | 2.08 | 0.97 | 1.21 | 5.6 |
Soil bulk density | g/cm3 | 1.48 | 0.21 | 1.1 | 1.87 | 1.51 | 0.18 | 1.1 | 1.87 |
Porosity | % | 53.56 | 3.65 | 48 | 62 | 53.78 | 4.11 | 48 | 62 |
CaCO3 | % | 25.10 | 10.24 | 4 | 48.25 | 23.47 | 11.28 | 4 | 48.25 |
P | mg/kg | 1.10 | 0.62 | 0.55 | 3.06 | 1.07 | 0.54 | 0.55 | 3.06 |
OC | % | 1.09 | 0.65 | 0.11 | 2.76 | 0.98 | 0.58 | 0.12 | 2.10 |
OM | % | 1.88 | 1.12 | 0.20 | 4.77 | 1.69 | 1.00 | 0.20 | 3.63 |
N | % | 0.11 | 0.06 | 0.01 | 0.25 | 0.10 | 0.06 | 0.01 | 0.24 |
Temperature | °C | 31.48 | 0.50 | 31 | 32 | 31.43 | 0.50 | 31 | 32 |
IR | cm/min | 0.28 | 0.24 | 0.01 | 1.44 | 0.30 | 0.29 | 0.01 | 1.28 |
Parameters . | Units . | Training dataset . | Testing dataset . | ||||||
---|---|---|---|---|---|---|---|---|---|
Mean . | Standard deviation . | Min . | Max . | Mean . | Standard Deviation . | Min . | Max . | ||
Time | Min | 26.40 | 19.26 | 2.5 | 60 | 25.26 | 20.38 | 2.5 | 70 |
Sand | % | 48.03 | 12.81 | 26 | 69 | 46.27 | 12.11 | 26 | 68.15 |
Clay | % | 13.43 | 7.92 | 6.7 | 35.55 | 13.66 | 8.52 | 6.7 | 35.55 |
Silt | % | 38.48 | 8.87 | 24 | 54.33 | 40.00 | 7.70 | 25.15 | 54.33 |
PH | – | 7.97 | 0.11 | 7.74 | 8.14 | 7.98 | 0.09 | 7.74 | 8.14 |
EC | – | 305.40 | 74.92 | 176.5 | 501 | 311 | 75.60 | 176.5 | 501 |
Moisture content | % | 2.18 | 1.02 | 1.21 | 5.6 | 2.08 | 0.97 | 1.21 | 5.6 |
Soil bulk density | g/cm3 | 1.48 | 0.21 | 1.1 | 1.87 | 1.51 | 0.18 | 1.1 | 1.87 |
Porosity | % | 53.56 | 3.65 | 48 | 62 | 53.78 | 4.11 | 48 | 62 |
CaCO3 | % | 25.10 | 10.24 | 4 | 48.25 | 23.47 | 11.28 | 4 | 48.25 |
P | mg/kg | 1.10 | 0.62 | 0.55 | 3.06 | 1.07 | 0.54 | 0.55 | 3.06 |
OC | % | 1.09 | 0.65 | 0.11 | 2.76 | 0.98 | 0.58 | 0.12 | 2.10 |
OM | % | 1.88 | 1.12 | 0.20 | 4.77 | 1.69 | 1.00 | 0.20 | 3.63 |
N | % | 0.11 | 0.06 | 0.01 | 0.25 | 0.10 | 0.06 | 0.01 | 0.24 |
Temperature | °C | 31.48 | 0.50 | 31 | 32 | 31.43 | 0.50 | 31 | 32 |
IR | cm/min | 0.28 | 0.24 | 0.01 | 1.44 | 0.30 | 0.29 | 0.01 | 1.28 |
Study area
Model assessment
RESULT AND DISCUSSION
Evaluation of the GP model
Models . | CC . | MAE . | RMSE . | NSE . | SI . | LMI . |
---|---|---|---|---|---|---|
Training stage | ||||||
GP–PUK | 0.9781 | 0.0189 | 0.0517 | 0.9567 | 0.1817 | 0.8938 |
GP–RBF | 0.9631 | 0.0357 | 0.0669 | 0.9274 | 0.2352 | 0.7991 |
SVM–PUK | 0.8846 | 0.0454 | 0.1174 | 0.7763 | 0.4128 | 0.7444 |
SVM–RBF | 0.8027 | 0.0597 | 0.1495 | 0.6368 | 0.5260 | 0.6638 |
Testing stage | ||||||
GP–PUK | 0.7620 | 0.1531 | 0.2130 | 0.4671 | 0.7094 | 0.3346 |
GP–RBF | 0.8273 | 0.1109 | 0.1789 | 0.6240 | 0.5959 | 0.5180 |
SVM–PUK | 0.7604 | 0.1257 | 0.1986 | 0.5366 | 0.6615 | 0.4537 |
SVM–RBF | 0.7851 | 0.1149 | 0.1838 | 0.6029 | 0.6124 | 0.5007 |
Models . | CC . | MAE . | RMSE . | NSE . | SI . | LMI . |
---|---|---|---|---|---|---|
Training stage | ||||||
GP–PUK | 0.9781 | 0.0189 | 0.0517 | 0.9567 | 0.1817 | 0.8938 |
GP–RBF | 0.9631 | 0.0357 | 0.0669 | 0.9274 | 0.2352 | 0.7991 |
SVM–PUK | 0.8846 | 0.0454 | 0.1174 | 0.7763 | 0.4128 | 0.7444 |
SVM–RBF | 0.8027 | 0.0597 | 0.1495 | 0.6368 | 0.5260 | 0.6638 |
Testing stage | ||||||
GP–PUK | 0.7620 | 0.1531 | 0.2130 | 0.4671 | 0.7094 | 0.3346 |
GP–RBF | 0.8273 | 0.1109 | 0.1789 | 0.6240 | 0.5959 | 0.5180 |
SVM–PUK | 0.7604 | 0.1257 | 0.1986 | 0.5366 | 0.6615 | 0.4537 |
SVM–RBF | 0.7851 | 0.1149 | 0.1838 | 0.6029 | 0.6124 | 0.5007 |
Evaluation of SVM
SVM, in its regression application, integrates an RBF kernel and a PUK alongside adjustable parameters such as ‘L,’ gamma (Ɣ), precision to a certain number of decimal places, and O and S. A comparative analysis of the models' performances, encapsulated in Table 2, examined SVM–PUK and SVM–RBF across training and testing phases. This comparison from Table 2 highlighted that the SVM–RBF model exhibited a more accurate prediction of the IR than the SVM–PUK model. These metrics emphasized that the SVM–RBF model outperformed the SVM–PUK model in several ways. The SVM–RBF model has a higher CC value of 0.8027 compared to 0.7851 for the SVM–PUK model, a lower MAE value of 0.0597 compared to 0.1149, a reduced RMSE of 0.1174 compared to 0.1838, and a higher NSE of 0.6368 compared to 0.6029. Additionally, it has a smaller SI value of 0.5260 compared to 0.6124 and a better LMI value of 0.6638 compared to 0.5007 for both the training and testing stages. Figure 4(c) and 4(d) presents a scatter plot that maps the observed versus predicted values using the SVM–PUK and SVM–RBF models. The concentration of data points around the line of perfect agreement in this figure suggested greater accuracy and reliability in the predictions, indicating a solid alignment between observed and predicted values.
Evaluation of GMDH
Layers . | Neurons . | a0 . | a1 . | a2 . | a3 . | a4 . | a5 . |
---|---|---|---|---|---|---|---|
1 | 1 | −7.62902 | −0.00048 | 0.272431 | 0.000206 | −0.00223 | −0.00032 |
2 | 0.908387 | −0.0218 | −0.03955 | 0.000247 | 0.000575 | 0.000196 | |
3 | 0.011921 | −0.01107 | 0.007782 | 0.000239 | 6.09E-05 | −0.00017 | |
4 | 7.447267 | −0.28382 | −0.89401 | 0.002707 | −0.03294 | 0.020808 | |
5 | 7.413539 | −0.28259 | −1.53771 | 0.002696 | −0.09802 | 0.035807 | |
2 | 1 | 0.193914 | −0.91559 | −0.60218 | 2.677236 | 2.198473 | 0.473449 |
2 | 0.192743 | −0.91687 | −0.59041 | 2.679242 | 2.177974 | 0.473984 | |
3 | 0.09301 | −0.20286 | −0.20651 | 1.073433 | 0.893554 | 1.298476 | |
4 | 0.09191 | −0.20317 | −0.19571 | 1.074827 | 0.87471 | 1.296485 | |
5 | 0.145266 | −0.17225 | −0.0065 | −4.77314 | −5.36011 | 12.38166 | |
3 | 1 | 0.048078 | −0.38963 | 0.92695 | 2.258161 | 0.001442 | −1.64676 |
Layers . | Neurons . | a0 . | a1 . | a2 . | a3 . | a4 . | a5 . |
---|---|---|---|---|---|---|---|
1 | 1 | −7.62902 | −0.00048 | 0.272431 | 0.000206 | −0.00223 | −0.00032 |
2 | 0.908387 | −0.0218 | −0.03955 | 0.000247 | 0.000575 | 0.000196 | |
3 | 0.011921 | −0.01107 | 0.007782 | 0.000239 | 6.09E-05 | −0.00017 | |
4 | 7.447267 | −0.28382 | −0.89401 | 0.002707 | −0.03294 | 0.020808 | |
5 | 7.413539 | −0.28259 | −1.53771 | 0.002696 | −0.09802 | 0.035807 | |
2 | 1 | 0.193914 | −0.91559 | −0.60218 | 2.677236 | 2.198473 | 0.473449 |
2 | 0.192743 | −0.91687 | −0.59041 | 2.679242 | 2.177974 | 0.473984 | |
3 | 0.09301 | −0.20286 | −0.20651 | 1.073433 | 0.893554 | 1.298476 | |
4 | 0.09191 | −0.20317 | −0.19571 | 1.074827 | 0.87471 | 1.296485 | |
5 | 0.145266 | −0.17225 | −0.0065 | −4.77314 | −5.36011 | 12.38166 | |
3 | 1 | 0.048078 | −0.38963 | 0.92695 | 2.258161 | 0.001442 | −1.64676 |
Evaluation of MARS
Basic function using MARS model . | IR . |
---|---|
BF-1 | max (0, Time - 5) |
BF-2 | max (0, 5 - Time) |
BF-3 | max (0, 58 - Porosity) |
BF-4 | max (0, 0.093 - N) |
BF-5 = BF-2* | max (0, 55 - Sand) |
Basic function using MARS model . | IR . |
---|---|
BF-1 | max (0, Time - 5) |
BF-2 | max (0, 5 - Time) |
BF-3 | max (0, 58 - Porosity) |
BF-4 | max (0, 0.093 - N) |
BF-5 = BF-2* | max (0, 55 - Sand) |
Models . | CC . | MAE . | RMSE . | NSE . | SI . | LMI . |
---|---|---|---|---|---|---|
Training stage | ||||||
GP–RBF | 0.9631 | 0.0357 | 0.0669 | 0.9274 | 0.2352 | 0.7991 |
SVM–RBF | 0.8027 | 0.0597 | 0.1495 | 0.6368 | 0.5260 | 0.6638 |
MARS | 0.8171 | 0.1061 | 0.1430 | 0.6677 | 0.5031 | 0.4027 |
GMDH | 0.8096 | 0.2170 | 0.1459 | 0.6543 | 0.5131 | 0.3884 |
ANN | 0.9366 | 0.0607 | 0.0884 | 0.8732 | 0.3108 | 0.6585 |
Testing stage | ||||||
GP–RBF | 0.8273 | 0.1109 | 0.1789 | 0.6240 | 0.5959 | 0.5180 |
SVM–RBF | 0.7851 | 0.1149 | 0.1838 | 0.6029 | 0.6124 | 0.5007 |
MARS | 0.7445 | 0.1440 | 0.1978 | 0.5404 | 0.6588 | 0.3738 |
GMDH | 0.7702 | 0.1357 | 0.1861 | 0.5930 | 0.6200 | 0.4100 |
ANN | 0.8624 | 0.1000 | 0.1502 | 0.7350 | 0.5003 | 0.5654 |
Models . | CC . | MAE . | RMSE . | NSE . | SI . | LMI . |
---|---|---|---|---|---|---|
Training stage | ||||||
GP–RBF | 0.9631 | 0.0357 | 0.0669 | 0.9274 | 0.2352 | 0.7991 |
SVM–RBF | 0.8027 | 0.0597 | 0.1495 | 0.6368 | 0.5260 | 0.6638 |
MARS | 0.8171 | 0.1061 | 0.1430 | 0.6677 | 0.5031 | 0.4027 |
GMDH | 0.8096 | 0.2170 | 0.1459 | 0.6543 | 0.5131 | 0.3884 |
ANN | 0.9366 | 0.0607 | 0.0884 | 0.8732 | 0.3108 | 0.6585 |
Testing stage | ||||||
GP–RBF | 0.8273 | 0.1109 | 0.1789 | 0.6240 | 0.5959 | 0.5180 |
SVM–RBF | 0.7851 | 0.1149 | 0.1838 | 0.6029 | 0.6124 | 0.5007 |
MARS | 0.7445 | 0.1440 | 0.1978 | 0.5404 | 0.6588 | 0.3738 |
GMDH | 0.7702 | 0.1357 | 0.1861 | 0.5930 | 0.6200 | 0.4100 |
ANN | 0.8624 | 0.1000 | 0.1502 | 0.7350 | 0.5003 | 0.5654 |
According to the data presented in Table 5, the MARS model demonstrated reliable performance in predicting IR. The model's efficacy is evidenced by its statistical metrics: it achieved CC values of 0.8171 and 0.7445, MAE values of 0.1061 and 0.1440, RMSE values of 0.1430 and 0.1978, NSE values of 0.6677 and 0.5404, the SI values of 0.5031 and 0.6588, and the LMI values of 0.4027 and 0.3738 for the training and testing phases, respectively. The scatter plot in Figure 4(g), (h) visually represents the correlation between the observed and the MARS model-predicted values for the IR. The alignment of the data points with the line of perfect agreement in this graph suggested a strong concordance between the observed and predicted values, indicating the model's potential for accurate and reliable predictions in estimating the IR.
Evaluation of ANN
A multilayer perceptron framework involving an iterative process created an ANN-based model. Numerous trials were conducted with various combinations of user-defined parameters to achieve the best configuration, characterized by the highest CC value and the lowest error margins for training and testing datasets in the context of model prediction assessment. The combinations of user-defined parameters, i.e., learning rate, momentum, hidden layers, neurons per hidden layers, and iterations, achieving the best combination were 0.2, 0.1, 1, 20, and 1,000, respectively. The performance evaluation metrics for the best trials are detailed in Table 5. The ANN-based model demonstrated superior performance in predicting the IR compared to all other applied models. This superiority is reflected in its statistical metrics: for the training and testing stages, respectively, the model registered a CC values of 0.9366 and 0.8624, MAE values of 0.0607 and 0.1000, RMSE values of 0.0884 and 0.1502, Nash–Sutcliffe efficiency values of 0.8732 and 0.7350, the SI values of 0.3108 and 0.5003, and the LMI values of 0.6585 and 0.5654. These results indicated a higher accuracy of the ANN model in predicting the IR. Figure 4(i), (j) presents a scatter plot comparing observed values with those predicted by the ANN-based models. The close alignment of these points with the line of perfect agreement further confirmed the model's enhanced ability to accurately match observed and predicted outcomes, particularly in the context of IR predictions.
COMPARISON AMONG APPLIED MODELS
Statistic . | Observed . | GP–RBF . | SVM–RBF . | GMDH . | MARS . | ANN . |
---|---|---|---|---|---|---|
Minimum | 0.0120 | −0.0710 | 0.0020 | 0.0945 | −0.0237 | −0.0150 |
Maximum | 1.2800 | 1.3130 | 0.9290 | 1.0477 | 1.0889 | 1.0150 |
First quartile | 0.1000 | 0.0830 | 0.1020 | 0.1237 | 0.1156 | 0.0920 |
Median | 0.1600 | 0.1770 | 0.1690 | 0.2023 | 0.1908 | 0.1400 |
Third quartile | 0.4000 | 0.4150 | 0.3420 | 0.3832 | 0.3759 | 0.4070 |
IQR | 0.3000 | 0.3320 | 0.2400 | 0.2594 | 0.2603 | 0.3150 |
Statistic . | Observed . | GP–RBF . | SVM–RBF . | GMDH . | MARS . | ANN . |
---|---|---|---|---|---|---|
Minimum | 0.0120 | −0.0710 | 0.0020 | 0.0945 | −0.0237 | −0.0150 |
Maximum | 1.2800 | 1.3130 | 0.9290 | 1.0477 | 1.0889 | 1.0150 |
First quartile | 0.1000 | 0.0830 | 0.1020 | 0.1237 | 0.1156 | 0.0920 |
Median | 0.1600 | 0.1770 | 0.1690 | 0.2023 | 0.1908 | 0.1400 |
Third quartile | 0.4000 | 0.4150 | 0.3420 | 0.3832 | 0.3759 | 0.4070 |
IQR | 0.3000 | 0.3320 | 0.2400 | 0.2594 | 0.2603 | 0.3150 |
Statistic . | GP–RBF . | SVM–RBF . | GMDH . | MARS . | ANN . |
---|---|---|---|---|---|
Minimum | −0.5730 | −0.5700 | −0.4163 | −0.5041 | −0.4310 |
Maximum | 0.4390 | 0.4840 | 0.4987 | 0.5063 | 0.4540 |
First quartile | −0.0400 | −0.0200 | −0.0947 | −0.0906 | −0.0470 |
Median | 0.0080 | 0.0010 | −0.0197 | 0.0148 | 0.0150 |
Third quartile | 0.0480 | 0.0850 | 0.0989 | 0.0968 | 0.0730 |
Mean | −0.0026 | 0.0329 | 0.0024 | 0.0192 | 0.0228 |
Statistic . | GP–RBF . | SVM–RBF . | GMDH . | MARS . | ANN . |
---|---|---|---|---|---|
Minimum | −0.5730 | −0.5700 | −0.4163 | −0.5041 | −0.4310 |
Maximum | 0.4390 | 0.4840 | 0.4987 | 0.5063 | 0.4540 |
First quartile | −0.0400 | −0.0200 | −0.0947 | −0.0906 | −0.0470 |
Median | 0.0080 | 0.0010 | −0.0197 | 0.0148 | 0.0150 |
Third quartile | 0.0480 | 0.0850 | 0.0989 | 0.0968 | 0.0730 |
Mean | −0.0026 | 0.0329 | 0.0024 | 0.0192 | 0.0228 |
ANN has surpassed the SVM, GP, GMDH, and MARS in predicting soil IR. It is due to their ability to handle intricate, non-linear interactions, flexible structure, and proficiency in capturing temporal fluctuations. Also, the ANN was optimized using the trial-and-error method, which significantly improved the model accuracy. Despite the efficacy of SVM and other AI-based models, their performance in this work was constrained by structural limitations. It reduced adaptability to the intricate and dynamic characteristics of the IR data. The ANN model shows supremacy over all other models. Thus, this model can be used for predicting the IR of soil and could be a better replacement for the experimentation as it requires less time and effort. Overall, the numerous advantages of ANN, together with meticulous parameter optimization and tactics to address its limits, position it as a dependable alternative to experimental approaches for predicting soil IRs. This method provides an efficient solution in terms of time and effort, while more refining of computational algorithms may improve its scalability and application.
Taylor diagram
Sensitivity analysis
A sensitivity analysis was conducted to find the most influential input variables for predicting the IR. This analysis, using an optimized model such as the ANN-based one, examined the impact of various combinations of input parameters. A total of 16 scenarios were created. The first scenario included all input variables, while the remaining 15 were created by removing one variable (diagonally) from each scenario. In Table 8, this process is depicted, where models based on different combinations of inputs were developed. The effectiveness of each scenario variant was evaluated using metrics like CC, MAE, and RMSE. As presented in Table 8, the findings revealed that among all the input parameters, time (T) emerged as the most critical factor in accurately predicting the IR, as there were large variations in the evaluation metrics when T was removed in the modeling process. In other scenarios, there was low sensitivity toward IR, as in these scenarios, the CC varied from 0.8619 to 0.8365, whereas the ideal value was 0.8624. This was evidenced by a notable decrease in the CC value and an increase in the error rates (MAE and RMSE) when time was excluded from the model inputs, underscoring its sensitivity and importance in the prediction process.
Input parameters . | Output IR . | ANN-based model . | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
T . | S . | C . | Si . | pH . | EC . | MC . | Sd . | Po . | CaCO3 . | P . | OC . | OM . | N . | Temp . | . | CC . | MAE . | RMSE . |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8624 | 0.1000 | 0.1502 | |
T | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.6113 | 0.2328 | 0.2662 | |
✓ | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8500 | 0.0995 | 0.1568 | |
✓ | ✓ | C | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8610 | 0.0959 | 0.1510 | |
✓ | ✓ | ✓ | Si | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8548 | 0.0989 | 0.1538 | |
✓ | ✓ | ✓ | ✓ | pH | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8365 | 0.1034 | 0.1633 | |
✓ | ✓ | ✓ | ✓ | ✓ | EC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8490 | 0.1064 | 0.1579 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | MC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8470 | 0.1002 | 0.1571 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Sd | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8411 | 0.0986 | 0.1591 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | PO | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8436 | 0.1084 | 0.1591 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CaCO3 | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8488 | 0.1011 | 0.1567 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | P | ✓ | ✓ | ✓ | ✓ | 0.8584 | 0.1066 | 0.1549 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | OC | ✓ | ✓ | ✓ | 0.8539 | 0.1005 | 0.1558 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | OM | ✓ | ✓ | 0.8537 | 0.1004 | 0.1559 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | N | ✓ | 0.8563 | 0.0985 | 0.154 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Temp | 0.8460 | 0.1026 | 0.159 |
Input parameters . | Output IR . | ANN-based model . | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
T . | S . | C . | Si . | pH . | EC . | MC . | Sd . | Po . | CaCO3 . | P . | OC . | OM . | N . | Temp . | . | CC . | MAE . | RMSE . |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8624 | 0.1000 | 0.1502 | |
T | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.6113 | 0.2328 | 0.2662 | |
✓ | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8500 | 0.0995 | 0.1568 | |
✓ | ✓ | C | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8610 | 0.0959 | 0.1510 | |
✓ | ✓ | ✓ | Si | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8548 | 0.0989 | 0.1538 | |
✓ | ✓ | ✓ | ✓ | pH | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8365 | 0.1034 | 0.1633 | |
✓ | ✓ | ✓ | ✓ | ✓ | EC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8490 | 0.1064 | 0.1579 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | MC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8470 | 0.1002 | 0.1571 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Sd | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8411 | 0.0986 | 0.1591 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | PO | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8436 | 0.1084 | 0.1591 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CaCO3 | ✓ | ✓ | ✓ | ✓ | ✓ | 0.8488 | 0.1011 | 0.1567 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | P | ✓ | ✓ | ✓ | ✓ | 0.8584 | 0.1066 | 0.1549 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | OC | ✓ | ✓ | ✓ | 0.8539 | 0.1005 | 0.1558 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | OM | ✓ | ✓ | 0.8537 | 0.1004 | 0.1559 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | N | ✓ | 0.8563 | 0.0985 | 0.154 | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Temp | 0.8460 | 0.1026 | 0.159 |
CONCLUSION
This investigation identifies the optimal model for predicting IR using several computing approaches, such as GP, SVM, GMDH, MARS, and ANN-based models. In this study, 15 input variables, including time, sand, clay, silt, pH, EC, moisture content, soil bulk density, porosity, (CaCO3, P, OC, OM, N, and temperature, were evaluated using six key performance metrics to assess the efficacy of various predictive models. These metrics comprised the CC, MAE, RMSE, Nash–Sutcliffe efficiency (NSE), the SI, and the LMI. The evaluation results highlighted the superior performance of the ANN-based model in predicting the IR. This model demonstrated outstanding results in both the training and testing phases, with CC values of 0.9366 and 0.8624, MAE values of 0.0607 and 0.1000, RMSE values of 0.0884 and 0.1502, NSE values of 0.8732 and 0.7350, the SI values of 0.3108 and 0.5003, and the LMI values of 0.6585 and 0.5654, indicative of lower error rates.
The scatter plot analysis further reinforced the ANN model's reduced error margins and optimal prediction fit. Visual representations such as box plots and Violin graphs corroborated the model's efficacy in predicting the IR with minimal errors. In the Taylor diagram, the ANN model was depicted as excelling over other models, confirming its suitability for accurately predicting the IR. Additionally, the sensitivity analysis pointed out the significant impact of the time (T) variable on the IR, underscoring its importance in the model's predictive accuracy.
FUNDING
No funding was reported for this research.
HUMAN PARTICIPANTS AND/OR ANIMALS
This article contains no studies with human participants or animals performed by any of the authors.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.