## Abstract

In this study, the vote algorithm used to improve the performances of three machine-learning models including M5Prime (M5P), random forest (RF), and random tree (RT) is developed (i.e. V-M5P, V-RF, and V-RT). Developed models were tested for forecasting soil temperature (*T _{S}*) at 1, 2, and 3 days ahead at depths of 5 and 50 cm. All models were developed using different climatic variables, including mean, minimum, and maximum air temperatures; sunshine hours; evaporation; and solar radiation, which were evaluated. Correlation coefficients of 0.95 for the V-M5P model, 0.95 for the V-RF model, and 0.91 for the V-RT model were recorded for both 1- and 2-day ahead forecasting at a depth of 5 cm. For 3-day ahead forecasting, V-RF was the superior model with Nash–Sutcliff efficiency (NSE) values of 0.85, compared to V-M5P's value of 0.81 and V-RT's value of 0.81. The results at a depth of 5 cm indicate that V-RT was the least effective model. At a depth of 50 cm, forecasted

*T*

_{s}_{S}was in good agreement with measurements, and the V-RF was slightly superior. Among the limitations of the current work is that the models were unable to improve their performances by increasing the forecasting horizon.

## HIGHLIGHTS

Modelling soil temperature at different depths based on meteorological variables using vote algorithm.

Forecasting soil temperature at 1, 2, and 3 days ahead at depths of 5 and 50 cm using machine learning models.

M5P, RF, and RT are applied.

An ensemble approach with the vote algorithm (i.e. V-M5P, V-RF, and V-RT) is also proposed.

## INTRODUCTION

Soil temperature (*T _{S}*) is a very important factor in the earth environment; it is mainly included in various meteorology studies, water soil dynamics, water evaporation, deep and slow water infiltration, and the quantification of water requirements for irrigation purposes. However, despite those important roles of

*T*, important challenges remain in its estimation process, including an exact quantification over time and space, which is mainly related to the soil plant atmosphere behaviour (Taheri

_{S}*et al.*2023). Soil temperature is affected by various external factors, especially climatic variables, plants, and soil structure (Ma

*et al.*2023). Due to the complexity of soil structure and composition, several approaches have been adopted for soil temperature (

*T*) estimation and three different approaches are mainly adopted including (i) direct

_{S}*in situ*measurement, (ii) remote sensing estimation, and (iii) model application (Taheri

*et al.*2023).

Machine-learning methods, especially artificial neural networks (ANNs), have been found to provide reliable forecasts for some hydrological and meteorological variables in evapotranspiration modelling (Adnan *et al.* 2021), water quality indices (Abba *et al.* 2021), and irrigation system modelling (Kisi *et al.* 2021). ANNs were the most commonly used in hydrological and meteorological modelling, and numerous studies have employed ANNs to estimate *T _{S}*. ANN models have been used to estimate daily and annual

*T*and they have demonstrated strong performance in forecasting the spatial variations of

_{S}*T*(Mihalakakou 2002). Tabari

_{S}*et al.*(2011) found that ANN methods perform better than multivariate linear regression when predicting daily

*T*at six soil depths and they found that the most important variables for these predictions were air temperature and relative humidity. Bilgili (2010) compared linear regression, nonlinear regression, and ANN models to estimate monthly

_{S}*T*at five soil depths and found that the ANN model was the most effective. Kisi

_{S}*et al.*(2015) estimated monthly

*T*at several soil depths by modelling with multi-layer perceptron neural networks (MLPNNs), radial basis neural networks (RBNNs), generalized regression neural networks (GRNNs), and multiple linear regression (MLR). The results indicated that the RBNN model was best for predicting

_{S}*T*at shallow depths (i.e., 5 and 10 cm), but the MLR and GRNN models were better at deeper soil depths (50 and 100 cm, respectively). Tabari

_{S}*et al.*(2015) used an ANN model to forecast

*T*1 day in advance at six depths and realized good accuracy in that short-term prediction.

_{S}Kim & Singh (2014) evaluated the adaptive neuro-fuzzy inference systems (ANFIS) and multi-layer perceptron (MLP) models for calculating daily *T _{S}.* An extreme learning machine (ELM) was optimized by self-adaptive evolutionary modelling to predict daily

*T*at six depths (Nahvi

_{S}*et al.*2016). Both models produced accurate estimates, but the optimized ELM model performed marginally better than the original ELM model. Talaee (2014) used a co-active neuro-fuzzy inference system (CANFIS) to model

*T*in arid and semi-arid regions and found that CANFIS produced accurate results. Abyaneh

_{S}*et al.*(2016) utilized both ANN and CANFIS modelling to estimate

*T*in moderate air temperature conditions in humid and dry climates and found that they were more effective models in arid settings.

_{S}*T*was predicted monthly by ANN and ANFIS models using data collected at 31 stations in Iran and ANFIS was found to be the more effective model (Mehdizadeh

_{S}*et al.*2017). Several other models have been tested as well. These include genetic programming (GP) (Kisi

*et al.*2017), gene expression programming (Samadianfard

*et al.*2018), and genetic-based neural networks (Kazemi

*et al.*2018). Khosravi

*et al.*(2022a) developed several ML models including KStar, instance-based K-nearest learner, and locally weighted learner coupled with bagging (BA) and dagging (DA) for

*T*prediction in Iran. They finally stated that for soil depth of 5 cm, BA-KStar is superior while for soil depth of 50 cm, DA-KStar outperforms other algorithms. Malik

_{S}*et al.*(2022) applied support vector machine (SVM), MLP, and ANFIS models optimized with slime mould algorithm, particle swarm optimization, and spotted hyena optimizer algorithms for

*T*prediction in a semi-arid region of Punjab, India. Their finding showed the higher performance of SVM models at multiple depths.

_{S}Ozbek (2023) compared long short-term memory (LSTM) deep learning, the ANFIS with fuzzy c-means clustering algorithm, the autoregressive integrated moving average, and the autoregressive moving average models for forecasting 1-h-ahead soil temperature. They reported that the LSTM was more accurate compared to the other models at different sites. Farhangmehr *et al.* (2023) used the convolutional neural network (CNN) and the MLPNN models for predicting hourly soil temperature using large features, namely precipitation, surface pressure, evaporation, wind gust, dewpoint temperature, surface solar radiation, surface thermal radiation, and air temperature. According to the obtained results, the authors highlighted the importance of the air temperature as the first relevant feature, while the surface thermal radiation was the poorest one. Furthermore, the CNN was the most accurate, significantly higher than the MLPNN model. Ebtehaj *et al.* (2023) introduced a new machine-learning model called the emotional neural network (ENN) for predicting *T _{S}* at 10 and 20 cm depths. They developed the modelling framework according to two scenarios, namely (i) modelling

*T*using climatic variables, i.e., air temperature, wind speed, and solar radiation and (ii) time series modelling. By comparison with the least square support vector machine, the GP and the multivariate adaptive regression splines machine-learning models demonstrated the superiority of the ENN. Alizamir

_{S}*et al.*(2020) compared ELM, the MLPNN, the classification and regression trees, and the group method of data handling for modelling monthly

*T*at different depths using four climatic variables, namely air temperature, solar radiation, wind speed, and relative air humidity. The authors reported that at 5, 10 and 50 cm, the ELM model was more accurate, and the best performances can be obtained using only air temperature; however, at 10 cm depth, it is necessary to include solar radiation and wind speed. Sattari

_{S}*et al.*(2020) compared three ensemble machine-learning models, namely decision tree (DT), gradient boosted trees (GBT), and hybrid DT-GBT models for predicting daily soil temperature at 5, 10, and 20 cm depths. The proposed models were developed using mean, maximal and minimal air temperature, sunshine duration, and precipitation. From the obtained results, hybrid DT-GBT was found to be most accurate at 5 cm depth, while the DT exhibited high accuracy at 10 and 20 cm depths.

Though researchers have attempted predictions of *T _{S}* on daily or monthly intervals, predicting values for shorter periods (half an hour, for instance) is uncommon. There are questions regarding the appropriateness of machine-learning modelling of

*T*over short intervals, but such models may yield a valuable high-resolution insight for modelling water resources and agronomy to improve crop yields (Singh

_{S}*et al.*2018; Xing

*et al.*2018). There has been a supportive discussion of these opportunities (Sanikhani

*et al.*2018). On the one hand,

*T*measuring is time-consuming, and on the other hand, finding a reliable and practical model as a cost-effective model is required. This study investigates the potential of (1) three standalone machine-learning models: M5Prime (M5P), random forest (RF), and random tree (RT), (2) their ensembles with the vote (V) algorithm: V-M5P, V-RF, and V-RT, (3) effect of different input scenarios, and (4) involving different easily available input variables for forecasting

_{S}*T*at two depths (5 and 50 cm) 1–3 days in advance. To the best of the author's knowledge, the vote algorithm is rarely used in geoscience and their ensembles with tree-based algorithms are new and has a high performance for predicting geoscience phenomena.

_{S}## STUDY AREA AND DATA

### Study area and data collection

*T*) were collected from the Isfahan Regional Water Authority (IRWA) in Iran (Figure 1), and it was previously used by Sattari

_{S}*et al.*(2017) and recently by Khosravi

*et al.*(2022a). Daily soil temperature (

*T*) measured at 5 and 50 cm depths was modelled using various meteorological variables, namely mean air temperature (

_{S}*T*), minimum air temperature (

_{M}*T*) and maximum air temperature (

_{N}*T*), evaporation (

_{X}*E*

_{pan}), sunshine hours (

*S*), and solar radiation (

_{H}*S*). All datasets cover a period ranging from June 1992 to December 2005 (Table 1). The first 10 years were used for training (70%), while the remaining four years were used for validation (30%). More details about the data can be found in Sattari

_{R}*et al.*(2017) and Khosravi

*et al.*(2022a).

Variables . | Maximum . | Minimum . | Mean . | Standard deviation . | Skewness . | Kurtosis . | |
---|---|---|---|---|---|---|---|

Inputs | T (°C) _{M} | 34.80 | −3.30 | 20.78 | 7.50 | −0.48 | −0.73 |

T (°C) _{N} | 28.80 | −7.80 | 13.04 | 7.14 | −0.39 | −0.73 | |

T (°C) _{X} | 43 | 1.20 | 28.53 | 8.20 | −0.6 | −0.51 | |

E_{pan} (mm) | 30 | 0.1 | 8.11 | 4.12 | −0.02 | −0.48 | |

S (h) _{H} | 13.8 | 0.1 | 9.98 | 2.75 | −1.65 | 2.74 | |

S (Cal/cm_{R}^{2}) | 9,695 | 25 | 1,845 | 1,122 | 1.38 | 8.49 | |

Output | T at 5 cm (°C) _{S} | 45.53 | 0.70 | 26.24 | 9.69 | −0.48 | −0.89 |

T at 50 cm (°C) _{S} | 35 | 7.33 | 24.37 | 6.62 | −0.53 | −0.92 |

Variables . | Maximum . | Minimum . | Mean . | Standard deviation . | Skewness . | Kurtosis . | |
---|---|---|---|---|---|---|---|

Inputs | T (°C) _{M} | 34.80 | −3.30 | 20.78 | 7.50 | −0.48 | −0.73 |

T (°C) _{N} | 28.80 | −7.80 | 13.04 | 7.14 | −0.39 | −0.73 | |

T (°C) _{X} | 43 | 1.20 | 28.53 | 8.20 | −0.6 | −0.51 | |

E_{pan} (mm) | 30 | 0.1 | 8.11 | 4.12 | −0.02 | −0.48 | |

S (h) _{H} | 13.8 | 0.1 | 9.98 | 2.75 | −1.65 | 2.74 | |

S (Cal/cm_{R}^{2}) | 9,695 | 25 | 1,845 | 1,122 | 1.38 | 8.49 | |

Output | T at 5 cm (°C) _{S} | 45.53 | 0.70 | 26.24 | 9.69 | −0.48 | −0.89 |

T at 50 cm (°C) _{S} | 35 | 7.33 | 24.37 | 6.62 | −0.53 | −0.92 |

### Most effective input combination

The simple Pearson's correlation coefficients (*r-*values) between the model's input variables of *T _{M}*,

*T*,

_{X}*T*,

_{N}*E*

_{pan},

*S*, and

_{R}*S*and output variable (

_{H}*T*at two depths) were used to develop the input variable list. The simple

_{S}*r*-value approach is commonly used to build model's potential input successfully (Khosravi

*et al.*2020, 2021a, 2021b, 2021c, 2021d, 2021e; Kargar

*et al.*2021; Meshram

*et al.*2021; Panahi

*et al.*2021). The

*r-*values of the variable-output pairs suggested six inputs for modelling based on the strength of the relationships (Tables 2 and 3). The most effective input combination was determined and used for model training.

Inputs . | Output . |
---|---|

T _{M} | T _{S} |

T, _{M}T _{X} | T _{S} |

T, _{M}T, _{X}T _{N} | T _{S} |

T, _{M}T, _{X}T, _{N}E_{pan} | T _{S} |

T, _{M}T, _{X}T, _{N}E_{pan}, S _{H} | T _{S} |

T, _{M}T, _{X}T, _{N}E_{pan}, S, _{H}S _{R} | T _{S} |

Inputs . | Output . |
---|---|

T _{M} | T _{S} |

T, _{M}T _{X} | T _{S} |

T, _{M}T, _{X}T _{N} | T _{S} |

T, _{M}T, _{X}T, _{N}E_{pan} | T _{S} |

T, _{M}T, _{X}T, _{N}E_{pan}, S _{H} | T _{S} |

T, _{M}T, _{X}T, _{N}E_{pan}, S, _{H}S _{R} | T _{S} |

Soil depth . | Ahead days . | T
. _{N} | T
. _{X} | S
. _{R} | S
. _{H} | E_{pan}
. | T
. _{M} |
---|---|---|---|---|---|---|---|

5 cm | +1 | 0.887 | 0.879 | 0.301 | 0.564 | 0.788 | 0.903 |

+2 | 0.891 | 0.885 | 0.300 | 0.586 | 0.790 | 0.915 | |

+3 | 0.916 | 0.938 | 0.299 | 0.615 | 0.796 | 0.948 | |

50 cm | +1 | 0.918 | 0.938 | 0.305 | 0.613 | 0.791 | 0.953 |

+2 | 0.910 | 0.939 | 0.296 | 0.597 | 0.705 | 0.951 | |

+3 | 0.907 | 0.941 | 0.275 | 0.588 | 0.749 | 0.946 |

Soil depth . | Ahead days . | T
. _{N} | T
. _{X} | S
. _{R} | S
. _{H} | E_{pan}
. | T
. _{M} |
---|---|---|---|---|---|---|---|

5 cm | +1 | 0.887 | 0.879 | 0.301 | 0.564 | 0.788 | 0.903 |

+2 | 0.891 | 0.885 | 0.300 | 0.586 | 0.790 | 0.915 | |

+3 | 0.916 | 0.938 | 0.299 | 0.615 | 0.796 | 0.948 | |

50 cm | +1 | 0.918 | 0.938 | 0.305 | 0.613 | 0.791 | 0.953 |

+2 | 0.910 | 0.939 | 0.296 | 0.597 | 0.705 | 0.951 | |

+3 | 0.907 | 0.941 | 0.275 | 0.588 | 0.749 | 0.946 |

## THE MODELS

In the present study, all developed models are implemented in the Waikato Environment for Knowledge Analysis (WEKA 3.9). WEKA software was developed at the University of Waikato, New Zealand, and is a free software licensed under the GNU General Public License. Three machine learnings were proposed in the present study, namely M5P, RF, RT, and the V. These models have been broadly reported in the literature as having high forecasting capabilities (Hanoon *et al.* 2021; Irwan *et al.* 2023).

### M5Prime

*K*and

*K*denote training and dataset examples that receive the node and outcomes from node allocation, and SD is the standard deviation. The best branch among all branches that may result from potential separation that maximizes SDR is selected. The large number of branches can make a model very complicated and may weaken its generalization. To resolve this, branches are pruned, modifying them with the replacement of linear regressions at branches. Smoothing is applied to the predicted values through M5P at the leaf. More information about M5P can be found in Quinlan (1992) and Wang & Witten (1996).

_{i}### Vote (V) algorithm

### Random forest

### Random tree

### Model parameter identification

In the current study, the identification of all model parameters was performed in the WEKA 3.9 software. First, the default value is considered and each model is trained based on the existence condition. Second, lower and higher values that are default values are generated and fed to them, and again each model is implemented. This approach is continued until the optimum value for each parameter is determined. In this case, root mean square error (RMSE) is applied for comparing and determining the optimum value, as the lower the RMSE, the better the model parameter would be.

### Model evaluation

*R*) (Moriasi

*et al.*2007; Khosravi

*et al.*2022b):where and represent the observed and estimated daily soil temperature for

*i*th observation,

*N*is the number of data points, and and are the mean measured and mean estimated

*T*. NSE is a dimensionless metric that varies between − and 1; the optimum model's NSE = 1. The lower the RMSE and the MAE and the higher the

_{S}*R*, the better is the model's prediction power (Osman

*et al.*2022).

## RESULTS

This study compared three ensemble machine-learning models to forecast soil temperature (*T _{S}*) at depths of 5 and 50 cm. The models, V-M5P, V-RF, and V-RT, were trained and validated using four performance metrics: RMSE, MAE, NSE, and

*R*. The results were scrutinized for both depths.

### Input variable effectiveness

Based on the achieved results (Table 3), *T _{m}* has the highest impact on

*T*for all three ahead days forecasting at two different depths, while

_{S}*S*is the variable with the lowest impact. In addition, input scenario No. 2, in which

_{R}*T*and

_{M}*T*are involved, is determined as the most effective input scenario with the lowest RMSE value. Only the results from the best input scenario are considered for further modelling and analysis.

_{X}### 1-, 2-, and 3-day forecasts of *T*_{S} at 5 cm

_{S}

*T*predicted for 1-day forecast, RMSE ranged from 3.31 to 4.38 °C (mean 3.697 °C) and MAE ranged from 2.53 to 3.35 °C (mean 2.83 °C). The lowest values were produced by V-RF. V-M5P generated the second-best results and V-RT produced the highest errors and therefore was the least accurate. The differences between V-RF and V-M5P were slight and negligible, though their overall performances were mitigated slightly by equal values of

_{S}*R*and NSE. V-RT, however, increased RMSE and MAE by approximately 22.37 and 22.09%, and V-M5P increased them by 24.43 and 24.48% from the V-RF values (Table 4). V-RT produced the lowest

*R*and NSE values of 0.91 and 0.80. The scatterplot of measured versus 1-day forecasted

*T*at 5 cm (Figure 6) and the corresponding time variation plots of the values (Figure 7) exhibit the superiority of V-RF. The results for 2-day forecasts with these models are essentially the same (Table 4, Figures 6 and 7). The most accurate results were produced with V-RF and V-M5P:

_{S}*R*= 0.95 and NSE = 0.88 l, though the lowest errors were produced by the V-RF model: RMSE = 3.34°C and MAE = 2.56°C. The V-RT model remains the least effective of the models in terms of fit,

*R*= 0.91 and NSE ≍ 0.81, and errors, RMSE = 4.37°C and MAE = 3.31°C.

Algorithms . | 1-day forecast of T_{S}. | 2-day forecast of T_{S}. | 3-day forecast of T_{S}. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | |

V-M5P | 0.95 | 0.88 | 3.40 | 2.61 | 0.95 | 0.88 | 3.46 | 2.64 | 0.91 | 0.81 | 4.26 | 3.19 |

V-RF | 0.95 | 0.89 | 3.31 | 2.53 | 0.95 | 0.88 | 0.34 | 2.56 | 0.93 | 0.85 | 3.84 | 2.79 |

V-RT | 0.91 | 0.80 | 4.38 | 3.35 | 0.91 | 0.81 | 4.37 | 3.31 | 0.91 | 0.80 | 4.44 | 3.36 |

Algorithms . | 1-day forecast of T_{S}. | 2-day forecast of T_{S}. | 3-day forecast of T_{S}. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | |

V-M5P | 0.95 | 0.88 | 3.40 | 2.61 | 0.95 | 0.88 | 3.46 | 2.64 | 0.91 | 0.81 | 4.26 | 3.19 |

V-RF | 0.95 | 0.89 | 3.31 | 2.53 | 0.95 | 0.88 | 0.34 | 2.56 | 0.93 | 0.85 | 3.84 | 2.79 |

V-RT | 0.91 | 0.80 | 4.38 | 3.35 | 0.91 | 0.81 | 4.37 | 3.31 | 0.91 | 0.80 | 4.44 | 3.36 |

Similarly, 3-day forecasting revealed decreased performance for V-RF and V-M5P and no change in performance for V-RT (Table 4, Figures 6 and 7). Error scores produced by V-RF were 9.86% higher RMSE and 12.54% higher MAE than those values for V-M5P. In addition, these values for V-MVP were 13.51 and 16.96% better than the V-RT model. Comparing the models' 1- and 3-day forecasts, V-M5P's values for all metrics decreased: *R* by 4.21%, NSE by 7.95%, RMSE by 20.18%, and MAE by 18.18%. V-RF's scores declined at lower rates: *R* by 2.10%, NSE by 4.49%, RMSE by 13.80%, and MAE by 9.32%.

### 1-, 2-, and 3-day forecasts of *T*_{S} at 50 cm

_{S}

*T*at 5-cm depths, the forecasts of

_{S}*T*at 50-cm depths improved from 1- to 3-day forecasts in terms of both error and accuracy (Table 5, Figures 8 and 9). Mean RMSE and MAE improved by 4.30 and 3.20%, and mean

_{S}*R*and NSE slightly improved. The 1-day forecast exhibits the deficiency of V-RT compared to V-M5P and V-RF. The RMSE and MAE values of the V-RT were 16.35 and 21.05% lower than those of V-M5P, and 20.44 and 25.91% lower than V-RF. The best accuracies were achieved using the V-RF having the highest

*R*and NSE values (i.e., ≍0.94 and ≍0.86) forecasts (Table 5, Figures 8 and 9). The 2-day forecasts made by all three models were less accurate than both the 1-day and 3-day forecasts (Table 5, Figures 8 and 9). The V-RF produced the most accurate 2-day forecast model with

*R*= 0.93 and NSE = 0.85, and it had the lowest RMSE = 2.62°C and MAE = 1.91°C. Statistics for the 3-day forecasts show that V-RF had the least error with 9.72% better RMSE and 9.42% better MAE than V-M5P. The error statistics for the V-RT forecast exceeded both: RMSE by 25.64% and MAE by 28.21% (Table 5, Figures 8 and 9). V-RF and V-M5P performed equally as well as

*R*scores were equal and NSE scores had less than a 1% difference. Both were better than the V-RT scores.

Algorithms . | 1-day T forecast_{S}. | 2-day T forecast_{S}. | 3-day T forecast_{S}. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | |

V-M5P | 0.93 | 0.84 | 2.66 | 1.95 | 0.93 | 0.84 | 2.74 | 2.02 | 0.94 | 0.85 | 2.57 | 1.91 |

V-RF | 0.94 | 0.86 | 2.53 | 1.83 | 0.93 | 0.85 | 2.62 | 1.91 | 0.94 | 0.87 | 2.32 | 1.73 |

V-RT | 0.90 | 0.78 | 3.18 | 2.47 | 0.89 | 0.78 | 3.21 | 2.50 | 0.90 | 0.79 | 3.12 | 2.41 |

Algorithms . | 1-day T forecast_{S}. | 2-day T forecast_{S}. | 3-day T forecast_{S}. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | R (/)
. | NSE (/) . | RMSE (°C) . | MAE (°C) . | |

V-M5P | 0.93 | 0.84 | 2.66 | 1.95 | 0.93 | 0.84 | 2.74 | 2.02 | 0.94 | 0.85 | 2.57 | 1.91 |

V-RF | 0.94 | 0.86 | 2.53 | 1.83 | 0.93 | 0.85 | 2.62 | 1.91 | 0.94 | 0.87 | 2.32 | 1.73 |

V-RT | 0.90 | 0.78 | 3.18 | 2.47 | 0.89 | 0.78 | 3.21 | 2.50 | 0.90 | 0.79 | 3.12 | 2.41 |

## DISCUSSION

In this paper, we present a new method to forecast *T _{S}* measured at two different depths and we apply it to

*T*measured at the Isfahan region in Iran. Our method is based on the comparison between three single and hybrid models using the voting algorithm, i.e., V-M5P, V-RF, and V-RT. However, our method was inspired by several previous studies for which the

_{S}*T*was modelled using meteorological variables, especially, the mean, maximal, and minimal air temperatures. As commented in the previous results section, the

_{S}*T*was mostly influenced by air temperature, whereas combining these six meteorological variables produces different results at different depths and different forecasting horizons. Compared to the results reported in the literature, our

_{S}*T*estimations are consistent and encouraging. Note that, although the importance of air temperature is quite important, it seems to be sufficient for an accurate prediction of soil temperature as previously confirmed in the same site using the same data by Sattari

_{S}*et al.*(2017) and Khosravi

*et al.*(2022a). Comparing our numerical results with those reported in the literature, the inclusion of two temperatures, i.e., the mean and the maximal, shows a lower error at all depths and all forecasting horizons. However, given the high correlation coefficient reported in Table 3, we can consider that both variables have a positive effect on soil temperature. In addition, in this new modelling study, we demonstrate the importance of the vote algorithm in the improvement of model performances. Focusing on the proposed methods, the V-RF at 1-, 2- and 3-day forecast provides statistically high performances compared to V-M5P and V-RT. Additionally, the results of the proposed vote method provide a good forecast at 50 cm depth more than at 5 cm depth for all three forecasting horizons. Overall, the results show that with the hybrid vote method proposed in the present study, we can forecast the soil temperature with high NSE values approximately equal to 0.89.

## CONCLUSION

This study investigated the use of three ensemble machine-learning models for 1-, 2-, and 3-day forecasts of *T _{S}* at depths of 5 and 50 cm. The three models compared were V-M5P, V-RF, and V-RT. The novelty of this study is the use of the vote ensemble algorithm. The

*T*forecasting models were validated using

_{S}*in situ*measurements from one station. The model inputs included daily air temperature measurements (i.e., minimum, maximum, and mean), solar radiation, evaporation, and sunshine hours. Though each model accurately forecasted

*T*, thus demonstrating that each could be a powerful tool, the accuracies of the forecasts ranged from one model to another and for each forecasting horizon. There was an apparent increase in error statistics for

_{S}*T*forecasts at the depth of 5 cm among all three models as forecast horizons increased from 1 to 3 days; mean RMSE and MAE for 1-day forecasts were 3.697 and 2.830°C but increased by 11.56 and 9.10% in the 3-day forecast. This increased error was less apparent for

_{S}*T*forecasts at 50 cm depths. The

_{S}*T*forecasts of the three models improved with increasing forecast horizon: RMSE and MAE decreased in the 3-day forecast by 4.30 and 3.20%, respectively. Though the performance metrics of V-M5P and V-RF were similar (V-M5P showed slight superiority), both exceeded the performance of V-RT. The strong performances of all three models, however, depend largely on the number of input variables included in the models, and the variables may be relatively high in number. It would be beneficial to test models that could achieve similar or better results using fewer input variables. It is highly recommended that other potential input variables, such as atmospheric pressure, precipitation, relative humidity, and wind speed, which may have a high impact on

_{S}*T*prediction and their effectiveness, need to be investigated.

_{S}## ACKNOWLEDGEMENTS

The publication has been prepared with the support of the RUDN University Strategic Academic Leadership Program.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.