Abstract
Accurate forecast of carbon dioxide (CO2) emissions plays a significant role in China's carbon peaking and carbon neutrality policies. A novel two-stage forecast procedure based on support vector regression (SVR), random forest (RF), ridge regression (Ridge), and artificial neural network (ANN) is proposed and evaluated by comparing it with the single-stage forecast procedure. Nine independent variables’ data (study period: 1985–2020) are used to forecast the CO2 emissions in China. Our results reveal that, when the time gap, h increases from 1 to 8, the average root mean squared error (RMSE) and mean absolute error (MAE) of SVR–SVR, SVR–RF, SVR–Ridge, and SVR–ANN are almost uniformly lower than errors arising from their single-stage version, respectively. Among these two-stage models, SVR–ANN exhibits the lowest forecast errors, whereas SVR–RF admits the highest. The mean percentage decrease in forecast errors of SVR–SVR vs. SVR, SVR–RF vs. RF, SVR–Ridge vs. Ridge, and SVR–ANN vs. ANN are 36.06, 5.98, 43.05, and 14.81 for RMSE, and 36.06, 6.91, 43.27, and 15.35 for MAE. Our two-stage procedure is also suitable to forecast other variables, such as fossil fuel and renewable energy consumption.
HIGHLIGHTS
A novel two-stage forecast procedure is proposed and evaluated.
Four hybrids of machine learning models based on SVR, RF, Ridge, and ANN are constructed to provide an accurate forecast of CO2 emissions in China.
SVR–ANN gives the lowest forecast errors in terms of RMSE and MAE.
SVR–Ridge shows the highest performance improvement than Ridge.
Graphical Abstract
INDEX OF NOTATIONS AND ABBREVIATIONS
- ANFIS
adaptive neuro-fuzzy inference system
- ANN
artificial neural network
- ARIMA
autoregressive integrated moving average
- BLSTM
bi-directional long short-term model
- CART
Classification And Regression Tree
- CO2
carbon dioxide
- CV
coefficient of variation
- G20
Group of Twenty
- GDP
gross domestic product
- GM
grey model
- HHO
Harris hawk optimization
- KF
kalman filter
- KLS
acronym of KF, LSTM, and SVM
- KKT
Karush–Kuhn–Tucker
- LSTM
long short-term model
- MAE
mean absolute error
- MAPE
mean absolute percentage error
- ML
machine learning
- RBF
radial basis function
- Relu
rectified linear unit
- RF
random forest
- Ridge
ridge regression
- RMSE
root mean squared error
- SD
standard deviation
- SOM
self-organizing map
- SVD
singular value decomposition
- SVM
support vector machine
- SVR
support vector regression
INTRODUCTION
According to the Global Carbon Budget 2021 (Friedlingstein et al. 2022), carbon dioxide (CO2) emitted to the atmosphere attributed to human activities keeps growing at an unprecedented rate. Of all CO2 emissions during the last 70 years, 70% were emitted after the 1960s, and 33% were emitted in this millennium. The astronomical amount of CO2 emitted is exacerbating global warming. Consequently, considerable impacts on the ecological environment are introduced including crop production, rainfall, water scarcity, ocean acidification, etc., which, conversely, causes severe harm to human beings (Valipour 2017; Bhatt & Hossain 2019; Rehman et al. 2021).
In the last 10 years (2012–2021), around 89% of total anthropogenic CO2 emissions had originated from fossil fuel combustion (Friedlingstein et al. 2022). Consumption of fossil fuels, such as coal, oil, and natural gas, is a substantial factor that affects CO2 emissions. Therefore, it is imperative to promote the transformation of energy consumption structure from fossil fuels to renewable energy such as hydropower (Kuriqi et al. 2019, 2020, 2021; Malka et al. 2022), solar energy (Rabaia et al. 2021; Djaafari et al. 2022), and wind energy (Sadorsky 2021).
China takes its share of responsibility in tackling the climate change issue. At the 75th United Nations General Assembly in 2020, China stated its dual carbon commitment to achieve carbon peaking before 2030 and carbon neutrality before 2060. To make informative decisions in achieving these ambitious goals, it is necessary to make an accurate forecast of CO2 emissions coming from fossil fuel consumption.
Literature review of forecasting models
There is rich literature addressing the issue of CO2 emission forecast, with various forecasting methods proposed. Classical methods such as linear regression (Al-Mulali et al. 2016; Wang et al. 2019), grey models (Lu et al. 2009; Lin et al. 2011; Lotfalipour et al. 2013), and the time-series models, e.g., autoregressive integrated moving average (ARIMA) (Nyoni & Bonga 2019; Yang & O'Connell 2020; Ning et al. 2021), are frequently used. It has been widely concerned that the performance of linear regression can be compromised by the violation of the independence or normality assumptions (Gallo et al. 2014). Grey models focus on dealing with small data and poor information. The forecasting performance of grey models may be decreased when there are many independent variables or data. Moreover, the often-used grey model GM(1,1), together with ARIMA, is, by definition, unworkable for the particular forecasting task when multiple independent variables are included.
In contrast, machine learning (ML) methods are more flexible and show great potential in modeling nonlinear and complex patterns. For example, Stamenković et al. (2015) use a back-propagation neural network and a general regression neural network to forecast emissions of methane. It is observed that both neural network models are more effective than multiple linear regression. Saleh et al. (2016) utilize support vector regression (SVR) to predict the expenditure of CO2 emissions based on energy consumption in Yogyakarta and acquire a low prediction error. Acheampong & Boateng (2019) establish an artificial neural network (ANN) model to forecast the carbon emission intensity for Australia, Brazil, China, India, and the USA. Their study also identifies the most sensitive variable to each country's carbon emission intensity.
Many studies suggest that a combination of two or more models exhibits better-forecasting performance than a single model in various applications. For instance, Mardani et al. (2020) propose a multi-stage method to forecast CO2 emissions in G20 (Group of Twenty) countries. They use singular value decomposition (SVD) to forecast missing data, self-organizing map (SOM) to cluster data, and ANN and adaptive neuro-fuzzy inference system (ANFIS) to forecast CO2 emissions based on the gross domestic product (GDP) and energy consumption. Compared against methods using only ANN, ANFIS, or multiple linear regression, the multi-stage approach yields the lowest forecast error. Wang & Zhu (2021) use the Johansen cointegration test and the neural network autoregression model to forecast China's CO2 emissions based on assumptions of three levels of GDP growth rate, which results in a different amount increase in natural gas consumption and production. Morshed-Bozorgdel et al. (2022) find that a two-level ensemble of ML algorithms captures the high variation in wind speed. Farzin et al. (2022) use a bi-directional long short-term model (BLSTM) and the Harris hawk optimization (HHO) algorithm to optimize the model's hyperparameters to forecast the underground water table in Iran. They find that their BLSTM–HHO method is superior to benchmark methods regarding assessment criteria such as mean absolute error (MAE), root mean square error (RMSE), and forecast variance.
Li (2020) develops the KLS (acronym of KF, LSTM, and SVM) method, which is a fusion of Kalman filter (KF), long short-term memory (LSTM), and support vector machine (SVM), to forecast China's carbon emissions. In the meantime, ridge regression (Ridge) is used to select independent variables. The KLS method integrates time-series forecast and variable selection and has been confirmed to be more accurate than the four existing methods. Geevaretnam et al. (2022) apply three ML models, namely random forest (RF), SVM, and ANN to forecast global CO2 emissions and forecast performance was compared in terms of MAE, RMSE, and mean absolute percentage error (MAPE). It reveals that SVM produces the lowest forecast errors.
Accordingly, we will consider a hybrid of ML models to improve the forecast accuracy of CO2 emissions in China.
Contributions
We observe that, in the literature of CO2 emission forecast, few studies are explicitly discussing the time domain of their dependent and independent variables (Saleh et al. 2016; Acheampong & Boateng 2019; Mardani et al. 2020). So it is possible that some of them train models on data sets in the form of , where T is the observation length of the time series. Then, for the testing set, the value of independent variables at time t + h, namely, is substituted into the trained model to forecast the value of the dependent variable at time t + h to get . Such a procedure is suitable when investigating the dynamic links between independent variables and dependent variables. But it is inappropriate and unnecessary to forecast CO2 emissions by giving independent variables at the same time domain. Because the released data of production-based CO2 emissions are calculated by multiplying activity data (i.e. fuel consumption data in the industrial sector) by corresponding emission factors by the type of fuel (Liu et al. 2020; Friedlingstein et al. 2022). That means, when independent variables, i.e. activity data at time t + h are released, the CO2 emission data at time t + h are also released.
In practical applications, policymakers are more interested in forecasting CO2 emissions at time t + h through independent variables at time t. That is, models should be trained on data sets in the form of , where the forecasted value, , is obtained by given . There is a time gap h in the dependent variable, so data sets in the form of will be called lagged data sets thereafter. For instance, Hou et al. (2022) aim to forecast China's short-term CO2 emissions at time t + w via independent variables at time t. It converts multivariate time-series data into labeled sample data using the sliding window method and different shallow learning approaches are tested to find the best one. Faruque et al. (2022) also train several deep-learning models on lagged data sets in their second step to forecast Bangladesh's CO2 emissions.
We will explore the functional relationship between and . However, when forecasting CO2 emissions at time t + h using independent variables at time t, the forecast errors may increase in contrast with using independent variables at time t + h. To our knowledge, there is no work focused on lowering the forecast errors caused by the time gap, h, in the CO2 emission forecast field. We propose a novel two-stage forecast procedure, adapting the procedure in Patel et al. (2015) by making necessary modifications to reduce the forecast errors. The first stage forecasts independent variables at time t + h with their historical values up to time t, i.e. is calculated via . The second stage forecasts CO2 emissions at time t + h with forecasted independent variables at time t + h, i.e. . We make an effort to contribute to the existing knowledge in the following ways:
Clarify the necessity to train models on lagged data sets when forecasting CO2 emissions.
Propose a novel two-stage forecast procedure based on ML models to reduce the forecast errors caused by the time gap in the dependent variable of lagged data sets. It is shown that four hybrids of ML models in the two-stage procedure almost all have lower forecast errors than four individual ML models in the single-stage procedure, respectively.
Develop a forecasting tool for CO2 emissions from a real-case data set in China. A model with high accuracy could provide a guiding tool for the formulation of China's carbon emission reduction policy.
There are several novelties of the proposed two-stage forecast procedure compared to that of Patel et al. (2015):
The historical values of the independent variables are used to forecast their future counterparts in the first stage. Thus, this method introduces no exogenous variables to update independent variables and greatly simplifies the forecasting procedure of Patel et al. (2015).
Patel et al. (2015) forecast stock prices in the financial field, while we attempt to apply a two-stage forecast procedure based on ML models to data sets dealing with macroeconomics and climatology. We expand the application domain of the two-stage forecast procedure.
Four ML models, i.e. SVR, RF, Ridge, and ANN, are applied in this article, while Patel et al. (2015) only tried SVR, RF, and ANN.
DATA AND EVALUATION MEASURES
Variable names . | Variable explanations and units . | Data sources . |
---|---|---|
CO2 emissions | Total production-based emissions of carbon dioxide, including fossil fuels and cement production, excluding land-use change. (million tons) | Global Carbon Projecta |
Foreign direct investment | The net inflows of investment to acquire a lasting management interest in an enterprise operating in an economy other than that of the investor. This series shows net inflows (new investment inflows less disinvestment) from foreign investors and is divided by GDP (%). | The World Bankb |
Industry, value added | An increase in industrial activity. It comprises value added in mining, manufacturing, construction, electricity, water, and gas, and is divided by GDP (%). | The World Bank |
Trade | The sum of exports and imports of goods and services measured as a share of GDP (%). | The World Bank |
Urban population | Proportion of urban population to total population (%). | The World Bank |
GDP per capita | Gross domestic product is divided by total population (current US$). | The World Bank |
Energy consumption | Total energy consumption, including coal, oil, nature gas, and primary electricity and other energy sources. (10,000 tons of standard coal). | China Statistical Yearbookc |
Coal | Proportion of coal consumption to total energy consumption (%). | China Statistical Yearbook |
Oil | Proportion of oil consumption to total energy consumption (%). | China Statistical Yearbook |
Gas | Proportion of nature gas consumption to total energy consumption (%). | China Statistical Yearbook |
Variable names . | Variable explanations and units . | Data sources . |
---|---|---|
CO2 emissions | Total production-based emissions of carbon dioxide, including fossil fuels and cement production, excluding land-use change. (million tons) | Global Carbon Projecta |
Foreign direct investment | The net inflows of investment to acquire a lasting management interest in an enterprise operating in an economy other than that of the investor. This series shows net inflows (new investment inflows less disinvestment) from foreign investors and is divided by GDP (%). | The World Bankb |
Industry, value added | An increase in industrial activity. It comprises value added in mining, manufacturing, construction, electricity, water, and gas, and is divided by GDP (%). | The World Bank |
Trade | The sum of exports and imports of goods and services measured as a share of GDP (%). | The World Bank |
Urban population | Proportion of urban population to total population (%). | The World Bank |
GDP per capita | Gross domestic product is divided by total population (current US$). | The World Bank |
Energy consumption | Total energy consumption, including coal, oil, nature gas, and primary electricity and other energy sources. (10,000 tons of standard coal). | China Statistical Yearbookc |
Coal | Proportion of coal consumption to total energy consumption (%). | China Statistical Yearbook |
Oil | Proportion of oil consumption to total energy consumption (%). | China Statistical Yearbook |
Gas | Proportion of nature gas consumption to total energy consumption (%). | China Statistical Yearbook |
The unit of each variable is marked in parentheses.
Variable . | Count . | Min . | Max . | Mean . | SD . | CV . |
---|---|---|---|---|---|---|
CO2 emissions (million tons) | 144 | 495.59 | 2,675.07 | 1,432.60 | 782.79 | 0.545 |
Foreign direct investment (%) | 144 | 0.13 | 1.69 | 0.74 | 0.39 | 0.635 |
Industry, value added (%) | 144 | 9.44 | 11.92 | 11.03 | 0.68 | 0.416 |
Trade (%) | 144 | 4.85 | 16.18 | 9.92 | 2.97 | 0.584 |
Urban population (%) | 144 | 5.69 | 15.41 | 10.11 | 3.01 | 0.679 |
GDP per capita (current US$) | 144 | 62.06 | 2,624.95 | 787.73 | 849.54 | 1.172 |
Energy consumption (10,000 tons of standard coal) | 144 | 19,022.43 | 124,913.29 | 61,547.91 | 35,913.87 | 0.844 |
Coal (%) | 144 | 14.17 | 19.07 | 17.54 | 1.37 | 0.393 |
Oil (%) | 144 | 4.06 | 5.53 | 4.58 | 0.39 | 0.764 |
Gas (%) | 144 | 0.44 | 2.12 | 0.86 | 0.49 | 1.194 |
Variable . | Count . | Min . | Max . | Mean . | SD . | CV . |
---|---|---|---|---|---|---|
CO2 emissions (million tons) | 144 | 495.59 | 2,675.07 | 1,432.60 | 782.79 | 0.545 |
Foreign direct investment (%) | 144 | 0.13 | 1.69 | 0.74 | 0.39 | 0.635 |
Industry, value added (%) | 144 | 9.44 | 11.92 | 11.03 | 0.68 | 0.416 |
Trade (%) | 144 | 4.85 | 16.18 | 9.92 | 2.97 | 0.584 |
Urban population (%) | 144 | 5.69 | 15.41 | 10.11 | 3.01 | 0.679 |
GDP per capita (current US$) | 144 | 62.06 | 2,624.95 | 787.73 | 849.54 | 1.172 |
Energy consumption (10,000 tons of standard coal) | 144 | 19,022.43 | 124,913.29 | 61,547.91 | 35,913.87 | 0.844 |
Coal (%) | 144 | 14.17 | 19.07 | 17.54 | 1.37 | 0.393 |
Oil (%) | 144 | 4.06 | 5.53 | 4.58 | 0.39 | 0.764 |
Gas (%) | 144 | 0.44 | 2.12 | 0.86 | 0.49 | 1.194 |
SD, standard deviation; CV, coefficient of variation.
METHODOLOGY
Model building blocks
Four supervised ML models, namely SVR, RF, Ridge, and ANN, are introduced as the building blocks of our forecast procedure. Denote a training set as , where D is the number of features and n is the sample size.
Support vector regression
The value of -insensitive loss function is zero when the deviation between the observed value and the forecasted value is no greater than . Equation (4) makes SVR robust to outliers and can prevent overfitting to some extent. It is also verified that SVR has excellent performances in regression and time-series forecast applications (Smola & Schölkopf 2004).
Setting , a grid search method (Mselmi et al. 2017; Papouskova & Hajek 2019) is conducted to determine the optimal parameters that give the smallest test error by choosing from all possible value combinations in Table 3.
Parameters . | Tested values . |
---|---|
Kernel function | Polynomial kernel, RBF |
Polynomial kernel, | {1,2,3,4,5} |
Regularization parameter, | 100 numbers equally spaced between [0.01,10] |
Parameters . | Tested values . |
---|---|
Kernel function | Polynomial kernel, RBF |
Polynomial kernel, | {1,2,3,4,5} |
Regularization parameter, | 100 numbers equally spaced between [0.01,10] |
Random forest
Ensemble learning takes advantage of the power of multiple ML models combined together to train on the same set of observations. Therefore, ensemble models can significantly improve forecast accuracy compared to individual ML models. RF is one of the most popular decision tree-based ensemble models (Ramasubramanian & Singh 2017) due to its high forecasting accuracy (Boulesteix et al. 2012; Belgiu & Drăguţ 2016; Ouedraogo et al. 2019). RF randomly fits multiple decision trees and then takes the average of each tree's forecast as forest's forecast. RF used in our research is implemented as follows:
- 1.
Create bootstrap samples by sampling n elements from a given training set with replacement.
- 2.
Use the Classification And Regression Tree (CART) algorithm (Loh 2011) to train a decision tree with randomly selected features on each bootstrap sample.
- 3.
Calculate the forecasted value of each trained decision tree on the test set and average them as the ultimate forecasted value.
The number of features () and the number of decision trees () are hyperparameters that need to be determined through experiments. A comprehensive number of experiments are carried out by varying the parameter values as shown Table 4.
Parameters . | Tested values . |
---|---|
Number of features, | {3,4,5,6,7,8,9} |
Number of decision trees, | {10, 20, 30, …, 200} |
Parameters . | Tested values . |
---|---|
Number of features, | {3,4,5,6,7,8,9} |
Number of decision trees, | {10, 20, 30, …, 200} |
Ridge regression
Ridge adopts a regularized loss function to compress the linear regression coefficients resulting in reduced variance but increased bias of coefficient estimators. Compared to linear regression, this method can avoid overfitting and the ridge estimator is preferably effective at enhancing the least-squares estimate when there is multicollinearity (Arashi et al. 2019).
Artificial neural network
Two-stage forecast procedure
Next, the outputs of the first stage, i.e. the forecasted values of nine independent variables at time t + h, serve as the inputs of the second stage in Figure 5. The output of the second stage is the forecasted value of CO2 emissions at time t + h, which is the same as the output of the single-stage procedure in Figure 4. This indicates that models used in the second stage need to fit the relationship between nine forecasted independent variables at time t + h and CO2 emissions at the same time domain.
Briefly, the first stage of the two-stage procedure updates the value of the independent variables at time t to time t + h, and the second stage forecasts CO2 emissions at time t + h through forecasted independent variables at time t + h.
We build a model on each training set and test it on the corresponding testing set, respectively. The final forecast error is averaged over the forecast errors of the eight testing sets.
RESULTS AND DISCUSSION
Model . | RMSE (h = 1) . | MAE (h = 1) . | RMSE (h = 2) . | MAE (h = 2) . | RMSE (h = 3) . | MAE (h = 3) . | RMSE (h = 4) . | MAE (h = 4) . |
---|---|---|---|---|---|---|---|---|
SVR | 0.0266 | 0.0233 | 0.0291 | 0.0252 | 0.0329 | 0.0286 | 0.0369 | 0.0323 |
SVR–SVR | 0.0149 | 0.0123 | 0.0194 | 0.0169 | 0.0214 | 0.0188 | 0.021 | 0.0184 |
RF | 0.0934 | 0.086 | 0.0871 | 0.0791 | 0.0932 | 0.0855 | 0.0928 | 0.0849 |
SVR–RF | 0.0868 | 0.0789 | 0.088 | 0.0799 | 0.0884 | 0.0805 | 0.0865 | 0.078 |
Ridge | 0.0243 | 0.0209 | 0.0236 | 0.0203 | 0.0291 | 0.0248 | 0.0341 | 0.0299 |
SVR–Ridge | 0.0162 | 0.014 | 0.0177 | 0.0149 | 0.0146 | 0.0125 | 0.0162 | 0.0141 |
ANN | 0.0133 | 0.0109 | 0.0138 | 0.0122 | 0.0164 | 0.0143 | 0.0136 | 0.0116 |
SVR–ANN | 0.0099 | 0.0087 | 0.0106 | 0.0091 | 0.0124 | 0.0101 | 0.0122 | 0.0103 |
Model . | RMSE (h = 1) . | MAE (h = 1) . | RMSE (h = 2) . | MAE (h = 2) . | RMSE (h = 3) . | MAE (h = 3) . | RMSE (h = 4) . | MAE (h = 4) . |
---|---|---|---|---|---|---|---|---|
SVR | 0.0266 | 0.0233 | 0.0291 | 0.0252 | 0.0329 | 0.0286 | 0.0369 | 0.0323 |
SVR–SVR | 0.0149 | 0.0123 | 0.0194 | 0.0169 | 0.0214 | 0.0188 | 0.021 | 0.0184 |
RF | 0.0934 | 0.086 | 0.0871 | 0.0791 | 0.0932 | 0.0855 | 0.0928 | 0.0849 |
SVR–RF | 0.0868 | 0.0789 | 0.088 | 0.0799 | 0.0884 | 0.0805 | 0.0865 | 0.078 |
Ridge | 0.0243 | 0.0209 | 0.0236 | 0.0203 | 0.0291 | 0.0248 | 0.0341 | 0.0299 |
SVR–Ridge | 0.0162 | 0.014 | 0.0177 | 0.0149 | 0.0146 | 0.0125 | 0.0162 | 0.0141 |
ANN | 0.0133 | 0.0109 | 0.0138 | 0.0122 | 0.0164 | 0.0143 | 0.0136 | 0.0116 |
SVR–ANN | 0.0099 | 0.0087 | 0.0106 | 0.0091 | 0.0124 | 0.0101 | 0.0122 | 0.0103 |
Model . | RMSE (h = 5) . | MAE (h = 5) . | RMSE (h = 6) . | MAE (h = 6) . | RMSE (h = 7) . | MAE (h = 7) . | RMSE (h = 8) . | MAE (h = 8) . |
---|---|---|---|---|---|---|---|---|
SVR | 0.0414 | 0.0361 | 0.0451 | 0.0391 | 0.0497 | 0.0424 | 0.0522 | 0.0426 |
SVR–SVR | 0.0282 | 0.0242 | 0.0304 | 0.026 | 0.0321 | 0.0271 | 0.039 | 0.0352 |
RF | 0.0906 | 0.0824 | 0.0888 | 0.0807 | 0.0927 | 0.0833 | 0.0894 | 0.0803 |
SVR–RF | 0.0867 | 0.0781 | 0.0821 | 0.0734 | 0.0813 | 0.0722 | 0.0853 | 0.0762 |
Ridge | 0.036 | 0.0312 | 0.0383 | 0.0325 | 0.0403 | 0.0339 | 0.0428 | 0.0357 |
SVR–Ridge | 0.017 | 0.014 | 0.0189 | 0.0161 | 0.0207 | 0.0171 | 0.0251 | 0.0217 |
ANN | 0.016 | 0.0139 | 0.0153 | 0.0131 | 0.0164 | 0.0138 | 0.0173 | 0.0157 |
SVR–ANN | 0.0154 | 0.013 | 0.0148 | 0.0124 | 0.0148 | 0.0128 | 0.016 | 0.0137 |
Model . | RMSE (h = 5) . | MAE (h = 5) . | RMSE (h = 6) . | MAE (h = 6) . | RMSE (h = 7) . | MAE (h = 7) . | RMSE (h = 8) . | MAE (h = 8) . |
---|---|---|---|---|---|---|---|---|
SVR | 0.0414 | 0.0361 | 0.0451 | 0.0391 | 0.0497 | 0.0424 | 0.0522 | 0.0426 |
SVR–SVR | 0.0282 | 0.0242 | 0.0304 | 0.026 | 0.0321 | 0.0271 | 0.039 | 0.0352 |
RF | 0.0906 | 0.0824 | 0.0888 | 0.0807 | 0.0927 | 0.0833 | 0.0894 | 0.0803 |
SVR–RF | 0.0867 | 0.0781 | 0.0821 | 0.0734 | 0.0813 | 0.0722 | 0.0853 | 0.0762 |
Ridge | 0.036 | 0.0312 | 0.0383 | 0.0325 | 0.0403 | 0.0339 | 0.0428 | 0.0357 |
SVR–Ridge | 0.017 | 0.014 | 0.0189 | 0.0161 | 0.0207 | 0.0171 | 0.0251 | 0.0217 |
ANN | 0.016 | 0.0139 | 0.0153 | 0.0131 | 0.0164 | 0.0138 | 0.0173 | 0.0157 |
SVR–ANN | 0.0154 | 0.013 | 0.0148 | 0.0124 | 0.0148 | 0.0128 | 0.016 | 0.0137 |
It is clear that SVR–RF vs. RF with h = 2 is the only scenario where the two-stage procedure shows a slightly larger average forecast error than the single-stage procedure. In all other cases, the average forecast errors of the two-stage procedure are smaller than that of the single-stage procedure.
Figures 8 and 9 reveal that except for the RF and SVR–RF scenarios, the average forecast errors of all models display a slowly increasing trend as h moves up. But there is an apparent reduction in forecast error in the two-stage procedure compared with the single-stage procedure, which indicates that the two-stage forecast procedure proposed in this study indeed improves the forecasting accuracy of CO2 emissions.
In addition, among these four two-stage models, SVR–ANN exhibits the smallest forecast error, whereas SVR–RF gives the greatest forecast error. The forecast errors of RF and SVR–RF show significant fluctuation as h increases. This can be explained by the characteristics of the RF algorithm, which consists of decision trees. According to Figures 1 and 2, four out of nine independent variables resemble the increasing trend of CO2 emissions. Therefore, the decision tree may be randomly fitted by a single independent variable during training, which is insufficient for the forecasting task. This also implies that models involving decision trees may not be suitable for the data in this paper.
Finally, error reduction rates for the four two-stage models relative to their corresponding single-stage models ranging from 1 to 8 are averaged, respectively, resulting in Table 7. It indicates that the performance of SVR–Ridge and SVR–SVR is significantly improved than Ridge and SVR, respectively. The performance of SVR–ANN is moderately improved than ANN and SVR–RF is slightly improved than RF. This demonstrates the advantages of the proposed two-stage procedure.
Models . | RMSE . | MAE . |
---|---|---|
SVR–SVR vs. SVR (%) | 36.06 | 36.06 |
SVR–RF vs. RF (%) | 5.98 | 6.91 |
SVR–Ridge vs. Ridge (%) | 43.05 | 43.27 |
SVR–ANN vs. ANN (%) | 14.81 | 15.35 |
Models . | RMSE . | MAE . |
---|---|---|
SVR–SVR vs. SVR (%) | 36.06 | 36.06 |
SVR–RF vs. RF (%) | 5.98 | 6.91 |
SVR–Ridge vs. Ridge (%) | 43.05 | 43.27 |
SVR–ANN vs. ANN (%) | 14.81 | 15.35 |
In conclusion, SVR–Ridge gives the best performance improvement than Ridge, but SVR–ANN has the lowest forecast error among all models. Therefore, it is appropriate to forecast China's CO2 emissions via SVR–ANN.
In addition to the hybrids of SVR with the four ML models, alternative hybrids, i.e. RF–SVR, RF–RF, RF–Ridge, RF–ANN; Ridge–SVR, Ridge–RF, Ridge–Ridge, Ridge–ANN; and ANN–SVR, ANN–RF, ANN–Ridge, ANN–ANN, are also tested in the study. The resulting figures of RMSE, MAE, and reduced rates are provided in the Supplementary Material. It is observed that the two-stage models all have better prediction performance than their corresponding single-stage models.
CONCLUSION
Nine relevant independent variables have been adopted to forecast production-based CO2 emissions in China. In the literature, it is found that some studies may not train their models on lagged data sets, and some train models on lagged data sets but ignore the forecast errors caused by the time gap, h, in the dependent variable. We develop a novel two-stage procedure based on ML models in order to reduce the forecast errors identified. The hybrids of ML models in the two-stage procedure are compared against the corresponding individual models in the single-stage procedure.
Our experiments point out that the average forecast errors of the four two-stage methods, SVR–SVR, SVR–RF, SVR–Ridge, and SVR–ANN, are almost consistently smaller than their single-stage versions. Other combinations of SVR, RF, Ridge, and ANN are also compared against the corresponding single-stage procedures. It is observed that our two-stage procedure can improve forecasting performance. An accurate forecast of CO2 emissions can help policymakers to schedule carbon emission reduction plans more efficiently and promote the realization of the carbon peaking and carbon neutrality goals.
The window width used in the first stage of the two-stage procedure is fixed as 3. It may be worth trying more possibilities to give more accurate forecasts. Since ML models have the advantages of handling big data with many inputs, another direction for future study is to consider more independent variables that affect CO2 emissions, such as consumer price index and environmental policy stringency (Ahmed & Ahmed 2018).
Finally, the two-stage procedure is also suitable for other forecast tasks. For example, it is possible to forecast the consumption of different types of fossil fuels and renewable energy, which are listed before.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories.
CONFLICT OF INTEREST
The authors declare there is no conflict.