ABSTRACT
Climate change, driven by greenhouse gas (GHG) emissions, causes extreme weather events, impacting ecosystems, biodiversity, population health, and the economy. Predicting GHG emissions is crucial for mitigating these impacts and planning sustainable policies. This research proposes a novel machine learning model for GHG emissions forecasting. Our model, Meta-Learning Applied to Multivariate Single-Step Fusion Model, utilizes historical GHG emissions from Brazil over the past 60 years to predict CO2 and CH4 emissions. Additionally, the model employs a unique combination of two techniques in time series forecasting: (i) in the Fusion Model, each substance is individually extracted and trained based on a specific decision task, then integrated into the same feature space; (ii) Meta-Learning allows the model to learn from past prediction tasks, leading to better generalization. Our model was compared with state-of-the-art time series models using the same dataset. The results show that our approach reduces the mean absolute percentage error by 49.06% with 95% confidence compared to the Transformer-based TST model, demonstrating its superior performance and low estimated CO2 emissions of 0.01 kg CO2eq. Furthermore, the model's flexibility allows it to be adapted for various environmental studies and general time series forecasting.
HIGHLIGHTS
Multivariate fusion model and meta-learning for GHG emissions forecasting.
BiLSTM for data extraction and Reptile for model optimization.
Analyses Brazil's CO2 and CH4 emissions over the last 60 years.
Summarizes the main Brazilian legislation for climate change.
Simple model architecture adaptable for other substances.
INTRODUCTION
Globalization promotes local economic growth and investment in emerging countries to reduce poverty and hunger, stimulate education, scientific research, economy, and improve public health. Likewise, the growth of anthropogenic activities in one country contributes to increased greenhouse gas (GHG) emissions that can affect the climate in other regions or continents.
The effects of global warming are responsible for extreme heatwaves, floods, droughts, hurricanes, and biodiversity loss in several countries (Rolnick et al. 2022). However, the emergent countries in the southern hemisphere, including Brazil, are the most affected by climate change (Zhang et al. 2023) due to the lack of sanitation infrastructure, urban plans, natural disaster mitigation measures, and lack of investment in sustainable development.
A historical flash flood occurred in the southern region of Brazil in 2008 affected the most vulnerable population in the rural area (Wink Junior et al. 2023). The extreme precipitation in the mountain region of Rio de Janeiro in 2011 caused landslides resulting in irreparable losses for the most vulnerable population (Lopez et al. 2023).
Despite the economic difficulties and the climate impacts suffered in the last years, Brazil is committed to contributing to the low-carbon emission agenda with the following goals: (i) to achieve 45% of overall renewable sources in the energy mix by 2030 (Werner & Lazaro 2023) and (ii) to achieve neutral net emission by 2050 (UNFCCC 2023). According to the EDGAR – Emissions Database for Global Atmospheric Research data provided by the European Commission, the 2022 GHG emission in Brazil was 91.64% lower than China, 78.22% lower than the United States, and 66.77% lower than India, which are countries that most contribute to the GHG emissions (Crippa et al. 2021).
Moreover, the geographic, climatic and natural resource diversity allows Brazil to have the electricity energy matrix with 83% renewable in 2020, in which hydropower energy accounted for 60.7% (De Toledo et al. 2023). However, due to climate variability such as low precipitation levels and droughts, the percentage of hydropower energy decreased to 53.4% in 2021 (Werner & Lazaro 2023).
Efforts to address climate change from a legal perspective are taking place in Brazil. Law 12187 (BRAZIL 2009b) establishes the National Policy on Climate Change (PNMC) providing guidelines to mitigate the global warming impacts, preserve ecosystems, and promote sustainable development. The Brazilian Congress is analyzing the bill PL 412/2022 (BRAZIL 2022c), which regulates the trade of carbon credits in Brazil. The approval of this bill could create legal instruments to encourage transactions with carbon assets, contributing to the country's sustainable development.
In addition to creating the National Policy on Climate Change and regulating the carbon credit trade, it is essential to monitor GHG emissions. Joint monitoring and regulation action can allow Brazil to achieve the target of 45% overall renewable energy mix by 2030 and keep contributing to the low-carbon agenda. GHG forecasting can help analyze, plan, and adopt a sustainable agenda to promote development, encourage carbon credit policy, and mitigate the impacts of climate change.
The problem of forecasting GHG emissions can be addressed through time series modeling, which has been applied in various fields. Lv et al. (2024) proposed a multi-temporal correlation feature fusion network for machinery fault diagnosis in the manufacturing sector. In the same year, Wang et al. (2024) presented a deep learning framework for acoustic modulation for autonomous underwater vehicles. Turning to food science, Natsume & Okamoto (2024) used the Echo State Network (ESN) to predict changes in food preferences. In the field of climate change, Misra et al. (2024) used a Convolutional Short-Term Memory Network for rainfall prediction.
In this research, we devised the Meta-Learning Applied to a Multivariate Single-Step Fusion Model for GreenHouse Gas Emission (ML4GHG) Forecasting. The model analyses Brazil's CO2 and CH4 emissions over the past 60 years. The model uses multivariate GHG emission data from Brazil to learn the time series forecasting. In the training process, the Fusion Model learns the multivariate series, and the Meta-Learning algorithm helps ML4GHG in the generalization process, outperforming the baseline models.
The contributions of this work are:
Develop and evaluate a novel model based on a multivariate single-step approach leveraging the combination of the Fusion Model for data alignment and optimization-based Meta-Learning tailored for time series;
Adapt the Reptile Meta-Learning algorithm to improve the model generalization;
Develop a model that can be easily adapted to other substances;
Evaluate the proposed models with other models from the literature.
The remainder of this work is organized as follows. Section 2 describes the main Brazilian legislation related to GHG emissions; Section 3 describes recent machine learning models applied to climate change. Section 4 presents the methodology used in ML4GHG Forecasting. Section 5 details the experiment results, and Section 6 details the evaluation analysis. Finally, Section 7 presents the conclusion and indicates future works.
BRAZILIAN LEGISLATION
This section summarizes the main Brazilian legislation related to climate change to create incentives for renewable energy, preserve ecosystems, and guide Brazil toward zero emissions by 2050. The main federal and state laws enacted in the recent years are:
Law 12114 of December 9, 2009 (BRAZIL 2009a) establishes the National Fund on Climate Change (FNMC) to finance studies and measures to reduce and repair the effects of climate change;
Law 12187 of December 29, 2009 (BRAZIL 2009b) establishes the National Policy on Climate Change (PNMC), providing the institutional framework and outlines to mitigate the global warming impacts, preserve and restore ecosystems, and promote sustainable development;
Law 12305 of August 2, 2010 (BRAZIL 2010) establishes the National Policy on Disposal of Solid Waste, which is essential for the reduction of CH4 emitted in the final disposal of organic waste;
Law 14300 of January 6, 2022 (BRASIL 2022a) establishes tariff benefits for distributed small-scale renewable electricity production;
Brazil's Constitution was amended to mandate that bio-fuels (BRAZIL 2022b) and low-carbon hydrogen (BRAZIL 2023) be taxed less than fossil fuels;
Several state laws including (RJ 2015), provide tax reductions or exemptions for electric, hybrid or natural gas-powered vehicles; and
The legislation regarding the limit of pollutant emissions by motor vehicles is increasingly strict (CNMA 2018).
The Brazilian Congress is analyzing the bill PL 412/2022 (BRAZIL 2022c), which regulates the trade of carbon credits in Brazil. The approval of this bill could create legal instruments to encourage transactions with carbon assets, contributing to the country's sustainable development. Other bills that provide relevant changes are PL 639/2015 (BRAZIL 2015) and PLS 302/18 (BRAZIL 2018), which create incentives for the generation of energy in landfills resulting in large capture and utilization of CH4 for electricity generation.
RELATED WORKS
This section describes traditional methodologies and state-of-the-art recent machine learning models for time series forecasting applied to climate change and GHG emissions.
We can find several approaches in the literature for time series forecasting using statistical methods. In the last decade, deep learning models have gained attention in Natural Language Processing, computer vision, and sequential signal processing and have successfully been applied to time series forecasting. More recently, transformer-based methods have presented as a promising option when the computational resource and financial cost are flexible. Alternatively, a meta-learning-based model has been used to help time series model generalization with simple architecture and low computational resource consumption.
Statistical methods designed for time series, such as Auto-Regressive models (AR) and their extensions Auto-Regressive Moving Average (ARMA) and Auto-Regressive Integrated Moving Average (ARIMA) (Hillmer & Tiao 1982) have been successfully used. Time series forecasting relies on stationary data to efficiently predict the future behavior of a certain feature. However, real-world data may have trends or seasonality, which needs to be prepossessed and transformed into stationary data. The prepossessing becomes very challenging in case of disruption of a trend or seasonality in the data, such as an unprecedented global threat: COVID-19. The CO2 emissions in 2020 were at their lowest level (Meng & Noman 2022) compared to prior decades, causing a never seen disruption in the data trend. Meng & Noman (2022) used a statistical approach with Seasonal Auto-regressive Integrated Moving Average (SARIMA) to forecast the global CO2 emission in China for the post-COVID-19 period. COVID-19 had a strong impact on the air quality around the world. Gupta et al. (2023) used SARIMAX (SARIMA modeling with exogenous factor) to predict air quality improvement in India during the nationwide lockdown imposed by the COVID-19 pandemic. Teggi et al. (2020) proposed InFORM to forecast the daily weather (temperature, humidity, and visibility) in Bangalore (India) using the ARIMA statistical method.
In recent years, deep learning models, such as Long Short-Term Memory (LSTM) (Hochreiter & Schmidhuber 1997), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN) have been used for time series forecasting in the climate change domain. Kumari & Singh (2023) compared LSTM with statistical models (ARIMA and SARIMAX), classical machine learning models (Linear Regression and Random Forest) for CO2 forecasting in India. LSTM model outperformed statistical models in CO2 forecasting (Kumari & Singh 2023) and outperformed Prophet – an additive regression model – in air temperature forecasting in Indonesia (Haris et al. 2022).
Climate change has caused an impact on the agricultural sector, which is very sensitive to the weather and temperature oscillation. Alex & Sobin (2021) applied LSTM and ARIMA for temperature forecasts to help harvest planning and reduce losses in the agricultural sector. In response to global warming, efforts to convert the energy matrix to renewable energy, including wind and solar are growing. Almalaq et al. (2021) used RNN, LSTM, and GRU (Cho et al. 2014) for solar and wind energy forecasts, helping supply-demand energy planning in Saudi Arabia.
Deep learning-based models often need a large amount of training data to learn the complex relationship between past and future time series data. To address this problem, we can apply meta-learning. Mo et al. (2023) used Model-Agnostic Meta-Learning (MAML) (Finn et al. 2017) with parameter initialization and Euclidean distance for similarity matching to forecast the remaining useful life of mechanical equipment. Thi Kieu Tran et al. (2020) used a meta-learning-based genetic algorithm for hyperparameter optimization of deep learning models to perform temperature forecasting. Reptile (Nichol et al. 2018), the first-order meta-learning algorithm, can be used to improve time series forecasting. Tian et al. (2021) leverage Reptile with transfer learning, Leelakittisin & Sun (2021) and Gupta & Raghav (2020) applied Reptile to a CNN-based model.
Driven by the country's low-carbon agenda, we also found several works using machine learning models in the context of climate change in Brazil. Hydroelectricity represents around 60% of the electricity generated in Brazil, and extreme weather events impact the country's main source of electrical energy. De Toledo et al. (2023) applied a classical machine learning model (Random Forest, Support Vector Regression, Kernel Ridge Regression) and statistical model (SARIMAX) for stream-flow prediction based on climate indices. Galvão Filho et al. (2020) proposed an LSTM-based model for water flow forecasting used in a hydroelectric power plant in Rondonia state of Brazil.
The difference between our model and the aforementioned works are: (i) Meng & Noman (2022), Gupta et al. (2023), Teggi et al. (2020), and De Toledo et al. (2023) used statistical methods while our model uses meta-learning approach; (ii) Kumari & Singh (2023), Alex & Sobin (2021), Haris et al. (2022), Galvão Filho et al. (2020), Almalaq et al. (2021) proposed deep learning-based models while we used deep learning model combined with meta-learning; (iii) Mo et al. (2023) used MAML, Thi Kieu Tran et al. (2020) used genetic algorithm while we used Reptile for meta-learning-based model optimization; Tian et al. (2021) leverage Reptile with transfer learning whereas we trained the model from scratch; Leelakittisin & Sun (2021) and Gupta & Raghav (2020) applied Reptile to a CNN-based model while we used a Bidirectional Long Short-Term Memory (BiLSTM)-based model and Fusion Model for multivariate data processing.
METHODOLOGY
This section presents the methodology used for GHG emission forecasting in Brazil, called ML4GHG – Meta-Learning Applied to a Multivariate Single-Step Fusion Model for Greenhouse Gas Emission Forecasting. ML4GHG aims to analyze Brazil's CO2 and CH4 emissions over the past 60 years. Therefore, ML4GHG is based on a multivariate single-step approach leveraging the combination of the Fusion Model for data alignment and optimization-based Meta-Learning tailored for time series. Moreover, ML4GHG adapted the Reptile Meta-Learning algorithm to improve the model. In the next sections, we will describe ML4GHG in more depth.
ML4GHG overview
First, we used EDGARv8.0 as a multivariate time series input data for GHG emission prediction. Next, we performed three tasks in the data pre-processing: (i) data analysis to understand the trend and seasonality; (ii) future scaling for data normalization; and (iii) time frame splitting.
After the data preprocessing, the BiLSTM1 model performs the univariate feature extraction for CO2 and BiLSTM2 for CH4. Once the input features are extracted separately, the multivariate Fusion Model performs the CO2 and CH4 data alignment, and the meta-learning algorithm helps in the model generalization. Finally, the multivariate parallel time series forecasting is performed by the Fusion Model's classifier, which predicts the next time step values for CO2 and CH4.
To facilitate the understanding, Table 1 summarizes the notations used in ML4GHG.
Notations . | Description . |
---|---|
T | Multivariate time series data |
n | Number of elements in T |
ti | Values measured at time step i |
xji | Value of the jth feature at time step i |
m | Number of features |
Fusion model | |
Z | Concatenation operation |
x1i | Represents CO2 |
x2i | Represents CH4 |
fφ | Function that returns CO2 embedding vector |
fϑ | Function that returns CH4 embedding vector |
Meta-learning | |
ϕ | Fusion model weights |
ɛ | Step-size of meta-learning |
Str | Multivariate fusion training set |
Sts | Multivariate fusion test set |
ϕ̃ | Fusion model new weights |
l | Fusion model loss |
t | Loss threshold |
Notations . | Description . |
---|---|
T | Multivariate time series data |
n | Number of elements in T |
ti | Values measured at time step i |
xji | Value of the jth feature at time step i |
m | Number of features |
Fusion model | |
Z | Concatenation operation |
x1i | Represents CO2 |
x2i | Represents CH4 |
fφ | Function that returns CO2 embedding vector |
fϑ | Function that returns CH4 embedding vector |
Meta-learning | |
ϕ | Fusion model weights |
ɛ | Step-size of meta-learning |
Str | Multivariate fusion training set |
Sts | Multivariate fusion test set |
ϕ̃ | Fusion model new weights |
l | Fusion model loss |
t | Loss threshold |
Multivariate dataset
The multivariate time series data can be represented as T = {t1,…,tn}, in which n is the number of elements in the time series and ti are the values measured at time step i. Each ti contains the tuple where x1i represents the value of the first feature at time step i, and m is the number of features.
The dataset used in our work is EDGARv8.0. It is a publicly available GHG (CO2, CH4, N2O, and F-gases) emissions database for global atmospheric research reported by the European Member States and by Parties under the United Nations Framework Convention on Climate Change (UNFCCC). The dataset provides annual and monthly GHG emissions data for the time span of 1970–2022 by country (Crippa et al. 2021). We used the monthly data from Brazil, which contains CO2 and CH4 emissions for 636 consecutive months from 1970 to 2022.
Data preprocessing
Data preprocessing plays an important role in time series forecasting to help the model capture the trend and seasonality of data, avoid gradient spikes, and handle missing data. The data preprocessing comprises the following steps: data analysis, feature scaling, and time frame splitting.
The auto-correlation graph of CH4 can be visualized in the second graph of Figure 2. Similarly, to the CO2, the graph shows that only the previous few time steps values of CH4 have a high influence on the current value.
The last step in our multivariate single-step parallel time series forecasting data pre-processing is the time frame splitting. We used the sliding window considering the prior three time steps (lag = 3) to predict the next time step (single-step). We considered lag = 3 because the auto-correlation graph of both substances demonstrated that the current value is highly correlated only with the previous few time steps.
Feature extraction
After the data pre-processing, the CO2 and CH4 normalized data are extracted to produce the corresponding embedding vector. Each extractor model is an independent BiLSTM network with its own classifiers and trained as univariate time series data, as illustrated in Figure 1. BiLSTM is effective in many application areas, such as Natural Language Processing (Enamoto et al. 2022; Costa et al. 2023; Gou & Li 2023) and time series forecasting. BiLSTM comprises forward and backward LSTM (Hochreiter & Schmidhuber 1997). LSTM in turn detects an important feature from the input sequence in the early stage and transmits the information over a long distance, thus capturing potential long-term dependencies (Zrira et al. 2024). In time series forecasting, BiLSTM helps to capture the context of past and future time steps (Schuster & Paliwal 1997). In our model, the BiLSTM layer is followed by a Time-Distributed Layer and three Dense layers, which one is a customized Dense layer that produces the embedding of the univariate time series to be used in the Fusion Model.
Fusion Model
The Fusion Model performs the alignment or fusion of the heterogeneous CO2 and CH4 data. The goal of the Fusion Model is to create an abstraction of the unified representation of different features for each tuple ti in T = {t1,…,tn} and perform one or more tasks efficiently. In this process, the heterogeneous data need to be integrated to find the relationship between them, known as data fusion (Baltrušaitis et al. 2018).
In the literature (Baltrušaitis et al. 2018), we can find three types of data fusion methods: (i) late fusion or decision-level fusion in which each feature is individually extracted and trained based on a specific decision task and then integrated into the same feature space; (ii) early fusion or feature-level fusion which exploits the low-level features just after the extraction, creating a strong interaction between modalities (Wang et al. 2024); and (iii) hybrid fusion or intermediate-level fusion which learns a joint representation of different features by combining the decision-level and feature-level fusion.
We applied the decision-level fusion in our model. As illustrated in Figure 1, each BiLSTM extractor and the Fusion Model have its own classifier (Baltrušaitis et al. 2018). The advantage of this fusion type is that it enables separate calibration according to the data quality.
Multivariate Meta-Learning
Time series forecasting can be challenging in case of disruption of a trend or seasonality in the data. One possible option to address this issue is to use Meta-Learning to optimize the model's learning capabilities (Enamoto et al. 2023).
In our work, we adopted the flexibility of Reptile (Nichol et al. 2018): a gradient-based Meta-Learning approach. Meta-Learning helps the underlying model learn from past experience, adapt, and generalize according to the new task (Enamoto et al. 2023). The details of the Reptile are described in Algorithm 1.
Algorithm 1 Reptile-based multivariate time series Meta-Learning | |
1: | Initialize Fusion Model's weights ϕ |
2: | Initialize meta step-size ɛ |
3: | Construct multivariate fusion training set Str |
4: | Construct multivariate fusion test set Sts |
5: | for each meta-iteration do |
6: | Calculate ϕ̃ by SGD to ϕ on Str |
7: | Predict the next time step with Sts |
8 | if loss l < threshold tthen |
9 | exit for |
10 | else |
11 | Update ϕ ← ϕ + ɛ(ϕ̃− ϕ) |
12 | Adjust ɛ |
13 | end if |
14 | end for |
Algorithm 1 Reptile-based multivariate time series Meta-Learning | |
1: | Initialize Fusion Model's weights ϕ |
2: | Initialize meta step-size ɛ |
3: | Construct multivariate fusion training set Str |
4: | Construct multivariate fusion test set Sts |
5: | for each meta-iteration do |
6: | Calculate ϕ̃ by SGD to ϕ on Str |
7: | Predict the next time step with Sts |
8 | if loss l < threshold tthen |
9 | exit for |
10 | else |
11 | Update ϕ ← ϕ + ɛ(ϕ̃− ϕ) |
12 | Adjust ɛ |
13 | end if |
14 | end for |
First, the weights ϕ of the Fusion Model are randomly initialized, and the meta step-size ɛ is initialized with a fixed value (lines 1 and 2). The multivariate fusion training set Str and the test set Sts are generated (lines 3 and 4). In the meta-iteration loop, the new weights ϕ̃ are computed by Stochastic Gradient Descent (SGD) using the training set Str (line 6). The test set Sts is used for prediction (line 7) and if the loss l is greater than the threshold t, the weights ϕ are updated moving ϕ closer to the optimal value (lines 11 and 12). In practice, we add a little perturbation in the weights ϕ, and after a few meta-iterations, the weight update helps in the model generalization.
Model training
Applying the Reptile algorithm, the Fusion Model keeps its knowledge by updating the weight ϕ under the guidance of ɛ and acquires new knowledge in turn. This mechanism helps the Fusion Model adapt for a time series forecasting where an unexpected disruption in the data trend or seasonality may lead to an undesired outcome.
Before training the model, we used grid search to obtain the best hyperparameter combination, as detailed in Table 2. The first column represents the hyperparameters and the second column the corresponding values used in the grid search. In the pre-processing phase, the best results were obtained with standard scaling, which was done separately, first on the training set and then scaled on the test set to avoid data leakage. We compared LSTM and BiLSTM for univariate data extractor and BiLSTM resulted in the best performance. After the univariate feature extraction, two data alignments were tested and concatenation was the method that best preserved the univariate sequence values.
Parameter . | Values . |
---|---|
Data normalization | {Standard, MinMax} |
Feature extraction | |
Model | {BiLSTM, LSTM} |
Batch size | {32, 64, 128} |
Epoch | {100, 200, 300, 400, 500} |
Learning rate | {0.0001, 0.0005, 0.001, 0.003} |
Dropout | {0.3, 0.5, 0.7} |
Optimizer | {Adam, RMSprop} |
Fusion Model | |
Data alignment | {concatenation, average} |
Batch size | {32, 64, 128} |
Epoch | {100, 200, 300} |
Learning rate | {0.0001, 0.0005, 0.001, 0.003} |
Dropout | {none, 0.3, 0.5, 0.7 |
Optimizer | {adam, rmsprop, swish} |
Meta-Learning | |
Meta step-size | {0.15, 0.25, 0.35, 0.45} |
Meta-iteration | {5, 10} |
Loss threshold | {0.02, 0.03, 0.03, 0.04} |
Parameter . | Values . |
---|---|
Data normalization | {Standard, MinMax} |
Feature extraction | |
Model | {BiLSTM, LSTM} |
Batch size | {32, 64, 128} |
Epoch | {100, 200, 300, 400, 500} |
Learning rate | {0.0001, 0.0005, 0.001, 0.003} |
Dropout | {0.3, 0.5, 0.7} |
Optimizer | {Adam, RMSprop} |
Fusion Model | |
Data alignment | {concatenation, average} |
Batch size | {32, 64, 128} |
Epoch | {100, 200, 300} |
Learning rate | {0.0001, 0.0005, 0.001, 0.003} |
Dropout | {none, 0.3, 0.5, 0.7 |
Optimizer | {adam, rmsprop, swish} |
Meta-Learning | |
Meta step-size | {0.15, 0.25, 0.35, 0.45} |
Meta-iteration | {5, 10} |
Loss threshold | {0.02, 0.03, 0.03, 0.04} |
After obtaining the best values for the hyperparameter, we trained the model using the configuration detailed in Table 3. The first column represents the model, ‘Loss’ represents the objective function and ‘Learn. Rate’ represents the learning rate of each model. ‘Parameters’ is the number of trainable parameters, ‘Batch Size’ is the number of batches used for training, and ‘Meta-Iteration’ is the number of meta-learning repetitions using the same training/test data split. ‘Loss Thres.’ is the loss value used as the threshold before stopping the iteration, and ‘Meta Step-size’ is the initial value of epsilon used to update the Fusion Model's weights for better convergence.
Model . | Loss . | Optimizer . | Learn. rate . | Drop out . | Parameters . | Batch size . | Meta-iteration . | Loss thres. . | Meta step-size . |
---|---|---|---|---|---|---|---|---|---|
BiLSTM1 | MSE | Adam | 0.0001 | 0.3 | 138 K | 32 | – | – | – |
BiLSTM2 | MSE | Adam | 0.0001 | 138 K | 32 | – | – | – | |
Fusion model | MSE | Swish | 0.0005 | 37 K | 32 | – | – | – | |
Meta-Learning | – | – | – | 10 | 0.36 | 0.45 |
Model . | Loss . | Optimizer . | Learn. rate . | Drop out . | Parameters . | Batch size . | Meta-iteration . | Loss thres. . | Meta step-size . |
---|---|---|---|---|---|---|---|---|---|
BiLSTM1 | MSE | Adam | 0.0001 | 0.3 | 138 K | 32 | – | – | – |
BiLSTM2 | MSE | Adam | 0.0001 | 138 K | 32 | – | – | – | |
Fusion model | MSE | Swish | 0.0005 | 37 K | 32 | – | – | – | |
Meta-Learning | – | – | – | 10 | 0.36 | 0.45 |
MODEL EVALUATION
This section describes the evaluation results of ML4GHG using the EDGARv8.0 dataset. The cross- validation strategy is described in Subsection 5.1, and the results of Meta-Learning Applied to Multivariate Single-Step Fusion Model for GHG emissions forecasting are described in Subsection 5.2.
Cross-validation strategy
In time series forecasting, the temporal sequence of data needs to be preserved so the model learns the relationship between data from the current and previous time steps. We use the time series splitting method for the cross-validation strategy, in which we use continuous time blocks of different durations for training. We divided the EDGARv8.0 dataset into ten blocks of different time steps to make up the training set and a fixed length of 24 time steps (months) for the test set.
The details of the data split are described in Table 4. For example, in the split 1, 396 continuous time steps or months are used to train the model and the following 24 time steps for testing. In split 2, 420 continuous time steps are used to train and the following 24-time steps for testing. This way, after executing the ten splits, the model is evaluated using different continuous non-overlapping blocks of months. In the split 10, 612 continuous months were used for training and the last 24 months for testing, totaling all the 636 months that compose the EDGARv8.0 dataset.
Data split . | Training timestep . | Test timestep . |
---|---|---|
1 | 396 | 24 |
2 | 420 | 24 |
3 | 444 | 24 |
4 | 468 | 24 |
5 | 492 | 24 |
6 | 516 | 24 |
7 | 540 | 24 |
8 | 564 | 24 |
9 | 588 | 24 |
10 | 612 | 24 |
Data split . | Training timestep . | Test timestep . |
---|---|---|
1 | 396 | 24 |
2 | 420 | 24 |
3 | 444 | 24 |
4 | 468 | 24 |
5 | 492 | 24 |
6 | 516 | 24 |
7 | 540 | 24 |
8 | 564 | 24 |
9 | 588 | 24 |
10 | 612 | 24 |
Results
In this Subsection, the cross-validation results of ML4GHG with the EDGARv8.0 dataset are reported. Next, we compare the results of ML4GHG with two deep learning-based models and five recent time series forecasting models.
The details of the cross-validation results are described in Table 5. The second column ‘Evaluation Period’ represents the period of test data, and the following columns are the regression error metrics: MSE (Mean Square Error), MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), and RMSE (Root Mean Squared Error). All metrics represent the comparison error between the actual and the estimated values of CO2 and CH4, which means that the smaller the error, the better the model.
Data Split . | Evaluation period . | MSE . | MAE . | MAPE (%) . | RMSE . |
---|---|---|---|---|---|
1 | 2003-Jan → 2004-Dec | 0.022 | 0.115 | 5.691 | 0.141 |
2 | 2005-Jan → 2006-Dec | 0.016 | 0.104 | 5.179 | 0.126 |
3 | 2007-Jan → 2008-Dec | 0.022 | 0.113 | 5.507 | 0.139 |
4 | 2009-Jan → 2010-Dec | 0.021 | 0.107 | 5.598 | 0.132 |
5 | 2011-Jan → 2012-Dec | 0.022 | 0.112 | 4.928 | 0.134 |
6 | 2013-Jan → 2014-Dec | 0.030 | 0.111 | 4.578 | 0.151 |
7 | 2015-Jan → 2016-Dec | 0.022 | 0.098 | 5.265 | 0.132 |
8 | 2017-Jan → 2018-Dec | 0.017 | 0.104 | 6.300 | 0.126 |
9 | 2019-Jan → 2020-Dec | 0.020 | 0.121 | 8.578 | 0.143 |
10 | 2021-Jan → 2022-Dec | 0.013 | 0.093 | 6.364 | 0.10 |
Data Split . | Evaluation period . | MSE . | MAE . | MAPE (%) . | RMSE . |
---|---|---|---|---|---|
1 | 2003-Jan → 2004-Dec | 0.022 | 0.115 | 5.691 | 0.141 |
2 | 2005-Jan → 2006-Dec | 0.016 | 0.104 | 5.179 | 0.126 |
3 | 2007-Jan → 2008-Dec | 0.022 | 0.113 | 5.507 | 0.139 |
4 | 2009-Jan → 2010-Dec | 0.021 | 0.107 | 5.598 | 0.132 |
5 | 2011-Jan → 2012-Dec | 0.022 | 0.112 | 4.928 | 0.134 |
6 | 2013-Jan → 2014-Dec | 0.030 | 0.111 | 4.578 | 0.151 |
7 | 2015-Jan → 2016-Dec | 0.022 | 0.098 | 5.265 | 0.132 |
8 | 2017-Jan → 2018-Dec | 0.017 | 0.104 | 6.300 | 0.126 |
9 | 2019-Jan → 2020-Dec | 0.020 | 0.121 | 8.578 | 0.143 |
10 | 2021-Jan → 2022-Dec | 0.013 | 0.093 | 6.364 | 0.10 |
Note: MSE, mean square error; MAE, mean absolute error; MAPE, mean absolute percentage error; RMSE, root mean squared error.
We can observe in Table 5 that MAPE achieved the highest value (8.578%) during the COVID-19 pandemic represented by data split 9. This outcome suggests that the model struggled to estimate CO2 and CH4 emissions due to the COVID-19 pandemic's impact on the global economy.
Table 6 details the comparison of our model with two baseline models and five recent time series forecasting models. Our model achieved the lowest MAPE of 5.799% with 1.06% of standard deviation. This result is significant in the interval of (5.030, 6.555), at 95% confidence. All models were trained as a multivariate parallel single-step using the previous three time steps (lag = 3) to predict the next time step. In addition, for a fair comparison, we used the same data split of the EDIGARv8.0 dataset described in Table 4 to assess the seven models.
Model . | Method . | MSE . | MAE . | MAPE (%) . | RMSE . |
---|---|---|---|---|---|
BiLSTM | 1L BiLSTM | 0.097 ± 0.05 | 0.250 ± 0.06 | 12.575 ± 2.25 | 0.278 ± 0.06 |
LSTM | 2L LSTM | 0.073 ± 0.04 | 0.213 ± 0.06 | 11.001 ± 2.46 | 0.247 ± 0.07 |
N-Beats (2020) | Residual links | 0.034 ± 0.04 | 0.129 ± 0.08 | 13.659 ± 8.55 | 0.146 ± 0.09 |
TFT (2021) | Transformer | 0.021 ± 0.01 | 0.110 ± 0.04 | 11.384 ± 4.56 | 0.129 ± 0.05 |
NHiTS (2023) | Hierarchical interpolation | 0.075 ± 0.03 | 0.187 ± 0.07 | 23.898 ± 7.77 | 0.211 ± 0.08 |
N-Linear (2023) | Linear model | 0.074 ± 0.16 | 0.147 ± 0.16 | 15.864 ± 19.15 | 0.172 ± 0.20 |
TiDE (2023) | Encoder decoder | 0.038 ± 0.06 | 0.114 ± 0.08 | 12.252 ± 9.62 | 0.141 ± 0.12 |
ML4GHG | Meta-learning | 0.021 ± 0.01 | 0.108 ± 0.01 | 5.799 ± 1.06 | 0.134 ± 0.01 |
Model . | Method . | MSE . | MAE . | MAPE (%) . | RMSE . |
---|---|---|---|---|---|
BiLSTM | 1L BiLSTM | 0.097 ± 0.05 | 0.250 ± 0.06 | 12.575 ± 2.25 | 0.278 ± 0.06 |
LSTM | 2L LSTM | 0.073 ± 0.04 | 0.213 ± 0.06 | 11.001 ± 2.46 | 0.247 ± 0.07 |
N-Beats (2020) | Residual links | 0.034 ± 0.04 | 0.129 ± 0.08 | 13.659 ± 8.55 | 0.146 ± 0.09 |
TFT (2021) | Transformer | 0.021 ± 0.01 | 0.110 ± 0.04 | 11.384 ± 4.56 | 0.129 ± 0.05 |
NHiTS (2023) | Hierarchical interpolation | 0.075 ± 0.03 | 0.187 ± 0.07 | 23.898 ± 7.77 | 0.211 ± 0.08 |
N-Linear (2023) | Linear model | 0.074 ± 0.16 | 0.147 ± 0.16 | 15.864 ± 19.15 | 0.172 ± 0.20 |
TiDE (2023) | Encoder decoder | 0.038 ± 0.06 | 0.114 ± 0.08 | 12.252 ± 9.62 | 0.141 ± 0.12 |
ML4GHG | Meta-learning | 0.021 ± 0.01 | 0.108 ± 0.01 | 5.799 ± 1.06 | 0.134 ± 0.01 |
EVALUATION ANALYSIS
This section describes the analysis of our model results to verify the contribution of individual components. We performed ML4GHG ablation analyses in Subsection 6.1, check the effects of Meta-Learning in Subsection 6.2, and detailed the consumption of computational resources by estimating the CO2 emission related to the experiments in Subsection 6.3.
Ablation analysis
In this Subsection, we performed three ablation analyses to evaluate the impact of individual components in the model: (i) the impact of the Fusion Model and Meta-Learning (Ab1); (ii) the impact of Meta-Learning (Ab2); and (iii) the impact of feature extractor (Ab3). The models used in the ablation were evaluated using the same cross-validation strategy of the EDGARv8.0 dataset described in Subsection 5.1.
The first ablation analysis (Ab1) was conducted by eliminating the Fusion Model and the Meta-learning. Then, replacing the univariate embedding extractor (BiLSTM) with multivariate BiSLTM. In this ablation, the model forecasts CO2 and CH4 at once using the multivariate BiLSTM. In the second ablation (Ab2), we eliminated the Meta-Learning algorithm and kept the Fusion Model. In the third ablation (Ab3), we replaced the BiLSTM feature extractor with LSTM.
The results of the ablation analysis using the EDGARv8.0 dataset are detailed in Table 7. The results are at a 95% confidence interval. The ablation type that most influenced the results was eliminating the Fusion Model and Meta-Learning (Ab1) with 12.57% MAPE, with 116.8% higher error than our model. We observed a severe degradation in the model, suggesting that the Fusion Model helps the model learn a better multivariate embedding for time series forecasting, and the Meta-Learner algorithm helps the model generalization.
Ablation type . | Predicted features . | MSE . | MAE . | MAPE (%) . | RMSE . |
---|---|---|---|---|---|
Ab1 – without fusion model and meta-learning | CO2, CH4 | 0.097 ± 0.05 | 0.250 ± 0.06 | 12.575 ± 2.25 | 0.278 ± 0.06 |
Ab2 – without meta-learning | CO2, CH4 | 0.048 ± 0.04 | 0.168 ± 0.07 | 8.742 ± 3.17 | 0.193 ± 0.07 |
Ab3 – LSTM feature extractor | CO2, CH4 | 0.044 ± 0.03 | 0.156 ± 0.06 | 8.282 ± 2.90 | 0.182 ± 0.06 |
ML4GHG | CO2, CH4 | 0.021 ± 0.01 | 0.108 ± 0.01 | 5.799 ± 1.06 | 0.134 ± 0.01 |
Ablation type . | Predicted features . | MSE . | MAE . | MAPE (%) . | RMSE . |
---|---|---|---|---|---|
Ab1 – without fusion model and meta-learning | CO2, CH4 | 0.097 ± 0.05 | 0.250 ± 0.06 | 12.575 ± 2.25 | 0.278 ± 0.06 |
Ab2 – without meta-learning | CO2, CH4 | 0.048 ± 0.04 | 0.168 ± 0.07 | 8.742 ± 3.17 | 0.193 ± 0.07 |
Ab3 – LSTM feature extractor | CO2, CH4 | 0.044 ± 0.03 | 0.156 ± 0.06 | 8.282 ± 2.90 | 0.182 ± 0.06 |
ML4GHG | CO2, CH4 | 0.021 ± 0.01 | 0.108 ± 0.01 | 5.799 ± 1.06 | 0.134 ± 0.01 |
The second ablation with high impact was the Fusion Model without Meta-Learning (Ab2) with 8.74% MAPE, a 50.75% higher error than our model. This result suggests that the Meta-Learner helps to avoid a lack of generalization in case of disruption of a trend in the time series data.
The impact of replacing BiLSTM with the LSTM feature extractor (Ab3) was 8.28% MAPE, representing an error 42.81% higher than our model. This result suggests that for multivariate data, BiLSTM can better capture the relationship between future and past time series data than LSTM.
Effects of Meta-Learning
Meta-Learning helps the underlying model learn from previous experience, resulting in task and model generalization. We used Reptile (Nichol et al. 2018): a gradient-based Meta-Learning approach to optimize the model learning capabilities.
CO2 emissions related to the experiments
The number of machine learning models trained on the cloud providers has increased, which may collectively contribute to CO2 emissions from data centers (Wu et al. 2022). Devising small machine learning models using few computational resources may contribute, even on a small scale, to reducing GHG emissions.
The estimations of carbon emissions were conducted using the Machine Learning Impact calculator (Lacoste et al. 2019). The CO2-equivalents (kgCO2eq) are used as a standardized measure to express how much warming a given amount of gas will have.
The ML4GHG has only 323 K parameters and was trained and evaluated in 30 minutes, considering all the cross-validation runs. The experiments were conducted using the Google Cloud Platform in the South America-east1 region, Intel(R) Xeon(R) CPU @ 2.20 GHz. The total emission is estimated to be 0.01 kgCO2eq for 0.5 hours of computation.
CONCLUSIONS AND FUTURE WORKS
Despite the economic difficulties and climate impacts suffered in recent years, Brazil is committed to the low-carbon emission agenda. The geographic diversity and natural resources allow Brazil to have an electricity energy mix with 83% renewable in 2020.
From the Brazilian legislation perspective, Law 12,187 establishes the National Policy on Climate Change, and bill PL 412/2022 promotes debate to create a legal instrument for carbon credit trade regulation. In addition to legal measures, it is essential to monitor GHG emissions to keep contributing to the global low-carbon agenda. GHG forecasting can help to improve the sustainable agenda and mitigate the climate change impacts in the country.
This work proposes ML4GHG: a Meta-Learning Applied to a Multivariate Single-Step Fusion Model for GHG emission forecasting where two variables were observed: CO2 and CH4. These substances were extracted from the EDGARv8.0 dataset leveraging two BiLSTM models, one for each substance. Next, the multivariate Fusion Model performed the CO2 and CH4 data alignment, and then the Reptile algorithm provided the model generalization to the GHG forecasting.
The model was evaluated with two baseline models and five recent time series forecasting models. ML4GHG reduces MAPE by 49.06% with 95% confidence compared to the transformer-based TST model, demonstrating its superior performance and low estimated CO2 emissions of 0.01 kg CO2eq. After conducting an ablation analysis, it was found that removing the Fusion Model and Meta-Learning had a high impact on the model with a 116.8% increase in MAPE. Eliminating only the Fusion Model resulted in a 50.75% increase in MAPE while replacing the BiLSTM with LSTM resulted in a 42.81% increase in MAPE.
These results suggest that for multivariate data, the BiLSTM can better capture the relation between substances over time, the Fusion Model helps in the substances data alignment, and the Meta-Learner helps to avoid the lack of generalization in case of disruption in the time series data.
As a future work, we are investigating the use of the Large Language Model (LLM) and multimodal data in time series forecasting.
DATA AVAILABILITY STATEMENT
The dataset utilized in the experiment can be accessed at the following public repository: https://edgar.jrc.ec.europa.eu/dataset_ghg80.
CONFLICT OF INTEREST
The authors declare there is no conflict.