ABSTRACT
This research evaluates the performance of deep learning (DL) models in predicting rainfall in George Town, Penang, utilizing the open-source NASA POWER meteorological data, which includes variables such as rainfall, dew point, solar radiation, wind speed, relative humidity, and temperature. This study introduces a newly developed hybrid DL based on the integration of a 2D convolutional neural network (CNN2D) with a bidirectional recurrent neural network (BRNN) and a bidirectional gated recurrent unit (BGRU). The proposed models, CNN2D–BGRU and BRNN–BGRU, were compared against standalone models CNN2D, BRNN, and BGRU. The results indicate that the BRNN–BGRU model is the most effective, with a root mean square error (RMSE) value of 2.59, a mean absolute error (MAE) value of 1.97, a Pearson correlation coefficient (PCC) value of 0.79, and a Willmott index (WI) value of 0.88. In a 3-day prediction, the BRNN–BGRU model also performed the best, with a test WI value of 0.83, a PCC value of 0.69, a RMSE value of 3.02, and MAE value of 2.34. The hybrid BRNN–BGRU model consistently excels in predicting multi-step rainfall in tropical regions using the NASA POWER dataset. These findings can contribute to the development of advanced rainfall-predicting systems for more effective management of water resources and flooding in urban areas.
HIGHLIGHTS
Hybrid deep learning models substantially enhance rainfall prediction accuracy in George Town.
NASA POWER data effectively mitigate the issue of limited observational data.
The bidirectional recurrent neural–bidirectional gated recurrent unit hybrid model demonstrated superior performance for both 1- and 3-days rainfall predictions.
INTRODUCTION
Rainfall is a vital component of the water cycle (Gu et al. 2022) and plays a significant role in predicting natural disasters (Jamei et al. 2023). In recent years, the spatiotemporal distribution of rainfall has undergone substantial changes due to the impacts of climate change. These regional shifts have led to adverse consequences, such as increased variability in heavy rainfall events and prolonged drought periods (Faiz et al. 2018, 2021; Baig et al. 2024). This results in natural disasters such as flash floods, landslides, and crop damage often occurring. For instance, Ghaderpour et al. (2024) found that heavy rain, particularly after a prolonged drought, can trigger landslides. Additionally, such intense rainfall can damage crops, resulting in severe losses for farmers (Fu et al. 2023; Su & Kuo 2023b). Therefore, accurate rainfall prediction is essential for various sectors, including agriculture, hydrology, environmental management, and disaster risk reduction (Latif et al. 2023).
In past research, various methodologies have been adopted for rainfall modeling, with statistical methods being widely used for predicting rainfall (Baig et al. 2024). However, the increasing frequency and intensity of extreme rainfall events have challenged the accuracy of these traditional approaches. Consequently, researchers have increasingly turned to artificial intelligence (AI)-based models, including machine learning (ML) and deep learning (DL) models, to enhance the accuracy and performance of predictions for extreme rainfall patterns under these changing conditions. For example, Baig et al. (2024) utilized a range of models, such as multilayer perceptron (MLP), random forest (RF), eXtreme Gradient Boosting (XGBoost), linear regression (LR), long short-term memory (LSTM), gradient boost, and support vector machine (SVM), along with their ensemble methods, to improve the accuracy of monthly rainfall predictions. Singarasubramanian et al. (2024) applied models like k-nearest neighbors (KNN), Naive Bayes, SVM, LR, classification and regression trees (CART), RF, and linear discriminant analysis to classify rainfall at Sathanur Dam, India, and found the CART model to be most accurate. Gu et al. (2022) employed stacking methods to ensemble models, including artificial neural networks (ANN), XGBoost, KNN, and SVM, to improve monthly rainfall prediction based on data from the China Meteorological Data Service Centre, China Meteorological Administration.
The shift from traditional ML techniques to DL models has improved the ability to analyze rainfall, which has become increasingly complex due to climate change. Previous research shows that DL models outperform ML techniques in rainfall forecasting. For example, Alqahtani (2024) applied the LSTM method to forecast monthly rainfall in Mecca, optimizing model hyper-parameters using a grid search strategy. Similarly, Salaeh et al. (2022) employed various ML and DL models, including MLP, M5, RF, LSTM, and SVM, to predict multi-step monthly rainfall in the Thale Sap Songkhla region. Their findings demonstrated that the LSTM model offered more accurate predictions. Additionally, Endalie et al. (2022) employed an LSTM network to predict rainfall in Oromia, Ethiopia, comparing its performance with ML models such as Decision Tree (DT), MLP, KNN, and SVM. Like Salaeh et al. (2022), they concluded that the LSTM model provided superior prediction accuracy. Thus, DL models have proven to be more effective than conventional ML models in capturing complex patterns in rainfall patterns.
Several strategies can improve the performance of DL for predicting precipitation, including the hybrid model (Sheikh Khozani et al. 2022), feature engineering (AlDahoul et al. 2023), feature selection (Pei et al. 2020), and parameter optimization through metaheuristic algorithms (Khosravi et al. 2024). Khan & Maity (2020) developed a hybrid model combining 1D Convolutional Layer (Conv1D) and MLP, referred to as the hybrid Conv1D–MLP. This model outperformed the compared model, demonstrating an improvement in the accuracy of multi-step predictions. Sheikhi et al. (2023) employed the invasive weed optimization algorithm, firefly optimization algorithm, and genetic particle swarm optimization algorithm (GAPSO) to enhance the performance of ANN and the group method of data handling (GMDH). They also used wavelet transformation, which is used to assess variations across multiple temporal scales, identify specific rainfall events, facilitate data compression, and detect changes and trends in rainfall patterns. The results showed that the GAPSO-wavelet-ANN model achieved the highest accuracy with the lowest root mean square error (RMSE), outperforming all other hybrid models. Despite these advancements, there is limited research into hybrid models that integrate two DL techniques. Hybrid approaches have the potential to optimize the predictive capabilities of individual models. For instance, Chhetri et al. (2020) demonstrated that the Bidirectional Long Short-Term Memory (BLSTM)–gated recurrent unit (GRU) hybrid improved RMSE results by 41.1% compared to the standalone LSTM model for monthly rainfall prediction. Thottungal Harilal et al. (2024) utilized hybrid DL models, specifically the Convolutional Neural Network (CNN)–LSTM and recurrent neural network (RNN)–LSTM hybrids, to predict daily rainfall. Their findings revealed that these hybrid models outperformed the other tested models. Additionally, Jamei et al. (2024b) showed that hybrid models provide greater accuracy than standalone models for predicting river streamflow. The literature review indicates that hybrid DL models offer higher accuracy for predictions compared to standalone models. However, the integration of hybrid models for tropical rainfall predictions remains underexplored.
The application of the NASA POWER dataset in tropical countries has been less explored in the literature (Tan et al. 2023). However, recent studies have highlighted its relevance to hydrological research and climate studies, particularly in regions lacking ground-based weather data. Kadhim Tayyeh & Mohammed (2023) demonstrated the potential of NASA POWER to generate weather datasets in the Euphrates River Basin, Iraq, highlighting its role in increasing the accessibility of climate data. Similarly, Rodrigues & Braga (2021) found that NASA POWER data effectively complemented ground observations of solar radiation, temperature, and humidity in the Alentejo Region, Portugal. Dharmayasa et al. (2022) validated the reliability of NASA POWER for rainfall estimates in Bali, Indonesia, highlighting its strong correlation with observed data, although there were some discrepancies in average rainfall values. Bandira et al. (2023) demonstrated the usefulness of NASA POWER in estimating climate variability in Northern Peninsular Malaysia, particularly for identifying optimal locations for solar farming. Khan et al. (2024) used classification models such as MLP, LR, RF, and decision trees to predict rainfall in Aligarh with NASA POWER data from 2013 to 2022. This approach shows that classification models can provide useful insights in predicting rainfall patterns using NASA POWER. While these studies suggest that the NASA POWER dataset is of high quality, its application in tropical rainfall prediction remains limited.
This study aims to develop both hybrid and standalone models to predict tropical rainfall in Georgetown, Malaysia, utilizing NASA POWER meteorological data. Three standalone models were developed: CNN2D, bidirectional recurrent neural (BRNN), and bidirectional gated recurrent unit (BGRU), along with two new hybrid models, CNN2D–BGRU and BRNN–BGRU, to evaluate their performance in predicting 1- and 3-day future tropical rainfall. Weather predictions are generally more accurate for short time frames, typically ranging from a few hours to about a week (Sarker 2022). In addition, multi-step ahead rainfall prediction is also important for real applications (Zhou et al. 2024). To improve accuracy, the Savitzky–Golay method was used to address outliers in the dataset (Pham et al. 2024), although its use in hybrid DL models for rainfall predictions is less common. The results are useful to improve disaster management strategies by enabling more effective planning and mitigation for extreme rainfall events.
MATERIALS AND METHODS
Study area
Datasets
The selected study site is located in the George Town area, which is prone to flooding. This research aims to provide preliminary information that can assist local stakeholders to improve disaster management strategies in flood-prone areas. The NASA POWER platform allows users to extract data in multiple formats, point, regional, and global, facilitating efficient data retrieval. For this study, the specific location is at a latitude of 5.4097 and a longitude of 100.3151, with the data covering the period from 1 January 2017 to 31 December 2021. An 80%/20% split of the complete dataset was used for training and validation purposes, respectively.
The NASA POWER dataset includes minimum temperature (Tmin), solar radiation (SR), maximum temperature (Tmax), maximum windspeed (WSmax), average windspeed, minimum windspeed (WSmin), average temperature (AvT), dew point, and relative humidity, which are freely accessible at https://power.larc.nasa.gov/data-access-viewer/. According to Stackhouse (2020), NASA POWER supplies high-resolution rainfall data obtained from NASA's Global Precipitation Measurement (GPM) mission's Integrated Multi-satellite Retrievals for GPM (IMERG), with a resolution of 0.1° latitude by 0.1° longitude. Other meteorological variables are obtained from NASA's GMAO Modern Era Retrospective-Analysis for Research and Applications (MERRA-2) assimilation model and the Goddard Earth Observing System (GEOS) 5.12.4 FP-IT (Bosilovich et al. 2017). For the SR parameter, data are obtained from the NASA GEWEX SRB Integrated Output 4-Product (R4-IP) archive and the NASA CERES SYN1deg and FLASHFlux projects.
METHODS
To enhance the quality and performance of the model, the dataset must undergo preprocessing, which includes improving data quality, scaling inputs, and addressing outliers. Preprocessing is crucial for optimizing datasets for model development. In this study, the MinMaxScaler method was employed to normalize the input data features, thereby improving the relevance and reliability of the predictive models. Additionally, the Savitzky–Golay filter was applied for smoothing and noise reduction, as supported by existing literature (Lian et al. 2023; Pham et al. 2024). According to Pham et al. (2024), this filter can improve data accuracy without distorting the signal trend, producing high-quality input data for use in AI model prediction. Additionally, the Savitzky–Golay filter uses local least squares polynomial approximation.
The next phase involves the modeling process, where three DL models, CNN2D, BRNN, and BGRU, were evaluated and utilized for nonlinear rainfall predictions. To improve the performance of the standalone models, the study introduces hybrid models that integrate CNN2D with BGRU and BRNN with BGRU for more sophisticated rainfall forecasting. The objective of developing these hybrid models is to better capture and interpret the complex rainfall patterns present in the dataset. The final phase focuses on evaluation and validation using statistical metrics such as RMSE, mean absolute error (MAE), Pearson correlation coefficient (PCC), and Willmott index (WI) to assess the performance and reliability of the models. This evaluation aims to identify the most effective model configuration for practical urban rainfall prediction.
2D convolutional neural network
In the CNN model, the fundamental architecture consists of an input layer, a convolution layer, a maximum pooling layer, a fully connected layer, and an output layer. The input layer is represented by an m × n matrix, where each element corresponds to a feature value, forming a two-dimensional feature map. The convolution layer, equipped with multiple convolution units, uses backpropagation to optimize its parameters and extract various features from the input layer (Wang et al. 2019). Following the convolution layer, pooling mechanisms are employed to lower the feature dimension, thereby reducing the coefficient number and reducing the risk of overfitting by simplifying the network (Mozo et al. 2018; Shu et al. 2021; Hakim et al. 2022). This aggregation process involves dimensionality reduction before the data are passed through fully connected layers (Lee et al. 2020; Hakim et al. 2022).
CNN, as a specific form of ANN feed, has shown significant effectiveness in the processing and analysis of digital images due to its layered architecture that facilitates comprehensive feature extraction and processing (Kuo 2016; Silva et al. 2023). The pooling layer is typically placed following the convolution layers (Su & Jiang 2023a). The maximum pooling is the most prominent pooling method used (Silva et al. 2023). The maximum pooling layer helps remove weaker features to reduce the number of parameters in the network and prevent overfitting (Silva et al. 2023; Su & Jiang 2023a). CNN also serves as a predictive time series solution, levering convolutional dilatation, which allows the filter to be used for intercellular dilatation calculation (Wibawa et al. 2022).
Bidirectional recurrent neural network
Bidirectional gated recurrent unit
Hybrid models
Model configuration
This work employs CNN2D, BRNN, and BGRU models, as well as the development of a hybrid model, CNN2D–BGRU and BRNN–BGRU, to enhance the performance of the standalone model. In the development of the CNN2D–BGRU model, the CNN2D layer first processes the spatial features from the reshaped input data, while the BGRU layer captures the temporal dependence. The outputs from these two branches are combined and further processed through additional dense layers to refine features and make predictions.
For the BRNN–BGRU model, the second model architecture uses separate BRNN and BGRU branches to analyze temporal patterns from different perspectives. The output from these branches is combined, passed through a dropout layer to reduce overfitting and then processed by a final dense layer before making predictions. The same setup model was used to test the prediction 3 days in advance. This is to assess the model's capacity to make future predictions. Table 1 shows the parameter settings for all of the models, including hybrid models.
Model . | Layers and parameters . | Filters/units . | Kernel size/pool size . | Dense units . |
---|---|---|---|---|
CNN2D | Conv2D, MaxPooling2D, dense | 128, 256 filters | Kernel size: (3, 3), pool size: (2, 2) | 256 |
BRNN | Bidirectional LSTM (RNN), dropout, dense | 200, 100 units | – | 64 |
BGRU | Bidirectional GRU, dropout, ensDe | 128, 64 units | – | 128 |
BRNN–BGRU | Bidirectional LSTM (RNN), bidirectional GRU, dense | LSTM: 200, 100 units, GRU: 128, 64 units | – | 128 |
CNN2D–BGRU | Conv2D, MaxPooling2D, bidirectional GRU, Dense | Conv2D: 128, 256 filters, GRU: 128, 64 units | Kernel size: (3, 3), pools: (2, 2) | 128 |
Model . | Layers and parameters . | Filters/units . | Kernel size/pool size . | Dense units . |
---|---|---|---|---|
CNN2D | Conv2D, MaxPooling2D, dense | 128, 256 filters | Kernel size: (3, 3), pool size: (2, 2) | 256 |
BRNN | Bidirectional LSTM (RNN), dropout, dense | 200, 100 units | – | 64 |
BGRU | Bidirectional GRU, dropout, ensDe | 128, 64 units | – | 128 |
BRNN–BGRU | Bidirectional LSTM (RNN), bidirectional GRU, dense | LSTM: 200, 100 units, GRU: 128, 64 units | – | 128 |
CNN2D–BGRU | Conv2D, MaxPooling2D, bidirectional GRU, Dense | Conv2D: 128, 256 filters, GRU: 128, 64 units | Kernel size: (3, 3), pools: (2, 2) | 128 |
Model evaluation
Four evaluation metrics, such as RMSE, MAE, PCC, and WI, are used in this study, which have been previously applied by researchers to assess the performance of both standalone and hybrid models (Wang et al. 2022). The same evaluation metrics are used to assess the capability of the proposed models for multi-step predictions. These metrics are crucial for measuring the reliability and accuracy of the models, offering a thorough evaluation of their performance. Additionally, Taylor diagrams are utilized to visualize and compare model performance. These diagrams are graphical tools that help assess the accuracy of mathematical models by comparing simulated data with observed data (Sammen et al. 2023). RMSE is commonly employed to evaluate differences between observed and predicted values. The RMSE metric assigns greater weight to extreme errors, whereas the MAE is calculated based on the mean magnitude of the error without consideration of its direction (Khosravi et al. 2023a). Consequently, RMSE and MAE provide a comprehensive and complementary analysis of model error. The WI provides a descriptive index that enables cross-comparison and evaluation of difference models (Piri et al. 2023). PCC represents an established methodology for quantifying the linear degree of correlation between variables (Hu et al. 2020). The following are equations for metric assessment that have been in use in previous studies (Willmott 1981; Ghorbani et al. 2018; Kim et al. 2023):
RESULTS
One-day-ahead prediction
Table 2 displays the performance results of the models for 1-day-ahead prediction. The CNN2D–BGRU model achieved the lowest RMSE value of 2.84, outperforming all other models. This suggests that the CNN2D–BGRU model delivers the most accurate predictions on average for a 1-day period compared to the BRNN–BGRU, BRNN, BGRU, and CNN2D models, which have RMSE values of 3.04, 3.14, 3.15, and 3.34, respectively. The standalone CNN2D model recorded the highest RMSE, indicating lower accuracy. Additionally, the CNN2D–BGRU model excelled in MAE performance with a value of 2.03. The MAE values for the other models are as follows: BRNN–BGRU (2.17), BGRU (2.24), and BRNN (2.27), with the CNN2D model showing a relatively high MAE value of 2.47. These results indicate that the CNN2D model is less effective compared to the others. Overall, the CNN2D–BGRU model demonstrated superior accuracy and consistency, with fewer errors and more reliable predictions across different datasets.
Model . | RMSE . | MAE . | PCC . | WI . |
---|---|---|---|---|
Training | ||||
CNN2D | 3.34 | 2.47 | 0.73 | 0.79 |
BRNN | 3.14 | 2.27 | 0.76 | 0.84 |
BGRU | 3.15 | 2.24 | 0.76 | 0.83 |
CNN2D–BGRU | 2.84 | 2.03 | 0.82 | 0.87 |
BRNN–BGRU | 3.04 | 2.17 | 0.78 | 0.85 |
Testing | ||||
CNN2D | 2.70 | 2.24 | 0.75 | 0.85 |
BRNN | 2.72 | 2.13 | 0.76 | 0.86 |
BGRU | 2.61 | 2.01 | 0.78 | 0.88 |
CNN2D–BGRU | 2.64 | 2.10 | 0.77 | 0.87 |
BRNN–BGRU | 2.59 | 1.97 | 0.79 | 0.88 |
Model . | RMSE . | MAE . | PCC . | WI . |
---|---|---|---|---|
Training | ||||
CNN2D | 3.34 | 2.47 | 0.73 | 0.79 |
BRNN | 3.14 | 2.27 | 0.76 | 0.84 |
BGRU | 3.15 | 2.24 | 0.76 | 0.83 |
CNN2D–BGRU | 2.84 | 2.03 | 0.82 | 0.87 |
BRNN–BGRU | 3.04 | 2.17 | 0.78 | 0.85 |
Testing | ||||
CNN2D | 2.70 | 2.24 | 0.75 | 0.85 |
BRNN | 2.72 | 2.13 | 0.76 | 0.86 |
BGRU | 2.61 | 2.01 | 0.78 | 0.88 |
CNN2D–BGRU | 2.64 | 2.10 | 0.77 | 0.87 |
BRNN–BGRU | 2.59 | 1.97 | 0.79 | 0.88 |
Regarding the PCC results, the CNN2D–BGRU model achieved the highest value of 0.82, outperforming all other models on the training dataset. The BRNN–BGRU model followed with a PCC value of 0.78, while the BRNN, BGRU, and CNN2D models had lower PCC values of 0.76, 0.76, and 0.73, respectively. For the WI metric, the CNN2D–BGRU model also excelled, attaining the highest WI value of 0.87. It outperformed the BRNN–BGRU (0.85), BRNN (0.84), BGRU (0.83), and CNN2D (0.79) models. The CNN2D model performed the worst in terms of both PCC and WI metrics. Overall, the CNN2D–BGRU model demonstrated the best performance for 1-day-ahead prediction in the training dataset, whereas the CNN2D model showed less favorable results.
For the testing dataset, the BRNN–BGRU hybrid model achieved the lowest RMSE value of 2.59, outperforming all other models: BGRU (2.61), CNN2D–BGRU (2.64), CNN2D (2.70), and BRNN (2.72). The BGRU model came in second for performance, demonstrating strong predictive capability. In contrast, the standalone BRNN model had the highest error. The BRNN–BGRU model also excelled in terms of MAE, with a value of 1.97, surpassing the BGRU (2.01), CNN2D–BGRU (2.10), BRNN (2.13), and CNN2D (2.24) models. The CNN2D model showed the poorest performance in this metric, while the BGRU model achieved the second-highest accuracy score.
The BRNN–BGRU model attained the highest PCC score of 0.79 on the test dataset, with the standalone BGRU model following closely with a PCC value of 0.78. The other models recorded scores of 0.77 for CNN2D–BGRU, 0.76 for BRNN, and 0.75 for CNN2D. In terms of the WI metric, the BRNN–BGRU model excelled with a value of 0.88, surpassing the BGRU model (0.88), CNN2D–BGRU (0.87), BRNN (0.86), and CNN2D (0.85). The CNN2D model showed the lowest WI and PCC scores in this test dataset, while the standalone BGRU model achieved the second-highest score, demonstrating its effectiveness in enhancing the performance of both CNN2D and BRNN models.
The hybrid models outperform the standalone models, as shown by their improved performance metrics. During training, the CNN2D–BGRU hybrid model reduces RMSE by 14.88% and MAE by 17.96% compared to the CNN2D model. It also shows a 10.42% increase in PCC and an 8.98% increase in WI. In testing, the BRNN–BGRU model improves RMSE by 5.07% and MAE by 7.73% compared to the BRNN model. Additionally, it shows a 3.44% increase in PCC and a 2.27% increase in WI over the BRNN model. These results indicate that hybrid models enhance accuracy and strengthen overall performance metrics.
Three-day-ahead prediction
Extending predictions to 3 days' results in decreased performance across all models, as shown in Table 3, highlighting the increased complexity and uncertainty associated with longer-term forecasts. The CNN2D–BGRU model leads in RMSE on the training dataset with a score of 3.23, outperforming all other models. This model has consistently demonstrated superior performance in 1-day-ahead predictions with lower RMSE values. The other models reported the following RMSE values: BRNN–BGRU (3.44), BRNN (3.51), BGRU (3.54), and CNN2D (3.56). The CNN2D–BGRU model again outperformed the standalone CNN2D model, which had the highest RMSE. These results indicate that hybrid models can achieve better performance compared to individual models.
Model . | RMSE . | MAE . | PCC . | WI . |
---|---|---|---|---|
Training | ||||
CNN2D | 3.56 | 2.65 | 0.69 | 0.75 |
BRNN | 3.51 | 2.53 | 0.69 | 0.78 |
BGRU | 3.54 | 2.53 | 0.68 | 0.78 |
CNN2D–BGRU | 3.23 | 2.34 | 0.75 | 0.82 |
BRNN–BGRU | 3.44 | 2.45 | 0.70 | 0.79 |
Testing | ||||
CNN2D | 2.96 | 2.42 | 0.69 | 0.80 |
BRNN | 3.14 | 2.48 | 0.66 | 0.81 |
BGRU | 3.06 | 2.34 | 0.68 | 0.82 |
CNN2D–BGRU | 3.05 | 2.40 | 0.68 | 0.81 |
BRNN–BGRU | 3.02 | 2.34 | 0.69 | 0.83 |
Model . | RMSE . | MAE . | PCC . | WI . |
---|---|---|---|---|
Training | ||||
CNN2D | 3.56 | 2.65 | 0.69 | 0.75 |
BRNN | 3.51 | 2.53 | 0.69 | 0.78 |
BGRU | 3.54 | 2.53 | 0.68 | 0.78 |
CNN2D–BGRU | 3.23 | 2.34 | 0.75 | 0.82 |
BRNN–BGRU | 3.44 | 2.45 | 0.70 | 0.79 |
Testing | ||||
CNN2D | 2.96 | 2.42 | 0.69 | 0.80 |
BRNN | 3.14 | 2.48 | 0.66 | 0.81 |
BGRU | 3.06 | 2.34 | 0.68 | 0.82 |
CNN2D–BGRU | 3.05 | 2.40 | 0.68 | 0.81 |
BRNN–BGRU | 3.02 | 2.34 | 0.69 | 0.83 |
The CNN2D–BGRU model achieved the lowest MAE value of 2.34, with the BRNN–BGRU hybrid model closely following at 2.45. The remaining models had MAE values of 2.53 for both BRNN and BGRU and 2.65 for CNN2D. For PCC, the standalone CNN2D–BGRU model led with a value of 0.75, followed by BRNN–BGRU at 0.70, BRNN and CNN2D at 0.69 each, and BGRU at 0.68. In terms of WI, the CNN2D–BGRU model also performed best with a score of 0.82, surpassing BRNN–BGRU (0.79), BRNN and BGRU (0.78 each), and CNN2D (0.75). The CNN2D model had the lowest WI score for the 3-day-ahead predictions, indicating that the CNN2D–BGRU hybrid model significantly enhances the performance of the standalone CNN2D model.
For the testing dataset, the CNN2D model achieved the lowest RMSE value of 2.96, outperforming the other models: BRNN–BGRU (3.02), CNN2D–BGRU (3.05), BGRU (3.06), and BRNN (3.14). In terms of MAE, the BGRU model performed best with a score of 2.34, slightly better than BRNN–BGRU (2.34) and CNN2D–BGRU (2.40), while CNN2D (2.42) and BRNN (2.48) showed higher values. For PCC, the BRNN–BGRU hybrid model led with 0.69, closely followed by CNN2D (0.69), BGRU (0.68), CNN2D–BGRU (0.68), and BRNN (0.66). Regarding the WI metric, the BRNN–BGRU model was the most efficient with a score of 0.83, surpassing BGRU (0.82), CNN2D–BGRU (0.81), BRNN (0.81), and CNN2D (0.80). Overall, the CNN2D model had the lowest performance across all metrics for the 3-day-ahead rainfall prediction.
Taylor diagram
DISCUSSION
This study revealed that the DL models CNN2D and BRNN had inconsistent and limited performance in predicting multi-step ahead rainfall. For example, CNN2D had the highest RMSE and MAE for both 1- and 3-day predictions, while BRNN showed slightly better results but still fell short compared to the hybrid models. These findings align with Salma & Ashwitha (2022), which found that the CNN–LSTM hybrid model outperformed standalone DL and ML models in rainfall prediction accuracy. This study leveraged the advantages of the GRU model, which excels in capturing temporal dependencies, as noted by Han et al. (2024). The BGRU model, incorporating unidirectional and asymmetric GRU, as discussed by Jamei et al. (2024b), improves the prediction performance of CNN2D and BRNN models.
The BRNN–BGRU model outperforms standalone models by providing enhanced multi-step prediction capabilities and maintaining consistent performance for both 1- and 3-day forecasts. Jamei et al. (2024b) also found that the CNN–BGRU hybrid model, used together with multivariate variational mode decomposition, achieved better results than other standalone models. The results are consistent with Chhetri et al. (2020), where combining two DL models leads to superior performance compared to other models. Similar to the study by Khan & Maity (2020), the combined model between ML and DL got better results than the standalone model tested. RNN models are particularly suitable for time series prediction and sequential data analysis (Khosravi et al. 2023b). This research uses the BRNN model, which provides additional feature extraction capabilities (Jamei et al. 2024b). The combined BRNN–BGRU model is the most effective model by using the strengths of both models for better time series data prediction.
NASA POWER has the potential to address the shortage of meteorological data for rainfall predictions in tropical regions. Oloyede et al. (2023) demonstrated that models can be developed to generate more accurate temperature forecasts by utilizing NASA POWER data as input. Moreover, there is widespread agreement that NASA POWER data are valuable in scenarios where ground-measured data are either unavailable or incomplete. Khan et al. (2024) used the NASA POWER dataset to predict the rainfall classification in India. However, the authors did not mention the quality of the dataset from NASA POWER.
In this study, the utilization of the NASA POWER dataset for rainfall forecasting introduces some uncertainty, primarily due to the lack of local rainfall stations. Tan et al. (2023) showed that while the NASA POWER dataset effectively captures rainfall variability during the 2014–2015 floods in Kelantan, Malaysia, it requires bias correction before application. Future research should prioritize model calibration with ground-based data sets to increase accuracy in predictions. Hyperparameter configuration is currently performed manually. Employing optimization techniques such as grid search, particle swarm optimization (PSO) or genetic algorithms can lead to more efficient models by systematically exploring the hyperparameter space. Hyperparameter optimization, particularly PSO for DL, is crucial since DL models are often sensitive to poor initial weights (Cholissodin & Sutrisno 2020). The model selection is based on the capabilities and advantages identified by previous studies, emphasizing their individual strengths (Su & Jiang 2023a; Vijay & Aparna 2023). Combining the best-performing standalone models with those that are less promising models not only reduces the impact of model uncertainty but also leverages multiple methodologies to improve overall results.
By 2030, climate change is projected to intensify due to rising carbon dioxide levels, resulting in drier summers, wetter winters, and more frequent severe storms (Sarker 2022). Furthermore, the occurrence of typhoons and hurricanes is expected to rise as ocean waters warm. Accurate rainfall prediction can serve as an early warning system for natural disasters such as flash floods, landslides, and droughts (Zhou et al. 2024). The results indicate that the developed model is capable of accurately predicting rainfall and capturing extreme rainfall patterns driven by climate change, utilizing datasets from NASA POWER. This information is especially beneficial for farmers, enabling more efficient use of water resources (Latif et al. 2023). The development of reliable multi-step predicting models can be applied in practical situations to provide early warnings to the public, assist in disaster management, and help authorities mitigate risks to property, crops, and infrastructure (Latif et al. 2023; Zhou et al. 2024).
Future research should consider incorporating optimizers to improve prediction accuracy. Adnan et al. (2023) found that metaheuristic optimizers enhance model performance in river flow prediction. Additionally, integrating multi-head attention could further refine DL models. Given that NASA POWER data often contain outliers and noise, which can affect model accuracy and increase the risk of overfitting, techniques like wavelet transform (Kumar & Kumar 2021), Savitzky–Golay (Pham et al. 2024), and Z-score (Raziei & Miri 2023) are used to eliminate outliers. This study utilized the Savitzky–Golay method to remove or mitigate outliers, demonstrating its effectiveness in enhancing model performance and preventing overfitting. Future research should emphasize the necessity of establishing rainfall stations in the study area. Changes in hydrologic data patterns resulting from climate change can impact the network of hydrometric gauges, indicating that the area may need a dedicated rainfall monitoring station (Singhal et al. 2024).
This study has several limitations, including the reliance on a single time series dataset, which may limit the generalizability of the results to other contexts. Future research should explore different areas to assess the model's predictive capabilities in various geographical conditions. Additionally, bias correction was not applied prior to training the model, which could impact the accuracy of the predictions. Another limitation is the absence of a feature selection process, as the study depended solely on the variables provided by NASA POWER and prior research for variable selection. Addressing these limitations in future studies can improve the reliability of NASA POWER in predicting tropical rainfall.
CONCLUSIONS
This study advances the development of enhanced rainfall prediction models that address contemporary climate challenges, especially in tropical countries that lack observations. It presents two hybrid models, CNN2D–BGRU and BRNN–BGRU, which exhibit superior multi-day forecasting capabilities compared to standalone models. This capacity for extended forecasting makes the hybrid models well-suited for real-world applications. Moreover, utilizing NASA POWER datasets for predicting tropical rainfall can offer valuable insights to researchers and stakeholders in tropical regions.
This study successfully develops and evaluates the performance of both standalone and hybrid DL models for rainfall prediction in George Town, Penang. Utilizing the NASA POWER dataset, which includes variables such as rainfall, wind speed, temperature, dew point, relative humidity, and SR, the research involved a thorough preprocessing phase to enhance model performance. The results reveal that the hybrid models BRNN–BGRU and CNN2D–BGRU show improved performance, particularly for 1- and 3-day predictions. Notably, the CNN2D–BGRU model performed best during training, while the BRNN–BGRU model yielded more accurate results on the testing dataset.
Integrating BRNN with BGRU has been shown to improve the accuracy and reliability of short-term rainfall predictions, especially for nonlinear scenarios. However, the study also highlights a decline in accuracy for long-term predictions, which may be due to the increased complexity of assessing additional variables over extended periods. To address these challenges, it is essential to refine models and methodologies. Overall, the research framework utilizing the BRNN–BGRU hybrid model proves effective for both short- and long-term rainfall prediction, providing valuable insights for mitigating the impacts of natural disasters such as floods, droughts, and landslides.
ACKNOWLEDGEMENTS
We extend our gratitude to NASA for providing the dataset available on the NASA POWER website.
FUNDING
This research was funded by the Ministry of Higher Education Malaysia under the Trans-Disciplinary Research Grant Scheme (TRGS), grant number TRGS/1/2022/USM/02/3/1.
AUTHOR CONTRIBUTIONS
A.S. and M.L.T. conceptualized the study and were responsible for writing, reviewing and editing the article. Z.M.Y. and Z.F. reviewed and edited the article.
DATA AVAILABILITY STATEMENT
NASA POWER data are available at https://power.larc.nasa.gov/data-access-viewer/
CONFLICT OF INTEREST
The authors declare there is no conflict.