This study aims to provide an efficient and accurate model by comparing the tree-based machine learning approach and the global prediction model with the European Center for Medium Weather Forecast (ECMWF) model for predicting long-term rainfall. Light gradient boosting (LGB) and regression tree (RT) tree-based machine learning algorithms are utilized in this study and compared with the global model. Local metrological parameters such as relative humidity, dew point temperature, minimum temperature, maximum temperature, wind speed, convective available potential energy, and sunshine and large-scale climate variable (sea surface temperature) were used as input during model development. Initially, the database was preprocessed and then partitioned into a training set and a testing set. GridsearchCV technique was used for tuning the parameters of the models. For daily rainfall variation, LGB exhibits strong performance with the highest coefficient of determination (R2 = 0.991; 0.996), lowest root mean squared error (RMSE = 1.14 mm; 0.383 mm), lowest mean squared error (MSE = 1.992; 0.146), and lowest mean absolute error (MAE = 0.899 mm; 0.302 mm) for daily and monthly time scales. For both temporal variations, the LGB model shows significantly higher accuracy than both RT and ECMWF. Relative humidity is the most influential meteorological parameter for rainfall prediction identified by the important random forest (RF) feature with a value of 0.4129. An agricultural decision support system that is still in development will incorporate the suggested models in Ethiopia.

  • The work is new and novel.

  • It gives a model for comparing the machine learning models and the global model that have not been compared yet for rainfall prediction.

  • The study uses new input meteorological parameters that are not yet used.

Problems related to climate change and global warming are worldwide concerns. One important aspect of weather change is rainfall, which is the most important meteorological parameter throughout Africa. This is due to the fact that rain-fed agriculture makes up the majority of the continent's economy and that rainfall has a high impact on crop production in many places. Unusually high or low rainfall can cause floods or droughts, respectively, with catastrophic effects on the environment and human welfare (Diro et al. 2008; Hooshyaripor et al. 2020). In many areas of the continent, the current gauge network is unable to give timely or sufficient information regarding the pattern of rainfall because of unequal distribution, missing data, and sparse observations. This leads to the use of global numerical models like European Center for Medium Weather Forecast (ECMWF) (Dinku et al. 2007; Diro et al. 2009; Koutsouris et al. 2016; Olaniyan et al. 2018). It was found that they perform less well in terms of capturing the observed long-term trends and are unable to predict extreme rainfall phenomena in many regions of the African sector (Koutsouris et al. 2016; Olaniyan et al. 2018; Lemma et al. 2019). Additionally, local biases are present in the ECMWF model that tends to estimate the rainfall in Ethiopia (Diro et al. 2009; Gleixner et al. 2020). These studies highlighted the shortcomings of this global model in the African region. Several neural network-based models have been developed in the African region to fill this gap (Endalie et al. 2022; Ojo & Ogunjo 2022; Abebe & Endalie 2023). The results of these investigations revealed that, in comparison to the global models, neural network (NN)-based models show great promise in capturing the overall dynamics of rainfall changes. In contrast to other machine learning methods, this does not mean that NN models always produce accurate estimation in each scenario. Because NN algorithms require a lot of data to fully utilize their potential, they often overfit small datasets (Piotrowski & Napiorkowski 2013; Kim et al. 2024b).

Recent studies have highlighted the application of machine learning and advanced modeling techniques in environmental prediction and management. For instance, Pandey et al. (2021) applied ensemble machine learning models to predict scour depth and assess sediment dynamics around spur dikes, while Basha et al. (2024) used the InVEST model to evaluate the impact of land use changes on water yield in India. Kim et al. (2024b) modeled surface water temperature dynamics in Arctic lakes using machine learning, Mahdian et al. (2024) analyzed environmental changes in the Anzali Wetland in Iran, focusing on the effects of climate variability on ecosystems and Endalie et al. (2022) modeled rainfall using the machine learning models. These approaches, particularly in hydrology and ecology, can be leveraged for modeling rainfall and other climatic factors in regions like Ethiopia. In recent years, future weather estimation, like rainfall, will heavily rely on tree-based machine learning techniques (Kumar et al. 2023). They are appropriate for problems that are too complex or large for global model approaches. Recently, numerous technological and scientific sectors, including atmospheric weather research, have shown great interest in machine learning techniques (Geetha & Nasira 2014) with a focus on modeling the nonlinear relations, like rainfall. In response to technology advancement, birth of light gradient boosting (Microsoft 2016; Ke et al. 2017) (LGB; hereafter) and regression tree model (Breiman 2017) (RT; hereafter), and dynamics of the rainfall scenario, atmospheric science community models predict rainfall using machine learning techniques beyond NN models and the global model around the world and African sector. For example, studies by other authors (Appiah-Badu et al. 2021; Misra et al. 2021; Ridwan et al. 2021; Zhou et al. 2021; Kim et al. 2022; Monego et al. 2022; Yirga 2023) modeled rainfall using machine learning approaches. Their findings have demonstrated that machine learning techniques are efficient in modeling rainfall, both with large and small amounts of data. Additionally, tree-based algorithms are not expensive because they do not necessitate a significant number of computational resources and training time compared to the NN algorithms (Ramsundram et al. 2016; Bentéjac et al. 2021; Kim et al. 2024a). As a result, machine learning algorithms are relatively simple to optimize compared to the NN techniques.

In previous studies conducted in the African region, machine learning methodologies other than tree-based algorithms were primarily used for modeling rainfall focus on short-term data, using small scale and local metrological input parameters. Even the study by Endalie et al. (2022) in Ethiopia on modeling rainfall with limited input parameters and by ignoring large-scale climate indices stressed the importance of global climatic indices like sea surface temperature (SST) in their motivation. Due to the focus on short-term rainfall estimation, the smaller number of local metrological parameters, insufficient studies with large-scale climate indices, and the lower accuracy of models motivate us to look for an alternative method for modeling rainfall variability using large-scale climate indices like SST and local meteorological indices including Convective Available Potential Energy (CAPE) as input parameters. Due to the hotness of the issue and by reviewing the previous works, it is clear that not enough research has been done on evaluating different machine learning models for long-term rainfall estimation. It has also been observed that no studies have compared tree-based machine learning and global models for rainfall estimation. This gap has motivated the current research, which aims to explore the development of a new feasible model using local input meteorological parameters and large-scale climatic indices for rainfall estimation and modeling.

Consequently, this study has been conducted to test the effectiveness of tree-based algorithm models in rainfall estimation using the LGB and RT models and is compared with the existing global numerical weather prediction model, called ECMWF hereafter, to identify the reliable model for accurate rainfall prediction.

Study area and data

Our study area focuses on the Bahir Dar sector which is near to the Lake Tana Basin, the largest lake in Ethiopia (Figure 1). Its geographical location is 11.57 latitude and 37.36 longitude. Bahir Dar City is one of the largest and most rapidly expanding cities in Ethiopia. It is the political, economic, and cultural center of Amhara National Regional State (ANRS), the second-most populous region in the country. In addition, the city is one of the major tourist centers in the country because of its cultural heritage (such as the Lake Tana Monasteries and religious festivals) and natural attractions (such as the Blue Nile Falls, birds, and hippos). The area is also characterized by rich biodiversity and recognized as a Biosphere Reserve by the UNESCO. According to the data from the Central Statistics Agency (Ethiopia 2013) in 2017, about 350,000 people live in the city.
Figure 1

The study area.

Figure 2

Methodology used in this study.

Figure 2

Methodology used in this study.

Close modal

In Bahir Dar, the agricultural sector can be characterized as segmented, highly dependent on rainfall, and lacking permanent watercourses. This is also true for horticulture, agro-industrial processing, urban agriculture, manufacturing, and a variety of service businesses. On the other hand, the main economic activity in Bahir Dar is tourism, which attracts large numbers of the population. Furthermore, the diverse range of meteorological conditions in Bahir Dar renders rainfall estimation a crucial matter. When combined with its temporal estimation, it forms a crucial component for making informed decisions, mitigating potential hazards associated with abrupt spikes, and a notable dispersion of the impacted region.

Daily local meteorological data were collected from the Bahir Dar meteorological institute, located in Bahir Dar (latitude: 11.610 N, longitude: 37.380 E, altitude 1,800 m), between the years 1993 and 2022. The collected metrological parameters include maximum temperature (Tmax), minimum temperature (Tmin), relative humidity (RH), wind speed (Wspeed), sunshine (Sunshine), dew point temperature (Tdew) and rainfall (R). Other local metrological parameters are CAPE, one of the large-scale climate indices SST, and hourly datasets from the years 1993 to 2022, from ECMWF fifth-generation reanalysis, ERA-5. It provides hourly data with a 0.25◦ spatial resolution with hourly intervals and is easy to access via Copernicus' Climate Data Store with (https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-singlelevels?tab=form). The hourly datasets are converted into a daily format using Panda-Python programing software. The summary of meteorological data and large-scale climate variables are given in Table 1. The methodology used for the study is shown in Figure 2.

Table 1

Description of data in the study

ParametersSymbol used in studyUnitsDescription
Relative humidity RH Percentage (%) Saturation vapor pressure to real water vapor ratio 
Dew point temperature Tdew Degree Celsius The temperature at which the air can precisely hold the amount of moisture present is known as the dew point. 
Maximum temperature Tmax Degree Celsius The greatest temperature measured in a certain time period 
Minimum temperature Tmin Degree Celsius The lowest temperature recorded during a specified period of time 
Wind speed Wspeed m/s The speed of wind 
Convective available energy CAPE J/kg Estimate of the occurrence of convective rainfall situations 
Sunshine Sunshine Hour Hours of shine in a day 
Sea surface temperature SST Degree Celsius The temperature of the water near the surface of the ocean 
Rainfall Rainfall Millimeter The amount of water that precipitates in liquid form over a given region and time period; often measured in millimeters or inches 
ParametersSymbol used in studyUnitsDescription
Relative humidity RH Percentage (%) Saturation vapor pressure to real water vapor ratio 
Dew point temperature Tdew Degree Celsius The temperature at which the air can precisely hold the amount of moisture present is known as the dew point. 
Maximum temperature Tmax Degree Celsius The greatest temperature measured in a certain time period 
Minimum temperature Tmin Degree Celsius The lowest temperature recorded during a specified period of time 
Wind speed Wspeed m/s The speed of wind 
Convective available energy CAPE J/kg Estimate of the occurrence of convective rainfall situations 
Sunshine Sunshine Hour Hours of shine in a day 
Sea surface temperature SST Degree Celsius The temperature of the water near the surface of the ocean 
Rainfall Rainfall Millimeter The amount of water that precipitates in liquid form over a given region and time period; often measured in millimeters or inches 

Models and technique

RT model

The decision tree algorithm is used for both classification and regression. The purpose of RTs is to predict continuous values, and the sum of squared differences between predicted and actual values is used to determine how accurate the predictions are. The RT model consistently conducts binary partitions for each parameter of the dataset at each level until the prescribed maximum depth is reached (Rokach & Maimon 2005; Liang et al. 2022). In this current study, the problem is a regression problem, and thus, an RT is used for the study since our target variable (rainfall) is continuous.

LGB model

By creating an exclusive feature binding (EFB) operator and gradient-based one-side sampling (GOSS) to increase efficiency, LGB was further improved to address the computational cost issue. To be more precise, EFB can simplify the feature space by binding the mutually exclusive features, whereas GOSS down-samples the sample instances and arbitrarily eliminates those with tiny gradients according to Tang et al. (2021).

The LGBM algorithm is based on decision trees; therefore, the formulation of the model is as follows (Sibindi et al. 2023). Given a training dataset where n is the sample with m features. To find the estimation, the decision tree predictions are combined as follows:
(1)
where the number of trees is p with as the trees. The goal is to minimize the objective function below to obtain.
(2)
The loss function is L and the regularization parameter is given by
(3)
where and are the penalty parameters for T leaves and weight of leaves w. Taking L as a loss function which is a squared error, then
(4)
where the residual r is fitted to obtain . The function for minimizing the objective function at iteration p is defined using a quadratic approximation as
(5)
Through minimizing the objective function, a new tree is obtained. Each node with the biggest information gain is divided by the decision tree. The variance gains for a node that separates feature j at point s is given by
(6)
where denotes samples on the decision tree fixed node. , and. The decision tree selects for each feature j and computes the highest gain. The data will be split into right and left nodes according to feature at point. All samples are scanned to find the optimal splitting point in order to calculate the information gain.

ECMWF model–global model

The global atmospheric reanalysis product known as ECMWF is created by fusing observational data from satellites and ground observations with a numerical weather prediction model. The data utilized in the present study come from the ERA-5 ECMWF reanalysis product (Gleixner et al. 2020) which provides hourly meteorological conditions back to 1979. This version of ECMWF reanalysis is based on the Integrated Forecasting System (IFS) and includes a four-dimensional vibrational analysis (4D-Var). ERA5 data are available in higher spatial as well as temporal resolution. ERA5 data are available on a 0.25◦ grid with hourly intervals. The number of observational datasets that serve as input for the assimilation system was increased and a major difference is the consideration of satellite estimates of rainfall in ERA5. For this study, an ERA5 reanalysis rainfall dataset (0.25° × 0.25°, hourly), spanning January 2017 to December 2022, was downloaded from https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-singlelevels?tab=form, and then the hourly precipitation was converted to a daily scale.

Development of the models

A 29-year (1993–2022) daily interval of meteorological data has been used to train and test models. The data are typically divided into two datasets. No universal guideline exists for the preparation of training and testing datasets in both spatial and time series estimations. It is recommended to divide the complete dataset into two sub-datasets for training (80%) and testing (20%) (Khosravi et al. 2018).

In this study, modeling was performed using the data from January 1993 to December 2016: (80%) of the data was used as the training dataset for model building while the data from January 2017 to December 2022 (20%) served as the testing dataset for model development. The experiments consisted of developing models for rainfall estimation over Bahir Dar, Ethiopia using two different approaches, RT and LGB machine learning techniques. The database was then divided into training and testing subsets.

Model hyperparameter adjustment

Hyperparameters are a set of parameters in machine learning algorithms that need to be adjusted specifically for a given learning problem because they cannot be predicted from data. The ideal values of hyperparameters might vary depending on the data and the problem; they are often found through testing different combinations and evaluating each model's performance. This is a parameter optimization to select the most suitable set of parameters (Wu et al. 2019) using a trial-and-error process until the best estimation score is obtained. The model's performance largely depends on the optimization of hyperparameters (Diez-Sierra & Del Jesus 2020). The study by Ridwan et al. (2021) found that without tuning, the model (boosting and decision tree regression) performed poorly, but when tuned, the accuracy of the model noticeably increased. The optimization processes in this study demonstrate that the models tend to gain the minimizing errors, and the relationship between hyperparameters and model accuracy is not linear for all ML models. Careful consideration is required to tune the hyperparameters. Enlarging the hyperparameters may not improve the ML accuracy. Both accuracy and computational cost need to be considered in model development. Maximum tree depth (max depth), number of trees used in ensemble learning (n estimators), fraction of samples to fit individual base learners (subsample), learning rate to minimize the gradient step (learning rate), and maximum number of leaves in each weak learner (num leaves) are the optimal hyperparameters. The range of looked-for values and the selected optimum hyperparameters are displayed in Table 2.

Table 2

Optimal hyperparameters of the RT and LGB models

ModelSelected hyperparametersRange of search
Regression tree model max_depth = 18
min_sample_split = 8
min_sample_split_leaf = 4 
[4,5,6,8,12,14,16,18,20]
[2,4,5,6,8,10]
[2,4,6] 
Light gradient boosting model num_leaves = 160
learning rate = 0.0988
max_depth = 48
n estimators = 160 
Num_leaves = 100–180
Learning rate = 0.03–0.1
max_depth = 18–54
n estimators = 120–200 
ModelSelected hyperparametersRange of search
Regression tree model max_depth = 18
min_sample_split = 8
min_sample_split_leaf = 4 
[4,5,6,8,12,14,16,18,20]
[2,4,5,6,8,10]
[2,4,6] 
Light gradient boosting model num_leaves = 160
learning rate = 0.0988
max_depth = 48
n estimators = 160 
Num_leaves = 100–180
Learning rate = 0.03–0.1
max_depth = 18–54
n estimators = 120–200 

A GridsearchCV method was implemented to optimize the hyperparameters, identifying the combination that yielded the most accurate results. GridSearchCV is a machine learning technique that uses a methodical search process to determine the best hyperparameters for a given algorithm. It helps to identify the hyperparameter sweet spot of a model for optimal performance. Because GridSearchCV employs cross-validation to evaluate model performance after attempting every possible set of hyperparameters, it is incredibly effective (Kartini et al. 2021). Cross-validation can be used to evaluate models on several subsets of the training data in order to prevent overfitting. More parameters were adjusted using Python GridSearchCV software, and the output was utilized to train the models and produce precise predictions.

To enhance the prediction accuracy and prevent overfitting, we employed GridSearchCV for hyperparameter tuning, utilizing k-fold cross-validation as the evaluation method. This strategy ensures that the model is robust and generalizes well to unseen data by systematically evaluating different combinations of hyperparameters across various subsets of the training data; hence, GridSearchCV is a method that exhaustively searches through a specified hyperparameter grid, trying all possible combinations of hyperparameters to find the set that yields the best performance based on a chosen evaluation metric. In this study, we used k-fold cross-validation, a widely used technique that helps to mitigate the risk of overfitting by partitioning the training data into k subsets (or ‘folds’). For each combination of hyperparameters in the grid, the model is trained on k − 1 folds and evaluated on the remaining fold, iterating this process until every fold has been used for validation. The final model performance is averaged over all folds, which provides a more reliable estimate of how the model will perform on unseen data.

For model evaluation, four statistical indicators—mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and coefficient of determination (R2)—were selected. They were also used to evaluate the robustness of the models. These model effectiveness parameters are utilized to determine the performance of models (Abebe & Endalie 2023).
(7)
(8)
(9)
(10)
where is the observed/measured value, is the predicted value, and is the averaged value at time step t.

The correlation of rainfall and input parameters

This section starts by indicating the correlation of input data from 1993 to 2016 that will be used for model development. The model is intended to be fed with as much data as possible to facilitate the identification and learning of meaningful patterns. Because one of the most essential phases in developing a predictive model is choosing the input data, it has a big impact on model performance (Ghorbani et al. 2016). The data obtained may contain many attributes or variables, which may or may not be related to the dependent variable. Therefore, for the most accurate analysis of dependent variables, only attributes related to dependent variables should be selected as the input models. This study used metrological data for the rainfall estimation based on Pearson's correlation. The correlation matrix between the input and the output data can be used to confirm these qualities. Finding the features that are most associated with the target variable is simple with a heatmap. To describe the metrological variables that correlate with rainfall and are relevant for rainfall estimation, a rectangular correlation heat map was used (Salaeh et al. 2022), which shows the Pearson correlation analyzed on the variables presented.

The correlation values of all the features are shown in Figure 3 and it reveals that rainfall has a strong positive correlation with the following parameters: Tdew (94%), Tmin (91%), Wspeed (91%), Tmax (85%), and RH (82%); a weak positive correlation with CAPE (1.3%); and a negative correlation with SST (2.3%) and sunshine (1.5%). Hence, great importance has been given to dew point due to the fact that it is seen as the measure of the state of the atmosphere in as much as it tells how much water vapor is present in the air that condenses into water droplets. The reason for the positive correlation between wind and rain is that wind carries an amount of moisture in it which can highly affect the amount of precipitation in an area. Faster winds and precipitation are strongly correlated in nature where faster winds cause rain, showing major significance on daily rainfall variability. RH is the main factor in cloud formation, resulting in rainfall. RH is a key variable for cloud and rainfall (Ackerman et al. 2004). It has also been reported that rainfall strongly increases with RH in different regions of the world (Hardwick Jones et al. 2010; Rushley et al. 2018). In the context of Ethiopia, RH is a key driver in determining rainfall, as it is directly related to the moisture available in the atmosphere for precipitation. RH is an important feature in atmospheric moisture estimation and precipitation forecasting. High levels of RH indicate that the atmosphere is saturated with moisture, which is crucial for the formation of clouds and precipitation. In Ethiopia, especially during the rainy seasons, increased RH supports the formation of convective systems that generate rainfall. These convective systems are important drivers of rainfall variability in tropical regions, including parts of Ethiopia.
Figure 3

Correlation heat map of features with rainfall.

Figure 3

Correlation heat map of features with rainfall.

Close modal

Several studies have confirmed the importance of RH in predicting rainfall. In tropical and subtropical regions, high humidity is closely linked to cloud formation and the likelihood of precipitation. For instance, Taye et al. (2021) demonstrated that RH is a strong predictor of rainfall in East Africa, particularly in regions like Ethiopia, where the main rainy season (kiremt) is characterized by high moisture content in the atmosphere. Similarly, Seleshi & Zanke (2004) highlighted the critical role of atmospheric moisture, often measured by RH, in determining the onset and intensity of the rainy seasons in Ethiopia.

A significant positive relationship exists between daily minimum temperature and rainfall, whereas a less significant relationship is observed between maximum temperature and rainfall. It is assumed that there is a correlation between rainfall and temperature changes over months. According to the study by Cong & Brady (2012), temperature and rainfall were positively correlated during January and May but negatively correlated in July. Moreover, little is known about the correlation between these three types of variables (minimum temperature, maximum temperature and rainfall) in Ethiopia, particularly in this study area. Therefore, the present study shows the existence of their correlation. CAPE weakly positively correlates with rainfall which is a crucial indicator of the meteorological conditions needed for convective precipitation events to occur (Seeley & Romps 2015). It is identified to be among the important correlated variables. This is consistent with the previous studies showing that CAPE is an important indicator of extreme precipitation in the eastern US (Lepore et al. 2015; Gizaw et al. 2021). For large-scale climate indices, correlations between rainfall and SST were negatively weak (r= −0.015). Positive SST results in reduced rainfall or drought (El Niño). In contrast, negative SST results in high rainfall (La Niña) (Kirtphaiboon et al. 2014). Another weakly correlated variable, sunshine duration, is an important indicator of the amount of solar radiation received in a region; r= −0.23 with rainfall.

Relative feature importance for rainfall estimation

The variable selection or feature/attribute selection in machine learning, identifying and selecting a subset of relevant features from a large set of variables, is a crucial step to improve the model's performance. Feature importance analysis is a useful procedure in the machine learning community, as it can guide model development by focusing on important variables (Zhou et al. 2021). In this work, the Random Forest (RF) technique was employed to determine the importance of each feature for rainfall estimation as supported by recent studies on environmental data modeling that underscore RF's effectiveness in quantifying variable impacts (Kim et al. 2024a; Mahdian et al. 2024). The RF model was built and trained daily data, and the significance of the variables was determined. The use of RF is to assess the impact of modifying the variables in a particular model's capacity of estimation and to quantify the relative importance of the variables in that model.

Figure 4 depicts the importance of the input variables as predictors of rainfall by using the RF model. Among all features, RH ranks first and was determined to be the most significant parameter, followed by Tdew, Tmax, CAPE, Tmin, WS, sunshine, and SST with mean importance scores of 0.4129, 0.2234, 0.0824, 0.0721, 0.0668, 0.0600, 0.0575, and 0.0245, respectively. It confirms the findings reported by a previous study by Zhou et al. (2021), which found that RH ranks first, mean pressure and sunshine duration rank second and third, and minimum temperature ranks fourth.
Figure 4

Relative importance of feature variables for rainfall estimation.

Figure 4

Relative importance of feature variables for rainfall estimation.

Close modal

Performance of models for temporal rainfall variation

This section examines the observed rainfall variability with test data over the region and shows how our developed models are reliable to show the temporal variability of rainfall. Moreover, the section also consists of intercomparison of the LGB, RT, and ECMWF models.

Estimating daily rainfall

Figure 5 shows the performance of tree-based machine learning and global model to predict daily rainfall variation. To predict daily rainfall, LGB, RT and the global model called ECMWF are utilized in order to predict and select a feasible model for the daily rainfall estimation over the region. These models are compared to each other based on their performance, feasibility, and accuracy in examining the observed rainfall. Their accuracy was examined using statistical model evaluation metrics. The performance of the LGB, RT, and ECMWF models was expressed with four main statistical indicators: R2, RMSE, MSE, and MAE, which were significantly considered.
Figure 5

Scatter plot of estimated versus measured daily rainfall values for the testing phase (marked black dot) with the linear fit (blue line) using the following: (a) LGB model; (b) RT model; and (c) ECMWF model.

Figure 5

Scatter plot of estimated versus measured daily rainfall values for the testing phase (marked black dot) with the linear fit (blue line) using the following: (a) LGB model; (b) RT model; and (c) ECMWF model.

Close modal

The plots of Figure 5 in the panels are LGB, RT, and ECMWF from left to right in order. The R2 scores for LGB, RT, and ECMWF models are 0.991, 0.874, and 0.778, respectively. The scatter plots depict that the predicting ability of the rainfall models agrees with the observed values. However, their degrees of agreement are different. The LGBM model agrees the most. The RT model agrees better than that of the ECMWF model but less than LGBM. The LGB model is a strong candidate to predict rainfall in this temporal rainfall variation.

To investigate whether there are systematic deviations of a model's estimation from the observed daily rainfall variation, Figure 6 presents the variance distribution of daily deviation probability for LGB, RT and ECMWF models from left to right in order. The statistical error metrics of LGB, RT and ECMWF models with RMSE in mm (1.411, 7.31, and 7.892) is also given. The LGB model had the lowest RMSE value (1.411 mm) making it the most accurate in predicting daily rainfall variation. In contrast, the ECMWF model had the highest RMSE (7.892 mm). The LGB model returned the smallest value (MAE = 0.899 mm), indicating that the deviation between predicted and measured values was also the smallest. The ECMWF model's MAE was the highest (MAE = 4.306 mm), indicating that it had the greatest estimation bias. The findings also revealed that the MSE in mm of the LGB, RT and ECMWF models was 1.992, 53.44, and 62.28, respectively. As the aforementioned numerical values, the MSE of LGBM (1.992 mm) is smaller than that of both the RT (53.44 mm) and ECMWF models (62.28), which implies that LGB performs better than the other models. The ECMWF model shows relatively poor performance. The LGB strong, RT models had relatively good predictive ability, whereas the ECMWF model performed badly, according to the results of the daily data estimations. For further verification of the performance of each model, their daily statistical evaluation matrices values are displayed in Table 3.
Table 3

Daily statistical values of each model

LGBMRTECMWF
R2 0.9910 0.874 0.778 
MSE (mm) 1.992 53.44 62.28 
RMSE (mm) 1.411 7.31 7.892 
MAE (mm) 0.899 4.27 4.306 
LGBMRTECMWF
R2 0.9910 0.874 0.778 
MSE (mm) 1.992 53.44 62.28 
RMSE (mm) 1.411 7.31 7.892 
MAE (mm) 0.899 4.27 4.306 
Table 4

Statistical criteria of three models for the monthly rainfall scale

Monthly rainfall variation
LGBRTECMWF
R2 0.996 0.940 0.925 
RMSE (mm) 0.383 1.541 1.727 
MSE (mm) 0.146 2.375 2.982 
MAE (mm) 0.302 0.838 1.166 
Monthly rainfall variation
LGBRTECMWF
R2 0.996 0.940 0.925 
RMSE (mm) 0.383 1.541 1.727 
MSE (mm) 0.146 2.375 2.982 
MAE (mm) 0.302 0.838 1.166 
Figure 6

Histogram of error distribution of daily rainfall for the testing phase using the following: (a) LGB model, (b) RT model and bottom panel, and (c) ECMWF model.

Figure 6

Histogram of error distribution of daily rainfall for the testing phase using the following: (a) LGB model, (b) RT model and bottom panel, and (c) ECMWF model.

Close modal

Comparison of models performance for daily – trend of rainfall variation

In this section, the daily trend variation of rainfall was examined and compared using the results obtained from the LGB, RT, and ECMWF models (Table 4). The trend analysis is focused on understanding the differences between observed and model results. In Figure 7, the observed and predicted daily rainfall values for testing data are plotted to illustrate the capability of the model in capturing the trends of day-to-day time series variability. As can be seen, the modeled rainfall values closely follow the observed values, which proves that the developed model is consistent with the observed rainfall pattern. A close inspection of these figures shows that the LGB model with optimal parameters predicts rainfall very similar to the observed values, followed by RT and ECMWF. ECMWF has poor capability. It is seen from the figures that the predicted rainfall follows a typical daily variability for the study region. The machine learning models (green and purple lines in the time series) closely follow the observed rainfall (black line in the time series), suggesting improvements in RMSE, MAE, MSE, and R2 over ECMWF.
Figure 7

Time series plot of daily rainfall variation using the LGB, RT, and ECMWF models.

Figure 7

Time series plot of daily rainfall variation using the LGB, RT, and ECMWF models.

Close modal

However, once larger rainfall accumulation amounts are attained, the model's ensemble members' disagreement could escalate, making it less accurate in predicting extreme precipitation. This result is similar to that of the study by other authors (Nguyen et al. 2018; Olaniyan et al. 2018). These results indicate that the LGBM and RT models generally have a better capability of predicting rainfall for less rainy seasons rather than that for wet seasons. The result is also supported by Zhou et al. (2021). The ECMWF model has relatively less skill to capture the variation of rainfall because it covers a wider area and a longer time span. This model generally runs at a lower resolution, both spatially (fewer forecast points per given area) and temporally (fewer time points get a forecast). Generally, the LGB and RT models have better agreement with observed rainfall than the ECMF models.

Comparison of model performance for monthly rainfall variation

This section aims to show the feasibility of using LGB, RT tree-based models and the ECMWF global model for monthly rainfall variation. The monthly predicted and measured values of rainfall variation for these models are shown in a scatter plot in Figure 8. As shown in the figure, the monthly rainfall models simulated using the LGB, RT, and ECMWF models have R2 values of 0.996, 0.940, and 0.925, respectively. According to the given result, the LGB model is a stronger candidate than the other two models. The best result was achieved by LGB with R2 = 0.996, followed by RT with R2 = 0.940. However, the ECMWF model has the least accurate predictions of monthly rainfall with an R2 value of 0.925 compared to that of both LGB and RT models.
Figure 8

Scatter plot of estimated versus measured monthly rainfall values for the testing phase (marked black dot) with the linear fit (blue line) using the following: (a) LGB model, (b) RT model, and (c) ECMWF model.

Figure 8

Scatter plot of estimated versus measured monthly rainfall values for the testing phase (marked black dot) with the linear fit (blue line) using the following: (a) LGB model, (b) RT model, and (c) ECMWF model.

Close modal
Figure 9 shows the statistical model evaluation metrics of the LGB model with RMSE = 0.383 mm, MAE = 0.302 mm, and MSE = 0.146 mm; the RT model with RMSE = 1.541 mm, MAE = 0.838 mm, and MSE = 2.375 mm; and the ECMWF model with RMSE = 1.727 mm, MAE = 1.166 mm, and MSE = 2.375 mm. As per the aforementioned numerical values, the RMSE, MAE, and MSE of the LGBM model are smaller than those of both the RT and ECMWF models, which imply that the LGB model performs better than the other models. This means that the estimating ability of the LGB model is better than that of the other models, whereas RT is second, and ECMWF is the least because of RMSE, MAE, and MSE of ECMWF > RMSE, MAE, and MSE of RT > RMSE, MAE, and MSE of LGB.
Figure 9

Histogram of error distribution of monthly rainfall for the testing phase using the following: (a) LGB model, (b) RT model and bottom panel, and (c) ECMWF model.

Figure 9

Histogram of error distribution of monthly rainfall for the testing phase using the following: (a) LGB model, (b) RT model and bottom panel, and (c) ECMWF model.

Close modal

At the monthly scale, most of the R2 values for each model are larger. This shows that the estimating ability of the LGB model has better agreement with the measured or observed values than the daily time scale, which is shown by the good fit between predicted and observed rainfall. The amount of rainfall variation estimated by the tree-based machine learning model captures the trend of the observed monthly variation in the rainfall rate. Our model estimation almost agrees with the observed rainfall variation, and ECMWF also agrees well with the observed rainfall variation in this temporal variation case.

The optimal model for rainfall modeling using all the performance criteria is found to be LGB. This model has one of the lowest RMSE, MAE, and MSE, and the highest R2 compared to that of the RT and ECMWF models. ECMWF has relatively high RMSE, MAE, MSE and low R2 values. The LGB model has been found to be the most generalizing and accurate in rainfall modeling.

Our study describes that the monthly time scale modeling has better agreements with the observed effects than the daily time scale modeling, which is shown by the good fits between modeled and observed plots. This result is also confirmed by the study (Nguyen et al. 2018).

Comparison of models performance for monthly trend of rainfall variation

This section shows the average monthly rainfall pattern/trend and the feasibility of our model to capture the monthly trends of rainfall over the Bahir Dar sector. While the highest average rainfall at Bahir Dar is in July, rainfall increases gradually from May at 4.5 mm and reaches its maximum in July at 18.5 mm. This has been the general rainfall pattern in Bahir Dar Station, and the maximum rainfall occurs once a year. The rainfall pattern at the Bahir Dar Station can be referred to as a uni-modal rainfall distribution (Elvis et al. 2015). The results from the model and the observed data confirm that stations with a uni-modal rainfall distribution have rainfall periods ranging from May to September, with July and August recording the highest amount of rainfall.

As shown in Figure 10, the LGB model shows a similar rainfall pattern to that of the observed data but grossly underestimates the amount of rainfall in all the months except for June, September, and October. Similarly, the RT model shows a similar rainfall pattern to that of the observed data but grossly underestimates the amount of rainfall in all the months except for May, June, and October. As suggested by Afiesimama et al. (2006), the differences (overestimation and underestimation) in the model results and those observed may be due to the coarse nature of the horizontal grid of the model to accurately simulate the mesoscale systems and nature of the training data.
Figure 10

Trend of the average monthly rainfall of years 2017–2022 over Bahir Dar using various models.

Figure 10

Trend of the average monthly rainfall of years 2017–2022 over Bahir Dar using various models.

Close modal

It is justified that LGB performed best in June, July, August, and September. RT performs well in rainy months of October, December, June and May. Both the LGB and RT models are good candidates for rainy month rainfall prediction. ECMWF underestimates the amount of rainfall in all the months. The ECMWF model was insignificant because it grossly underestimated the amount of rainfall in all the months. The ECMWF reanalysis model generally had the largest discrepancies when compared with the other models. This is due to the fact that several algorithms and models based on multiple wavelengths have been developed to derive rainfall estimates. Nevertheless, it is essential to note that rainfall estimates derived from the ECMWF model are indirect and are inevitably accompanied by a large degree of variability. For instance, many previous studies have indicated that reanalysis-based models generally have difficulty in representing rainfall in areas with complex topography in which rainfall is controlled by orography and characterized by high spatiotemporal variability (Sun et al. 2018).

Overall, the outcome of this study suggests that the LGBM and RT models have the potential to be suitable for the study of rainfall variability and trend studies over Bahir Dar. The ability of this model to accurately predict rainfall variability from the observed data can play a vital role even in water resources management in Bahir Dar since the observed data are generally sparse and riddled with gaps.

This study builds two tree-based machine learning models (i.e., RT and LGB Machine) for the estimation of rainfall, based on local input metrological parameters (RH, dew point temperature, minimum temperature, maximum temperature, wind speed, CAPE, and sunshine) as well as large-scale climate variables such as SST. First, the given data from NMI is preprocessed and fed into models with split into a training set and a testing set. Then, through GrideserchCV, the machine learning models are tuned to achieve high accuracy. Tree-based machine learning models have achieved high accuracy in the testing set (R2 > 0.874), which largely outperforms the broadly used global model, ECMWF (R2 = 0.778) for daily rainfall variation. Similarly, for monthly variations, LGB and RT achieve high accuracy in the testing set (R2 > 0.940), and outperform the broadly used global model, ECMWF (R2 = 0.925). Compared with RT and ECMWF, LGBM can achieve significantly higher accuracy for both temporal rainfall variations: the R2 of LGB, RT, and ECMWF is 0.991, 0.874, and 0.778, respectively, for daily variation, while for their monthly variation, it is 0.996, 0.940, and 0.925, respectively. A strong correlation between rainfall and the dew point temperature (Tdew), as well as minimum temperature (Tmin), wind speed (Wspeed), RH (RH), and maximum temperature (Tmax) use the heat map method. Their range of correlation is within (r= 0.82–0.94). On the other hand, CAPE is weakly positive and SST and sunshine (sunshine) are weakly negative, and these are the relevant atmospheric features that correlate with rainfall. Their range of correlation is also within the range of (r= − 0.015 to 0.013). The RF model is used for feature important analysis, and it produces the parameters such as RH and dew point temperature (Tdew) with RF scores of 0.4129 and 0.2234, respectively. This feature important analysis is consistent with common knowledge about factors that influence rainfall, which validates the feasibility of the proposed RF model for feature important analysis. For future work, the use of additional datasets and exploring other places and locations throughout Ethiopia would be a good option for building a strong model.

The authors are thankful to the National Meteorology Institute (NMI) of Ethiopia for providing their observational data on the Bahir Dar sector. The authors also acknowledge the Bahir Dar University institutional respiratory website for archiving the thesis part of our work (http://ir.bdu.edu.et/handle/123456789/15484#:∼:text=http%3A//ir.bdu.edu.et/handle/123456789/15484).

The authors declare that no funds, grants, or other support were received during the preparation of this article.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ackerman
A. S.
,
Kirkpatrick
M. P.
,
Stevens
D. E.
&
Toon
O. B.
(
2004
)
The impact of humidity above stratiform clouds on indirect aerosol climate forcing
,
Nature
,
432
(
7020
),
1014
1017
.
Afiesimama
E. A.
,
Pal
J. S.
,
Abiodun
B. J.
,
Gutowski
W. J.
&
Adedoyin
A.
(
2006
)
Simulation of West African monsoon using the RegCM3. Part I: model validation and interannual variability
,
Theoretical and Applied Climatology
,
86
(
1–4
),
23
37
.
https://doi.org/10.1007/s00704-005-0202-8
.
Appiah-Badu
N. K. A.
,
Missah
Y. M.
,
Amekudzi
L. K.
,
Ussiph
N.
,
Frimpong
T.
&
Ahene
E.
(
2021
)
Rainfall prediction using machine learning algorithms for the various ecological zones of Ghana
,
IEEE Access
,
10
,
5069
5082
.
Basha
U.
,
Pandey
M.
,
Nayak
D.
,
Shukla
S.
&
Shukla
A. K.
(
2024
)
Spatial–temporal assessment of annual water yield and impact of land use changes on Upper Ganga Basin, India, using InVEST model
,
Journal of Hazardous, Toxic, and Radioactive Waste
,
28
(
2
),
04024003
.
Bentéjac
C.
,
Csörgő
A.
&
Martínez-Muñoz
G.
(
2021
)
A comparative analysis of gradient boosting algorithms
,
Artificial Intelligence Review
,
54
,
1937
1967
.
Breiman
L.
(
2017
)
Classification and Regression Trees
.
New York, NY
:
Routledge
.
Cong
R.-G.
&
Brady
M.
(
2012
)
The interdependence between rainfall and temperature: copula analyses
,
The Scientific World Journal
,
2012
,
405675
.
Dinku
T.
,
Ceccato
P.
,
Grover-Kopec
E.
,
Lemma
M.
,
Connor
S.
&
Ropelewski
C.
(
2007
)
Validation of satellite rainfall products over East Africa's complex topography
,
International Journal of Remote Sensing
,
28
(
7
),
1503
1526
.
Diro
G. T.
,
Grimes
D. I. F.
,
Black
E.
,
O'Neill
A.
&
Pardo-Iguzquiza
E.
(
2008
)
Evaluation of reanalysis rainfall estimates over Ethiopia
,
International Journal of Climatology
,
29
(
1
),
67
78
.
https://doi.org/10.1002/joc.1699
.
Diro
G.
,
Grimes
D.
,
Black
E.
,
O'neill
A.
&
Pardo-Iguzquiza
E.
(
2009
)
Evaluation of reanalysis rainfall estimates over Ethiopia
,
International Journal of Climatology: A Journal of the Royal Meteorological Society
,
29
(
1
),
67
78
.
Elvis
A.
,
Danso
S.
,
Eyra
E.
,
David
A.
,
Selasi
D.
,
Melody
D.
&
Hakii
N.
(
2015
)
Precipitation and rainfall types with their characteristic features
,
J. Nat. Sci. Res
,
5
,
89
92
.
Endalie
D.
,
Haile
G.
&
Taye
W.
(
2022
)
Deep learning model for daily rainfall prediction: case study of Jimma, Ethiopia
,
Water Supply
,
22
(
3
),
3448
3461
.
Ethiopia
C.
(
2013
)
Population Projection of Ethiopia for All Regions at Wereda Level From 2014–2017
.
Addis Ababa
:
Agency CS
.
Geetha
A.
&
Nasira
G.
(
2014
). '
Data mining for meteorological applications: decision trees for modeling rainfall prediction
',
2014 IEEE International Conference on Computational Intelligence and Computing Research
.
Ghorbani
M. A.
,
Zadeh
H. A.
,
Isazadeh
M.
&
Terzi
O.
(
2016
)
A comparative study of artificial neural network (MLP, RBF) and support vector machine models for river flow prediction
,
Environmental Earth Sciences
,
75
,
1
14
.
Gizaw
M. S.
,
Gan
T. Y.
,
Yang
Y.
&
Gan
K. E.
(
2021
)
Changes to the 1979–2013 summer Convective Available Potential Energy (CAPE) and extreme precipitation over North America
,
Physics and Chemistry of the Earth, Parts A/B/C
,
123
,
103047
.
Gleixner
S.
,
Demissie
T.
&
Diro
G. T.
(
2020
)
Did ERA5 improve temperature and precipitation reanalysis over East Africa?
,
Atmosphere
,
11
(
9
),
996
.
Hardwick Jones
R.
,
Westra
S.
&
Sharma
A.
(
2010
)
Observed relationships between extreme sub-daily precipitation, surface temperature, and relative humidity
,
Geophysical Research Letters
,
37
(
22
),
L22805
.
Hooshyaripor
F.
,
Faraji-Ashkavar
S.
,
Koohyian
F.
,
Tang
Q.
&
Noori
R.
(
2020
)
Annual flood damage influenced by El Niño in the Kan River basin, Iran
,
Natural Hazards and Earth System Sciences
,
20
(
10
),
2739
2751
.
Kartini
D.
,
Nugrahadi
D. T.
&
Farmadi
A.
(
2021
). '
Hyperparameter tuning using GridsearchCV on the comparison of the activation function of the ELM method to the classification of pneumonia in toddlers
’,
2021 4th International Conference of Computer and Informatics Engineering (IC2IE)
.
Ke
G.
,
Meng
Q.
,
Finley
T.
,
Wang
T.
,
Chen
W.
,
Ma
W.
,
Ye
Q.
&
Liu
T.-Y.
(
2017
)
Lightgbm: a highly efficient gradient boosting decision tree
,
Advances in Neural Information Processing Systems
,
30
,
3149
3157
.
Khosravi
K.
,
Pham
B. T.
,
Chapi
K.
,
Shirzadi
A.
,
Shahabi
H.
,
Revhaug
I.
,
Prakash
I.
&
Bui
D. T.
(
2018
)
A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran
,
Science of the Total Environment
,
627
,
744
755
.
Kim
H. I.
,
Kim
D.
,
Mahdian
M.
,
Salamattalab
M. M.
,
Bateni
S. M.
&
Noori
R.
(
2024a
)
Incorporation of water quality index models with machine learning-based techniques for real-time assessment of aquatic ecosystems
,
Environmental Pollution
,
355
,
124242
.
Kim
H. I.
,
Kim
D.
,
Salamattalab
M. M.
,
Mahdian
M.
,
Bateni
S. M.
&
Noori
R.
(
2024b
)
Machine learning-based modeling of surface water temperature dynamics in Arctic lakes
,
Environmental Science and Pollution Research
,
31
(
49
),
59642
59655
.
Kirtphaiboon
S.
,
Wongwises
P.
,
Limsakul
A.
,
Sooktawee
S.
&
Humphries
U.
(
2014
)
Rainfall variability over Thailand related to the El Nino-Southern Oscillation (ENSO)
,
Journal of Sustainable Energy & Environment
,
5
,
37
42
.
Koutsouris
A. J.
,
Chen
D.
&
Lyon
S. W.
(
2016
)
Comparing global precipitation data sets in Eastern Africa: a case study of Kilombero Valley, Tanzania
,
International Journal of Climatology
,
36
(
4
),
2000
2014
.
Kumar
V.
,
Kedam
N.
,
Sharma
K.
,
Khedher
K.
&
Alluqmani
A.
(
2023
)
A comparison of machine learning models for predicting rainfall in urban metropolitan cities
,
Sustainability
,
15
(
18
),
13724
.
Lepore
C.
,
Veneziano
D.
&
Molini
A.
(
2015
)
Temperature and CAPE dependence of rainfall extremes in the eastern United States
,
Geophysical Research Letters
,
42
(
1
),
74
83
.
Liang
M.
,
Chang
Z.
,
Wan
Z.
,
Gan
Y.
,
Schlangen
E.
&
Šavija
B.
(
2022
)
Interpretable Ensemble-Machine-Learning models for predicting creep behavior of concrete
,
Cement and Concrete Composites
,
125
,
104295
.
Mahdian
M.
,
Noori
R.
,
Salamattalab
M. M.
,
Heggy
E.
,
Bateni
S. M.
,
Nohegar
A.
,
Hosseinzadeh
M.
,
Siadatmousavi
S. M.
,
Fadaei
M. R.
&
Abolfathi
S.
(
2024
)
Anzali wetland crisis: unraveling the decline of Iran's ecological gem
,
Journal of Geophysical Research: Atmospheres
,
129
(
4
),
e2023JD039538
.
Microsoft
(
2016
)
LightGBM, Light Gradient Boosting Machine
.
Redmond, WA
:
GitHub San Francisco
.
Misra
R. K.
,
Panda
P.
,
Sahu
A.
,
Sahoo
S.
&
Behera
D.
(
2021
)
Rainfall prediction using machine learning approach: a case study for the state of odisha
,
Indian Journal of Natural Sciences
,
10
,
20000
20007
.
Monego
V. S.
,
Anochi
J. A.
&
de Campos Velho
H. F.
(
2022
)
South America seasonal precipitation prediction by gradient-boosting machine-learning approach
,
Atmosphere
,
13
(
2
),
243
.
Ojo
O. S.
&
Ogunjo
S. T.
(
2022
)
Machine learning models for prediction of rainfall over Nigeria
,
Scientific African
,
16
,
e01246
.
Olaniyan
E.
,
Adefisan
E. A.
,
Oni
F.
,
Afiesimama
E.
,
Balogun
A. A.
&
Lawal
K. A.
(
2018
)
Evaluation of the ECMWF Sub-seasonal to seasonal precipitation forecasts during the peak of West Africa Monsoon in Nigeria
,
Frontiers in Environmental Science
,
6
,
4
.
Pandey
M.
,
Jamei
M.
,
Karbasi
M.
,
Ahmadianfar
I.
&
Chu
X.
(
2021
)
Prediction of maximum scour depth near spur dikes in uniform bed sediment using stacked generalization ensemble tree-based frameworks
,
Journal of Irrigation and Drainage Engineering
,
147
(
11
),
04021050
.
Ramsundram
N.
,
Sathya
S.
&
Karthikeyan
S.
(
2016
)
Comparison of decision tree based rainfall prediction model with data driven model considering climatic variables
,
Irrigation and Drainage Systems Engineering
,
5
(
3
),
1
5
.
Ridwan
W. M.
,
Sapitang
M.
,
Aziz
A.
,
Kushiar
K. F.
,
Ahmed
A. N.
&
El-Shafie
A.
(
2021
)
Rainfall forecasting model using machine learning methods: case study Terengganu, Malaysia
,
Ain Shams Engineering Journal
,
12
(
2
),
1651
1663
.
Rokach
L.
&
Maimon
O.
(
2005
)
Decision trees
.
In: Rokach, L. & Maimon, O. (eds.)
Data Mining and Knowledge Discovery Handbook
,
Cham: Springer
,
165
192
.
Rushley
S. S.
,
Kim
D.
,
Bretherton
C.
&
Ahn
M. S.
(
2018
)
Reexamining the nonlinear moisture-precipitation relationship over the tropical oceans
,
Geophysical Research Letters
,
45
(
2
),
1133
1140
.
Salaeh
N.
,
Ditthakit
P.
,
Pinthong
S.
,
Hasan
M. A.
,
Islam
S.
,
Mohammadi
B.
&
Linh
N. T. T.
(
2022
)
Long-Short term memory technique for monthly rainfall prediction in Thale Sap Songkhla River Basin, Thailand
,
Symmetry
,
14
(
8
),
1599
.
Seeley
J. T.
&
Romps
D. M.
(
2015
)
Why does tropical convective available potential energy (CAPE) increase with warming?
,
Geophysical Research Letters
,
42
(
23
),
10,429
410,437
.
Seleshi
Y.
&
Zanke
U.
(
2004
)
Recent changes in rainfall and rainy days in Ethiopia
,
International Journal of Climatology: A Journal of the Royal Meteorological Society
,
24
(
8
),
973
983
.
Sun
Q.
,
Miao
C.
,
Duan
Q.
,
Ashouri
H.
,
Sorooshian
S.
&
Hsu
K. L.
(
2018
)
A review of global precipitation data sets: data sources, estimation, and intercomparisons
,
Reviews of Geophysics
,
56
(
1
),
79
107
.
Wu
J.
,
Chen
X.-Y.
,
Zhang
H.
,
Xiong
L.-D.
,
Lei
H.
&
Deng
S.-H.
(
2019
)
Hyperparameter optimization for machine learning models based on Bayesian optimization
,
Journal of Electronic Science and Technology
,
17
(
1
),
26
40
.
Yirga
G.
(
2023
)
Comparative Study of Rainfall Modeling Using Machine Learning and ECMWF Models Over Bahir Dar
.
Bahir Dar
:
Bahir Dar University
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).