Climate change has altered rainfall patterns, leading to urban flooding in Peshawar City. This study develops intensity–duration–frequency (IDF) curves to assess rainfall intensities for various return periods and durations. The methodology involves downscaling and bias correction of general circulation model (GCM) data, followed by feature selection using XGBoost and Extra Tree to rank nine GCMs. The top three models were used as input for four machine learning (ML) algorithms – random forest, regression tree, gradient boosting, and AdaBoost – for multi-model ensemble estimation. The models’ performance was evaluated using mean squared error, mean absolute error, root mean squared error, Nash–Sutcliffe efficiency (NSE), and Willmott's index (WI), with AdaBoost outperforming others. Bias-corrected and ensemble-modeled data were used to develop IDF curves employing normal, lognormal, and Gumbel distributions under shared socioeconomic pathways (SSPs) 245 and 585. Rainfall intensities were estimated for return periods of 2, 10, 25, 50, 75, and 100 years. This study enhances the IDF curve development by integrating advanced bias reduction and ML techniques, providing crucial insights into future rainfall patterns. The findings contribute to urban flood risk management and climate resilience planning for Peshawar City.

  • Multi-model ensemble was estimated via machine learning (ML) techniques.

  • ML techniques performance was assessed via statistical performance indicators.

  • AdaBoost demonstrated outstanding results during the training and testing phases.

  • The rainfall intensity is in the order of SSP585 > SSP245 > observed rainfall.

Intensity–duration–frequency (IDF) curves serve to depict the intricate relationship among rainfall intensity, duration, and frequency, playing a crucial role in various aspects of water resources engineering and management (Masum & Pal 2021). These aspects consist of stormwater drainage system design, rainfall zone classification, rainfall pattern evaluation, and flood control structure operation. IDF curves are often created using in-situ rainfall records, presuming a fixed climate with little variation in rainfall over time. However, it is now widely acknowledged that global warming has caused significant changes in the climate. Rainfall patterns have experienced notable variations worldwide due to shifts in atmospheric moisture within a warmer climate. Even significant changes in extremes can be caused by little departures from the mean value, as seen by the reported rise in heavy rainfall in several areas. Consequently, it becomes apparent that IDF curves developed under the assumption of a stationary rainfall regime may no longer be suitable for designing hydraulic structures.

Numerous efforts have been undertaken to incorporate nonstationary in the construction of IDF curves, aiming to support the development of climate-resilient hydraulic infrastructure (Silva et al. 2021a, b; Schlef et al. 2023). The current trends in rainfall have frequently been included as variables in these studies and integrated into the rainfall distribution parameters (Alam et al. 2021). To create nonstationary IDF curves, for example, the historical annual maximum rainfall intensity (AMRI) trend was included in the AMRI's probability distribution function (PDF) parameters. In a recent study, Yan et al. (2023) examined nonlinear trends in distribution parameters and proposed that minimizing bias in estimating IDF curves may be achieved by integrating linear trends. In a different method, Agilan & Umamahesh (2016) produced nonstationary IDF curves by using the Southern Oscillation Index (SOI) as a covariate. They discovered that adding teleconnections and data on climate change to IDF curves might improve their dependability in situations including climate change.

However, the present nonstationary IDF curves are severely limited by their dependence on current AMRI trends or their correlation with climatic warning signs. Future global warming-related climatic changes will not always follow historical patterns. Changes might also be made to the current correlations between rainfall and several large-scale indicators, including SOI. The course of climate change will be determined by other socioeconomic factors and internationally enacted policies aimed at lowering greenhouse gas emissions (Haleem et al. 2023a, b). Therefore, to successfully promote the construction of climate-resilient infrastructure, it is essential to build IDF curves based on expected climatic scenarios that span a variety of climate change possibilities.

The use of general circulation models (GCMs) is essential for predicting variations in the properties of rainfall for a century or more. Recently, IDF curves have been reevaluated and revised using GCM forecasts, especially for adaptation strategies in water management infrastructures (Silva et al. 2021a, b; Kourtis & Tsihrintzis 2022). However, because there is still much to learn about several earth and atmospheric processes, the construction of GCMs necessitates several assumptions, which adds a significant degree of uncertainty to GCMs. Risks are introduced by this uncertainty when using IDF curves that are generated from rainfall forecasts by GCMs (Wang et al. 2025).

In order to reduce these hazards and improve the resilience of hydraulic design, uncertainties in expected IDF curves are usually included (Mainali & Sharma 2023). The basic predicate for developing economically viable hydraulic structures is that the uncertainty related to predicted IDF curves must be within reasonable bounds. To provide more accurate climate simulations, it is usually recommended to carefully pick a subset of GCMs based on their capacity to represent current climatology properly.

In hydraulic engineering, predicting IDF curves at ungauged places is a major difficulty (Haleem et al. 2023a, b). Although there have been several studies/researches investigating the projection of IDF curves using observed recorded data and the GCM, up to this point, not much work has been done on performing multi-model ensembles (MMEs) after downscaling and bias correction with the help of machine learning (ML) models (Anwar et al. 2024). However, these forecasts are essential to building a long-term, climate-resilient civilization in any particular area. In order to close this gap, this work suggests a methodology for creating IDF curves for Peshawar under various climate change scenarios, along with the related uncertainty.

This study addresses a notable research gap in the construction of IDF curves by employing modern methods to reduce bias. Specifically, it utilizes feature engineering and MME techniques on future data using ML models, which have not been widely explored in IDF curve development. This innovative approach enhances the accuracy of predicting extreme weather events, which is particularly crucial under evolving climatic conditions. Unlike previous generalized studies, this research is tailored to the specific region of Peshawar, where recent climate changes have significantly impacted precipitation patterns. This focused regional analysis offers fresh insights into the localized hydrological effects of climate change, providing essential data for targeted urban planning and infrastructure development in Peshawar. This study stands out as one of the few to investigate the temporal evolution of IDF curves, spanning historical and projected future periods. By analyzing both past data and anticipated future scenarios, we offer a distinct temporal perspective, highlighting trends and variations in precipitation intensity and duration. These insights are critical for advancing forecasting accuracy and informing adaptive planning measures for future climate conditions.

In summary, the literature review on IDF curves underscores a significant shift in hydrological modeling due to the impacts of climate change on rainfall patterns. This shift is steering away from traditional stationary assumptions toward the development of nonstationary IDF curves that integrate recent climatic trends and variability. Such curves are critical for accurately forecasting extreme rainfall events in a changing climate. Advances in methodology have been highlighted, including the use of climate indices, historical data trends, and forecasts from GCMs, which collectively enhance the ability to predict future climate scenarios. However, managing the uncertainties inherent in GCM projections and climate variability remains a challenge, emphasizing the need for sophisticated strategies in uncertainty quantification and management.

Moreover, the importance of nonstationary IDF curves extends to their practical applications in designing resilient hydraulic infrastructure, urban drainage systems, and flood control structures. The literature points to the need for ongoing research to address gaps in effectively integrating complex climate signals into hydrological models, enhancing model accuracy, and confronting new challenges as climate conditions evolve (Xiong et al. 2024). Continued advancements are crucial for improving infrastructure resilience and shaping effective adaptation strategies. The call for standardized methodologies, interdisciplinary approaches, and better data accessibility highlights the ongoing efforts to refine hydrological and urban planning strategies in response to dynamic environmental conditions, marking a clear paradigm shift in the field.

The main objective of this research is to utilize yearly maximum rainfall data that has been recorded to generate IDF curves. Subsequently, IDF curves are constructed using the GCM projected, bias-corrected, and MME GCM rainfall data, considering two shared socioeconomic pathways (SSPs). This innovative approach addresses challenges related to bias correction, MME incorporation using ML models, and the impacts of climate change on hydraulic engineering. The study contributes by offering IDF curves under diverse climate change scenarios.

Peshawar, Khyber Pakhtunkhwa, Pakistan

The study's primary location is Peshawar, which is located in Pakistan's Khyber Pakhtunkhwa province. Its purpose is to investigate how different climatic zones are affected by global warming globally (Figure 1). Peshawar experiences hot summers and more cold winters due to its varied climate (Nisa 2012). In addition to various climate-related issues, the region has recently seen extreme weather occurrences that have had far-reaching repercussions (Otto et al. 2023).
Figure 1

Map of Peshawar highlighting the study area.

Figure 1

Map of Peshawar highlighting the study area.

Close modal

Peshawar has experienced multiple floods and other climate-related catastrophes in the past decade, which have resulted in substantial damage and fatalities. For instance, in August 2017, intense floods caused by monsoon rains in Peshawar devastated plenty of houses, businesses, and infrastructure (Otto et al. 2023). Communities were forced to relocate due to the floods, which also destroyed roads and destroyed priceless items. In July 2019, there was yet another flood disaster in Peshawar, with reports of fatalities and injuries along with significant damage to crops and infrastructures (Bibi et al. 2018).

Additionally, in April 2021, a strong and unanticipated hailstorm devastated crops and property, seriously affecting the livelihoods of the local population and emphasizing Peshawar's susceptibility to climate-related catastrophes (Pakistan Meteorological Department, 2021). These occurrences highlight the significance of evaluating Peshawar's shifting climate patterns and creating plans to reduce any possible dangers.

In light of global climate change, developing effective adaptation and resilience strategies necessitates an awareness of the particular issues Peshawar faces (Ullah et al. 2017). The objective of this study is to provide helpful knowledge about the regional impacts of climate change in order to ensure decisions about sustainable development might be made in Peshawar and other locations with similar climates.

Data sources

In this study, IDF curves were developed for the Peshawar district based on observed and SSP scenarios using yearly maximum precipitation data.

Rainfall data

Daily rainfall data spanning from 1981 to 2018 visualized in Figure 2 were collected from the weather station located in Peshawar. At the time of initiating this research, data post-2018 was not fully processed, and the relevant meteorological department was unable to provide the most recent data. This limitation is one of the drawbacks of our study. The rigorous quality control and verification processes can often delay the availability of official weather data, which is crucial for ensuring the reliability of data used in studies. The Pakistan Meteorological Department (PMD) provided the aforementioned meteorological data. The utilization of meteorological data from PMD ensures the reliability and accuracy of the data, forming a crucial foundation for the subsequent analyses.
Figure 2

Representation of the observed data.

Figure 2

Representation of the observed data.

Close modal

GCM simulations

The CMIP6 dataset includes the anticipated daily precipitation data for Peshawar obtained from GCMs. All nine GCMs specifically selected for this research are presented in Table 1. Ullah et al. (2023) also used these GCMs. The selection of these GCMs allows for a comprehensive analysis of potential future precipitation patterns in the area, reflecting a wide range of climate patterns (Tian et al. 2024). These selected GCM outputs are used to construct IDF curves after bias correction. This research provides crucial insights into the numerous climate change scenarios that could affect Peshawar by thoroughly examining potential variances and swings in rainfall characteristics.

Table 1

GCMs used in this research

Model GFDL-ESM4 INM-CM4-8 INM-CM5-0 NESM3 CNRM-CM6-1 CNRM-ESM2-1 EC-Earth3-Veg-LR MIROC6 MRI-ESM2-0 
Horizontal resolution ∼0.25° (∼25 km) ∼2.5° (∼250 km) ∼1.5° (∼150 km) ∼1.1° (∼110 km) ∼0.25° (∼25 km) ∼1.4° (∼140 km) ∼0.75° (∼75 km) ∼1.4° (∼140 km) ∼1.1° (∼110 km) 
Model GFDL-ESM4 INM-CM4-8 INM-CM5-0 NESM3 CNRM-CM6-1 CNRM-ESM2-1 EC-Earth3-Veg-LR MIROC6 MRI-ESM2-0 
Horizontal resolution ∼0.25° (∼25 km) ∼2.5° (∼250 km) ∼1.5° (∼150 km) ∼1.1° (∼110 km) ∼0.25° (∼25 km) ∼1.4° (∼140 km) ∼0.75° (∼75 km) ∼1.4° (∼140 km) ∼1.1° (∼110 km) 

Downscaling and bias correction of GCMs rainfall

Downscaling of GCM refers to the process of obtaining higher-resolution and more localized climate information from the coarse-scale output produced by GCM (Teegavarapu 2010). Although GCMs frequently lack the geographic resolution required for regional or local impact assessments, they still offer insightful information about large-scale climatic patterns. This gap is filled by downscaling techniques, which produce finer-grained climate data that is more pertinent to particular areas. The statistical downscaling method was used in this study to downscale the GCMs.

One important statistical technique that is frequently used to improve the output of GCMs is bias correction, which comes after downscaling. GCMs are extremely powerful computer programs that use complex mathematical formulas to mimic the physical processes of the atmosphere, oceans, and land surfaces. However, these complex models have inherent limitations that could lead to discrepancies. These discrepancies can be overcome by bias correction which compares GCM output historical data with observed data and then modifies the model output future data using statistical techniques to make it more similar to the observed data (Dutta & Bhattacharjya 2022). Reducing model biases and uncertainties is the primary goal of bias correction, which also improves the dependability and use of GCM outputs for impact evaluations and decision-making (Maraun 2016).

Bias correction is used to establish statistical relationships between observable data and GCM outputs (Xu et al. 2024). These methods, such as linear scaling, offer simplicity, speed, and processing efficiency as compared with other methods. Fang et al. (2015) demonstrated that linear scaling is widely employed by various scholars in hydrological and climatic studies.

In the particular context of this research project, bias correction was carried out using the linear scaling technique with the help of the CMhyd tool due to the linear scaling approach's ease of use and computational efficacy allowed it to preserve the patterns and variability identified in the GCM output. CMhyd is a widely accessible and free tool that is used for downscaling and bias-correcting GCM data. Figure 3 displays the visual representation of the entire bias correction process. It is strongly advised to employ an ensemble strategy after bias correction, as outlined by Teutschbein & Seibert (2012), which makes use of bias-corrected data from many climate models.
Figure 3

Bias correction technique applied in this study.

Figure 3

Bias correction technique applied in this study.

Close modal

Multi-model ensemble

Multiple climate models are merged in the MME to generate accurate climate projections. This method lowers the uncertainties and biases of any particular GCM by combining the output of multiple GCMs. By doing this, MME provides a comprehensive and reliable evaluation of future climatic data, improving the accuracy of climate model forecasts (Ahmed et al. 2019).

By considering the variations and interactions between multiple GCMs and observed data, MME offers a more thorough understanding of climate projections. Because the method is particularly effective at capturing the inherent diversity in different GCMs, it offers a more accurate assessment of potential climate scenarios.

In this study, feature engineering was carried out with the help of two ensemble learning models such as XGBoost and Extra Tree, as shown in Figure 4. By using these two ensemble learning algorithms, feature importance scores were calculated for all nine CMIP6 GCMs. Researchers assessed the feature importance scores to determine which GCMs best suit the observed climate data (Tian et al. 2023). This approach improves the selection process by emphasizing models that boost future climate forecast accuracy and substantially advance our general understanding of climate dynamics.
Figure 4

Flowchart of the feature engineering process.

Figure 4

Flowchart of the feature engineering process.

Close modal

Ensemble learning models

Extra Trees and XGBoost are two ensemble learning algorithms that are commonly used for ensemble learning tasks.

XGBoost

XGBoost is also known as eXtreme Gradient Boosting in ML. It is a potent boosting algorithm that is notable for its capacity to progressively construct a sequence of weak learners, which are frequently decision trees. By employing a boosting strategy, each successive tree corrects the errors caused by the preceding ones, resulting in the development of a robust and accurate prediction model. The iterative refining process of XGBoost is what gives the model its power as it gradually increases its predictive capacity (Ali et al. 2023).

Additionally, regularization methods are used in XGBoost to lessen overfitting because L1 (LASSO) and L2 (Ridge) regularization terms are incorporated into XGBoost's objective function, the development of overly complex models that might become overly tailored to the training data is discouraged. Regularization improves the model's overall performance by assisting the model in avoiding overfitting traps and improving its capacity to generalize to new data (Chen & Guestrin 2016).

XGBoost employs gradient-based optimization approaches in addition to its boosting and regularization capabilities to improve overall performance and convergence speed. Gradient boosting in XGBoost refers to the model's training process optimization by the implementation of a gradient-based technique. Furthermore, XGBoost uses feature importance scores to provide significant knowledge regarding each feature's relevance, which helps users comprehend the distinct contributions of variables to the model's predictions. XGBoost is a robust and flexible program that supports missing data and parallel processing. It works especially effectively with large datasets and real-world scenarios where data completeness may change (Zhang et al. 2022).

Extra Trees

Extra Tree is a crucial member of the random forests family, which is employed in the ensemble approach (Sanmorino et al. 2023). Extra Trees differs from conventional random forests in that it uses a completely new randomization technique. Extra Trees introduces random subsets of features for every decision tree split, in addition to using random subsets of data samples for each tree's training. To promote diversity, arrange the individual trees in the ensemble in a decorrelating manner, and reduce the possibility of overfitting – all of which frequently result in more accurate and consistent predictions – this increased degree of unpredictability is crucial.

A crucial element of Extra Trees is aggressive randomization, which also contributes to the reason for its resilience to noisy input. This property sets Extra Trees apart from traditional random forests, and they work especially effectively on datasets with noisy features and inconsistent data quality. The training process is accelerated by the strict randomization technique. Extra Trees is more efficient than some of the existing ensemble techniques since it generates multiple trees in parallel with a significant degree of unpredictability. Like random forests, Extra Trees provide feature importance scores to help make sense of how important each feature is in influencing the model's predictions.

Ensemble learning models development: XGBoost and Extra Trees

In the study, two powerful ensemble learning algorithms – Extra Trees and XGBoost – were used to calculate feature importance scores in a given dataset with the help of a Python script created for this study. The dataset from the Excel file is imported using the Pandas library. To impute missing values for the feature and target variable, mean imputation is utilized.

Next, the script uses the train_test_split function from the sickest-learn library to divide the data into training and testing sets. After that, an XGBoost and Extra Trees regression is built and trained. The feature importance scores obtained by both models are then utilized to provide visual aids, including radar charts and radial bar charts that illustrate the relative importance of each feature to the target variable. The code demonstrates a systematic way of comparing and displaying the feature significance scores of the two algorithms.

Furthermore, the feature importance scores for XGBoost and Extra Trees are computed, offering a thorough analysis of each feature's role in the models. Moreover, the top three features for each method are displayed.

ML models

After the top three GCMs were chosen, as indicated in Figure 5, several ML models were trained on them in order to ensemble future climate data.
Figure 5

Flowchart of ML models.

Figure 5

Flowchart of ML models.

Close modal

Random forest

Random forest is an ML technique that is robust and versatile, proven for its ability to tackle both regression and classification problems with exceptional accuracy (Ren et al. 2017). Random forest is a member of the wide family of ML models and excels in combining numerous weak predictive models – typically decision trees – into a single, potent predictor. The method prevents overfitting and increases generalization by introducing randomness into the model during training.

Several decision trees are independently built in the random forest paradigm, each of which improves its predictive power on a different subset of the dataset. To further create a heterogeneous group that generates trustworthy predictions overall, randomization is expanded to consider several feature subsets at each decision tree split. After a voting process in which each tree submits its estimate, the most popular outcome becomes the final projection. This group method not only increases prediction accuracy but also yields fresh data regarding the relative importance of different attributes.

In this study, Orange is a free and public tool that includes various ML models, including random forest (Ishak et al. 2020; Zhan et al. 2024). In the Orange tool, the data are elegantly loaded. After loading the data, with the help of a data sampler, the data are split into 70 and 30% for training and testing. With ten trees and infinite features and depths, a no-replication pledge, and homage to the cosmic law of five instances, the random forest model is expertly designed.

Regression tree algorithm

The regression tree algorithm proves to be a versatile and effective tool for both classification and regression tasks (Sharma 2022). It functions by breaking down complex issues into more manageable sub-issues, making it easier to see the underlying patterns in the data. Regression trees have clear decision-making processes because of the hierarchical conditions that are applied from the root to the leaf nodes. This transparency helps to make the model easier to grasp because every choice can be traced back through the tree structure.

In this study, the Orange tool takes center stage, introducing the distinguished regression tree. With the help of a data sampler, data are elegantly split into 70 and 30% training and testing pairs. With a minimum of two instances in leaves, five instances in internal nodes, and a maximum depth of 100, the regression tree ensemble uses a smart splitting strategy that stops at a 95% majority. Binary trees add a resounding ‘Yes’ to the symphony.

AdaBoost algorithm

AdaBoost stands out as a potent ML technique renowned for its ability to enhance the performance of weak learners, commonly in the form of a decision tree (Chengsheng et al. 2017; Chen et al. 2024). AdaBoost's key component is its iterative methodology, which involves giving misclassified data points more weight with each iteration. This tactical change enables later weak learners to concentrate on difficult cases, progressively increasing the overall accuracy of the model. AdaBoost generates a strong and precise model through the combination of these learners' predictions, each weighted based on skill level. AdaBoost is a well-liked option in the ML toolkit due to its adaptability and ability to tackle a wide range of classification and regression problems; it is used for jobs like text classification and facial recognition, among others.

In this study, the Orange tool takes center stage for crafting the AdaBoost model. With the help of an expert data splitter, the data are divided into 70 and 30% for training and testing paired with elegance. With a tree serving as its base estimator, the AdaBoost ensemble consists of 50 estimators. With a linear loss, the algorithm Samme.r controls the regression process, and the learning rate of 1.000 creates a precision increase symphony.

Gradient boosting algorithm

Gradient boosting is an extremely powerful ML method that has attracted a lot of interest due to its remarkable predictive power and adaptability. Gradient boosting is a vital member of the ML family that is highly effective in solving problems related to both regression and classification. Its basic idea is to build a robust and accurate predictor by repeatedly combining several weak predictive models, frequently in the form of decision trees. By reducing the residual errors from earlier models, this amalgamation is accomplished, thereby honing the final model to focus on data points that were incorrectly classified or poorly forecasted in the past. Gradient boosting has therefore been widely used in a variety of fields, such as natural language processing, finance, and healthcare, making it a vital tool for researchers aiming for cutting-edge outcomes in their investigations.

In this study, the Orange tool played a pivotal role in implementing a gradient boosting model, where careful attention was paid to parameter settings. The ensemble, donned in the attire of gradient boosting (scikit-learn), featured 100 trees. The learning rate was set at 0.1, so repeatable training was guaranteed by a resounding ‘Yes’. The percentage of training instances stayed at 1, and the maximum tree depth climbed to a maximum of 3. During the constructing phase, nodes were gradually split, with the splitting ceasing only when the maximum instances reached two.

ML model evaluation criteria

Five statistical metrics were employed in this study to assess the performance of the ML models: mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), Nash–Sutcliffe efficiency (NSE), and Willmott's index (WI). These metrics offer vital information on the precision and accuracy of the model's predictions (Ullah et al. 2023).

Mean squared error

MSE is a common statistical metric that calculates the average of the squared deviations between the actual and predicted values (Fürnkranz et al. 2010). MSE can be computed by Equation (1):
(1)

Root mean squared error

RMSE is a commonly used statistical metric to assess the average magnitude of differences between predicted and actual data (Chai & Draxler 2014). RMSE is computed by Equation (2):
(2)

Mean absolute error

MAE is the average variation between the observed values and the predicted values in the dataset (Schneider & Xhafa 2022). MAE can be computed by Equation (3):
(3)

Nash–Sutcliffe efficiency

NSE is a widely used metric for assessing the performance of hydrological or environmental models. It is calculated using Equation (4):
(4)
where Oi is the observed values; Pi is the predicted values; is the mean of observed values; and n is the number of data points.

Willmott's index

WI assesses the relative error between the observed and predicted values, providing a measure of model performance. It is calculated using Equation (5):
(5)
where Oi is the observed values; Pi is the predicted values; is the mean of observed values; and n is the number of data points.

These evaluation criteria offer a comprehensive analysis of the prediction performance of the ML model, accounting for both the magnitude and direction of errors. Several academics frequently use these statistical indicators to assess the model's accuracy and decide if it is suitable for the given job or dataset (Ullah et al. 2023; Yan et al. 2023).

Development of IDF curves

The IDF curve design is crucial for estimating rainfall intensities throughout various periods and repetition intervals. Understanding the relationship between precipitation and storm frequency is based on the IDF curve (Thanh & Xuan 2023). Hydrologic assessments of water systems depend on the estimates of rainfall intensity; hence, IDF curves are critical to storm design. Although IDF statistics for a specific storm are not available, data from many storms can be gathered to create a strong link. IDF curves are constructed using a careful three-step process that is adapted to different time intervals (i.e., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 720, and 1,440 min). This procedure involves using PDFs to match the gathered data and determine the intensity of rainfall for predetermined return periods, such as 2, 10, 25, 50, 75, and 100 years. In water resource engineering, the IDF curves are an essential tool that helps with infrastructure planning and storm design.

Empirical reduction equation for short-duration rainfall estimation

An empirical reduction equation is used by the Indian Meteorological Department (IMD) to estimate values of short-duration rainfall (Rashid et al. 2011; Shamkhi et al. 2022). The rainfall depth Pt at ‘t’ minutes, which corresponds to different durations like 5, 10, 15, and 20 min, is obtained using Equation (6):
(6)

The formula makes use of the annual maximum values, where Pt is the amount of rainfall needed in millimeters for a certain time, ‘t’. This is connected to the millimeter-measured daily rainfall. By taking into account the relationship between the desired rainfall depth for a given time duration and the total daily rainfall, the formula makes it easier to estimate short-duration rainfall and offers insightful information for managing water resources and meteorological forecasts.

PDF fitting using the frequency distribution method

In various regions worldwide, several theoretical PDFs find widespread application, including generalized extreme value (GEV) distribution, Gumbel (extreme value distribution type 1, EV1), normal distribution, lognormal distribution, Pearson, and log-Pearson type III distribution. In the creation of IDF curves from rainfall data for a specific study area, three prevalent PDFs – normal, lognormal, and Gumbel – were employed. The decision to choose the normal, lognormal, and Gumbel (EV1) distributions is based on the literature reviews. Different researchers (Millington et al. 2011; Samantaray & Sahoo 2020) recommended that using these three distributions in climate data best fit the observed rainfall data, the climate data best exhibit align well with the assumptions of the normal, lognormal, and Gumbel (EV1) distributions, so these distributions may be the most appropriate choice.

Normal distribution
In the field of statistics, the normal (Gaussian) distribution stands out as a predominant method. To derive rainfall intensities for specific return periods and storm duration, various calculations are essential (Shamkhi et al. 2022). The normal distribution is commonly used due to its simplicity and applicability when data exhibit symmetric and bell-shaped characteristics. It is often favored for modeling variables influenced by numerous small, independent factors (according to the Central Limit Theorem), which can make it suitable for certain types of meteorological data, including rainfall intensity. Similar to other approaches, this method involves the computation of rainfall intensities. Equation (7) provided below allows for the determination of P (in mm) based on a specified time (T in years) and a specific duration (t).
(7)
where represents the arithmetic average of the rainfall records and S denotes the standard deviation. Kt is the frequency factor which is equal to Z for the normal and lognormal distribution function and it is calculated by Equation (8):
(8)
where w is calculated by Equation (9):
(9)
where P is the probability of occurrence and is calculated by Equation (10):
(10)

In the scenario where P > 0.5, the substitution of P in Equation (9) is performed with 1 − P, resulting in a negative value for Z.

To facilitate the development of IDF calculations for precipitation and rainfall intensity (I) (in mm/h) concerning a period (t), the formula is given by Equation (11):
(11)

Here, t signifies the duration in hours. This procedure is systematically employed to determine the intensity for 14 durations and 6 return periods through meticulous calculations.

Lognormal function

Employing the lognormal method involves utilizing logarithmic transformations on variables to calculate the frequency of precipitation, akin to the normal method. Rainfall data often exhibit positive skewness (long tail toward high rainfall values), making the lognormal distribution an appropriate choice. By transforming the data to a logarithmic scale, the lognormal distribution can better represent the underlying distribution of rainfall intensity, particularly for extreme events. This approach entails performing calculations for average precipitation and standard deviations through logarithmically transformed data.

Gumbel function (EV1)

The Gumbel function, or type 1 distribution of maxima, is named after its originator Gumbel. The Gumbel distribution is specifically designed for modeling extreme values, making it well-suited for IDF curve analysis, which focuses on rare and high-intensity rainfall events. The Gumbel distribution is often preferred when the interest lies in accurately estimating the probability of extreme events beyond the range of typical observations.

It is used to analyze and evaluate IDF curves and shows that they are suitable for fitting maxima. When it comes to managing extreme data and maximum rainfall values, the Gumbel distribution excels. Like the normal function, this method can be used to find the return period; however, Kt, the frequency factor, is different from the normal function. Chow states that Equation (12) can be used to characterize the Gumbel distribution.
(12)

Then, the intensity can be computed using Equation 11.

An organized method was used to increase the reliability of the GCM outputs for impact evaluations and well-informed decision-making. Initially, a two-step procedure was used, with bias correction coming after downscaling. It was suggested that lowering model biases and uncertainties would improve the consistency and usefulness of GCM outputs. In particular, the linear scaling method of bias correction was chosen to maintain trends and variability in the GCM output due to its simplicity of use and computational effectiveness.

Following bias correction, two ensemble learning techniques such as XGBoost and Extra Trees were used. The feature importance scores for both techniques were carefully evaluated in order to select the top three models, as Table 2 demonstrates. Following this decision, an MME framework using multiple ML models was utilized to forecast future precipitation. These ML models included gradient boosting, AdaBoost, regression tree, and random forest. These prediction models employed the observed precipitation data as the target variable and input variables were the precipitation data from the top three GCMs, chosen based on feature importance scores.

Table 2

GCM selected through the feature selection process

Extra Trees MRI-ESM2-0 EC-Earth3-Veg-LR INM-CM5-0 
XGBoost MRI-ESM2-0 GFDL-ESM4 INM-CM5-0 
Extra Trees MRI-ESM2-0 EC-Earth3-Veg-LR INM-CM5-0 
XGBoost MRI-ESM2-0 GFDL-ESM4 INM-CM5-0 

The dataset was deftly divided into two stages: a training phase that included 70% of the total dataset and a testing phase that included 30% of the data, in order to facilitate a comprehensive analysis and validation of our models. The accuracy of our projections was guaranteed by this separation, which allowed us to assess the models' performance in-depth throughout the testing phase. The accuracy and precision of our models' interpretation of climatic patterns were assessed using key statistical metrics such as MSE, RMSE, MAE, NSE, and WI as benchmarks. After careful consideration of the statistical metrics, it was found that AdaBoost performed better than the other models in terms of precipitation forecasting. This analysis of AdaBoost's exceptional performance in the context of this study demonstrates its value and proficiency as a precipitation prediction tool.

Evaluation of ensemble learning models

The GCMs were evaluated and ranked according to how well they relate to the observed data using ensemble learning models using feature importance scores. The aim was to recognize the best-performing GCMs, which would then serve as the basis for creating an MME to reduce uncertainty in the GCM expectations. This inquiry was conducted independently for SSP245 and SSP585, two distinct SSPs.

Evaluation of the result from the Extra Tree model

The Extra Tree model can be visually represented through two types of illustrations: a radial bar chart and a radar graph, which are shown in Figure 6. These visuals target to showcase the presentation of the most impactful GCM, as determined by feature significance scores. Following the analysis, the top three models were identified, as presented in Table 2. This examination was conducted separately for two distinct SSPs, namely SSP245 and SSP585.
Figure 6

Radar chart and polar bar chart illustrate the significance of every GCM using Extra Tree methods.

Figure 6

Radar chart and polar bar chart illustrate the significance of every GCM using Extra Tree methods.

Close modal

Evaluation of the result from the XGBoost model

Two visual representations of the XGBoost model are portrayed in Figure 7. It depicts a polar bar chart and a radar graph. These visuals are effective in emphasizing the prominence of the most influential GCM based on feature importance scores. Succeeding the scrutiny, the top three models were recognized and are shown in Table 2. This study was carried out for the historical data.
Figure 7

Radar chart and polar bar chart illustrate the significance of every GCM using XGBoost methods.

Figure 7

Radar chart and polar bar chart illustrate the significance of every GCM using XGBoost methods.

Close modal

Analyzing the effectiveness of the ML model

The MME in the present study was composed of several ML models, including gradient boosting, random forest, regression tree, and AdaBoost models. Our prediction models used the observed precipitation data as the target variable and the precipitation data from the top three GCMs in each ensemble technique as input variables. Moreover, we divided the dataset into two stages to evaluate and validate our models: the training phase comprised 70% of the entire dataset, while the testing phase comprised the remaining 30%. By using this method, we were able to evaluate the models' performance and make sure that our forecasts were reliable. Table 3 displays the important statistical metrics MSE, RMSE, MAE, NSE, and WI that we used to assess how well these ML models replicated precipitation. These measurements served as benchmarks to assess the precision and dependability of ML models in identifying patterns of precipitation. Our results showed that AdaBoost outperformed the other models tested in terms of precipitation forecasting on XGBoost's top three GCMs during the training and testing stages, indicating its higher predictive power and establishing it as a useful tool for this study's precipitation prediction.

Table 3

Training and testing results of the top three ML models used in this study

ModelTrainingTesting
MSERMSEMAENSEWIMSERMSEMAENSEWI
Extra Trees Random forest 962.6 31.03 18.43 0.80 0.71 1,128.2 33.6 27.21 0.48 0.42 
Gradient boosting 2.04 1.43 1.23 0.76 0.91 7,066.5 84.06 56.82 0.28 0.31 
Tree 1,155.3 33.99 20.1 0.50 0.62 3,286.2 57.33 46.51 0.32 0.36 
AdaBoost 52.87 7.27 3.51 0.95 0.94 984.2 31.37 26.29 0.56 0.51 
XGBoost Random forest 960.2 30.98 18.7 0.79 0.71 388.8 19.71 16.94 0.65 0.62 
Gradient boosting 2.54 1.59 1.28 0.89 0.75 3,843.6 61.9 42.6 0.32 0.28 
Tree 1,059.8 32.6 18.9 0.54 0.67 2,431.2 49.3 36.1 0.43 0.48 
AdaBoost 53.1 7.3 2.970 0.96 0.95 469.6 21.7 19.7 0.63 0.61 
ModelTrainingTesting
MSERMSEMAENSEWIMSERMSEMAENSEWI
Extra Trees Random forest 962.6 31.03 18.43 0.80 0.71 1,128.2 33.6 27.21 0.48 0.42 
Gradient boosting 2.04 1.43 1.23 0.76 0.91 7,066.5 84.06 56.82 0.28 0.31 
Tree 1,155.3 33.99 20.1 0.50 0.62 3,286.2 57.33 46.51 0.32 0.36 
AdaBoost 52.87 7.27 3.51 0.95 0.94 984.2 31.37 26.29 0.56 0.51 
XGBoost Random forest 960.2 30.98 18.7 0.79 0.71 388.8 19.71 16.94 0.65 0.62 
Gradient boosting 2.54 1.59 1.28 0.89 0.75 3,843.6 61.9 42.6 0.32 0.28 
Tree 1,059.8 32.6 18.9 0.54 0.67 2,431.2 49.3 36.1 0.43 0.48 
AdaBoost 53.1 7.3 2.970 0.96 0.95 469.6 21.7 19.7 0.63 0.61 

Evaluation of random forest model performance

The random forest model was developed and put through testing to see how well it could predict patterns of precipitation. Performance indicators were employed in the training and testing phases, such as RMSE, MAE, MSE, NSE, and WI to assess its effectiveness. The findings shown in Table 3 reveal that, during the training phase, the top three models of the Extra Tree variant achieved MSE, RMSE, MAE, NSE, and WI values of 962.6, 31.03, 18.43, 0.80, and 0.71, respectively. Similarly, the testing phase yielded values of 1,128.22, 33.6, 27.21, 0.48, and 0.42.

Similarly, the XGBoost combination exhibited values of 960.2, 30.98, 18.7, 0.79, and 0.71 in the training phase and 388.8, 19.71, 16.94, 0.65, and 0.62 in the testing phase. Unfortunately, the overall findings display that the random forest model performed inadequately in both ensemble methodologies during the training and testing stages. The model's capacity to predict precipitation was unable to reach the essential levels of accuracy and dependability.

Evaluation of gradient boosting performance

The gradient boosting model was cautiously created and assessed for its predictive competencies in forecasting precipitation patterns. Robust performance indicators, including RMSE, MAE, MSE, NSE, and WI, were used to evaluate its efficacy during the training and testing stages. The results, as presented in Table 3, emphasize that during the training phase, the top three models of the Extra Tree variant achieved MSE, RMSE, MAE, NSE, and WI values of 2.04, 1.43, 1.23, 0.76, and 0.91, respectively. On the other hand, the testing phase exhibited values of 7,066.5, 84.062, 56.828, 0.28, and 0.31.

Similarly, the XGBoost combination revealed the values of 2.54, 1.59, 1.284, 0.89, and 0.75 in the training phase, while the testing phase reflected the values of 3,843.6, 61.9, 42.6, 0.32, and 0.28. Appallingly, the overall results point to the unsatisfactory performance of the gradient boosting model in both ensemble techniques throughout both the training and testing phases. The model failed to attain the preferred accuracy and reliability in precipitation prediction.

Evaluation of regression tree performance

The regression tree model was thoroughly established and its predictive capabilities in forecasting precipitation patterns, employing performance metrics such as RMSE, MAE, MSE, NSE, and WI across both the training and testing phases. The results shown in Table 3 highlight that, during the training phase, the top three models of the Extra Tree variant achieved MSE, RMSE, MAE, NSE, and WI values of 1,155.3, 33.99, 20.12, 0.50, and 0.62, respectively. In the testing phase, the corresponding values were detected as 3,286.28, 5,733, 46.514, 0.32, and 0.36.

Likewise, for the XGBoost combination, the values of 1,059.8, 32.6, 18.9, 0.543, and 0.670 were recorded during the training phase, with the testing phase the values of 2,431.2, 49.3, 36.1, 0.43, and 0.48. Overall, these results suggest the satisfactory performance of the regression tree model in both the training and testing phases. However, it falls short of accomplishing the desired accuracy and reliability in precipitation prediction.

Evaluation of AdaBoost performance

The AdaBoost model was prudently established and calculated to assess its predictive capabilities in projecting precipitation patterns, using key performance metrics for evaluation including RMSE, MAE, MSE, NSE, and WI in both the training and testing phases. The findings shown in Table 3 reveal that, during the training phase, the top three models of the Extra Tree variant achieved MSE, RMSE, MAE, NSE, and WI values of 52.87, 7.27, 3.51, 0.95, and 0.94, respectively. In the testing phase, the corresponding values were detected as 984.22, 31.37, 26.291, 0.56, and 0.51.

Similarly, for the XGBoost combo, the values of 469.6, 21.7, 19.7, 0.63, and 0.61 were recorded during the testing phase, while during the training phase, 53.1, 7.3, 2.970, 0.96, and 0.95 were documented. Overall, the compromise programming shows the acceptable performance of the AdaBoost on XGBoost's top three GCMs during the training and testing stages, successfully achieving the desired accuracy and reliability in precipitation prediction.

Despite a slight decrease in performance compared with the training phase, the model exhibited reasonable accuracy and predictive capability when applied to unseen data. The comparison of AdaBoost's precipitation prediction performance with random forest, gradient boosting, and regression tree models is very notable, as AdaBoost beat all of them. Reduced prediction errors are indicated by lower AdaBoost RMSE, MAE, and MSE values, and high values of NSE and WI. These metrics demonstrate the AdaBoost model's superior performance in predicting precipitation, which allows for confident extensions of predictions until 2100. The AdaBoost model's efficacy in precipitation prediction is further supported by these metrics, which also demonstrate the model's superiority over other modeling techniques, as demonstrated in the study by Ullah et al. (2023). In conclusion, the AdaBoost model on the XGBoost combo outclasses gradient boosting, random forest, and regression tree models in terms of predictive accuracy and reliability, making it a strong and reliable way for precipitation prediction.

PDF fitting using PDFs

Three dominant probability distribution techniques, namely normal, lognormal, and Gumbel, were employed to analyze SSP data, as shown in Figure 8. The analysis for SSP245 is shown in Figure 9 and for SSP585 in Figure 10.
Figure 8

PDFs for observed precipitation: (a) lognormal, (b) Gumbel, and (c) normal.

Figure 8

PDFs for observed precipitation: (a) lognormal, (b) Gumbel, and (c) normal.

Close modal
Figure 9

PDFs for precipitation in the SSP245 scenario: (a) lognormal, (b) Gumbel, and (c) normal.

Figure 9

PDFs for precipitation in the SSP245 scenario: (a) lognormal, (b) Gumbel, and (c) normal.

Close modal
Figure 10

PDFs for precipitation in the SSP585 scenario: (a) lognormal, (b) Gumbel, and (c) normal.

Figure 10

PDFs for precipitation in the SSP585 scenario: (a) lognormal, (b) Gumbel, and (c) normal.

Close modal

IDF curve generation

The IDF curve, created through the chosen methodology, establishes the relationship between the duration of rainfall (in minutes), intensity (in mm/h), and the return period (in years). Three distinct methods, namely normal, lognormal, and Gumbel, have been advised for calculating rainfall intensity across various return periods. The relationship between rainfall intensity and return duration is illustrated in Figures 1113, which depict a combination of these three methods. Consequently, an increase in the return interval corresponds to an increase in the storm duration, and vice versa. The plots specify an inverse relationship between rainfall intensity and storm duration in each case. Notably, a noteworthy change in the return period is observed when the rainfall intensity transitions from a duration of 1–4 h, gradually decreasing. The intensity–duration curve highlights that the Gumbel method yields the highest rainfall intensity values among the three methods for higher return periods.
Figure 11

IDF curve using the normal distribution for (a) the observed data, (b) SSP245, and (c) SSP585.

Figure 11

IDF curve using the normal distribution for (a) the observed data, (b) SSP245, and (c) SSP585.

Close modal
Figure 12

IDF curve using the lognormal distribution for (a) the observed data, (b) SSP245, and (c) SSP585.

Figure 12

IDF curve using the lognormal distribution for (a) the observed data, (b) SSP245, and (c) SSP585.

Close modal
Figure 13

IDF curve using the Gumbel distribution for (a) the observed data, (b) SSP245, and (c) SSP585.

Figure 13

IDF curve using the Gumbel distribution for (a) the observed data, (b) SSP245, and (c) SSP585.

Close modal

The rainfall intensity is highest for SSP585 and SSP245 in comparison with observed historical rainfall. The rainfall intensity is in the order of SSP585 > SSP245 > observed rainfall. Our findings align closely with studies by Noor et al. (2018) and Galiatsatou & Iliadis (2022) demonstrating that higher emission scenarios, notably SSP585, lead to pronounced increases in rainfall intensity compared with historical observations and lower emission scenarios like SSP245. Moreover, this consistency reinforces the understanding of climate change impacts on IDF curves and emphasizes the importance of considering emission pathways in climate resilience planning.

Furthermore, a nuanced exploration of the benefits and drawbacks of the developed models is essential to enhance transparency and applicability. The ensemble learning approach utilizing Extra Trees, XGBoost, and AdaBoost showcases superior performance in feature selection and model estimation, as validated through compromise programming. However, limitations such as data dependency (spanning from 1981 to 2018) and reliance on climate model projections introduce uncertainties that should be addressed in future studies. Incorporating more recent rainfall data and refining uncertainty quantification methods are crucial steps toward improving model accuracy and robustness. By delving into these aspects, the discussion enriches the scientific rigor of the research and provides valuable insights for decision-making processes related to climate-resilient infrastructure development and urban planning initiatives.

Scientific contributions and implications for resilient infrastructure design

The study presents significant new scientific contributions to the field of hydrological modeling and climate resilience. By integrating ensemble learning techniques such as Extra Trees and XGBoost for feature selection and model estimation, along with employing ML models like AdaBoost for MME estimation, this research introduces innovative approaches to developing IDF curves. Additionally, the study incorporates bias correction methods and explores various distribution methods (e.g., normal, lognormal, Gumbel) to enhance the accuracy and reliability of IDF curve predictions, addressing data variability and improving predictive capabilities under changing climate conditions. The temporal analysis of historical data alongside projected future climate scenarios (e.g., SSP585 and SSP245) offers valuable insights into temporal variations in rainfall intensity and duration, essential for understanding climate change impacts and informing adaptive planning measures. Focusing on the specific geographical area of Peshawar, the study provides localized insights into the hydrological effects of climate change, contributing to targeted urban planning and infrastructure development. Furthermore, the study emphasizes practical implications for climate resilience, advocating for the incorporation of SSP scenarios in drainage system designs to promote climate-resilient infrastructure development. Overall, these scientific contributions advance the field of hydrological modeling by leveraging ML techniques and providing actionable insights for climate adaptation strategies. These contributions will be explicitly linked to how they address the previously identified research gaps, thereby underlining the direct impact of our work on advancing the field.

The primary goal of this study is to develop IDF curves for calculating rainfall intensity based on specific storm durations and return periods. Following bias correction, ensemble learning techniques using Extra Trees and XGBoost were employed for feature selection, with four ML models (random forest, regression tree, gradient boosting, and AdaBoost) utilized for MME estimation. AdaBoost emerged as the best-performing model through compromise programming and was selected for the final MME. Subsequently, three distribution methods were applied to fit the data, resulting in the development of IDF curves. The findings indicate that projected rainfall intensity under future climate scenarios, particularly SSP585 and SSP245, surpasses historical observations. Specifically, rainfall intensity is highest under SSP585, followed by SSP245 and observed historical rainfall. However, limitations of this study are multifaceted and warrant careful consideration in interpreting the results and applying findings. Firstly, the use of rainfall data spanning from 1981 to 2018 may not fully capture recent climate changes, potentially affecting the accuracy of IDF curve predictions. Future research endeavors should prioritize incorporating more recent and up-to-date rainfall data, extending the analysis period to encompass data up to 2024 for a more precise representation of current climate conditions and trends. Additionally, the study's reliance on GCM introduces inherent uncertainties, which could influence the reliability of the generated IDF curves. Furthermore, the study's focus on the specific geographical area of Peshawar may limit its generalizability to other regions, as variations in rainfall patterns across different locations are not fully captured. This regional specificity underscores the importance of recalibration or adaptation when applying the developed model to diverse geographical contexts, considering local climatic nuances to ensure the model's relevance and accuracy. Addressing these limitations is essential for advancing the applicability and robustness of IDF curve modeling in the context of hydrological and climate impact assessments. Sanitation engineers are advised to incorporate SSP scenarios into drainage system designs to enhance climate resilience and mitigate the adverse impacts of intensified rainfall events. The study underscores the value of applying ML techniques in hydrological modeling, demonstrating their efficacy in handling complex rainfall data and improving predictive capabilities for climate impact assessments. Future research directions should address these limitations by refining ensemble learning approaches, exploring additional climate scenarios, and integrating uncertainty quantification methods to enhance the reliability and robustness of IDF curve predictions under evolving climate conditions.

The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group funding program grant code (NU/RG/SERC/12/47).

The submitted work is original and has not been published elsewhere in any form or language.

All the authors agree with the participation of this article.

All the authors agree with the publication of this article.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ahmed
K.
,
Sachindra
D. A.
,
Shahid
S.
,
Demirel
M.
&
Chung
E.-S.
(
2019
)
Selection of multi-model ensemble of general circulation models for the simulation of precipitation and maximum and minimum temperature based on spatial assessment metrics
,
Hydrology and Earth System Sciences
,
23
,
4803
4824
.
Alam
F.
,
Salam
M.
,
Khalil
N. A.
,
khan
O.
&
Khan
M.
(
2021
)
Rainfall trend analysis and weather forecast accuracy in selected parts of Khyber Pakhtunkhwa, Pakistan
,
SN Applied Sciences
,
3
(
5
),
575
.
Ali
Z.
,
Abduljabbar
Z.
,
Tahir
H.
,
Sallow
A.
&
Almufti
S.
(
2023
)
Exploring the power of eXtreme gradient boosting algorithm in machine learning: a review
,
Academic Journal of Nawroz University
,
12
320
334
.
Anwar
H.
,
Khan
A. U.
,
Ullah
B.
,
Taha
A. T. B.
,
Najeh
T.
,
Badshah
M. U.
,
Ghanim
A. A. J.
&
Irfan
M.
(
2024
)
Intercomparison of deep learning models in predicting streamflow patterns: insight from CMIP6
,
Scientific Reports
,
14
(
1
),
17468
.
Bibi
T.
,
Nawaz
F.
,
Rahman
A.
,
Razak
K.
&
Latif
A.
(
2018
)
Flood risk assessment of River Kabul and Swat catchment area: district Charsadda, Pakistan
,
ISPRS – International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
,
XLII-4/W9
,
105
113
.
Chen
T.
&
Guestrin
C.
(
2016
) '
XGBoost: A scalable tree boosting system
',
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16)
.
ACM
, pp.
785
794
.
https://doi.org/10.1145/2939672.2939785
.
Chen, L., Wang, Q., Zhu, G., Lin, X., Qiu, D., Jiao, Y., Lu, S., Li, R., Meng, G. & Wang, Y.
(
2024
)
Dataset of stable isotopes of precipitation in the Eurasian continent
,
Earth System Science Data
,
16
(
3
),
1543
1557
.
Chengsheng
T.
,
Huacheng
L.
&
Bing
X.
(
2017
)
AdaBoost typical algorithm and its application research
,
MATEC Web of Conferences
,
139
,
00222
.
Fang
G.
,
Yang
J.
,
Yaning
C.
&
Zammit
C.
(
2015
)
Comparing bias correction methods in downscaling meteorological variables for hydrologic impact study in an arid area in China
,
Hydrology and Earth System Sciences
,
19
,
2547
2559
.
Fürnkranz
J.
,
Chan
P. K.
,
Craw
S.
,
Sammut
C.
,
Uther
W.
,
Ratnaparkhi
A.
,
Jin
X.
,
Han
J.
,
Yang
Y.
,
Morik
K.
,
Dorigo
M.
,
Birattari
M.
,
Stützle
T.
,
Brazdil
P.
,
Vilalta
R.
,
Giraud-Carrier
C.
,
Soares
C.
,
Rissanen
J.
,
Baxter
R. A.
,
Bruha
I.
,
Baxter
R. A.
,
Webb
G. I.
,
Torgo
L.
,
Banerjee
A.
,
Shan
H.
,
Ray
S.
,
Tadepalli
P.
,
Shoham
Y.
,
Powers
R.
,
Webb
G. I.
,
Scott
S.
,
Blockeel
H.
&
De Raedt
L.
(
2011
)
Mean absolute error
. In:
Encyclopedia of Machine Learning
.
Springer
, pp.
652
653
.
https://doi.org/10.1007/978-0-387-30164-8_525
.
Haleem
K.
,
Khan
A.
,
Khan
F.
,
Zada
U.
,
Khan
J.
&
Khan
M.
(
2023a
)
Futuristic hydroclimatic projections under CMIP6 GCMs: Implications for water resources management
,
Research Square
,
[Preprint]. https://doi.org/10.21203/rs.3.rs-3222779/v1
.
Haleem
K.
,
Khan
A. U.
,
Khan
J.
,
Ghanim
A. A. J.
&
Al-Areeq
A. M.
(
2023b
)
Evaluating future streamflow patterns under SSP245 scenarios: insights from CMIP6
,
Sustainability
,
15
(
22
),
16117
.
Ishak
A.
,
Siregar
K.
,
Aspriyati
,
Ginting
R.
&
Afif
M.
(
2020
)
Orange software usage in data mining classification method on the dataset lenses
,
IOP Conference Series: Materials Science and Engineering
,
1003
,
012113
.
Kourtis
I. M.
&
Tsihrintzis
V. A.
(
2022
)
Update of intensity-duration-frequency (IDF) curves under climate change: a review
,
Water Supply
,
22
(
5
),
4951
4974
.
Maraun
D.
(
2016
)
Bias correcting climate change simulations – a critical review
,
Current Climate Change Reports
,
2
(
4
),
211
220
.
Masum
M. M. H.
&
Pal
S.
(
2021
) '
Development of rainfall Intensity Duration Frequency (IDF) curves for Chattogram City
',
Proceedings of the International Conference on Sustainable Development in Technology for 4th Industrial Revolution 2021 (ICSDTIR-2021)
.
Port City International University
. pp.
170
176
.
Millington
N.
,
Das
S.
&
Simonovic
S.
(
2011
)
The Comparison of GEV, Log-Pearson Type 3 and Gumbel Distributions in the Upper Thames River Watershed under Global Climate Models: (print) XXX-X-XXXX-XXXX-X; (online) XXX-X-XXXX-XXXX-X
.
Nisa
S.
(
2012
)
Trends and variability in climate parameters of Peshawar district
,
Science, Technology and Development
,
31
,
341
347
.
Noor
M.
,
Ismail
T.
,
Chung
E.-S.
,
Shahid
S.
&
Sung
J. H.
(
2018
)
Uncertainty in rainfall intensity duration frequency curves of Peninsular Malaysia under changing climate scenarios
,
Water
,
10
(
12
),
1750
.
Otto
F.
,
Zachariah
M.
,
Saeed
F.
,
Siddiqi
A.
,
Kamil
S.
,
Mushtaq
H.
,
Thanigachalam
A.
,
Achutarao
K.
,
Chaithra
S. T.
,
Barnes
C.
,
Philip
S.
,
Kew
S.
,
Vautard
R.
,
Koren
G.
,
Pinto
I.
,
Wolski
P.
,
Vahlberg
M.
,
Singh
R.
,
Arrighi
J.
&
Clarke
B.
(
2023
)
Climate change increased extreme monsoon rainfall, flooding highly vulnerable communities in Pakistan
,
Environmental Research: Climate
,
2 (2), 025001
.
Pakistan Meteorological Department (2021) Annual Report 2021, Islamabad, Pakistan.
Rashid
M. M.
,
Faruque
S. B.
&
Alam
J. B.
(
2012
)
Modeling of short duration rainfall intensity-duration-frequency (SDR-IDF) equation for Sylhet City in Bangladesh
,
ARPN Journal of Science and Technology
,
2
(
2
),
92
95
.
Ren
Q.
,
Cheng
H.
&
Han
H.
(
2017
)
Research on machine learning framework based on random forest algorithm
,
AIP Conference Proceedings
,
1820
(
1
),
080020-1
080020-7
.
https://doi.org/10.1063/1.4977376
.
Samantaray
S.
&
Sahoo
A.
(
2020
)
Estimation of flood frequency using statistical method: Mahanadi River basin, India
,
H2Open Journal
,
3
(
1
),
189
207
.
https://doi.org/10.2166/h2oj.2020.004
.
Sanmorino
A.
,
Marnisah
L.
&
Sunardi
H.
(
2023
) '
Feature selection using extra trees classifier for research productivity framework in Indonesia
',
Proceedings of the 3rd International Conference on Electronics, Biomedical Engineering, and Health Informatics
.
Surabaya
,
5–6 October 2022
, pp.
13
21
.
https://doi.org/10.1007/978-981-99-0248-4_2
.
Schlef
K. E.
,
Kunkel
K. E.
,
Brown
C.
,
Demissie
Y.
,
Lettenmaier
D. P.
,
Wagner
A.
,
Wigmosta
M. S.
,
Karl
T. R.
,
Easterling
D. R.
,
Wang
K. J.
,
François
B.
&
Yan
E.
(
2023
)
Incorporating non-stationarity from climate change into rainfall frequency and intensity-duration-frequency (IDF) curves
,
Journal of Hydrology
,
616
,
128757
.
Schneider
P.
&
Xhafa
F.
(
2022
)
Chapter 3 - Anomaly detection: concepts and methods
. In:
Schneider
P.
&
Xhafa
F.
(eds.)
Anomaly Detection and Complex Event Processing Over IoT Data Streams
.
Academic Press
, pp.
49
66
.
https://doi.org/10.1016/B978-0-12-823818-9.00013-4
.
Sharma
S.
(
2021
)
Classification and regression trees: the use and significance of trees in analytics
,
Journal on Recent Innovation in Cloud Computing, Virtualization & Web Applications
,
5
(
1
),
1
2021
.
ISSN: 2581-544X
.
Silva
D. F.
,
Simonovic
S. P.
,
Schardong
A.
&
Goldenfum
J. A.
(
2021a
)
Assessment of non-stationary IDF curves under a changing climate: case study of different climatic zones in Canada
,
Journal of Hydrology: Regional Studies
,
36
,
100870
.
https://doi.org/10.1016/j.ejrh.2021.100870
.
Teegavarapu
R. S. V.
(
2010
)
Modeling climate change uncertainties in water resources management models
,
Environmental Modelling & Software
,
25
(
10
),
1261
1265
.
Thanh
S. T.
&
Xuan
A. H.
(
2023
)
Deriving of intensity -duration -frequency (IDF) curves for precipitation at Hanoi, Vietnam
,
E3S Web of Conferences
,
403
,
06002
.
https://doi.org/10.1051/e3sconf/202340306002
.
Tian
Y.
,
Zhao
Y.
,
Son
S. W.
,
Luo
J. J.
,
Oh
S.-G.
&
Wang
Y.
(
2023
)
A deep-learning ensemble method to detect atmospheric rivers and its application to projected changes in precipitation regime
,
Journal of Geophysical Research: Atmospheres
,
128
,
e2022JD037041
.
https://doi.org/10.1029/2022JD037041
.
Tian
Y.
,
Zhao
Y.
,
Li
J.
,
Xu
H.
,
Zhang
C.
,
Lin
D.
,
Wang
Y.
&
Peng
M.
(
2024
)
Improving CMIP6 atmospheric river precipitation estimation by cycle-consistent generative adversarial networks
,
Journal of Geophysical Research: Atmospheres
,
129
(
14
),
e2023JD040698
.
https://doi.org/10.1029/2023JD040698
.
Ullah
W.
,
Takaaki
N.
,
Mohammad
N.
,
Zaman
R.
&
Ali
M.
(
2017
)
Understanding climate change vulnerability, adaptation and risk perceptions at household level in Khyber Pakhtunkhwa, Pakistan
,
International Journal of Climate Change Strategies and Management
,
10
.
Ullah
W.
,
Nihei
T.
,
Nafees
M.
,
Zaman
R.
&
Ali
M.
(
2018
)
Understanding climate change vulnerability, adaptation and risk perceptions at the household level in Khyber Pakhtunkhwa, Pakistan
,
International Journal of Climate Change Strategies and Management
,
10
(
3
),
359
378
.
https://doi.org/10.1108/IJCCSM-02-2017-0038
.
Ullah
B.
,
Fawad
M.
,
Khan
A. U.
,
Mohamand
S. K.
,
Khan
M.
,
Iqbal
M. J.
&
Khan
J.
(
2023
)
Futuristic streamflow prediction based on CMIP6 scenarios using machine learning models
,
Water Resources Management
,
37
(
15
),
6089
6106
.
Wang
Q.
,
Liu
Y.
,
Zhu
G.
,
Lu
S.
,
Chen
L.
,
Jiao
Y.
,
Li
W.
,
Li
W.
&
Wang
Y.
(
2025
)
Regional differences in the effects of atmospheric moisture residence time on precipitation isotopes over Eurasia
,
Atmospheric Research
,
314
,
107813
.
Xiong
C.
,
Tao
H.
,
Liu
S.
,
Wen
Z.
,
Shang
Y.
,
Wang
Q.
,
Fang
C.
,
Li
S.
&
Song
K.
(
2024
)
Using satellite imagery to estimate CO2 partial pressure and exchange with the atmosphere in the Songhua River
,
Journal of Hydrology
,
634
,
131074
.
Xu
H.
,
Zhao
Y.
,
Zhao
D.
,
Duan
Y.
&
Xu
X.
(
2024
)
Improvement of disastrous extreme precipitation forecasting in North China by Pangu-weather AI-driven regional WRF model
,
Environmental Research Letters
,
19
.
https://doi.org/10.1088/1748-9326/19/1/014003
.
Yan
F.
,
Wang
X.
,
Huang
C.
,
Zhang
J.
,
Su
F.
,
Zhao
Y.
&
Lyne
V.
(
2023
)
Sea reclamation in Mainland China: process, pattern, and management
,
Land Use Policy
,
127
,
106555
.
Zhang
P.
,
Jia
Y.
&
Shang
Y.
(
2022
)
Research and application of XGBoost in imbalanced data
,
International Journal of Distributed Sensor Networks
,
18
,
155013292211069
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).