Drought is quantified with one or a set of drought indices for monitoring and risk management. These indices have a limited ability to capture drought impacts. Drought impact prediction models have been developed to explore the interactions between the drought impact data and the physical drought indices. This study demonstrates the use of extreme gradient boosting (XGB), a well-known machine learning technique, to predict the likelihood of impact occurrence (LIO) of drought on public water supply as a function of drought indices, with high accuracy and low uncertainty. Using text-based drought impact data from multiple sources, the prediction accuracy of drought LIO on the public water supply of South Korea was evaluated using XGB and reference models (log-logistic, support vector machine, and random forest). We also analyzed receiver operating characteristics and quantified the uncertainty of each model with bootstrapping. This study shows that XGB and random forest have a high level of suitability. However, random forest presents a higher level of uncertainty than XGB for predicting drought LIO on the public water supply in South Korea. Although some limitations exist, the results suggest that text-based drought impact data collected from multiple sources can provide insightful information for drought risk management.

  • SPEI was used to model the likelihood of drought impact on public water supply.

  • The South Korean drought impact inventory was constructed using text-based data.

  • XGB showed the best predictive performance with high accuracy and low uncertainty.

Drought is a complex natural hazard that affects the environment, society, and economy (Wilhite et al. 2000). Since the 1990s, more than two billion people have been affected, in addition to more than 11 million casualties worldwide due to drought (UNISDR 2009; EM-Dat 2019). The Tana River Basin, which plays an essential role in hydropower generation in Kenya, experienced continuous droughts from 1999 to 2001. This has led to severe water scarcity and power shortages, causing hydropower generation and industrial production losses of approximately 2 billion USD (Mogaka et al. 2006). Drought was a direct cause of more than 500,000 fatalities in Africa during the 1980s (Kallis 2008). According to the Australian Bureau of Agricultural and Resource Economics and Sciences, winter cereal crop yields across Australia were reduced by 36% due to a 2006 drought, leading to fiscal crises for numerous farmers and a total cost of 3.5 billion AUD (Wong et al. 2010).

The Korean peninsula has been experiencing an extreme nationwide drought over a 4–6 year cycle, and its impacts are becoming more pronounced (Hong et al. 2016a). South Korea experienced severe droughts with an annual precipitation 35–50% lower than the average values of 2013–2015 (Kwon et al. 2016). It is one of the highest prolonged droughts recently observed in South Korea. This drought has led to water scarcity in agriculture, resulting in decreased crop production by 17.3% and increased food prices, which have severely impacted the Korean economy (Hong et al. 2016b). Furthermore, the characteristics of the drought, such as the frequency, period, and severity in South Korea, are projected to change with the increasing impacts of climate change (Boo et al. 2006; Yoo et al. 2012; Nam et al. 2015; Waseem et al. 2016).

Droughts are a creeping phenomenon considering the prolonged lag time on reduced precipitation, which makes it challenging to determine the onset, extent, and end of the drought. Quantifying drought events in terms of their geographic extent, scale, intensity, and duration is therefore problematic (Wilhite & Svoboda 2000). Multiple drought indices have been suggested for evaluating droughts (e.g., McKee et al. 1993; Svoboda et al. 2002; Shulka & Wood 2008; Mu et al. 2013), including the Palmer Drought Severity Index (Palmer 1965), Standardized Precipitation Index (SPI; McKee et al. 1993), and Standardized Precipitation Evapotranspiration Index (SPEI; Vicente-Serrano et al. 2010). Zhao et al. (2014) analyzed meteorological and hydrological drought characteristics in the Jinghe Basin of China, using SPI and Standardized Runoff Index (SRI; Shulka & Wood 2008), respectively. Lee et al. (2022) analyzed drought characteristics (i.e., agricultural, hydrological, and meteorological) in South Korea, with calculated SPI, Standardized Soil Moisture Index, and Standardized Streamflow Index (Barella-Ortiz & Quintana-Seguí 2019), respectively. However, these drought indices have a limited ability to capture drought impacts such as wildfires, water shortages, and crop losses (Gudmundsson et al. 2014; Blauhut et al. 2015; Stagge et al. 2015).

Blauhut et al. (2015) recently suggested the use of a log-logistic (LL) regression function to relate a drought index (SPEI) to drought impacts in the public water supply, energy and industry, water quality, and agriculture and livestock farming sectors in European countries. This was undertaken using the European Drought Impact Inventory (EDII; Stahl et al. 2016), which encompasses drought impact data from 15 categories and 33 countries. Stagge et al. (2015) strengthened the link between drought impacts and the index by incorporating data on seasonality, interannual trends, and nonlinear effects of droughts. Blauhut et al. (2016) presented a method that examined the vulnerability factors and drought index (SPEI) used for monitoring to model the likelihood of impact occurrence (LIO). Recently, Bachmair et al. (2017) used an ensemble tree-based model (i.e., random forest; Breiman 2001) to quantify the link between the drought index and drought impact. Furthermore, Sutanto et al. (2019) assessed the forecasting of drought impacts in Germany using hydrometeorological drought indices (i.e., SRI) and drought impacts in EDII. These studies have shown that drought impacts can be forecasted using machine learning techniques at temporal scales of a few months, based on the duration of the drought and the number of reported drought impacts. Note that all these studies were based on the EDII.

The majority of previous drought studies undertaken in South Korea have been based on hydrometeorological data and drought indices (e.g., Kim et al. 2012, 2014; Um et al. 2017, 2018a, 2018b; Bae et al. 2019), except for a few efforts to develop new drought indices using drought impact data (e.g., Lee et al. 2016; Jung et al. 2020). Lee et al. (2016) explored the use of unstructured data in the study of droughts based on correlation analysis between SPI as the meteorological drought index, reservoir water storage rate data, and drought impact data on agriculture collected from news articles. Jung et al. (2020) calculated meteorological and hydrological big data drought indices by combining unstructured data from news articles with hydrometeorological observation data, including precipitation and dam inflows using the Clayton Copula function. At present, there have been no studies in South Korea that have attempted to predict drought impacts using the relationship between drought indices and impacts.

The present study demonstrates the use of machine learning techniques to predict the LIO of drought in the public water supply as a function of drought indices. Data on unstructured drought impacts on the public water supply were collected from multiple sources and then processed into LIO data. A model relating to the drought index, specifically SPEI, with drought impact, was then developed using extreme gradient boosting (XGB; Chen & Guestrin 2016) and other machine learning approaches, including LL regression (Cox 1958), support vector machine (SVM; Cortes & Vapnik 1995), and random forest (RF; Breiman 2001). These have been used in similar studies and have shown that machine learning approaches could predict drought impacts with high performance and low uncertainty (Blauhut et al. 2015; Bachmair et al. 2017; Sutanto et al. 2019, 2020). Model uncertainty and prediction accuracy were examined using bootstrapping and receiver operating characteristic (ROC) curves. Building upon the prior literature, this study highlights the applications of XGB in drought impact prediction while considering prediction skill and uncertainty. Furthermore, the possibility of applying a drought impact prediction model with unstructured local impact data, other than EDII, is also demonstrated.

Study design

Figure 1 presents a flowchart showcasing the data and methodology of this study. The drought impact inventory was constructed using multiple sources and used to derive binary drought impact data (Section 2.2). SPEI was calculated using observational meteorological data (Section 2.3). Using the likelihood concept, the drought impact function was calculated based on SPEI (Section 2.4). The drought impact was quantified as the drought impact occurrence probability for each SPEI value, using the drought impact function. The drought impact was then predicted using different machine learning techniques. The model prediction accuracies were evaluated using the ROC curve and coefficient of determination () values. The model prediction uncertainties were then quantified using the bootstrap method (Section 2.5).
Figure 1

Flowchart showcasing the data and methodology.

Figure 1

Flowchart showcasing the data and methodology.

Close modal

Drought impact inventory

In this study, a drought impact inventory was constructed by collecting text-based data on South Korea from 1990 to 2019 (Table 1). In South Korea, past drought impact data are predominantly available from government reports. The National Drought Information Analysis Center (http://www.drought.go.kr/) was established in 2015 and manages a database of drought impact information related to the public water supply. In addition, data from news articles were collected using a web crawling method (Manning et al. 2008). The portal site Naver (http://www.naver.com) was selected as it is the most popular search engine in South Korea, providing many news articles. All news articles related to the drought were collected based on keywords, including drought, drought impact, drought damage, impact, and damage, using the ‘including’ filters on the portal site. Morphological analysis was performed on the articles collected, and nouns were extracted by removing unnecessary text, such as symbols and special characters. Subsequently, the extracted nouns were then compared with specific keywords from drought impact categories, such as fire, agriculture, livestock, green algae, and outage. The articles were classified based on relevant sectors, and the drought impacts were classified into categories of public water supply, wildfire, water quality, and agriculture and livestock farming, in line with the EDII technical report (Stahl et al. 2012) for 17 provinces (Figure 2). Finally, duplicate drought impact data were removed based on the date of occurrence of the drought impact.
Table 1

Sources used to construct the drought impact inventory

Reference typeSources
Database National Drought Information Portal www.drought.go.kr 
Korea Water Resources Corporation Emergency Water Supply Statistics Database, 2019 
Newspaper article Naver (largest search portal in South Korea) Web Crawling 
Governmental report Ministry of Environment National Drought Records Survey Report, 1995/2001 
Ministry of the Interior and Safety National Drought Information Statistics Report, 2018 
Korea Water Resources Corporation Drought Information Annual Report, 2018 
Reference typeSources
Database National Drought Information Portal www.drought.go.kr 
Korea Water Resources Corporation Emergency Water Supply Statistics Database, 2019 
Newspaper article Naver (largest search portal in South Korea) Web Crawling 
Governmental report Ministry of Environment National Drought Records Survey Report, 1995/2001 
Ministry of the Interior and Safety National Drought Information Statistics Report, 2018 
Korea Water Resources Corporation Drought Information Annual Report, 2018 
Figure 2

Study area and provinces within the study region. R1, Busan; R2, Chungbuk; R3, Chungnam; R4, Daegu; R5, Daejeon; R6, Gangwon; R7, Gwangju; R8, Gyeongbuk; R9, Gyeonggi; R10, Gyeongnam; R11, Incheon; R12, Jeju; R13, Jeonbuk; R14, Jeonnam; R15, Sejong; R16, Seoul; R17, Ulsan. The three regions that collected large drought impact data on public water supply are highlighted in gray.

Figure 2

Study area and provinces within the study region. R1, Busan; R2, Chungbuk; R3, Chungnam; R4, Daegu; R5, Daejeon; R6, Gangwon; R7, Gwangju; R8, Gyeongbuk; R9, Gyeonggi; R10, Gyeongnam; R11, Incheon; R12, Jeju; R13, Jeonbuk; R14, Jeonnam; R15, Sejong; R16, Seoul; R17, Ulsan. The three regions that collected large drought impact data on public water supply are highlighted in gray.

Close modal

Meteorological drought index

SPEI was used to quantify drought hazards. This method has the advantage of considering both temperature and precipitation. The estimation of SPEI uses the climate water balance concept, which calculates precipitation while excluding potential evapotranspiration and fits the data to a probability distribution. The LL distribution, which provides a better fit for extremely negative values, is recommended for the SPEI (Hernandez & Uddameri 2014). The process of calculating the SPEI using climatic data is summarized below.

First, the potential evapotranspiration was calculated using the method proposed by Thornthwaite (1948) based on temperature, latitude, and month data. The climate water balance () was calculated from the difference between precipitation () and potential evapotranspiration () for a given timescale, i, as follows:
(1)
Second, the calculated climate water balance () values were aggregated at the timescale. Third, the accumulated climate water balance was normalized to the LL distribution. The cumulative density function of a three-parameter LL distribution is expressed as:
(2)
where (in Equation (1)), , , and are the scale, shape, and origin parameters, respectively. Finally, the SPEI was estimated with . Then, the SPEI value could be obtained using the standardized value of according to the classical approximation method by Abramowitz & Stegun (1965) as follows:
(3)
where for . p is the probability of exceeding a determined value and is given as . If , then p was replaced by and the sign of the resultant SPEI was reversed. The constants are , , and , , , and . SPEI has positive and negative ranges where values greater than 2 are considered extremely wet, 1.5–2 is very wet, 1–1.5 moderately wet, −1 to 1 normal, −1.5 to −1 moderately dry, −2 to −1.5 severely dry, and values below −2 extremely dry (Vicente-Serrano et al. 2010).
The SPEI was estimated for 12 months using 30 years of climate data (i.e., temperature and precipitation) from 1990 to 2019. Daily data were collected from 67 Automated Synoptic Observing System (ASOS) stations of the Korea Meteorological Administration and then averaged to obtain monthly data. The SPEI was calculated at each ASOS station and then matched with each corresponding province. The SPEI time series were then averaged by province (Figure 3). When the year is determined to be a meteorological drought condition with the SPEI less than 0, the drought impact on public water supply is observed (Figure 3).
Figure 3

Time series of SPEI values for each ASOS station within the province (thin line) and average SPEI values for all ASOS stations in the province (bold line) in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide. The shaded area indicates the years with one or more reported drought impacts from 1990 to 2019.

Figure 3

Time series of SPEI values for each ASOS station within the province (thin line) and average SPEI values for all ASOS stations in the province (bold line) in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide. The shaded area indicates the years with one or more reported drought impacts from 1990 to 2019.

Close modal

Modeling the likelihood of drought impact occurrence

This study related the occurrence of drought impacts to the drought index following the methods of Blauhut et al. (2016). The monthly SPEI values were considered independent variables, whereas the monthly drought impact binary data (0 for no impact and 1 for impact) were considered dependent variables. The values were obtained from the drought impact inventory, which was constructed using data from multiple unstructured data sources. The LIO ranged from 0 to 1 and was estimated using the XGB and three other reference models: LL, SVM, and RF. LL and RF have been used in previous drought impact studies (Blauhut et al. 2015, 2016; Stagge et al. 2015; Bachmair et al. 2017; Sutanto et al. 2019, 2020), and to our knowledge, SVM and XGB were employed for the first time in this study. While SVM is a well-known machine learning algorithm, XGB is a recently developed machine learning algorithm presenting outstanding performance across multiple subject areas, including economics (Carmora et al. 2019), human disease risk assessment (Zhang et al. 2019b), streamflow forecasting (Zhang et al. 2019a; Ni et al. 2020), and flash-flood risk assessment (Ma et al. 2021; Maduhuri et al. 2021).

Log-logistic model

The LL statistical regression model is used to model a binary-dependent variable using a logistic function (Cox 1958). It is the statistical fit of the logit function to a dataset that can calculate the probability of the occurrence of a specific event or a certain value based on the linear combination of independent variables. In the LL model, the log odds of drought LIO were modeled as a linear combination of independent variables, namely SPEI, following the methods of Gudmundsson et al. (2014) (Equation (4)).
(4)
where and are model parameters, estimated using standard regression techniques within the framework of generalized linear models (Venables & Ripley 2002).

Support vector machine

The SVM is one of the most popular and representative supervised learning models for classification and regression tasks (Cortes & Vapnik 1995). Classification, which is a common assignment in machine learning, is the process of determining the class for a given set of data. SVM is based on the concept of finding a line or hyperplane (margin) that best separates the data into two classes. The classifier has a lower generalization error when the margins are larger. Therefore, the line or hyperplane with the greatest distance from the nearest training data for any class is the determining factor for robust classification.

In this study, the default radial basis function kernel was used as the classifier for SVM modeling, and three parameters – gamma, cost, and epsilon – were tuned. Gamma is responsible for the degree of linearity of the hyperplane, cost is responsible for the size of the margin of the SVM means of the weight according to the misclassification, and epsilon is the margin of tolerance, where no weight is given to the errors. If gamma had a higher value, cost had a lower value, and epsilon was close to 0. Accordingly, the hyperplane was curved with a larger margin, leading to overfitting because the model had a high bias and low variance (Pardo & Sberveglieri 2005). The SVM is very sensitive to parameter selection, and even minor changes in the parameters can lead to very different classification results (Lin et al. 2008).

Therefore, 10-fold cross-validation was conducted, which is a commonly used process to tune the optimized parameters for each region and evaluate the effectiveness of SVM with the selected parameters. The data are divided randomly into ten parts, of which nine are used for training and one for the test. The tuning ranges for each parameter are as follows: gamma from 0.5 to 2, cost from 4 to 16, and epsilon from 0 to 1. The cost parameter was set to 4 and the epsilon parameter for the region was set to 0.35. Given that gamma can have different values for each region, it was set to 1 in Gangwon and 0.5 in other regions.

Random forest

RF is a representative machine learning model that constructs numerous decision trees on bootstrapped subsamples for classification or regression and is suitable for prediction model development (Breiman 2001). RF is a special type of bagging concept for an ensemble meta-algorithm that aggregates base classifiers trained on slightly different training data through bootstrapping. RF prevents overfitting to the training data configuration by generating numerous random independent tresses and estimates the error cost-effective because there is no iterative training cost of the model related to cross-validation. RF is, therefore, widely used in various studies because of its flexibility, high accuracy, and better performance compared with other machine learning models (e.g., Wang et al. 2015; Naghibi et al. 2016; Bachmair et al. 2017).

In this study, default values were set for modeling for all parameters, except for two parameters, mtry and ntree, which have the greatest effect on the prediction performance of RF (Liew & Wiener 2002). The mtry is the number of variables randomly sampled for partitioning at each node and ntree refers to the number of trees grown. The lower mtry values improve the stability of the bagging as tree ensembles have more differences and a lower level of correlation (Strobl et al. 2008; Probst et al. 2019). And if excessive trees are generated owing to high ntree values, RF increased computational cost without significant performance gains (Oshiro et al. 2012; Probst & Boulesteix 2018). For small datasets, such as those used in the current study, small trees were suggested as even sufficient to get good performance (Oshiro et al. 2012). Therefore, an out-of-bag error value, which is a commonly used way of tuning the RF parameter, was used to set the optimized parameters. The out-of-bag error is the average error for each predicted result calculated using predictions from the trees that do not use that data in each bootstrap sample of RF. One-third of the data is used for model validation, while the remaining two-thirds of the data is to train RF. The tuning ranges for each parameter are as follows: mtry from 1 to 100 and ntree from 1 to 500. The mtry and ntree parameters were set to 2 and 50 for all regions, respectively.

Extreme gradient boosting

XGB refers to the gradient boosting concept and is an ensemble machine learning algorithm based on a decision tree for solving regression and classification problems (Chen & Guestrin 2016). The two main strengths of XGB are its superior execution speed and model performance when compared with other gradient boosting implementations. The gradient boosting concept was fitted via the gradient descent optimization algorithm and any arbitrary differentiable loss function. The loss gradient was minimized when the model was fitted, which is similar to a neural network. Trees were added to the ensemble one at a time and fit by weighting to correct for the prediction error of prior models. Based on the ensemble results of the previous model, the sample weight was adjusted for the next model result to proceed with the ensemble construct. The XGB model utilizes limited computational resources for boosted trees with improved gradient boosting. In contrast with gradient boosting, which builds trees sequentially, XGB builds trees in parallel, which improves the processing speed. This suggests that the model was designed to be more computationally efficient than other open-source programs.

Tuning the XGB is complicated because changing any parameter can affect the optimal values of the others. All the parameters were set with default values, except for three parameters, max_depth, nround, and early stopping rounds. These have the most pronounced effect on the prediction performance of XGB (Carmora et al. 2019). Controlling these parameters is also important for XGB to avoid overfitting. Our study used 10-fold cross-validation to set the optimized max_depth parameter. The data are divided randomly into ten parts, of which nine are used for training and one for the test, as we have done for the SVM. The max_depth parameter, which is the maximum depth of an individual decision tree, was set to 2 in the range of 2–10. With a high value for the max_depth parameter, the model could improve its accuracy, but it would be more complex and more likely to overfit (Carmora et al. 2019). Moreover, XGB aggressively consumes memory when training a deep tree with a high max_depth value. The nround parameter, which is the maximum number of boosting iterations, was set to 200, and the early stopping rounds, which is the parameter for controlling the patience of how many iterations the user will wait for the next decrease in the loss value, was set to 50 to avoid overfitting. As early stopping rounds are set, XGB can prevent overfitting and get stable performance (Fan et al. 2018; Bikmukhametov & Jäschke 2019; Qiu et al. 2022).

Evaluation of model performance

Model performance was evaluated using the ROC curve and area under the curve (AUC). This provides a comprehensive evaluation of regression and classification models (Wilks 2001; Mason & Graham 2002; Hernández-Orallo et al. 2013). The ROC curve is a tool for the visual assessment of each model, while the AUC value is a numeric representation of the model performance. For the ROC curve, models with curves closer to the 45° diagonal indicate lower performance, whereas those closer to the top-left corner indicate better performance. The ROC curve is expressed by a combination of metrics, which are calculated using a confusion matrix. A well-known metric combination for evaluating predictive models is the true positive rate (TPR) and the false positive rate (FPR). TPR is also known as sensitivity and defines the proportion of correctly predicted positive results across all positive samples, whereas FPR defines the proportion of incorrectly predicted negative results across all negative samples. The ROC curve is widely used to evaluate probabilistic forecasting systems such as the value of ensemble weather forecasts (Liguori et al. 2012), rainfall thresholds estimation for shallow landslide forecasting (Gariano et al. 2015), and drought impact prediction (Blauhut et al. 2015). The AUC can be quantified by the area under the ROC curve and has a value between 0 and 1. If the AUC value is greater than 0.5, the predictions of the chosen model are better than those of random guesses, while values close to 1 indicate the creation of a perfect model. In this study, test data with the same sample size were generated using simple random sampling to estimate the ROC curve and the AUC value.

Quantification of model uncertainty

Obtaining estimates of machine learning model uncertainties for newly predicted data is essential for determining whether predictions can be trusted. A common approach for such uncertainty quantification is to estimate the error from an ensemble of models. These are often generated by the bootstrap method (e.g., Slaets et al. 2017; Bomer et al. 2019). The bootstrap method, a resampling technique that samples a dataset with replacement, is used to estimate statistics including bias, variance, and confidence intervals. The confidence interval (95%) for each model was constructed using the bootstrap method (Efron 1979) by randomly sampling a dataset 1,000 times with replacements.

Drought impact database

More than 3,000 impact data points were collected for 17 provinces and 4 categories. In total, 2,600 impact datasets related to the public water supply were collected (Figure 4(d)), which accounted for over 80% of the data. Therefore, the drought impact data were limited to public water supply for the purposes of further analysis.
Figure 4

The number of impact occurrences in the public water supply from 1990 to 2020, based on the drought impact inventory. (a) The number of entries per year in the specified period and (b) the ratio of entries by province for each year. (c) The number of entries by region and (d) the number of entries by subcategories in each region. Each province in this study is denoted as follows: R1, Busan; R2, Chungbuk; R3, Chungnam; R4, Daegu; R5, Daejeon; R6, Gangwon; R7, Gwangju; R8, Gyeongbuk; R9, Gyeonggi; R10, Gyeongnam; R11, Incheon; R12, Jeju; R13, Jeonbuk; R14, Jeonnam; R15, Sejong; R16, Seoul; R17, Ulsan (refer to Figure 2).

Figure 4

The number of impact occurrences in the public water supply from 1990 to 2020, based on the drought impact inventory. (a) The number of entries per year in the specified period and (b) the ratio of entries by province for each year. (c) The number of entries by region and (d) the number of entries by subcategories in each region. Each province in this study is denoted as follows: R1, Busan; R2, Chungbuk; R3, Chungnam; R4, Daegu; R5, Daejeon; R6, Gangwon; R7, Gwangju; R8, Gyeongbuk; R9, Gyeonggi; R10, Gyeongnam; R11, Incheon; R12, Jeju; R13, Jeonbuk; R14, Jeonnam; R15, Sejong; R16, Seoul; R17, Ulsan (refer to Figure 2).

Close modal

Some temporal and regional deviations were noted in terms of data quantity and quality (Figure 4(a)). Drought impact data related to the public water supply dated predominantly from after 2009, accounting for 90% of the total impact data. Most of the drought impact data were collected from a specific region, such as Jeonnam, accounting for 30–80% of the total data from 2009 to 2013. The Gangwon, Gyeonggi, and Jeonnam regions (R6, R9, and R14 in Figure 2, respectively) presented the greatest amount of drought impact data on public water supply; therefore, these regions were selected for further analysis (Figure 4(b)). Nationwide data for all 17 provinces were also analyzed.

Model prediction accuracy

LIO was estimated by fitting all data using four models, and their performance was evaluated using value (Figures 5 and 6(a)). The four models are probability approaches to derive a regression function (here, LIO function) with parameter calibration (Section 2.4). This probabilistic approach has been suggested in previous studies to predict wildfire (Gudmundsson et al. 2014), forest fire (Kim et al. 2019), flood damage modeling (Spekkers et al. 2014), and drought impact (Blauhut et al. 2015; Stagge et al. 2015). Figure 5 shows the LIO on public water supply and model uncertainty by each region. The fitted model results showed how well SPEI of the region explained drought impact on public water supply. Overall, the XGB exhibited the best performance, with an value of almost 1 for all regions. Following XGB, RF presented the second-best performance, with ranging from 0.67 to 0.74 for all four regions. RF showed a well-fitted pattern but also indicated that LIO increased when no drought impact was observed. All models, except XGB, mis-predicted a significant number of drought occurrences when there were no drought events. The drought impact data showed that most drought events occurred after specific years (2014 in Gangwon, 2010 in Gyeonggi, and 2008 in Jeonnam), as indicated by the drought indices in Figure 3. The of LL and SVM was lower than 0.5 for all regions. In particular, the prediction skill was limited in Jeonnam, where the of LL and SVM was approximately 0.2 (Figure 6(a)).
Figure 5

Predicted drought LIO on public water supply by models (XGB, RF, SVM, and LL; black lines) and the uncertainty (red area) in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide. The shaded area indicates years with one or more reported drought impacts from 1990 to 2019. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/hydro.2023.064.

Figure 5

Predicted drought LIO on public water supply by models (XGB, RF, SVM, and LL; black lines) and the uncertainty (red area) in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide. The shaded area indicates years with one or more reported drought impacts from 1990 to 2019. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/hydro.2023.064.

Close modal
Figure 6

(a) value between the model prediction and observation of drought LIO and (b) the standard error of the bootstrap samples of prediction in Gangwon, Gyeonggi, Jeonnam, and nationwide from 1990 to 2019.

Figure 6

(a) value between the model prediction and observation of drought LIO and (b) the standard error of the bootstrap samples of prediction in Gangwon, Gyeonggi, Jeonnam, and nationwide from 1990 to 2019.

Close modal
The prediction accuracy was further evaluated using ROC because it is critical for capturing the true positive values in drought impact prediction, namely to predict drought impact when the drought impact occurs (Table 2 and Figure 7). The decision-tree-based models, XGB and RF, showed better performance in all four cases. The AUCs of XGB were the highest, with an average of 0.99 over the four regions, while RF showed the second-best performance, with an average AUC of 0.96. In Gangwon, LL and SVM also showed reasonable performance, with AUC values of 0.92 and 0.86, respectively. In these regions, other than Gangwon, the performances of LL and SVM were worse than those of XGB and RF.
Table 2

AUC for drought impact prediction in Gangwon, Gyeonggi, Jeonnam, and nationwide according to LL, SVM, RF, and XGB from 1990 to 2019

RegionMethod
LLSVMRFXGB
Gangwon 0.91 0.85 0.99 0.99 
Gyeonggi 0.74 0.79 0.99 0.99 
Jeonnam 0.67 0.67 0.98 0.99 
Nationwide 0.73 0.70 0.87 0.98 
RegionMethod
LLSVMRFXGB
Gangwon 0.91 0.85 0.99 0.99 
Gyeonggi 0.74 0.79 0.99 0.99 
Jeonnam 0.67 0.67 0.98 0.99 
Nationwide 0.73 0.70 0.87 0.98 
Figure 7

ROC curve for drought impact prediction in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide from 1990 to 2019.

Figure 7

ROC curve for drought impact prediction in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide from 1990 to 2019.

Close modal

Model uncertainty

The uncertainty of the model prediction was evaluated using the bootstrap method to quantify the confidence intervals (95%) (Figure 5) and estimate the standard error values (SE) (Figure 6(b)). Results showed that the uncertainty of the XGB was the lowest, given that the confidence intervals were narrow (Figure 5) and the SE was almost 0 (Figure 6(b)). Although RF had a similar model performance (Figure 5 and Table 2), it showed much greater uncertainty than the XGB (Figures 5 and 6(b)).

LL has a relatively low predictive performance, based on and ROC, and was estimated to have a lower uncertainty than RF and SVM. This suggests that linear logistic regression is more stable than other machine learning techniques. Furthermore, the SVM results for the Jeonnam region showed that the uncertainty of the model was much greater than that of the other three regions (Figure 6(b)). This is in accordance with its lowest AUC and values (Table 2 and Figure 6(a), respectively) and indicates that SVM is not suitable for the Jeonnam region, where the independent variable (SPEI) does not explain the dependent variable, as suggested by accuracy measures.

Drought impact prediction with XGB

The drought impact inventory constructed for this study is a new and valuable data source. The data on drought impact inventory is somewhat biased in time and space with an overall increasing trend for more recent events. These biases will decrease as more events are collected. Despite these limitations and uncertainties, the XGB model used to predict the likelihood of drought impact occurrence on public water supply as a function of SPEI was found to be meaningful in South Korea. These drought impact prediction models thus allow a quantitative assessment of regional differences in drought risk across South Korea.

The present study, via a case study of South Korea, demonstrates that XGB can predict drought LIO with high accuracy and low uncertainty. This may be because XGB builds one tree at a time and then updates the weights of the misclassified data in each classification process, before applying them in the next classification. Conversely, RF is simply a collection of trees, each of which provides a prediction while building each tree independently using a random sample of data. Moreover, RF collects the classification results from all trees and considers the mean, median, or mode as the prediction. However, there is a high probability that most trees will make predictions with some random chance, as each tree has its circumstances, which may include sample duplication, overfitting, and inappropriate node splitting. Therefore, the RF results show greater uncertainty.

Drought hazard characteristics with SPEI

Our study used binary drought impact on public water supply and SPEI, i.e., meteorological drought index, derived from the climatic observation data to derive a drought impact function. In particular, Gangwon region showed reasonable performance with all models. It means SPEI (independent variable) affects the LIO (dependent variable) closely in this region, which suggests that the region is more likely to suffer from drought impacts on public water supply due to meteorological drought. In contrast, Jeonnam region results suggested that the drought impact on the public water supply in that region might be better captured by other drought conditions, i.e., agricultural or hydrological drought conditions. To sum up, our results suggest that drought hazards in each area should be analyzed for better drought impact prediction.

Impact prediction for severe droughts

To further understand the model performance, we have performed additional model predictions for severe drought events with SPEI below −1 (Figure 8 and Table 3). In comparison to Figure 5 and Table 2, the model performance is low. Except for XGB, the AUCs are well below 1. It may be attributed to the fact that the sample size of drought impact on public water supply observed with SPEI less than −1 is not large enough for stable prediction (refer to Figure 3). Despite the deficient sample size, these results demonstrate the advantages of the XGB, with a value of 0.97 for the nationwide case. Note that the decisions for spatial sampling may affect the model performance. SPEI is a standardized variable and SPEI determines drought conditions with the same occurrence frequency everywhere (Blauhut et al. 2016). And the use of the mean SPEI, and the grouping of the samples into regional scales limits the precision of the classification between times with or without impacts and the corresponding drought indices. We expect as data availability increases, the analyses could be repeated at smaller spatial units.
Table 3

AUC for drought impact prediction for SPEI below −1 in Gangwon, Gyeonggi, Jeonnam, and nationwide according to LL, SVM, RF, and XGB from 1990 to 2019

RegionMethod
LLSVMRFXGB
Gangwon 0.76 0.74 0.77 0.99 
Gyeonggi 0.67 0.61 0.68 0.99 
Jeonnam 0.52 0.52 0.61 0.99 
Nationwide 0.54 0.55 0.56 0.97 
RegionMethod
LLSVMRFXGB
Gangwon 0.76 0.74 0.77 0.99 
Gyeonggi 0.67 0.61 0.68 0.99 
Jeonnam 0.52 0.52 0.61 0.99 
Nationwide 0.54 0.55 0.56 0.97 
Figure 8

Predicted drought LIO for SPEI below −1 on public water supply by models (XGB, RF, SVM, and LL; black lines) and the uncertainty (red area) in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide. The shaded area indicates years with one or more reported drought impacts from 1990 to 2019. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/hydro.2023.064.

Figure 8

Predicted drought LIO for SPEI below −1 on public water supply by models (XGB, RF, SVM, and LL; black lines) and the uncertainty (red area) in (a) Gangwon, (b) Gyeonggi, (c) Jeonnam, and (d) nationwide. The shaded area indicates years with one or more reported drought impacts from 1990 to 2019. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/hydro.2023.064.

Close modal

In this study, the LIO for the public water supply was modeled and evaluated in South Korea as a function of SPEI using XGB and three other reference models: LL, SVM, and RF. More than 3,000 drought impact data points were collected from various sources, such as databases, newspaper articles, and governmental reports. In particular, the text-based drought impact inventory constructed for this study is meaningful as it is the first such attempt in South Korea. The collected drought impact data showed somewhat time-biased, with an increasing trend in the number of reported drought impacts for recent drought events (Figure 4). Moreover, the drought impact function was fitted for the entire period using a binary drought impact and drought index, to predict the likelihood of drought impact occurrence.

The model prediction results showed that XGB exhibited the best performance for all regions. The RF showed a similar performance to XGB but with substantial uncertainty. This suggests that the advantage of XGB is based on boosting, which gives weight to misclassification and contributes to better model performance. The models showed the best performance in Gangwon and less effective performance in Jeonnam. This implies that the impact of drought on the public water supply in Gangwon is strongly associated with meteorological drought. However, other drought indices, such as the hydrological drought index (SRI), might improve drought impact prediction in Jeonnam, where the current model with SPEI showed less effective performance.

The results of this study suggest that XGB is suitable for drought impact prediction when considering the model prediction accuracy and uncertainty and indicates the possibility of using the drought impact prediction model with local data in South Korea other than EDII. This is the case despite the limited availability of drought impact data. However, the likelihood of drought impact occurrence was only assessed using machine learning models; thus, it is necessary to predict actual drought impacts in the future. As droughts have social and economic impacts in multiple areas beyond the public water supply, it is also necessary to predict the impact of droughts on other sectors. Future work on drought impact evaluation in several areas as well as decreased bias can be expected to improve the prediction skills of LIO modeling. It is therefore necessary to systematically archive drought impact data to build on recent advancements in deep learning techniques. Thus, the study also highlights the potential for using text-based impact data to characterize the risk of complex natural hazards other than droughts using an appropriate machine learning technique, specifically XGB.

This study was supported by the Basic Science Research Program through the National Research Foundation of Korea, which was funded by the Ministry of Science, ICT & Future Planning (No. 2020R1A2C2007670), and the Technology Advancement Research Program through the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant 22CTAP-C163540-02).

All relevant data are available from an online repository or repositories. The model usage and codes of reference models (LL; SVM; RF; XGB) are available at GitHub repository (https://github.com/krsmsuh/JHI_DI). The drought impact inventory is included in this paper.

The authors declare there is no conflict.

Abramowitz
M.
&
Stegun
I. A.
1965
Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables
.
Dover Publications, INC.
,
New York
, p.
1046
.
Bachmair
S.
,
Svensson
C.
,
Prosdocimi
I.
,
Hannaford
J.
&
Stahl
K.
2017
Developing drought impact functions for drought risk management
.
Nat. Hazards Earth Syst. Sci.
17
,
1947
1960
.
Barella-Ortiz
A.
&
Quintana-Seguí
Q.
2019
Evaluation of drought representation and propagation in regional climate model simulations across Spain
.
Hydrol. Earth Syst. Sci.
23
(
12
),
5111
5131
.
Bikmukhametov
T.
&
Jäschke
J.
2019
Oil production monitoring using gradient boosting machine learning algorithm
.
IFAC-PapersOnLine
52
(
1
),
514
519
.
Blauhut
V.
,
Stahl
K.
,
Stagge
J. H.
,
Tallaksen
L. M.
,
De Stefano
L.
&
Vogt
J.
2016
Estimating drought risk across Europe from reported drought impacts, drought indices, and vulnerability factors
.
Hydrol. Earth Syst. Sci.
20
,
2779
2800
.
Bomer
A.
,
Schielen
R. M. J.
&
Hulscher
S. J. M. H.
2019
Decreasing uncertainty in flood frequency analyses by including historic flood events in an efficient bootstrap approach
.
Nat. Hazards Earth Syst. Sci.
19
,
1895
1908
.
Breiman
L.
2001
Random forests
.
Mach. Learn.
45
,
5
32
.
Carmora
P.
,
Climent
F.
&
Momparler
A.
2019
Predicting failure in the U.S. banking sector: an extreme gradient boosting approach
.
Int. Rev. Econ. Finance
61
,
304
323
.
Chen
T.
&
Guestrin
C.
2016
XGBoost: a scalable tree boosting system
. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp.
785
794
.
Cortes
C.
&
Vapnik
V.
1995
Support-vector networks
.
Mach. Learn.
20
,
273
297
.
Cox
D. R.
1958
The regression analysis of binary sequences
.
J. R. Stat. Soc.
20
(
2
),
215
242
.
EM-DAT 2019 The International Disaster Database. Global ‘number killed’ and ‘number affected’ by drought between 1900–2009. Brussels, Belgium: University Catholique de Louvain. Available from: www.emdat.be/.
Gariano
S. L.
,
Brunetti
M. T.
,
Iovine
G.
,
Melillo
M.
,
Peruccacci
S.
,
Terranova
O.
,
Vennari
C.
&
Guzzetti
F.
2015
Calibration and validation of rainfall thresholds for shallow landslide forecasting in Sicily, southern Italy
.
Geomorphology
227
,
653
665
.
Gudmundsson
L.
,
Rego
F. C.
,
Rocha
M.
&
Seneviratne
S. I.
2014
Predicting above normal wildfire activity in Southern Europe as a function of meteorological drought
.
Environ. Res. Lett.
9
(
8
),
084008
.
Hernández-Orallo
J.
,
Flach
P.
&
Ferri
C.
2013
ROC curves in cost space
.
Mach. Learn.
93
,
71
91
.
Hong
E. M.
,
Nam
W. H.
,
Choi
J. Y.
&
Pachepsky
Y. A.
2016b
Projected irrigation requirements for upland crops using soil moisture model under climate change in South Korea
.
Agric. Water Manage.
165
,
163
180
.
Jung
J. H.
,
Park
D. H.
&
Ahn
J. H.
2020
Drought evaluation using unstructured data: a case study for Boryeong area
.
J. Korea Water Resour. Assoc.
53
(
12
),
1203
1210
.
Kallis, G. 2008 Droughts. Annual Review of Environment and Resources 33 (1), 85–118.
Kim
B. S.
,
Sung
J. H.
,
Kang
H. S.
&
Cho
C. H.
2012
Assessment of drought severity over South Korea using SPEI
.
J. Korea Water Resour. Assoc.
45
(
9
),
887
900
.
Kim
B. S.
,
Park
I. H.
&
Ha
S. R.
2014
Future projection of droughts over South Korea using representative concentration pathways (RCPs)
.
Terr. Atmos. Oceanic Sci.
25
(
5
),
673
688
.
Kim
S. J.
,
Lim
C. H.
,
Kim
G. S.
,
Lee
J. Y.
,
Geiger
T.
,
Rahmati
O.
,
Son
Y. W.
&
Lee
W. K.
2019
Multi-temporal analysis of forest fire probability using socio-economic and environmental variables
.
Remote Sens.
11
(
1
),
86
.
Lee
J. W.
,
Jang
S. S.
,
Ahn
S. R.
,
Park
K. W.
&
Kim
S. J.
2016
Evaluation of the relationship between meteorological, agricultural and in-situ big data droughts
.
J. Korean Assoc. Geogr. Inf. Stud.
19
(
1
),
64
79
.
Liew
A.
&
Wiener
M.
2002
Classification and regression by randomForest
.
R News
2
(
3
),
18
22
.
Liguori
S.
,
Rico-Ramirez
M. A.
,
Schellart
A. N. A.
&
Saul
A. J.
2012
Using probabilistic radar rainfall nowcasts and NWP forecasts for flow prediction in urban catchments
.
Atmos. Res.
103
,
80
95
.
Lin
S. W.
,
Lee
Z. J.
,
Chen
S. C.
&
Tseng
T. Y.
2008
Parameter determination of support vector machine and feature selection using simulated annealing approach
.
Appl. Soft Comput.
8
(
4
),
1505
1512
.
Ma
M.
,
Zhao
G.
,
He
B.
,
Li
Q.
,
Dong
H.
,
Wang
S.
&
Wang
Z.
2021
XGBoost-based method for flash flood risk assessment
.
J. Hydrol.
598
,
126382
.
Maduhuri
R.
,
Sistla
S.
&
Raju
K. S.
2021
Application of machine learning algorithms for flood susceptibility assessment and risk management
.
J. Water Clim. Change
12
(
6
),
2608
2623
.
Manning
C.
,
Raghavan
P.
&
Schütze
H.
2008
Introduction to Information Retrieval
.
Cambridge University Press
,
Cambridge
,
USA
.
http://doi.org/10.1017/CBO9780511809071
.
McKee, T. B., Doesken, N. J. & Kleist, J. 1993 The relationship of drought frequency and duration to time scales. In Eighth Conference on Applied Climatology, January 17–22, Anaheim, CA, pp. 179–184.
Mogaka, H., Gichere, S., Davis, R. & Hirji, R. 2006 Climate variability and water resources degradation in Kenya: Improving water resources development and management. World Bank Working Paper Series, 69, 34854, World Bank, Washington DC.
Mu
Q.
,
Zhao
M.
,
Kimball
J. S.
,
McDowell
N. G.
&
Running
S. W.
2013
A remotely sensed global terrestrial Drought Severity Index
.
Bull. Am. Meteorol. Soc.
94
(
1
),
83
98
.
Naghibi
S. A.
,
Pourghasemi
H. R.
&
Dixon
B.
2016
GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran
.
Environ. Monit. Assess.
188
(
44
), 1–27.
Nam
W. H.
,
Hayes
M. J.
,
Svoboda
M. D.
,
Tadesse
T.
&
Wilhite
D. A.
2015
Drought hazard assessment in the context of climate change for South Korea
.
Agric. Water Manage.
160
,
106
117
.
Ni
L.
,
Wand
D.
,
Wu
J.
,
Wang
Y.
,
Tao
J.
,
Zhang
J.
&
Liu
J.
2020
Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model
.
J. Hydrol.
583
,
124296
.
Oshiro
T. M.
,
Perez
P. S.
&
Baranauskas
J. A.
2012
How many trees in a random forest?
In
International Workshop on Machine Learning and Data Mining in Pattern Recognition
, Vol.
7376
.
Springer
, pp.
154
168
.
Palmer
W. C.
1965
Meteorological droughts
. In:
Weather Bureau Research Paper
, Vol.
45
.
U.S. Department of Commerce
, p.
58
.
Pardo
M.
&
Sberveglieri
G.
2005
Classification of electronic nose data with support vector machines
.
Sens. Actuators B Chem.
107
(
2
),
730
737
.
Probst
P.
&
Boulesteix
A. L.
2018
To tune or not to tune the number of trees in random forest
.
J. Mach. Learn. Res.
18
,
1
18
.
Probst
P.
,
Wright
M. N.
&
Boulesteix
A. X.
2019
Hyperparameters and tuning strategies for random forest
.
WIREs Data Min. Knowl. Discovery
9
,
e1301
.
Qiu
R.
,
Liu
C.
,
Cui
N.
,
Gao
Y.
,
Li
L.
,
Wu
Z.
,
Jiang
S.
&
Hu
M.
2022
Generalized Extreme Gradient Boosting model for predicting daily global solar radiation for locations without historical data
.
Energy Convers. Manage.
258
,
115488
.
Shulka
S.
&
Wood
A. W.
2008
Use of a standardized runoff index for characterizing hydrologic drought
.
Geophys. Res. Lett.
35
(
2
), L02405.
Slaets
J. I. F.
,
Piepho
H. P.
,
Schmitter
P.
,
Hilger
T.
&
Cadisch
G.
2017
Quantifying uncertainty on sediment loads using bootstrap confidence intervals
.
Hydrol. Earth Syst. Sci.
21
,
571
588
.
Spekkers
M. H.
,
Kok
M.
,
Clemens
F. H. L. R.
&
ten Veldhuis
J. A. E.
2014
Decision-tree analysis of factors influencing rainfall-related building structure and content damage
.
Nat. Hazard Earth Syst. Sci.
14
,
2531
2547
.
Stagge
J. H.
,
Kohn
I.
,
Taliaksen
L. M.
&
Stahl
K.
2015
Modeling drought impact occurrence based on meteorological drought indices in Europe
.
J. Hydrol.
530
,
37
50
.
Stahl, K., Blauhut, V., Kohn, I., Acácio, V., Assimacopoulos, D., Bifulco, C., De Stefano, L., Dias, S., Eilertz, D., Frielingsdorf, B., Hegdahl, T. J., Kampragou, E., Kourentzis, V., Melsen, L., van Lanen, H. A. J., Van Loon, A. F., Massarutto, A., Musolino, D., de Paoli, L., Senn, L., Stagge, J. H., Tallaksen, L. M. & Urquijo, J. 2012 A European Drought Impact Report Inventory (EDII): Design and Test for Selected Recent Droughts in Europe, DROUGHT-R&SPI Technical Report No. 3, 23.
Stahl
K.
,
Kohn
I.
,
Blauhut
V.
,
Urquijo
J.
,
De Stefano
L.
,
Acácio
V.
,
Dias
S.
,
Stagge
J. H.
,
Tallaksen
L. M.
,
Kampragou
E.
,
Van Loon
A. F.
,
Barker
L. J.
,
Melsen
L. A.
,
Bifulco
C.
,
Musolino
D.
,
de Carli
A.
,
Massarutto
A.
,
Assimacopoulos
D.
&
Van Lanen
H. A. J.
2016
Impacts of European drought events: insights from an international database of text-based reports
.
Nat. Hazards Earth Syst. Sci.
16
,
801
819
.
Strobl
C.
,
Boulesteix
A. L.
,
Kneib
T.
,
Augustin
T.
&
Zeileis
A.
2008
Conditional variable importance for random forest
.
BMC Bioinf.
9
(
307
). https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-307.
Sutanto
S. J.
,
van der Weert
M.
,
Wanders
N.
,
Blauhut
V.
&
Van Lanen
H. A. J.
2019
Moving from drought hazard to impact forecasts
.
Nat. Commun.
10
,
4945
.
Sutanto
S. J.
,
van der Weert
M.
,
Blauhut
V.
&
Van Lanen
H. A. J.
2020
Skill of large-scale seasonal drought impact forecasts
.
Nat. Hazards Earth Syst. Sci.
20
,
1595
1608
.
Svoboda
M.
,
LeComte
D.
,
Hayes
M.
,
Heim
R.
,
Gleason
K.
,
Angel
J.
,
Rippey
B.
,
Tinker
R.
,
Palecki
M.
,
Stooksbury
D.
,
Miskus
D.
&
Stephens
S.
2002
The drought monitor
.
Bull. Am. Meteorol. Soc.
83
(
8
),
1181
1190
.
Um
M. J.
,
Kim
Y. J.
,
Park
D. R.
&
Kim
J. B.
2017
Effects of different reference periods on drought index estimations for 1901-2014
.
Hydrol. Earth Syst. Sci.
21
(
10
),
4989
5007
.
Um
M. J.
,
Kim
M. M.
,
Kim
Y. J.
&
Park
D. R.
2018a
Drought assessment with the community land model for 1951–2010 in East Asia
.
Sustainability
10
(
6
),
2100
.
Um
M. J.
,
Kim
Y. J.
&
Park
D. R.
2018b
Evaluation and modification of the Drought Severity Index (DSI) in East Asia
.
Remote Sens. Environ.
209
,
66
76
.
United Nations International Strategy for Disaster Reduction Secretariat 2009 Global Assessment Report on Disaster Risk Reduction: Risk and Poverty in a Changing Climate. Invest Today for a Safer Tomorrow. United Nations International Strategy for Disaster Deduction, Geneva.
Venables
W. N.
&
Ripley
B.
2002
Modern Applied Statistics with S
.
Springer
,
Berlin
.
https://doi.org/10.1007/978-0-387-21706-2
.
Vicente-Serrano
S. M.
,
Beguería
S.
&
López-Moreno
J. I.
2010
A multiscalar drought index sensitive to global warming: the standardized precipitation evapotranspiration index
.
J. Clim.
23
(
7
),
1696
1718
.
Wang
Z.
,
Lai
C.
,
Chen
X.
,
Yang
B.
,
Zhao
S.
&
Bai
X.
2015
Flood hazard risk assessment model based on random forest
.
J. Hydrol.
527
,
1130
1141
.
Wilhite
D. A.
&
Svoboda
M. D.
2000
Drought early warning systems in the context of drought preparedness and mitigation. Early warning systems for drought preparedness and drought management
. In
Proceedings of an Expert Group Meeting
.
World Meteorological Organization
,
Geneva
,
Switzerland
.
Wilhite
D. A.
,
Hayes
M. J.
,
Knutson
C.
&
Smith
K. H.
2000
Planning for drought: moving from crisis to risk management
.
J. Am. Water Resour. Assoc.
36
,
697
710
.
Wong, G., Lambert, M.F., Leonard, M. & Metcalfe, A.V. 2010 Drought analysis using trivariate copulas conditional on climatic stats. J. Hydrol. Eng. 15, 129–141.
Yoo
J. Y.
,
Kwon
H. H.
,
Kim
T. W.
&
Ahn
J. H.
2012
Drought frequency analysis using cluster analysis and bivariate probability distribution
.
J. Hydrol.
420–421
,
102
111
.
Zhang
H.
,
Yang
Q.
,
Shao
J.
&
Wang
G.
2019a
Dynamic streamflow simulation via online gradient-boosted regression
.
J. Hydrol. Eng.
24
(
10
),
04019041
.
Zhang
X.
,
Li
T.
,
Wang
J.
,
Li
J.
,
Chen
L.
&
Liu
C.
2019b
Identification of cancer-related long non-coding RNAs using XGBoost with high accuracy
.
Front. Genet.
10
,
735
.
Zhao
L.
,
Lyu
A.
,
Wu
J.
,
Hayes
M.
,
Tang
Z.
,
He
B.
,
Liu
J.
&
Liu
M.
2014
Impact of meteorological drought on streamflow drought in Jinghe River Basin of China
.
Chin. Geogr. Sci.
24
(
6
),
694
705
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).