The RF and XGB were employed to generate 1-week forecasts of HABs in eight water supply reservoirs. In Jinyang, Paldang, Sayeon, and Unmun reservoirs, the bloom occurrence classification accuracy of both the RF and XGB improved after applying SMOTE; however, in Angye, Daecheong, Gwanggyo, and Yeongcheon reservoirs, classification performance did not improve significantly (Table 4). Both the RF (accuracy: 0.71–0.93) and XGB (accuracy: 0.71–0.95) exhibited high accuracy without SMOTE. However, other metrics that reflect performance for minority classes, i.e., bloom occurrence, demonstrated lower values (AUC: 0.63, recall: 0.34, and F-measure: 0.39) for ML models. After applying SMOTE, the RF and XGB exhibited performance improvements for most of the target reservoirs. For example, after mitigating the degree of data imbalance via SMOTE, the Unmun reservoir demonstrated the greatest performance improvement (average increase of AUC: 0.27, recall: 0.50, and F-measure: 0.67). However, for the Angye reservoir (average difference of recall: 0.00, F-measure: 0.00, and AUC: −0.03), little performance improvement was observed for both the RF and XGB (Table 4 and Supplementary Material, Table S1). Note that the sample size for the Angye reservoir was relatively smaller than that of the other reservoirs, and only two bloom occurrence samples were contained in the test set. Thus, even a single misclassified instance of bloom occurrence as non-occurrence could cause a significant performance reduction (López et al. 2013).

Table 4

Performance evaluation of forecasting cyanobacteria bloom occurrence for each study site using RF and XGB with and without SMOTE

ModelBefore applying SMOTE
After applying SMOTE
ReservoirAccuracyAUCRecallF-measureAccuracyAUCRecallF-measure
Angye RF 0.90 0.50 0.00 0.00 0.85 0.47 0.00 0.00
XGB 0.90 0.50 0.00 0.00 0.85 0.47 0.00 0.00
Daecheong RF 0.75 0.75 0.73 0.78 0.71 0.73 0.68 0.75
XGB 0.71 0.68 0.80 0.78 0.70 0.72 0.65 0.73
Gwanggyo RF 0.90 0.75 0.50 0.67 0.95 0.88 0.75 0.86
XGB 0.95 0.88 0.75 0.86 0.90 0.84 0.75 0.75
Jinyang RF 0.80 0.78 0.65 0.73 0.78 0.76 0.65 0.71
XGB 0.78 0.76 0.65 0.71 0.75 0.74 0.71 0.71
Paldang RF 0.93 0.50 0.00 0.00 0.93 0.50 0.00 0.00
XGB 0.93 0.50 0.00 0.00 0.93 0.62 0.25 0.33
Sayeon RF 0.92 0.63 0.25 0.40 0.92 0.73 0.50 0.57
XGB 0.89 0.61 0.25 0.33 0.92 0.84 0.75 0.67
Unmun RF 0.90 0.50 0.00 0.00 0.95 0.75 0.50 0.67
XGB 0.90 0.50 0.00 0.00 0.95 0.75 0.50 0.67
Yeongcheon RF 0.71 0.60 0.33 0.40 0.81 0.60 0.67 0.67
XGB 0.76 0.68 0.50 0.55 0.67 0.62 0.50 0.46
ModelBefore applying SMOTE
After applying SMOTE
ReservoirAccuracyAUCRecallF-measureAccuracyAUCRecallF-measure
Angye RF 0.90 0.50 0.00 0.00 0.85 0.47 0.00 0.00
XGB 0.90 0.50 0.00 0.00 0.85 0.47 0.00 0.00
Daecheong RF 0.75 0.75 0.73 0.78 0.71 0.73 0.68 0.75
XGB 0.71 0.68 0.80 0.78 0.70 0.72 0.65 0.73
Gwanggyo RF 0.90 0.75 0.50 0.67 0.95 0.88 0.75 0.86
XGB 0.95 0.88 0.75 0.86 0.90 0.84 0.75 0.75
Jinyang RF 0.80 0.78 0.65 0.73 0.78 0.76 0.65 0.71
XGB 0.78 0.76 0.65 0.71 0.75 0.74 0.71 0.71
Paldang RF 0.93 0.50 0.00 0.00 0.93 0.50 0.00 0.00
XGB 0.93 0.50 0.00 0.00 0.93 0.62 0.25 0.33
Sayeon RF 0.92 0.63 0.25 0.40 0.92 0.73 0.50 0.57
XGB 0.89 0.61 0.25 0.33 0.92 0.84 0.75 0.67
Unmun RF 0.90 0.50 0.00 0.00 0.95 0.75 0.50 0.67
XGB 0.90 0.50 0.00 0.00 0.95 0.75 0.50 0.67
Yeongcheon RF 0.71 0.60 0.33 0.40 0.81 0.60 0.67 0.67
XGB 0.76 0.68 0.50 0.55 0.67 0.62 0.50 0.46

Close Modal