Flood forecasting using quantitative precipitation forecasts and hydrological modeling in the Sebeya catchment, Rwanda

The absence of a viable ﬂ ood early warning system for the Sebeya River catchment continues to impede government efforts toward improving community preparedness, the reduction of ﬂ ood impacts and relief. This paper reports on a recent study that used satellite data, quantitative precipitation forecasts and the rainfall – runoff model for short-term ﬂ ood forecasting in the Sebeya catchment. The global precipitation measurement product was used as a satellite rainfall product for model calibration and validation and forecasted European Centre Medium-Range Weather Forecasts (ECMWF) rainfall products were evaluated to forecast ﬂ ood. Model performance was evaluated by the visual examination of simulated hydrographs, observed hydrographs and a number of performance indicators. The real-time ﬂ ow forecast assessment was conducted with respect to three different ﬂ ood warning threshold levels for a 3 – 24-h lead time. The result for a 3-h lead time showed 72% of hits, 7.5% of false alarms and 9.5% of missed forecasts. The number of hits decreased, as the lead time increased. This study did not con-sider the uncertainties in observed data, and this can in ﬂ uence the model performance. This work provides a base for future studies to establish a viable ﬂ ood early warning system in the study area and beyond.


INTRODUCTION
Flood forecasting is essential for appropriate flood management in terms of early warning and adequate preparedness (Aerts et al. 2018;McKinnon 2019). It is one of the reliable options for the development of flood mitigation measures by providing alarm for potential floods in advance to inform early evacuation, temporary construction and other precautionary measures to reduce flood effects. According to Sanders (2007) and Li et al. (2020), flood belongs to the most frequent and devastating hazards usually leading to death, displacement, evacuation, homelessness and injury. In most cases, urban areas are mostly affected because industries and cities are usually situated in flood-prone areas (Löschner et al. 2017). Accordingly, risks by severe flood events increase on account of the increase in population and economic development (Padi et al. 2011;Aja et al. 2019). There have been different episodes of recurring hazardous flooding around the world and it is pertinent to reflect on the impact on countries with least resilience to such natural hazards. Many African countries have been struck by different intensities of flooding and their impacts were catastrophic including the loss of infrastructure, homes and disease outbreaks (Lavers et al. 2019). There have been advances in numerical weather prediction, for flood early warning systems, as a result of the increase in computational power and improved understanding of earth system processes (Bauer et al. 2015).
Rwanda is highly susceptible to seasonal flooding because of its numerous mountains and steep slopes, dense river network and vast wetlands (Huang 2018). Riverine floods and flash floods are severe threats to many parts of the country due to the nature of the topography. The series of anomalous abundant rainfall events in 1997, 2006, 2007, 2008 and 2009 caused severe floods in many parts of Africa, including Rwanda, which affected many people and resulted in the destruction of structures, prime farmlands and impacted economic development (Downing et al. 2009). The Nyundo and Mahoko sub-catchments of the Sebeya catchment, which is located in western Rwanda, is also affected by flooding owing to the ongoing expansion, high rainfall intensity, topography of the area, unsustainable land-use practices and urbanization particularly in the northern part of the catchment (REMA 2009;Water for Growth 2017).
Flood cannot be avoided completely but its impacts can be mitigated by putting in place adequate preventive measures. A number of engineering structures such as dykes may be considered effective to mitigate the impacts of flood (Haile et al. 2016). At the household level, preventive measures may include the building of flood-proof houses with elevated foundations (Haile et al. 2013). These measures have been found to be effective at the household level by 25-55% (Kreibich et al. 2005;Kreibich & Thieken 2009;Bubeck et al. 2012). When physical measures are inadequate, the flood effect can further be reduced via community awareness and preparedness (Thielen et al. 2009). This involves flood forecasting and turning forecasts into useful early warning information. Therefore, an operational flood forecasting system is among the most effective flood risk management measures (UNISDR 2004). In spite of its importance, establishing a locally relevant and functional flood early warning system has not received the attention it deserves in different countries, Rwanda inclusive.
Considering the obvious impact of flooding, it is important to use available and emerging science to develop flood forecasting tools (Flack et al. 2019). Effective forecasts at all scales are vital for appropriate intervention to save lives and livelihoods in the events of flood. Hydrological models are considered as an important tool for operational flood forecasting. However, the accuracy of forecasts is likely to remain low at lead times beyond a few hours (Speight et al. 2018). The development of the hydrological model has advanced over the years with the distributed hydrological model providing more reliable results (Hand et al. 2004;Cuo et al. 2011;Pilling et al. 2016;Flack et al. 2019). There are many models that can be used to generate forecasts for a catchment (Speight et al. 2021). Significant investment in computational resources is required to account for forecast uncertainty or to provide forecasts with lead times of greater than 6 h (Flack et al. 2019).
Numerous initiatives are currently working to expand flood forecasting and warning lead time through continental and global-scale early warning systems. For example, the European Flood Awareness System (EFAS) and the Global Flood Awareness System (GloFAS) are advanced prototypes of continental and global flood alert system(s) that use(s) several deterministic models to produce probabilistic flood alerts with lead times of up to 15 days (Thiemig et al. 2011). Flood forecasting in Africa has not received the needed attention in the literature, in spite of the recognized importance of the topic. Although a number of authors (Haile et al. 2016;Siddique & Mejia 2017;Huang 2018;Arnal et al. 2020;Wenyan et al. 2020) have demonstrated the benefit of ensemble forecasts over deterministic forecasts. However, most of the studies that used ensemble forecasting have focused on forecasting for lead times of up to 15 days rather than short-time forecasting (Cloke & Pappenberger 2009;Pagano et al. 2014;Emerton et al. 2016;Wu et al. 2020). Deterministic flood forecasting, on the other hand, is frequently applied in flood forecasting for shorter lead times and without using a full ensemble forecast (Speight et al. 2021).
Based on a report from the Rwanda Water and Forestry Authority, the Sebeya River has been causing recurrent flooding events every rainy season since April 2011, and these floods have had negative impacts including loss of valuable lives, destruction of houses and other infrastructures, loss of prime farmlands, damage of water treatment plant as well as newly constructed hydropower plant in the downstream part of the river worth millions of dollars. An understanding of the characteristics of floods and catchment runoff behavior in the study area is needed to develop a viable flood forecasting system for this area. The overall objective of this study is to evaluate the potential of quantitative precipitation forecast (QPF) and hydrological model outputs for short-term flood forecasting in the Sebeya catchment.

Description of study area
The Sebeya catchment lies within the Congo-Kivu catchment in the Northern part of the Congo basin ( Figure 1). It is among the largest catchments draining the western slopes of the Nile Congo watershed in the western part of Rwanda with an approximate area of 336.4 km², representing 1.38% of the total surface area of Rwanda (Water for Growth 2017). The area of study is the upstream part of the Sebeya catchment which covers about 200.9 km 2 down to the town of Mahoko, including Bihongoro River and Karambo River.
The climatic condition of the area is similar to that of the entire country. Peak rainfall occurs in April and November with an average monthly rainfall amount of 1,200 mm (Musoni 2009), and the driest month is July with an average monthly rainfall amount of 27.52 mm. In the dry season, the potential evapotranspiration (PET) record increases up to 89.43 mm day À1 , while in the wet season, it decreases up to 53.13 mm day À1 (Munyaneza 2014). The streamflow of the Nyundo station is ranging between 42.34 m 3 /s in April and 5.67 m 3 /s in July, and the pattern of the streamflow hydrograph is similar to that of the annual rainfall cycle (Water for Growth 2017). The period from June to August is characterized by low streamflow.

Datasets and sources
Datasets and sources are summarized in Table 1.

Conceptual framework
For this study, a coupled rainfall-runoff model and QPFs were applied to evaluate flood forecasting in the Sebeya catchment. Figure 2 illustrates the main steps which were followed in this study. The HBV (Hydrologiska Byråns Vattenbalansavdelning) model was calibrated using in situ measurements and the Integrated Multi-satellite Retrievals for global precipitation measurements (GPM IMERG) early run product. Flood forecasting was done through the calibrated HBV model which is forced by QPFs. Forecasting results were compared against observed floods in the study area for an independent period.

Data quality assessment
The data quality assessment was performed on streamflow, rainfall, maximum and minimum temperatures, relative humidity, wind speed and sunshine hours. A double mass curve was applied to evaluate data consistency and homogeneity (Girma et al. 2020). The double mass curve is prepared/plotted using accumulated rainfall values of the station under investigation against accumulated values of other stations or accumulated rainfall values of the average of other stations over the same period of time. The missing values for the input data were filled using the normal ratio method. Normal ratio methods are expressed by the following relationship: where P x is the missing value of precipitation to be computed. N x is the average value of rainfall for the station in question for the recording period. N 1 , N 2 , …, N n are the average values of rainfall for the neighboring station. P 1 , P 2 , …, P n are the rainfall of the neighboring station during the missing period. N is the number of stations used in the computation.

Potential evapotranspiration
The Penman-Monteith method was selected for estimating PET for this study. This method is commonly used and is the best method when a full complement of weather data is available (Allen et al. 1998). The Penman-Monteith method can give good results under a variety of climate scenarios (Droogers & Allen 2002).

Bias correction of IMERG early run rainfall product
Satellite data is likely to generate large errors in estimating precipitation, including random errors and systematic errors (Habib et al. 2014;Haile et al. 2016). There is the need for bias correction prior to the application of In situ rainfall Hourly station data of rainfall is converted into catchment-averaged values using the Thiessen polygon method to correct errors of satellite rainfall data.
Meteorwanda satellite rainfall products for rainfall-runoff modeling (Haile et al. 2013;Habib et al. 2014). The bias correction for IMERG early run rainfall product is done using data from rain gauges to ensure adequate rainfall representation. The basic procedure of bias correction is to multiply satellite rainfall estimates (SREs) by the bias factor to get a new and more accurate set of SREs for use in modeling. In this study, the bias correction was estimated using the power transformation method. The rain gauge data at a station which falls into the IMERG pixel for a period (2014-2018) was compared with IMERG data at a grid scale, and correction factors a and b of the power transformation method were obtained on a monthly basis. Finally, the monthly bias correction factors a and b were applied to all pixels of IMERG data downloaded for the entire grid of the hourly data of 2014-2018. The equation of the power transformation method reads as follows: where PT is the corrected precipitation, G is the hourly rain gauge rainfall amount from IMERG early run. The constants a and b are calculated in two stages using Brent's method: a is calculated such that the mean of the transformed IMERG data equals the mean of gauge observations, b is a factor calculated iteratively such that the coefficient of variation of the IMERG data equals the coefficient variation of gauge-based observation (Thomas et al. 2013).

HBV model
Flood forecasting in this study was implemented using the HBV model which was developed by the Swedish Meteorological and Hydrological Institute (SMHI) in 1972 to assist hydropower operations by providing hydrological forecasts (Bergström 1992). HBV is a conceptual hydrological model mostly used for the simulation of continuous runoff. The model has fewer parameters to calibrate as compared to other hydrological models and it is simple to link them to physical attributes (Das et al. 2008). It makes the use of four subroutines for Uncorrected Proof precipitation and snow accumulation, soil moisture, response routine and transformation function. The input data for the model are precipitation, air temperature and PET.

Model setup
To setup the model, the text file, which contains hourly time series of precipitation (mm), temperature (°C) and the observed streamflow (m 3 /s) at the outlet of the catchment, was needed for calibration and validation including PET time series which was computed using the Penman-Monteith method with a single daily value for each month.
The chart above describes how hydrological modeling part for this research was done. The elevation zone ( Figure 3) and the modeled catchment, which is approximately 200.9 km 2 , are also needed to setup the model. The catchment was divided into three sub-basins (Figure 1), and the areas of the sub-basins are as shown in Table 2. This allows the HBV model to take into consideration the elevation aspect as well as the distribution of the meteorological variables of the area under study. The elevation of the study area ranges from 1800 to about 2,988 m with the lowest elevation and the catchment outlet in the Northern part as indicated in Figure 3.

Uncorrected Proof
Land-cover data is required as an input to set up the HBV model. The six predominant land-cover types in the Sebeya catchment are as shown in Figure 4 and were obtained from ESRI Rwanda/SRMAP. The land-cover types are rainfed agriculture (∼124.2 km 2 ), natural forest and forest plantation (∼22.1 km 2 ), built-up (∼2.01 km 2 ),

Model calibration
Input parameter fine-tuning was applied to the model to reach a certain acceptable performance for a selected time period. To do this, the input parameter values were optimized within their respective predefined ranges. There are three parameters involved in the soil moisture routine which are the maximum soil moisture storage (FC), the soil moisture value (Lp) and Beta which is the parameter that determines the relative contribution of precipitation to runoff. These parameters have a direct effect on the base flow and the simulated volume, and they control the water balance of the model. The response function routine has three parameters which are the Alfa, Khq and Hq ( Table 4). All of these parameters have an effect on the shape of the hydrograph and the peak streamflow.
The parameters for response function routine are Perc and K4 which have an effect on the base flow.
There are four main approaches to calibrate the HBV model, namely: manual 'trial and error' technique, genetic calibration algorithm, calibration by Monte-Carlo simulation and a batch calibration. The manual method was used in this study. Model parameter adjustment was based on the visual examination of the computed and observed hydrographs and the analyses of the goodness of fit functionmodel efficiency (NSE) also known as the Nash-Sutcliffe coefficient. The equation of this goodness of fit function is as shown below: where Q sim is the simulated streamflow, Q obs is the observed streamflow, Q obs is the mean of the observed streamflow.
The model efficiency (NSE) ranges from minus infinity (À∞) to 1, with 1 representing the perfect performance of the model. The model efficiency (NSE) in the range of 0.6-0.8 shows a reasonable model performance, while values from 0.8 to 0.9 indicate an excellent model performance. The model efficiency (NSE) is mostly used to assess the predictive power of the model, as it describes the accuracy of model outputs (Nash & Sutcliffe 1970).
For the water mass balance error, the model was calibrated and evaluated through another commonly applied goodness of fit measure called relative volume error (RVE) (Haile et al. 2016). The RVE shows a perfect performance when its value is 0, and values ranging from À10 to þ10 indicate a reasonable performance. The equation  for RVE is as shown below: where Q sim is the simulated flow, Q obs is the observed flow and n is the number of days. The observed water flow measurements from the gauging station were used in the model. As a useful rule of thumb, it is better when at least 10-year daily time-series data is used to calibrate and validate the HBV model (Lindström et al. 1997). It is important to note that 2014-2018 data was used as the development period for the HBV model in this work because that was the only in situ data available for the study area at the time of this study.
Before calibration, warming of the model was done to condition the model. Data used for model warming was from 12 March 2014 at 01:00 to 31 December 2014 at 23:00. Calibration followed using the data from 1 January 2015 at 00:00 to 31 December 2017 at 23:00. The validation data was from 1 January 2018 at 00:00 to 31 December 2018 at 23:00.

Sensitivity analysis
In this study, a local sensitivity analysis was adopted for evaluating the continuous model (Hamby 1994). It was applied by changing the value of one parameter at a time. Each parameter was changed individually while keeping all other parameters constant. Some model parameters were very sensitive so that a small change in their values resulted in huge differences between the observed and simulated volumes. The final set of the parameters of the calibrated model was deemed as the baseline parameter set.

Evaluation of forecast skill
To analyze flood forecast, there is the need to incorporate a decision-making element, to know if the flow will or will not exceed a critical threshold. This can be made possible through the specification of streamflow threshold which signals the occurrence of flood. For this study, three thresholds were defined that relate to the severity of the expected flood magnitude which corresponds to (i) medium (water flow levels are high but no significant flooding is expected), (ii) severe (high possibility of significant flooding is expected) and (iii) extreme severe (very high probability of significant flooding, potentially severe flooding is expected). The thresholds were fixed based on the observed streamflow values from 2014 to 2018. A flood event of a certain magnitude is said to occur when the forecasted streamflow exceeds the specified threshold. Here, the thresholds were fixed by calculating the streamflow corresponding to flow exceedance probability at 5% for medium, 1% for severe and 0.5% for extreme severe. Different colors are used to easily identify the exceedance of the three thresholds by the local community as well as governmental body. The Weibull plotting formula (Rientjes et al. 2011) was applied for calculating flow exceedance (Table 5).
In this study, QPF from the ECMWF was used as input to the hydrological rainfall-runoff (HBV) model for flood forecasting in the Sebeya catchment. For real-time flow forecasting, it is necessary to know how accurate the forecast model captures flood events which refer to streamflow magnitudes above or below the specified flow threshold. Such forecast skill can be evaluated in terms of the number of hits, missed events and false alarms. The forecast flow was evaluated using a contingency table (Haile et al. 2010), where the forecasted flood events were compared with the observed flood events ( Table 6). The likely outcomes in a contingency table are Hit (HIT), Miss (MISS), False alarm (FA) and Non-event (NE). If both the observed and forecasted streamflow of a particular day exceed the specified threshold, then the forecast is considered as HIT. If the forecasted flow is lower than its threshold while the observed flow exceeds its threshold, then it is called missed event (MISS). If the forecasted flow exceeds the threshold while the observed flow is below the threshold, then the forecast is considered as false alarm (FA). The forecast skill for 199 h was assessed in terms of categorical statistics. The statistics are presented using a contingency table and were obtained for 3, 6, 9, 12, 15, 18, 21 and 24-h lead times for the three flood thresholds which are medium, severe and extreme severe. The categorical verification statistics include the probability of detection (POD), the frequency of hit (FOH) and the frequency of miss (FOM). These statistics were estimated based on a contingency table and they are represented mathematically as follows: The POD is used as a measure of how accurate the model detects actual flood events. The FOH is the ratio of the number of correctly forecasted flood events and the total number of forecasted flood events. It measures the accuracy of the forecasted flood events. The FOM measures the observed flood events that are missed by the forecasts. All three categorical statistics range between 0 and 1. The POD and FOH values of 1.0 and FOM ¼ 0 indicate perfect forecast skill (Haile et al. 2016).

Rainfall data set comparison and bias correction
In situ measurements and GPM satellite-based rainfall estimates for a period of 5 years (2014-2018) were used in this study. Both data sets were compared to determine the accuracy of GPM satellite-based rainfall products in relation to in situ measurements. The comparison is based on statistical and graphical analysis which includes the calculation of statistical estimator such as the bias, the mean absolute error (MAE), the root-mean-square error (RMSE) and the coefficient of correlation (r) as shown in Table 7.
Normally, the ideal MAE value should be 0 or close to 0, and the bias should be 1 or close to 1. Given the result obtained as shown in Table 7, the MAE of 0.22 and the bias of 0.77 indicate a good performance of the bias correction. The accumulated and random errors were estimated by the use of RMSE and the correlation coefficient (r) to evaluate the agreement between the satellite-based and gauge station rainfall data. The RMSE was found to be 1.11, and the correlation coefficient (r) was 0.66. Ideally, the RMSE ranges from 0 to þ∞, and higher values of RMSE signify less agreement with the reference data set. The correlation coefficient on the other hand ranges from À1 to 1, with À1 indicating an indirect perfect relationship while þ1 indicating a direct perfect relationship.
In this study, the correlation coefficient between satellite and station rainfall data was found to be 0.66 which is an intermediate direct relationship. The bias correction done on GPM satellite-based rainfall estimation product improved the ability of the product to represent the observed rainfall amount. Figure 5 shows three rainfall data sets which are gauge, satellite only and bias-corrected satellite data. The gauge data has a point scale, whereas the satellite data is 0.1 Â 0.1 km 2 .

HBV model calibration and validation
The bias-corrected IMERG data was used in the model. This data was divided into three groups. The data of 12 March 2014 at 01:00 up to 31 December 2014 at 23:00 was used in warming up the model, the data of 1 January 2015 at 00:00 up to 31 December 2017 at 23:00 was used for calibration and the data of 1 January 2018 at 00:00 up to 31 December 2018 at 23:00 was used for validating the calibrated model. The calibration target is to have NSE values close to 1 and RVE close to 0. The NSE was 0.52 for calibration and 0.69 for validation, and RVE was À8.67 for calibration and À5.89 for validation. Both are within acceptable limits though on the lower side of the limit. All of the values are within the allowable range of parameter values. The results of the simulated and observed hourly streamflow hydrograph after calibration and validation are presented in Figures 6 and 7.

Parameter sensitivity analysis
The sensitivity of each model parameter was analyzed based on NSE and RVE objective functions using graphical plots for visualization (Figures 8 and 9). The parameters with steeper slopes are therefore considered to be more sensitive compared to others with a gentle slope.

Graphical evaluation of forecast skill
The HBV model was used to issue 199-h flood forecasts from 2 April 2018 at 18:00 to 19 April 2018 at 06:00 and from 19 October 2018 at 0:00 to 23 October 2018 at 12:00. The forecasts were made using the ECMWF precipitation forecast data. Figures 10-16 show the graphical evaluation of observed and forecasted streamflow using   Uncorrected Proof ECMWF rainfall data for attention, warning and alarm thresholds for the 2018 flood at the Sebeya River/Nyundo station.

Evaluation of forecast skill using categorical statistic
The forecast skill for 199 h for a 3-h lead time (Table 9) (from 2 April 2018 at 18:00 to 19 April 2018 at 06:00 and from 19 October 2018 at 0:00 to 23 October 2018 at 12:00) for medium threshold was assessed in terms of categorical statistics as presented in the contingency table (Table 6). For 3-h lead time, 144 h out of 199 h were hit hours, as both the observed and ECMWF-based forecasted streamflow were above the specified thresholds. There were 15 h of false alarms which means that flood was wrongly forecasted. There were 19 h misses, meaning that the flood occurred but the model did not forecast them. Table 10 shows the result of the forecast skill score for 199 h (from 2 April 2018 at 18:00 to 19 April 2018 at 06:00 and from 19 October 2018 at 0:00 to 23 October 2018 at 12:00) for medium threshold which was analyzed using appropriate skill score measurement.
The result of categorical statistics of flood forecast using ECMWF rainfall is presented as shown in Figure 17. The POD, the FOH, and the FOM range between 0 and 1. When the POD or FOH value is equal to 1 or closer to        Uncorrected Proof 1, it is considered as best forecast skill, but if the FOM value is equal to 0 or closer to 0 it is considered as the best forecast skill.

Rainfall data comparison, bias correction, model calibration and sensitivity analysis
The satellite data underestimated the observed rainfall in the catchment. The underestimation is noticeably large, suggesting that the satellite product cannot be used without bias correction. This agrees with the observation of   Uncorrected Proof Maurycy et al. (2019). However, this systematic error was successfully removed after bias correction (Laiolo et al. 2015;Zhan et al. 2015). Figure 6 shows the hourly streamflow hydrograph after calibration which captures the base flow, timing of peak flows and the pattern of the hydrograph. There are some disagreements between the observed and the simulated hydrograph at some peaks. However, visual inspection shows the satisfactory performance of the model. This result agrees with the observation of Birundu & Mutua (2017) in their study that was carried out in the Mara River Basin that cuts across Kenya and Tanzania. Figure 7 shows clearly that the pattern of the hydrograph was captured because the variation in the observed hydrograph is not well reflected in the model result. The base flow is overestimated on the larger part of the hydrograph during the validation period. There was a little portion which is underestimated. The parameters Beta and Khq are the most sensitive HBV model parameters. Khq controls the recession of the upper response box. The higher values of Khq results in good simulation in the hydrograph peak. When the low value was used, the hydrograph of simulated streamflow became linear. Beta parameter influences the total volume which means if Beta increases the total volume decreases. Many studies found that Beta, Khq and FC are the most sensitive parameters in the HBV model (Abebe et al. 2010;Zelelew & Alfredsen 2013;Parra et al. 2018;Ouatiki et al. 2020).
Outcome of the graphical evaluation of forecast skill Figure 10 shows the streamflow forecasts issued at 18:00 on 2 April 2018 for lead times ranging from 3 to 24 h. The observed streamflow exceeded the attention threshold almost for all lead times. The forecasted streamflow also exceeded the attention threshold almost for all lead times. This suggests that ECMWF-based flood forecasting was correct to issue a flood attention for this flood event. The observed streamflow was above the warning threshold for all times. However, the forecasted streamflow exceeded the warning threshold at 6 h after the threshold was issued. This indicates that the forecast failed to inform to issue a warning for the 3-h lead time. The forecasted streamflow exceeded the alarm threshold only at 21 and 24 h lead times, although the observed streamflow exceeded the alarm level almost for all lead times. It shows that the forecast was unable to alarm an extremely severe flood occurrence. According to the observed streamflow, the severe extreme flood event occurred for almost all lead times, as the observed streamflow was above the attention, warning and alarm thresholds; however the ECMWF started to forecast extreme flood at 6 h lead time and forecasted the severe extreme flood at 21 and 24 h lead times, indicating that model using the ECMWF forecasted streamflow did not capture very well the peak flow. The model underestimated streamflow magnitude using the ECMWF forecast streamflow in contrast to the observed streamflow for all lead times. Figure 11 shows streamflow forecasts which were issued at 18:00 on 3 April 2018; this date refers to the 3-h lead time, while the subsequent date refers to the 3-24-h lead time forecast. Both the observed streamflow and the ECMWF forecasted streamflow were above the attention threshold, warning threshold and alarm threshold for all lead times. This indicates that the model was able to forecast the extreme flood and it performed very well in issuing attention, warning and alarm for all lead times. This result agrees with the study reported by Zanchetta & Coulibaly (2020) where a critical success index was used to detect the occurrence of real flood events. Figure 12 shows streamflow forecasts which were issued at 3:00 on 11 April 2018 for a range of lead times. The observed streamflow exceeded the attention threshold but was below the warning and alarm thresholds for all lead times, while the ECMWF forecasted streamflow was below the attention, warning and alarm thresholds for all lead times. In terms of streamflow magnitude, the ECMWF forecasted streamflow again underestimated the flood events compared to observed streamflow for all lead times. Figure 13 shows streamflow forecasts which were issued at 21:00 on 19 April 2018. The observed streamflow exceeded the attention threshold only for 9 and 21-h lead times, while the model using the ECMWF exceeded the attention threshold for all lead times. This suggests that there is some mismatch between the observed and forecasted floods. In terms of streamflow magnitude, the streamflow forecasted by the ECMWF overestimated the observed streamflow for 3-18-h lead times but underestimated for 21 and 24-h lead times. Figure 14 shows the streamflow forecasts issued at 6:00 on 19 October 2018 for all the lead times. The observed streamflow exceeded the attention and warning thresholds for all lead times. The forecasted streamflow also exceeded the attention and warning thresholds for all lead times. This shows that ECMWF-based flood forecasting was correct to issue a flood attention and warning for this event. In terms of streamflow magnitude, the observed streamflow and the streamflow forecasted by the model using the ECMWF were almost close for all lead times. However, the forecasted streamflow was underestimated compared to the observed streamflow. Figure 15 shows the streamflow forecasts issued at 9:00 on 21 October for a range of lead times. Both the observed streamflow and the ECMWF forecasted streamflow were above the attention threshold but below the warning and alarm thresholds for all lead times. This suggests that the ECMWF was able to forecast moderate flood events that occurred. For streamflow magnitude, both the observed streamflow and the ECMWF forecasted streamflow at 3-6-h lead times were closer, but for the remaining lead times, the forecasted streamflow was overestimated compared to the observed streamflow. Figure 16 shows the streamflow forecasts issued at 12:00 on 22 October 2018 for all lead times. The observed streamflow exceeded the attention threshold for all lead times but the forecasted streamflow was above the attention threshold from 3 to 15-h lead times and it was below the attention threshold for lead times of 18-24 h. This indicates that the model using the ECMWF was able to issue moderate flood events for 3-15 h lead times and that it failed to issue moderate flood events for 18-24-h lead times. In terms of streamflow magnitude, the observed streamflow and the forecast streamflow were close from 3 to 15-h lead times, then from 18 to 24-h lead times, the forecast streamflow was underestimated compared to the observed streamflow.

Forecast skill using categorical statistic
The number of hits decreased as the lead time increased, showing that the forecast skill decreases with a lead time for ECMWF-based flood forecasts (Table 9). There were 21 No-events as the observed and ECMWF-based forecasted streamflow were below the specified thresholds. The POD values decreased as lead time increased (Table 10). However, the FOH values slightly increased, while FOM values significantly increased with lead times. POD values are highly sensitive to hit and miss values, while FOH values are sensitive to wrongly detected flood events (False alarm). Figure 17 also shows that POD decreased as lead times increased. Starting from the 3h lead time up to the 24-h lead time, POD decreases, while FOH and FOM values show slightly and significant increment, respectively, for all lead times. This result agrees with the findings of Martina et al. (2006). Uncorrected Proof CONCLUSION Setting up a viable flood early warning system for communities at risk requires the combination of meteorological and hydrological data as well as forecast tools. Predictions must be relatively accurate to promote confidence so that communities and users will take effective actions from the warning. The present study examined the potential of the HBV model for flood forecasting in the Sebeya catchment using ECMWF rainfall data. It was found that the HBV semi-distributed model was able to simulate the hydrograph at the Sebeya River/Mahoko station with NSE values of 0.52 for calibration and 0.69 for validation data set. The timing of peaks was simulated, although some peaks were underestimated and others overestimated. During the validation period, HBV model performance was very good for some months and poor for other months. Applying bias correction was important for improving the performance of the model result. This suggests that additional rainfall data from other rain gauges within the basin would potentially improve the model performance.
Three flood threshold values (medium, severe and extreme severe) were fixed based on the exceedance probability of the observed and modeled streamflow time series at the Nyundo gauging station. The graphical evaluation revealed that the flood forecasts based on the ECMWF rainfall were slightly more accurate.

Uncorrected Proof
Categorical statistics indicated that the POD was high for a short lead time, showing that a short lead time forecast gives a higher skill score than forecasts for a longer lead time.
Finally, our study demonstrated that forecasted rainfall product from the ECMWF was good in forecasting future flood occurrence. The forecast performance is high for moderate floods but declines as flood severity increases. This product can be used in an early warning system for community awareness and subsequent preparedness for flood events. A major limitation is that the study did not take into consideration the uncertainties in the observed data, and this can have an influence on model performance and will need to be incorporated into the future study. Future research need will be to test the performance of the ECMWF in other flood-prone catchments and to use more than one QPF from different sources to evaluate their performance.

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.  Uncorrected Proof