## Abstract

The most frequent natural disaster is flooding. Advanced forecasting systems are lacking in developing countries. The majority of urban areas are located close to flood plains for rivers. Accurate flood forecasting is necessary for reservoir planning and flood management. The Sabarmati River's atmospheric-hydrologic ensemble flood forecasting model has been developed using TIGGE data. Precipitation can be reliably predicted by TIGGE's global ensemble numerical weather prediction (NWP) systems. By using NWP data, flood forecasting systems may be extended from hours to days. Ensemble weather forecasts are produced using the European Centre for Medium-Range Weather Forecasts and National Centers for Environmental Prediction together with 5-day lead times from TIGGE. The flood occurrences from 2015, 2017, and 2020 were used for the calibration and validation of the ensemble flood forecasting model. Bias was corrected using Bayesian model averaging (BMA), heterogeneous extended linear regression, censored non-homogeneous linear regression (cNLR), and other statistical downscaling techniques. Forecasted and downscaled precipitation data were checked using the Brier score and rank likelihood score. For cNLR, Brier's score performed admirably. The specificity vs. sensitivity performance of the cNLR and BMA approaches is 91.87 and 91.82%, respectively, according to receiver operating characteristic and area under the curve diagrams. Models with the hybrid hydrologic coupling approach accurately predict floods. Users may predict peak time and peak discharge hazard likelihood with reliability using peak time and flood warning probability distributions.

## HIGHLIGHTS

Novel ensemble approach was proposed to predict reservoir inflow using the complex and dynamic rainfall–runoff model.

Spatial, hydrological, and meteorological data were used as input for the semi-distributed hydrological model.

Post-processing of ensemble data and combining the runoff results of the semi-distributed models to boost the overall efficiency.

Improved flood forecasting and warning based on ensemble model.

## INTRODUCTION

India's economy and agricultural productivity are heavily influenced by rainfall, with the country experiencing varying patterns, from arid desert regions to areas with heavy rainfall (Topno *et al.* 2015; Al-zahrani *et al.* 2017). Rainfall–runoff occurs whenever rain intensity exceeds the infiltration capacity of the soil, providing there are no physical obstructions to surface flow (Asadi & Boustani 2013). It is generated by rainstorms and its occurrence and quantity are dependent on the characteristics of the rainfall event, i.e., intensity, duration, and distribution. Runoff can come from both natural processes and human activity (e.g., snowmelt). It plays an important role in the hydrological cycle by returning the excess precipitation to the oceans and controlling how much water flows into stream systems.

Rainfall–runoff modelling is an essential technique used in hydrology to simulate the runoff generated by rainfall in a watershed (Bond *et al.* 2018). The modelling process involves the application of mathematical models to quantify the relationship between rainfall and the resulting runoff in a watershed. GIS (Geographic Information System) and HEC-GeoHMS (Hydrologic Engineering Centre Geospatial Hydrologic Modelling System) are two tools that are commonly used in rainfall–runoff modelling (Burn *et al.* 1991). The hydrological cycle is a critical natural process that helps regulate the Earth's climate, supports the growth of plants and animals, and provides freshwater for human use.

According to Brath *et al.* (1998), quantitative rainfall forecasting is crucial for improving the timeliness of flood control measures by increasing the lead time of flow forecasts. Therefore, authorities in charge of reservoir management can better serve the aforementioned ends with the aid of reliable reservoir inflow projections made with sufficient lead time. Hourly reservoir inflow estimates were generated using a rainfall–runoff model, and ensemble precipitation forecasts were employed to increase lead time. The findings provide a useful benchmark for reservoir managers to optimise flood defences while storing water for the dry season (Chatterjee *et al.* 2001; Champathong *et al.* 2013). In order to increase the lead time for rain forecasts, numerical weather predictions (NWPs) were implemented. Instead of relying on a single deterministic model, several NWPs now employ an ensemble approach to rain forecasting (Cloke & Pappenberger 2009; Ningaraju *et al.* 2016). Since 2004, the Hydrologic Ensemble Prediction Experiment (HEPEX; www.hepex.org) initiative has provided a collaborative context for progress by bringing together researchers, forecasters, and forecast user communities from around the world to work on improving ensemble streamflow predictions and demonstrating their utility in water management decision-making (Du *et al.* 2012; Ehsani *et al.* 2017). The *Handbook of Hydrometeorological Ensemble Forecasting* was recently published by HEPEX participants (Duan *et al.* 2019). This reference work describes the fundamental theories and specific steps for developing ensemble forecasting components, as well as operational ensemble forecast applications and case studies, in the field of hydrometeorology.

This study aims to develop an atmospheric-hydrologic flood forecasting model by using TIGGE data for advanced flood predictions and computer simulations of rainfall–runoff processes. The TIGGE database's forecasts are evaluated using European Centre for Medium-Range Weather Forecasts (ECMWF) and National Centers for Environmental Prediction (NCEP) (ensemble) numerical models across the Sabarmati basin. The capabilities of the models were evaluated for the years 2010–2020 using deterministic and probabilistic evaluation strategies. The researchers in this work employed a rainfall–runoff model to predict reservoir inflows hourly and applied ensemble precipitation forecasts to increase lead time. The findings provide a useful guide for reservoir managers to balance flood protection and water storage during dry seasons. In order to increase the time available for making accurate precipitation forecasts, meteorologists have turned to NWP models. Instead of using a single deterministic model, many NWPs instead employ the ensemble technique to capture rain forecasting uncertainties. In order to examine the rainfall pattern, the probabilistic forecasting concept and the ensemble statistical method are employed. Reservoir inflows were generated in this study using a rainfall–runoff model with an initial condition based on TIGGE ensemble rainfall forecasts and real-time rainfall gauge data. It is useful for forecasting reservoir inflow since it uses both ensemble rainfall projections and real-time data from rain gauges. Inflow into reservoirs is directly proportional to the amount of rain that has fallen. Reservoir inflow simulations were carried out in this study using the hydrologic model, which is based on the soil conservation system-curve number (SCS-CN) method and is a semi-distributed runoff model that takes into account a wide range of geomorphologic and hydrological watershed characteristics. The purpose of this research is to make reliable predictions of reservoir inflows during the monsoon season by combining an ensemble precipitation forecast with a hydrological rainfall–runoff model.

This research study's main goal is to improve flood forecasting's precision and efficacy for the Dharoi Dam area, which is situated in an area subject to seasonal floods. To capture a wider range of meteorological and hydrological elements that affect flood events, this calls for the integration of ensemble models, which harness the collective intelligence of several forecasting models. By making use of the advantages of the ensemble approach, we aim to reduce the shortcomings of individual models and boost the accuracy of flood predictions.

In order to develop an ensemble of models that perfectly matches the parameters of the Dharoi Dam catchment area, we first strive to choose an appropriate ensemble modelling approach, taking into account elements like model diversity and competence. Second, we intend to create a hydrological model that can accurately anticipate how the watershed will react to precipitation and other pertinent meteorological inputs. The hybrid forecasting system will be extremely dependent on this hydrological model.

Our goals also include calibrating and validating the hybrid forecasting system to guarantee its dependability and accuracy by rigorously evaluating it against previous flood disasters. In order to achieve lead periods that give local governments and impacted communities enough time to prepare and respond, we want to evaluate the system's performance in real-time flood forecasting and warning generation. We will also investigate the system's scalability and adaptability, which make it a valuable tool for enhancing flood resilience not only in the vicinity of the Dharoi Dam but also in other vulnerable places.

In conclusion, the main goal of this study is to create a cutting-edge flood forecasting and warning system using a hybrid ensemble and hydrological modelling approach, with the aim of improving accuracy, reliability, and lead time in flood predictions for the area around the Dharoi Dam. By attaining these goals, we hope to reduce the dangers associated with flooding and protect people and property in flood-prone areas.

## STUDY AREA

In this research, the Sabarmati River basin has been considered a study area. The Sabarmati basin is a River basin located in the western part of India, primarily in the state of Gujarat. The basin covers an area of about 21,674 km^{2} and includes the Sabarmati River and its tributaries. The Sabarmati River originates in the Aravalli Range near Udaipur in Rajasthan and flows westward through Gujarat before emptying into the Arabian Sea near the city of Ahmedabad. The total length of the river is about 371 km, of which 216 km are in Gujarat (Patel 2020). The basin includes several major tributaries of the Sabarmati River, including the Wakal River, Meshwo River, and Shedhi River. The basin also includes several important reservoirs, including the Dharoi Dam and the Sabarmati Riverfront Development Project.

*et al.*2021).

The region has a rugged topography and is characterised by the Aravalli Range, which is a major source of water for the River. The basin covers an area of approximately 4,208 km^{2}, and the River has a total length of about 371 km (Patel & Yadav 2022). The Upper Sabarmati River Basin is an important agricultural region, with a large number of villages and towns located along the river. The main crops grown in the region include cotton, wheat, and groundnut, and there are several large irrigation projects, such as the Hathmati and Dharoi dams, that provide water for farming and other uses. The region is also home to several major cities, including Udaipur, Himmatnagar, Gandhinagar, and Ahmedabad.

## METHODOLOGY

Hydrological modelling plays a crucial role in water resource management. It enables water managers to predict future water availability, plan water allocation and distribution, and design water infrastructure. To develop hydrological models, a range of approaches can be used, including empirical, conceptual, and physically based methods (Khaddor & Hafidi 2014; Myo Lin *et al.* 2018). Each approach has its strengths and weaknesses, and selecting the appropriate method depends on the study objectives and data availability. In recent years, hybrid ensemble methods have gained popularity due to their ability to combine different modelling approaches and improve model performance (Heinke & Gerten 2011; Jain *et al.* 2022). This research work focuses on the development of a hybrid ensemble hydrological model using RStudio and hydrologic modelling system (hydrologic engineering centre's hydrologic modelling system (HEC-HMS)). RStudio is a powerful statistical software widely used for data analysis and modelling, while HEC-HMS is a widely used hydrological modelling software. The hybrid ensemble approach involves the integration of multiple models, such as conceptual and data-driven models, to leverage their respective strengths and produce more accurate predictions (Nanditha & Mishra 2021; Niu *et al.* 2022).

Hydrological modelling involves the simulation of water cycle processes such as precipitation, evaporation, infiltration, and runoff. These processes are represented using mathematical equations and parameters that describe the physical properties of the catchment (Karbowski *et al.* 2005; Sudheer *et al.* 2019). The accuracy of hydrological models depends on the quality of the input data and the appropriateness of the model structure and parameters. In practice, it is often challenging to obtain reliable input data and determine appropriate model parameters, which can result in significant uncertainty in model predictions.

*et al.*2006; Dwivedi & Tripathi 2020). They are useful for catchments with limited data and for predicting long-term water availability. Data-driven models, on the other hand, use statistical techniques to relate input and output variables and do not require

*a priori*knowledge of the catchment's physical properties (Khaddor

*et al.*2015; Natarajan & Radhakrishnan 2021). They are useful for catchments with abundant data and for predicting short-term water availability. To develop a hybrid ensemble hydrological model, the first step is to select the modelling approaches to be included in the ensemble. In this case, the use of HEC-HMS software to develop a semi-distributed model and the ensemble model to develop a data-driven model is shown in Figure 2. Integration of the two models is required to develop a hybrid ensemble model.

In order to attain precise flood forecasting, an atmospheric-hydrologic ensemble flood forecasting model was initially developed for the Sabarmati River basin. The TIGGE (Thematic International Global Ensemble) dataset was utilised for this purpose. This dataset provides global ensemble outputs from NWP systems. By leveraging the combined capabilities of multiple NWP models, the TIGGE data have enabled us to accurately forecast precipitation, which is a crucial factor in flood prediction. By integrating NWP data into our methodology, we have successfully expanded the forecast timeframe from a limited number of hours to multiple days. This enhancement offers a significant benefit for flood management and reservoir planning, providing a valuable advantage in these areas.

The ensemble flood forecasting model was calibrated and validated using historical flood data from significant flood events that occurred in 2015, 2017, and 2020. In order to enhance the accuracy of the model, several bias correction techniques were employed, such as Bayesian model averaging (BMA), heterogeneous extended linear regression (HXLR), and censored non-homogeneous linear regression (cNLR), among other methods. The application of statistical downscaling methods has facilitated the improvement of our forecasted and downscaled precipitation data.

In order to assess the effectiveness of these techniques, we utilised metrics including the Brier score and rank probability score. The cNLR approach demonstrated outstanding performance in terms of Brier's score. In addition, we conducted an evaluation of the specificity and sensitivity performance of the cNLR and BMA approaches using receiver operating characteristic (ROC) and area under the curve (AUC) analyses. The results of these analyses indicate that both approaches exhibit a high level of reliability in predicting floods.

The hybrid hydrologic coupling approach employed in our study is a significant innovation. It effectively integrates both atmospheric and hydrological models, enhancing the overall methodology. The coupling mechanism enables precise prediction of not only flood events but also critical variables such as peak time and likelihood of peak discharge hazard. Our system enhances decision-making in flood management and emergency response by providing users with peak time and flood warning probability distributions. This enables users to make informed decisions based on the available data.

In summary, our research effectively integrated ensemble weather forecasting models, hydrological modelling techniques, and advanced statistical downscaling to create a sophisticated flood forecasting and warning system specifically designed for the Dharoi Dam region. The methodology presented not only tackles the urgent requirement for enhanced flood management in susceptible urban regions but also provides a dependable tool for reservoir planning and disaster preparedness in light of the rising occurrence of flooding events.

## HYDROLOGICAL MODEL

A hydrological model is a mathematical representation of the water cycle and associated processes in a given watershed or catchment area. It is used to simulate the movement and distribution of water in the hydrological system, including precipitation, evaporation, infiltration, runoff, and groundwater flow. Hydrological models are used to estimate water availability, predict floods and droughts, assess the impact of land use and climate change on water resources, and support water resource management and planning (Patel *et al.* 2023a, 2023b). They can be used for both small and large watersheds and can be tailored to specific hydrological conditions and objectives.

There are different types of hydrological models, including conceptual, empirical, and physically based models. Conceptual models are based on simplified equations that describe the hydrological processes, while empirical models are based on statistical relationships between input and output variables (Sahu *et al.* 2020a, 2020b). Physically based models are based on the fundamental principles of physics and mechanics, and they simulate hydrological processes using detailed equations and parameters (Gajbhiye & Mishra 2012).

Hydrological models require data on climate, topography, land use, soil properties, and other relevant factors in the watershed. The accuracy and reliability of the models depend on the quality and quantity of the data used and the appropriate choice of model type and parameters.

### SCS-CN method

The SCS-CN method is a widely used hydrological model for estimating runoff from rainfall events. It is a simple and effective method for predicting the volume of runoff from a given rainfall event and is often used for water management and conservation purposes (Shaikh *et al.* 2018).

The SCS-CN method is based on the concept of the curve number, which is a measure of the potential for runoff from a given area. The curve number is calculated based on several factors, including land use, soil type, and hydrological conditions. The SCS-CN method uses this curve number to estimate the volume of runoff from a rainfall event.

*P*is daily precipitation,

*Ia*is initial abstraction,

*F*is actual retention,

*Q*is direct surface runoff,

*S*is maximum retention potential, and

*λ*is initial abstraction coefficient. Combining the aforementioned two equations, the popular SCS-CN equation is given as follows:where

*S*is a function of CN and can be computed using the following equation:where

*S*is measured in millimetres and CN is a dimensionless runoff coefficient that is affected by the type of soil, the use of the land, and the antecedent moisture conditions.

HEC-HMS is a powerful software package developed by the US Army Corps of Engineers for simulating hydrological processes in watersheds (Wang *et al.* 2016). It is a widely used and highly regarded hydrological model that can simulate various hydrological processes, such as precipitation, infiltration, evapotranspiration, snowmelt, and routing. One of the key features of HEC-HMS is its ability to model complex watersheds and river basins. The software allows users to create detailed models that accurately represent the physical characteristics of the watershed such as topography, land use, and soil type. It can also simulate the effects of man-made structures, such as dams and reservoirs, on the hydrological processes within the basin.

*et al.*2018). Figure 3 shows the hydrological model flowchart. In this model, rainfall is the variable, while other parameters are fixed. This model is a semi-distributed hydrological model.

These methods allow users to simulate the complex processes that occur in a watershed system and to account for the effects of topography, land use, and soil type on runoff. HEC-HMS has been used to model the hydrological processes within the Sabarmati River basin and to assess the impacts of different water management strategies on the basin's water resources.

## ENSEMBLE PREDICTION MODEL

An ensemble prediction system (EPS) is a type of NWP system that produces multiple forecasts by running the same model with slightly different initial conditions or model parameters. Each member of the ensemble represents a possible scenario or outcome, and the ensemble forecast is a probabilistic representation of the possible range of outcomes (Shadeed & Almasri 2010; Samantaray & Sahoo 2020). The EPS works by using a set of initial conditions or model parameters that are perturbed or vary slightly from the best estimate of the current state of the atmosphere.

These perturbations are designed to represent the uncertainty in the initial conditions or model parameters and generate a range of possible forecast outcomes (Savvidou *et al.* 2018). The model is then run for each set of perturbed conditions, producing a set of ensemble members that represent the possible range of forecast outcomes. To generate a probabilistic forecast, the ensemble members are combined using statistical techniques, such as averaging or weighting, to produce an ensemble mean or other measures of the central tendency and spread of the forecast distribution. The resulting ensemble forecast provides a probabilistic representation of the possible range of forecast outcomes, which can be used to assess the uncertainty in the forecast and inform decision-making.

In this research work, nonlinear regression (NLR) has been used as it can handle missing data by imputing missing values using the posterior distribution of the missing values. This is important because missing data can be a common problem in real-world applications. In Bayesian statistics, the posterior distribution is the probability distribution that represents our updated beliefs about the unknown parameters of a statistical model, given the observed data and any prior information we may have. By providing a principled way to combine the predictions of different models while accounting for model uncertainty, NLR allows for more accurate and robust predictions, which can lead to better decision-making and improved outcomes (Patel *et al.* 2023a, 2023b). The calculation portion has been performed using RStudio.

In this research on reservoir inflow prediction using predicted rainfall, we utilised data from two major weather forecasting organisations: the ECMWF and the NCEP. The ECMWF is a renowned international organisation that provides global MWPs at various timescales, ranging from a few days to several months ahead. It uses sophisticated computer models to predict weather patterns and provides data on temperature, humidity, wind, and precipitation. To download predicted data, the following link should be explored: https://apps.ecmwf.int/datasets/data/tigge/levtype=sfc/type=cf/.

The NCEP, on the other hand, is an agency of the National Oceanic and Atmospheric Administration that provides weather forecasts and warnings for the United States. It also provides global weather data and produces various NWP models.

To predict reservoir inflow, we needed accurate rainfall data, which we obtained by using both ECMWF and NCEP data. We analysed the rainfall data from both sources and compared them to determine the accuracy and reliability of the predictions. By using data from both ECMWF and NCEP, we were able to generate a more comprehensive and accurate prediction of the inflow into the reservoir. This enabled us to better plan and manage the water resources in the reservoir, which is critical for ensuring a stable and reliable water supply.

Post-processing of precipitation data refers to the various methods and techniques used to improve the accuracy, completeness, and reliability of precipitation measurements obtained from different sources, such as rain gauges, radars, and satellites. Precipitation data are essential for many applications, such as weather forecasting, hydrological modelling, climate research, and water resource management. However, precipitation data can be affected by errors, biases, and missing values, which can reduce the accuracy and reliability of the measurements. Therefore, post-processing of precipitation data involves applying various techniques to correct errors, fill gaps, merge data, detect non-climatic changes, and analyse extreme precipitation events. Post-processing techniques are based on statistical and mathematical models that account for various factors affecting precipitation measurements, such as gauge under catch, wind, and instrument errors. The quality of precipitation data is critical for ensuring accurate and reliable information for decision-making in various sectors. Therefore, post-processing precipitation data is an important task that requires specialised knowledge and skills in various areas, such as statistics, hydrology, and atmospheric science. Post-processing of precipitation data is a critical task that requires careful attention to various factors affecting precipitation measurements. The quality of precipitation data can significantly impact various sectors and decision-making processes. Therefore, post-processing techniques play a vital role in ensuring the accuracy, completeness, and reliability of precipitation data and providing valuable information to decision-makers in various fields.

### Non-homogenous regression

Non-homogeneous regression is a type of statistical modelling technique used in the post-processing of precipitation data. It involves modelling the relationship between precipitation and one or more predictor variables that vary across space and time, such as topography, latitude, or seasonality. Unlike homogeneous regression, which assumes that the relationship between the predictor and response variables is constant across the entire domain of interest, non-homogeneous regression allows for the relationship to vary spatially and/or temporally. This can lead to more accurate predictions of precipitation at a particular location and time.

### Bayesian model averaging

The conversion of data into ‘ensembleData’ objects is a prerequisite for utilising the fitBMA() function, which is available in R through the ensembleBMA package. In order to carry out BMA, different packages like ensembleBMA, gBMA, and caret have been installed.

### Logistic regression

Logistic regression is a statistical method used for modelling the relationship between a binary dependent variable (i.e., a variable that can take on only two possible values) and one or more independent variables. In the context of post-processing of precipitation data in Rstudio, logistic regression can be used to model the probability of precipitation exceeding a certain threshold based on predictor variables such as temperature, humidity, and wind speed.

### Ensemble model fitting and prediction

During the ensemble model fitting phase of this research, we utilised Rstudio to perform a thorough analysis of the ensemble forecasting approach. Initially, scatter plots were utilised to visually represent the correlation between the observed flood values and the predicted flood values. The plots offered a valuable visual evaluation of the alignment between our ensemble model and the actual data, enabling the identification of patterns, trends, or outliers. To quantitatively evaluate the performance of ensemble model, we conducted a calculation to determine the spread skill. This critical metric was utilised to assess the model's capacity to accurately capture the variability in flood forecasts. The presence of a higher spread skill suggests that our model has demonstrated proficiency in accurately representing the full spectrum of potential flood outcomes. This is a critical factor in ensuring the reliability of our predictions. Histograms were utilised as a crucial tool in the process of model fitting. The utilisation of histograms facilitated the comparison between the distribution of observed and predicted flood values. Insights into the accuracy and bias of our ensemble model's predictions were obtained by conducting a visual inspection of the overlap and skewness of these distributions.

In order to enhance the robustness of ensemble model, we incorporated verification rank histograms. The diagrams presented illustrate the hierarchical distribution of ensemble forecasts in relation to the observed flood events. The presence of a well-calibrated rank histogram suggests that our model has generated dependable probabilistic forecasts, which are crucial for the successful operation of flood warning systems.

*x*-axis represents time or geographical locations and the

*y*-axis represents the probability of precipitation. Each point on the plot or map represents the PoP value at a specific time or location. By analysing the PoP plot, anyone can identify periods or regions with higher or lower chances of rainfall.

Figure 6 shows the model prediction plots for the probability of precipitation for the 3-day lead time. The dashed line shows the median prediction by the post-processing method applied to the ensemble data. The grey colour shading in the graph shows the predictive interquartile range, while the solid line shows the observations of precipitation. Subsequently, another graph shows the probability of precipitation, which is a solid line, with corresponding observation circles at 1 for occurrence and at 0 for non-occurrence. To interpret the plots, observe the peak or mode of the distribution, which represents the most likely precipitation value. The spread, or width, of the distribution provides information about the uncertainty or variability associated with the prediction. A wider distribution indicates higher uncertainty, while a narrower distribution suggests more confidence in the predicted values.

In the model prediction phase, analysis was expanded to assess the reliability and precision of ensemble forecasting approach. The initial step involved the computation of the probability of precipitation, which yielded an estimation of the probability of distinct rainfall occurrences that could potentially result in floods. The utilisation of this probabilistic measure has facilitated the communication of the level of uncertainty linked to our predictions, thereby improving the decision-making process for flood warning and preparedness.

Furthermore, predictive density plots were utilised to effectively visualise the distribution of flood predictions. The utilisation of these plots facilitated effective communication of the entire spectrum of potential flood scenarios, thereby assisting emergency responders and local authorities in making well-informed decisions. By incorporating an analysis of both the measures of central tendency and dispersion in our predictions, we have presented a comprehensive assessment of the flood risk.

### Ensemble model verification

Verification is the process of evaluating the accuracy and reliability of precipitation data generated by a forecasting model or a measurement system. The verification process is essential for ensuring the quality of precipitation data used in post-processing activities such as forecasting, climatology, and hydrological modelling. In this case, two methods of verification have been adopted.

#### Brier score

The Brier score is a measure of the accuracy of probabilistic forecasts, including precipitation forecasts. It compares the predicted probability of precipitation occurring at a particular location with the actual observation (0 for no precipitation and 1 for precipitation). A perfect forecast will have a Brier score of 0, while a completely random forecast will have a Brier score of 0.25. The lower the Brier score, the better the forecast. Table 1 shows the formula to calculate the Brier score.

Sr. No. . | Verification measures . | Formula . |
---|---|---|

1 | Pearson's correlation coefficient | |

2 | Root-mean-square error | |

3 | Relative root-mean-square error | |

4 | Probability of detection (hit rate) | |

5 | False alarm ratio | |

6 | Frequency bias | |

7 | Brier score | |

8 | Brier skill score |

Sr. No. . | Verification measures . | Formula . |
---|---|---|

1 | Pearson's correlation coefficient | |

2 | Root-mean-square error | |

3 | Relative root-mean-square error | |

4 | Probability of detection (hit rate) | |

5 | False alarm ratio | |

6 | Frequency bias | |

7 | Brier score | |

8 | Brier skill score |

*Note*: *F*, *O*, *P _{F}*, and

*P*denote the forecast, corresponding observation, probability of precipitation, and observed frequency, respectively;

_{O}*N*is the amount of forecast and observation pairs; similarly, and denote the forecast average and observation average, respectively; and BS

_{ref}is the Brier score of the reference probability forecast, typically the probability of event occurrence from the climatology.

The Brier score is a metric used in statistical analysis to evaluate the accuracy of probabilistic predictions. The Brier score is a scoring rule used to evaluate the accuracy of probabilistic predictions generated by a model. The metric quantifies the average squared difference between the predicted probabilities and the actual outcomes for binary classification problems. The Brier score is calculated by summing the squared differences between the predicted probability and the actual outcome, divided by the total number of instances. The Brier score is a metric used to evaluate the performance of a model. A lower Brier score indicates better performance, with a perfect score of 0 indicating perfect prediction. A lower Brier score indicates that the predicted probabilities are more closely aligned with the actual outcomes.

The Brier score calculation algorithm involves a step-by-step process. It begins by iterating through each instance and performing a specific computation. This computation involves finding the squared difference between the predicted probability and the actual outcome for each instance. These squared differences are then summed together. Finally, the sum is divided by the total number of instances to obtain the Brier score.

#### Ranked probability score

*x*-axis, ranging from 0 to 1, and the corresponding Brier scores on the

*y*-axis as shown in Figure 8. By analysing the Brier score plot, the thresholds at which the methods exhibit the lowest Brier scores it will be identified, indicating the most skillful forecasts.

The RPS is a metric used to assess the performance of probabilistic forecasts. The RPS is a metric used to assess the accuracy of probabilistic forecasts. It measures the extent to which the predicted probabilities are correctly ordered in relation to the actual outcomes.

The calculation for RPS involves summing the squared differences between the cumulative distribution function (CDF) of the predicted probability and the CDF of the actual outcome. The analysis reveals that lower values of RPS indicate a higher level of calibration in the model's predicted probabilities. A reduced RPS indicates that the ensemble model's probability rankings are more closely matched to the actual observed outcomes. The concept of RPS revolves around the arrangement of probabilities. It proves to be particularly valuable when the objective is to guarantee that the probabilities generated by the model are not only accurately calibrated but also properly ordered. The computation of RPS involves the following steps: (1) Calculate the cumulative distribution functions for both the predicted probabilities and the actual outcomes. (2) Calculate the squared differences between the cumulative distribution functions obtained in step 1.

Reliability diagrams are graphical representations used to assess the reliability of a system or component. These diagrams provide a visual depiction of the relationship between time and the probability of failure. Reliability diagrams are a type of graphical tool that is commonly employed in the field of probabilistic forecasting to effectively visualise and assess the calibration of such forecasts. They assist in evaluating the alignment between the predicted probabilities and the observed frequencies. To perform the calculation, follow these steps. First, divide the predicted probabilities into groups. Next, calculate the observed frequency of positive outcomes for each bin. Finally, plot the observed frequency against the mean predicted probability within each group. In order to determine the accuracy of a model, it is important to examine its reliability diagram. Ideally, a well-calibrated model should exhibit a reliability diagram where the predicted probabilities align closely with the observed frequencies. This alignment is represented by a diagonal line at a 45-degree angle. The presence of deviations from the diagonal line in the reliability diagram suggests that there is a potential miscalibration of the model.

The purpose of reliability diagrams is to visually evaluate the alignment between the predicted probabilities of an ensemble model and the actual outcomes. This assessment aids in the identification of any calibration issues that may be present.

The algorithm consists of several steps. First, the predicted probabilities are divided into bins. Then, the observed frequencies and mean predicted probabilities are calculated for each bin. Finally, these data points are plotted on a graph.

*x*-axis) with the observed frequency of precipitation (

*y*-axis). The diagram is divided into bins or intervals of predicted probabilities, and for each bin, the average observed frequency is calculated. For statistical analysis in research, the comparison of the performance of different post-processing methods based on their Brier scores, Critical Rank Probability Score (CRPS) values, and reliability diagram is required. Identify the methods with lower Brier scores and CRPS values, indicating higher accuracy. In addition, examine the reliability diagram to assess the calibration of the methods and choose those that provide well-calibrated probability estimates.

ROC and AUC plots are commonly used in statistical analysis and machine learning to evaluate the performance of binary classifiers. The ROC curves and AUC are utilised for evaluating the discrimination performance of a binary classifier. The calculation involves plotting the ROC curve, which compares the true positive rate (TPR) to the false positive rate (FPR) at different threshold values. The AUC metric quantifies the extent of the area located beneath the ROC curve. The analysis reveals that a higher AUC signifies improved discriminative power of the ensemble model.

The AUC is a metric used to evaluate the performance of a model by measuring the probability of correctly ranking a randomly selected positive instance higher than a randomly selected negative instance. In the context of evaluating an ensemble model, the concepts of ROC and AUC are used to measure its performance in distinguishing between positive and negative classes at various threshold values. The process of generating a ROC curve entails the computation of TPR and FPR for different threshold values. In addition, the AUC is commonly determined using integration methods such as the trapezoidal rule. The following post-processing methods offer a thorough assessment of the performance of ensemble models. This assessment includes the evaluation of calibration, discrimination, and the ranking of predicted probabilities. By employing these combined techniques, it is possible to achieve an ensemble model that not only delivers accurate results but also offers dependable and appropriately calibrated probabilistic predictions.

### Comparison of observed, raw and post-processed rainfall

Figure 11 shows the statistical downscaling of raw precipitation data with respect to observed precipitation data. Figure 11 also shows that raw ensemble precipitation data and observed precipitation data have too much variation in the peak rainfall. But after post-processing of the rainfall, it shows a good correlation with the observed data.

## RESULTS AND DISCUSSIONS

The analysis of watershed features in relation to floods using modern GIS methods reveals the optimal way for extracting drainage networks and deriving morphometric parameters from Cartosat DEM, which lowers digitising efforts. Based on the DEM analysis derived from the CARTOSAT, shows that the hydrological and river morphological parameters were changed in 10 year period of time due to the effect of the heavy flood in the study area. The findings reveal that, during the rainy season, there is a chance of increased runoff in the higher streams, resulting in flooding in the basin's lower reaches. The Dharoi dam was built along the channel of the main river to manage floods; however, it will not prevent flooding in the lower areas of the basin after high rains due to a rapid release of massive amounts of water from the dam. As a result, mitigation measures in the upper streams, such as early flood warning systems and adaptation of advanced flood forecasting techniques for the Upper Sabarmati River Basin, should be taken to prevent floods in the floodplain of the downward streams to protect agricultural crops and settlements.

### Prediction model in HEC-HMS

Table 2 shows a comparison between the peak discharge from the hydrological model for the different flood events simulated by the ensemble member and the peak discharge that was actually seen. For years 2015, 2017, and 2020, it is shown in Table 2 that the date and time of the peak discharge closely match the observed discharge.

. | Year 2015 . | Year 2017 . | Year 2020 . | |||
---|---|---|---|---|---|---|

Measure . | Simulated member . | Observed . | Simulated member . | Observed . | Simulated member . | Observed . |

Peak discharge | 9,067.3 | 9,123.7 | 7,930.9 | 7,857.4 | 11,943.4 | 12,379.2 |

Volume (M.M^{3}) | 431.62 | 410.20 | 322.93 | 332.72 | 536.45 | 540.21 |

Date of peak | 29 July 2015 | 29 July 2015 | 24 July 2017 | 24 July 2017 | 21 August 2020 | 21 August 2020 |

Time of peak | 02:00 | 02:00 | 08:00 | 09:00 | 08:00 | 07:00 |

. | Year 2015 . | Year 2017 . | Year 2020 . | |||
---|---|---|---|---|---|---|

Measure . | Simulated member . | Observed . | Simulated member . | Observed . | Simulated member . | Observed . |

Peak discharge | 9,067.3 | 9,123.7 | 7,930.9 | 7,857.4 | 11,943.4 | 12,379.2 |

Volume (M.M^{3}) | 431.62 | 410.20 | 322.93 | 332.72 | 536.45 | 540.21 |

Date of peak | 29 July 2015 | 29 July 2015 | 24 July 2017 | 24 July 2017 | 21 August 2020 | 21 August 2020 |

Time of peak | 02:00 | 02:00 | 08:00 | 09:00 | 08:00 | 07:00 |

The reservoir inflow prediction model was developed using predicted precipitation data from ECMWF and NCEP, members of the TIGGE organisation. In order to obtain bias-corrected data, various statistical downscaling methods like BMA, HXLR, and cNLR were implied for the years 2015, 2017, and 2020 using the ensemble model. Using the machine learning algorithm in RStudio, the testing and training data were segregated in ratios of 30 and 70%, respectively, for the gauging stations of Dharoi, Jotasan, and Kheroj. For verification of predicted and downscaled precipitation data, the methods of Brier score and rank probability score were adopted, and the Brier score showed very good results for cNLR. The statistically downscaled precipitation data for 2015, 2017, and 2020 were added as input to the HEC-HMS model in order to obtain predicted peak discharge for all 3 years.

Table 3 shows the statistical performance evaluation of the model for 2015, 2017, and 2020. In which, the Nash–Sutcliffe Efficiency (NSE) value for the year 2020 is 0.844, which is the highest, and the *R*^{2} value for the year 2017 is the highest at 0.938.

Year . | NSE . | RMSE . | R^{2}
. |
---|---|---|---|

2015 | 0.829 | 0.45 | 0.915 |

2017 | 0.796 | 0.54 | 0.938 |

2020 | 0.844 | 0.35 | 0.921 |

Year . | NSE . | RMSE . | R^{2}
. |
---|---|---|---|

2015 | 0.829 | 0.45 | 0.915 |

2017 | 0.796 | 0.54 | 0.938 |

2020 | 0.844 | 0.35 | 0.921 |

#### Exceedance plot

The plot for all 5 days of lead time has been put for predicted discharge for years 2015, 2017, and 2020. It helps us understand the sequence of flood events with a specific range of dates, which could further facilitate the dam operation and related activities. Based on the ensemble reservoir inflow, flood warnings can be issued in advance based on the inflow in the dam. As per the analysis and discussion with the dam engineer, it has been concluded that if the dam is 90% full, then the following is the flood warning threshold for the reservoir inflow: So, based on this threshold inflow, it is categorised into five warnings.

The warning is given in five levels from very low to very high.

- (i)
*Dark green colour:*It shows the very less quantity of inflow (0–250 cumec) into reservoir and hence, no warning is required. - (ii)
*Light green colour:*It shows the less quantity of inflow (250–500 cumec) into reservoir, and hence, no warning is required. - (iii)
*Yellow colour:*It shows the medium quantity of inflow (500–2,500 cumec) into reservoir, and low warning is given at this stage. - (iv)
*Orange colour*: It shows the high quantity of inflow (2,500–8,000 cumec) into reservoir, and hence, medium warning is given at this stage. - (v)
*Red colour:*It shows the very high quantity of inflow (>8,000 cumec) into reservoir, and so high warning is given at this stage.

## CONCLUSIONS

This study concluded that reservoir inflow prediction using hydrological modelling and simulation for the Sabarmati River basin has great importance. The available hydrological, meteorological, and satellite data facilitate intensive data analysis with the help of thematic maps, and the curve number has been generated in Google Earth Engine using Land Use Land Cover (LULC) and hydrological soil group maps, which facilitate the hydrological model in order to establish the loss and transform model in HEC-HMS. The hydrological model prepared using methods of loss transform, baseflow, and Muskingum routing for the years 2015 and 2017 resulted in peak simulated discharges of 8,067.3 cumec and 6,530.9 cumec for calibration and validation of the model, respectively. The NSE, root-mean-square error (RMSE), and *R*^{2} for the calibrated and validated models show values of 0.829, 0.45, 0.915, and 0.796, 0.54, and 0.938, respectively, which show very satisfactory results of the developed model. After validation of the model, a test case for the year 2020 was checked, and it shows similar results for NSE, RMSE, and *R*^{2} as 0.844, 0.35, and 0.921, respectively.

It has been concluded from the ensemble post-processing analysis that model fitting for a 3-day lead time shows good model fitting in terms of scatterplot, verification rank histogram, and spread skill ratio. Similarly, model prediction also shows that 2- and 3-day lead times having a good probability of precipitation as all the events are precipitation occurrences. Model verification is the key parameter for checking the model's performance. The Brier score, rank probability score, reliability diagram, ROC, and AUC plots are the key parameters for model verification. Based on the analysis, the Brier score for a 3-day lead time for Dharoi points is 0.10, which is the best suited using logreg methods compared to other methods. The rank probability score is also 0.10 for logreg methods for Jotasan points. For reliability diagrams, it has been concluded that a 3-day lead time using cNLR and BMA gives good consistency with the diagonal line compared to other methods. The ROC and AUC diagrams for the cNLR and BMA methods show 91.87% and 91.82% performance, respectively, which is the specificity vs. sensitivity graph. So, it has been concluded from all the analysis for post-processing of the ensemble precipitation data that cNLR and BMA methods will be adopted for the hydrological model for reservoir inflow prediction. The correlation between observed and predicted peak discharges showed values of 0.71 and 0.65, which showed a satisfactory relation between both discharges. In order to establish an advance flood warning system at Dharoi dam, various graphical inspections were made using the area-elevation curve, storage capacity curve, and rule curve. The exceedance plots for predicted peak discharge were established for 5 days of lead time discharge data for Dharoi reservoir. It has been concluded from the aforementioned research that if the dam is 90% full, then the exceedance plot shows the flood warning. Based on the hydrological model, the maximum inflow simulated by the model is 10,469 cumec for the flood event on 30 July 2015. The maximum inflow simulated by the model is 7,168 cumec for the flood event on 24 July 2017. The maximum inflow simulated by the model is 4,732 cumec for the flood event on 25 August 2020. So, these will be useful as flood warnings for the dam authority, as they can plan for the release of the water from the dam. The ensemble model could be a valuable tool for improving the reliability and accuracy of precipitation estimates and can be used in regions where observations are sparse or unavailable. However, it also has some limitations that need to be carefully evaluated, and appropriate statistical methods need to be applied to account for model biases and improve the accuracy of the estimates.

## ACKNOWLEDGEMENT

The authors are thankful to the Civil Engineering Department, Sardar Vallabhbhai National Institute of Technology, Surat, for providing an opportunity to do research work. The authors are also thankful to CWC, Gandhinagar; the State Water Data Centre, Gandhinagar; and Executive Engineer, Dharoi Dam for their valuable support in data provision as well as guidance in this project. We would also like to extend our deepest gratitude to Mr Gaurav Ninama, Assistant Engineer at Dharoi Dam for their huge support in data provision and technical guidance in this research work, and all those who have directly and indirectly funnelled us into this research work. The authors are thankful to the Civil Engineering Department, Institute of Technology, and Nirma University for providing support for this research work.

## FUNDING STATEMENT

The authors declare that they received no funding for this research and there is no any potential conflict of interest in this paper.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*American Journal of Engineering Research*

*IEEE-International Conference on Advances in Engineering, Science and Management (ICAESM-2012)*, Nagapattinam, India, 2012