ABSTRACT
Rainfall has a dominant role in rainfall-runoff models, with the rendering of these models depending on the data accuracy and on the way that rainfall is spatially allocated. The research proposes a methodological framework where a genetic algorithm (GA)-based method responsible for the spatial distribution of gauge observations at the basin scale is coupled with the HEC-HMS hydrological model to produce simulated discharges of high accuracy. The custom-developed GA is used to divide a 2D space, adhering to specific criteria, into polygonal geometries that represent gauge zones of influences, similar to the Thiessen polygon method concept. A collection of vectorial polygonal areas, equivalent in number to the employed monitoring stations, is produced with the areal weights to be used for distributing the rainfall across the case study basin and subsequently to force the hydrological simulations. The generated gauge weights are validated for a different temporal precipitation event. The final outputs expressed through a series of statistical measures, clearly demonstrate the effectiveness of the specific methodology (e.g. R2 and Nash–Sutcliffe are larger than 0.83 and 0.73). The methodology can foster accurate hydrological simulations, especially in cases where there is a limited number of rainfall stations and corresponding observations.
HIGHLIGHTS
Geometric-based rain distribution introduces uncertainties in hydrological models.
Genetic algorithms (GAs) can produce alternate rain distribution shapes.
Automated coupling of GA with the hydrological model at a well-defined case basin.
Validation of the GA gauge weight method performance at different temporal events.
New perspectives in watershed's hydrological modeling with limited gauges.
INTRODUCTION
Emerging alternate products, such as precipitation ones, are the response to sidestepping the lack of densely monitoring networks and derived observations (Kidd et al. 2017) that are consistently needed by the scientific community and component authorities, such as UNESCO's Intergovernmental Hydrological Programme (IHP). The non-traditionally captured rainfall data are usually derivatives of (i) remote sensing and/or terrestrial observation-based gridded datasets, e.g. the Integrated Multi-satellite Retrievals (IMERG) for Global Precipitation Measurement (GPM) mission (Pradhan et al. 2022), (ii) reanalysis datasets, e.g. the National Center for Environmental Prediction–National Center for Atmospheric Research reanalysis (NCEP/NCAR) (Kanamitsu et al. 2002), or the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 data (Hersbach et al. 2020), and of (iii) reprocessed datasets, e.g. the European gridded dataset (E-OBS) (Hofstra et al. 2008), with the latter dataset being developed with the use of spatial interpolation techniques (Mavromatis & Voulanas 2021). The use of these datasets is broadly considered in hydrological modeling (e.g. Skoulikaris et al. 2019; Probst & Mauser 2022).
Regardless of the source, rainfall is the most significant input variable in rainfall-runoff models; it can thus be established that it decisively controls the accuracy of hydrological simulations (Renard et al. 2010). The impacts of rainfall's data errors and derived uncertainties on hydrological modeling are thoroughly investigated in the literature (e.g. Bárdossy & Das 2008). Synoptically, in gauge-based rainfall monitoring cases, these errors are related to the quality of the data itself (La Barbera et al. 2002), and/or to the mean areal precipitation usually used in non-event based simulations (Moulin et al. 2009), and/or to the spatial and temporal discretization of the gauge stations, i.e. the gauge network density (Skoulikaris et al. 2019; Wang et al. 2023), and/or to rainfall's variability accurate representation (Xu & Singh 1998). Moreover, the coverage scarcity in terms of gauges at global scale (Mishra & Coulibaly 2009) and the consequent use of interpolation techniques applied to spatially distribute point source information at watershed scales impose additional input-based uncertainties in the modeling process (Moulin et al. 2009).
The most widely applied precipitation interpolation based schemes are the Arithmetic Mean, Thiessen polygons, Inverse Distance Weighting (IDW), different types of Kriging, and the Spline method (Hohmann et al. 2021; Skoulikaris et al. 2022), with the literature depicting numerous comparative research on precipitation interpolation methods. For example, Liu et al. (2020a) concluded that the mix of methods may perform better than single applied methods, while the IDW method performed well but not in all the case study sites. Yang et al. (2015) also concluded that daily rainfall in an Australian case study was better represented by the IDW method in comparison to other methods. Nikolopoulos et al. (2015) showed that for the assessment of the debris flow, which relies on rainfall derivatives, the Thiessen polygon method works as good as more complex methods. Guo et al. (2022) proposed the use of the spline's method variation when coupled with hydrological models in Chaohe River basin, China. Recent advancements propose supplementing these well-established methods with a copula-based probability model that represents a multivariate uniform distribution, which examines the dependence between multiple variables. In other words, as supported by Li & Babovic (2019), copulas facilitate the isolation of joint or marginal probabilities for a pair of variables that are enmeshed in a more complex multivariate system, such as the rainfall-runoff hydrosystem. However, little evidence exists on the optimal interpolation method since it depends on the examined variable, the case study area physiography and extend, the available data quality and quantity and the number and spatial distribution of the available data points (Ohmer et al. 2017). Little consensus is also shown when the temporal and not only the spatial component of the rainfall data is considered. For example, Skoulikaris et al. (2022) in their analysis on whether bias correction or spatiotemporal interpolation should go first in hydrologic simulations, suggested the use of the spatiotemporal kriging method, while Hussain et al. (2010) highlight the use of the transformed hierarchical Bayesian method during monsoon periods in Pakistan.
Focusing on the Thiessen polygon method, it belongs to the so-called gauge weights precipitation methods and makes use of geometry rules to allocate point information, i.e. stations’ rainfall, to an area, i.e. to a basin's extent. Particularly, in this method, as thoroughly described by Croley & Hartmann (1985), all points are connected two by two with straight-line segments and vertical bisectors are drawing to these segments. Then, the vertical bisectors intersect and create as many polygons as the number of stations, covering the watershed of interest. The bisector-based geometric approach determines that any location within a Thiessen polygon is closer to its associated point than to any other point (that's for it is also known as the Nearest Neighbor (NN) method), thus observations near one another in space are more likely to be similar than those which have that are far apart. Τhe latter is also known as the spatial autocorrelation approach broadly used in geostatistics. The Thiessen polygon method is widely used in hydrologic modeling applications (e.g. Skoulikaris & Ganoulis 2012; Duraisekaran et al. 2021). Nonetheless, it is imperative to argue that distance alone cannot serve as the sole surrogate measure of spatial correlation between data (Teegavarapu 2012). For instance, the variability of precipitation may be subjected to topographic characteristics including local relief peculiarities (Formetta et al. 2022). Furthermore, deterministic geometric rules of gauge weights precipitation methods lead to a sharp change in rainfall at the defined geometric boundaries, as well as its uniform distribution within these boundaries, i.e. an arbitrary non-physically based consideration (Zhao et al. 2022). To that end, and due to the complexity and non-linearity inherent in rainfall patterns, emerging techniques such as genetic algorithms (GAs) or artificial intelligence (AI) are considered modern approaches in the spatial allocation, as well as the forecasting, of rainfall (Liu et al. 2020b; Fooladi et al. 2023).
GAs are a meta-heuristic method inspired by evolution, with a variety of applications in water resources topics (Maier et al. 2014). The literature demonstrates that GAs can be directly integrated into the modeling procedure, such as being used for the calibration of surface and groundwater hydrological models (e.g. Shafii & De Smedt 2009; Del Giudice & Padulano 2016; Kirlas & Nagkoulis 2023). GAs are also enabled as optimizers, e.g. for aquifer recharge locations’ selection (Kourakos et al. 2023) or for selecting the number and location of pressure sensors (Soroush & Abedini 2019). Scholars have investigated the usefulness of GAs in precipitation forecasting (Salih et al. 2020), while GAs have been partially employed for the optimization of techniques or numerical models dedicated to rainfall distribution. For instance, Bărbulescu et al. (2021) used GAs to optimize the parameter β (usually set to 2) corresponding to the power affecting the weight in IDW, while Chang et al. (2005) implemented fuzzy theory alongside with IDW to interpolate the precipitation, with GAs used to determine the fuzzy-related parameters.
The objective of the research is to introduce a novel rainfall distribution method designed to enhance hydrological modeling performance. To accomplish this goal, GAs are engaged in the areal segmentation of a particular river basin in terms of precipitation coverage, with the outputs driving a widely recognized hydrological model, specifically the Hydrologic Engineering Center–Hydrologic Modeling System (HEC-HMS) model, within a well-established pilot basin. The optimal simulations, achieved by comparing observed discharges with their corresponding simulated values during a designated calibration period, define the rainfall distribution geometry responsible for this optimal state. Subsequently, the suitability of the selected geometries is validated across different time periods providing very promising results. The proposed methodology represents a modern, innovative, and fully automated approach that enhances hydrological simulation outputs by optimizing the input data spatial allocation. It can be readily applied to any case study basin with recorded discharge measurements.
MATERIALS AND METHODS
Case study area and hydrological model
In the research, the investigation focuses on the behavior of an alternate rainfall distribution method rather than well-established gauge weight methods, such as the Thiessen polygon method, when integrated with hydrologic modeling processes at a basin scale. The research was conducted in the demonstration watershed featured in the online tutorial and guide of the HEC-HMS1 for the following reasons: (i) minimize biases and uncertainties arising from hydrological model parameterization and focus solely on the influence of rainfall distribution on discharge, (ii) ensure that the input data, e.g. precipitation and observed runoff, are reliable and lacking of monitoring errors, and (iii) compare the derived outputs with outputs of high accuracy. The selection was made to provide an optimal environment for our study; hence by selecting a well-established case study area and hydrological model and solely modifying the rainfall distribution scheme in each set of simulations based on the GA algorithm outputs (please refer to the next section for detailed information), it is considered that the criteria set were effectively met.
The HEC-HMS hydrological model has been developed by the US Army Corps of Engineers (Feldman 2000) and is a well-established and routinely applied conceptual and lumped model to various spatiotemporal scales. It is used to analyze, among others, urban flooding, assess the flood frequency, perform hydrologic simulations and reservoirs simulations (Frysali et al. 2023; Gelete et al. 2023), while it can be also combined with artificial neural networks (ANNs) for streamflow simulations (Gunathilake et al. 2021). Epigrammatically and regarding its operation, the simulation of a watershed's runoff is conducted with specific predefined methods attributing the various components and processes of the hydrological cycle at the sub-basin scale, with precipitation being the main input variable. In particular, the so-called loss methods are responsible for computing the infiltration and the resulting runoff volume, the transform methods are used to represent the direct runoff, including overland flow and interflow, the baseflow methods for assessing the subsurface flow, and the routing methods for calculating open channel flow (HEC-HMS User Manual 2016). The methods provided in HEC-HMS are either designed for simulating events, or others support continuous simulation.
Processes . | Method . | Parameters . | Parameters’ values . |
---|---|---|---|
Loss | Initial and Constant | Initial deficit (mm) | 12.7 |
Maximum deficit (mm) | 203.2 | ||
Constant rate (mm/h) | 0.2794 | ||
Impervious (%) | 0 | ||
Transform | Clark Unit Hydrograph | Time of concentration (h) | 11 |
Storage coefficient (h) | 31 | ||
Time–area method | Default | ||
Baseflow | Linear Reservoir | Layers | 2 |
Initial (m3/s/km2) | 0 | ||
Fraction | 0.5 | ||
Coefficient (h) | 330 and 1,100 for each layer |
Processes . | Method . | Parameters . | Parameters’ values . |
---|---|---|---|
Loss | Initial and Constant | Initial deficit (mm) | 12.7 |
Maximum deficit (mm) | 203.2 | ||
Constant rate (mm/h) | 0.2794 | ||
Impervious (%) | 0 | ||
Transform | Clark Unit Hydrograph | Time of concentration (h) | 11 |
Storage coefficient (h) | 31 | ||
Time–area method | Default | ||
Baseflow | Linear Reservoir | Layers | 2 |
Initial (m3/s/km2) | 0 | ||
Fraction | 0.5 | ||
Coefficient (h) | 330 and 1,100 for each layer |
In the current study, the HEC-HMS parameters and coefficients representing the various hydrological processes remain constant across all conducted simulations, with their values matching those used in the HEC-HMS tutorial (Table 1). While the tutorial employs the U.S. Customary metric system for data units, e.g. rainfall and discharges are presented in inches and cubic feet per second, the outputs in this paper are presented using the international metric system. The case study data repository also includes precipitation and discharge data for the period of April 10 to April 14, 1994. Although these data are not utilized in the tutorial, they are employed to validate the proposed methodology, as described in the following section. Finally, the validation of the proposed methodology is conducted through the well-established goodness-of-fit measures, i.e. the RMSE, NSE, R2 (Krause et al. 2005; Chadalawada & Babovic 2019), plus the percent bias (%) index.
Generic algorithm and objective function
The development of the various geometries attributing the best possible rainfall distribution, namely the optimization problem at the basin scale, was accomplished with the use of R programming language and GAs (Scrucca 2013). The sequential implementation approach begins with initially creating a set of random binary vectors (first generation of chromosomes) that represent possible solutions to the optimization problem. The created vectors are validated using an objective function (presented in the following paragraph) that verifies the accuracy of the solution. Thereafter, new vectors are created (next generation of chromosomes) and are tested again for their accuracy. The implemented loop process continues until a threshold is reached, with the threshold either being an accuracy limit, or a number of created generations. In this study, we performed 700 generations containing 50 chromosomes each. The crossover probability has been chosen to be 0.85 and the mutation probability equals 0.2. Elitism is used by keeping the five best chromosomes of each generation to the next generation without any transformation.
Penalty A: If at least one of the generated points falls outside the designated area, a high penalty is applied. The penalty increases further if multiple points are situated outside the designated area. This particular penalty encourages the GA to ensure that all points remain within the examined area, facilitating the implementation of K-NNGs and the subsequent creation of polygons.
Penalty B: If the number of created polygons does not match the number of stations, a moderate penalty is applied. A greater disparity between the polygon number and the station number leads to an increased penalty.
Penalty C: If a polygon intersects with more than one station or less than one station, a penalty is applied. Each polygon should encompass exactly one station, similar to the Thiessen polygons concept. In the case that this criterion is not met a low penalty is given. The penalty decreases as the number of polygons containing precisely one station increases.
In all three cases, the fitness value guides GA in identifying distinct solutions that satisfy the problem's criteria. Once these criteria are met, the fitness value becomes equivalent to the RMSE, with the aim of minimizing it.
RESULTS
Rainfall datasets
Examining the data for the calibration period (Figure 3(a)), it is observed that the DUJP station recorded the maximum rainfall of 10.67 mm on 30 April 1996. On the same day, MFFP and PNXP also recorded substantial rainfall, with both stations measuring 9.1 mm of rainfall height. Regarding the MFFP station, its maximum rainfall height of 9.4 mm recorded one day earlier. The total recorded rainfall during the calibration period is 146.1 mm, with the DUJP, MFFP, and PNXP stations recording 44.7, 52.3, and 49.1 mm, respectively. On the other hand, and during the validation period of April 1994, the PNXP station recorded the highest rainfall of 14.98 mm on April 12th. On the same day, the DUJP and MFFP stations recorded rainfall amounts that were 52.6 and 75.9% smaller than the maximum, respectively. Over the 5-day storm event in this validation period, the DUJP, MFFP, and PNXP stations recorded rainfall totals of 66.0, 28.5, and 84.6 mm, respectively, summing up to 179.1 mm in total. Comparatively, the cumulative rainfall during the 5-day storm event in the validation period is 18.5% greater than the one of the 6-day event in the calibration period.
From a spatial perspective, during the calibration period, the DUJP station that is located farther from the catchment's boundaries, recorded the highest precipitation, with the DUJP and PNXP stations registering approximately 14.5 and 6.1% less rainfall respectively than MFFP. In contrast, during the validation period, the PNXP station, which is the only one found within the catchment, recorded 21.9% more rainfall than DUJP and 66.3% more than MFFP. This highlights the randomness of the rainfall pattern in the area of interest and the non-stationarity of the utilized datasets.
Compliance to the penalties
Once the point sets that fall within the orthogonal boundary of the case study area are defined, the algorithm proceeds to generate K-NNGs for delineating the orthogonal area into polygons. These generated polygons are then evaluated for compliance with the constraints outlined by the other penalties. Starting with Penalty B, where achieving a one-to-one correspondence between stations and polygons is an ideal scenario, the algorithm needs to explore numerous combinations to meet these requirements. Figure 4(c), for example, illustrates a case where the number of automatically drawn polygons is twice that than of the stations, i.e. six polygons vs. three stations. The polygons that successfully satisfy the one-to-one rule are examined to ensure their conformity with Penalty C, which involves validating whether each station precisely intersects with a unique polygon. In alignment with Penalty C, Figure 4(d) shows a generated scenario where a low penalty is assigned, since despite having three stations and exactly three polygons, one polygon does not encompass any station.
Finally, all the sets of polygons produced by 8, 10, or 12 points and successfully met the custom criteria are further evaluated for their effectiveness in spatially distributing rainfall compared to the Thiessen polygon method, as detailed in the following sections.
GA spatial outputs
The numerical representation of the spatial entities (polygons) is the gauge weight method. The default weights used in the tutorial (namely Thiessen polygons) as well as the best ones determined after implementing the GA that was formulated starting with 8, 10, and 12 vector points, respectively (hereinafter these methods are named as GA – 8 points, GA – 10 points, and GA – 12 points, respectively), are depicted in Table 2. As observed, in the case of the GA – 12 points method, the distribution of rainfall follows the spatial pattern of the Thiessen polygon method. This means, that in both methods, the station PNXP has the greatest influence within the catchment, covering 71.6 and 57.0% of the catchment area, respectively. In the case of the GA – 8 points method, the PNXP station has less influence, while each of the other two stations, DUJP and MFFP stations, cover approximately 36.8% of the watershed. Notably, in the case of the GA – 10 points method, the PNXP station dominates over the other two stations, with a spatial coverage of 99.5%; in other words, the rainfall from the other two stations has a negligible impact on the river hydrology and corresponding hydrological simulations.
Gauge name . | Gauge weights per method . | |||
---|---|---|---|---|
Thiessen polygons . | GA – 8 points . | GA – 10 points . | GA – 12 points . | |
DUJP | 0.12 | 0.369 | 0.004 | 0.125 |
MFFP | 0.31 | 0.368 | 0.001 | 0.159 |
PNXP | 0.57 | 0.263 | 0.995 | 0.716 |
Gauge name . | Gauge weights per method . | |||
---|---|---|---|---|
Thiessen polygons . | GA – 8 points . | GA – 10 points . | GA – 12 points . | |
DUJP | 0.12 | 0.369 | 0.004 | 0.125 |
MFFP | 0.31 | 0.368 | 0.001 | 0.159 |
PNXP | 0.57 | 0.263 | 0.995 | 0.716 |
Calibration and validation of methodology, and sensitivity analysis of the number of utilized vector points
Gauge weight method . | Statistical metrics . | |||
---|---|---|---|---|
RMSE (m3/s) . | Nash–Sutcliffe . | Percent bias (%) . | R2 . | |
Thiessen polygons | 0.00262 | 0.991 | 2.021 | 0.992 |
GA – 8 points | 0.00321 | 0.987 | 0.583 | 0.987 |
GA – 10 points | 0.00253 | 0.992 | 0.850 | 0.993 |
GA – 12 points | 0.00253 | 0.992 | 0.846 | 0.993 |
Gauge weight method . | Statistical metrics . | |||
---|---|---|---|---|
RMSE (m3/s) . | Nash–Sutcliffe . | Percent bias (%) . | R2 . | |
Thiessen polygons | 0.00262 | 0.991 | 2.021 | 0.992 |
GA – 8 points | 0.00321 | 0.987 | 0.583 | 0.987 |
GA – 10 points | 0.00253 | 0.992 | 0.850 | 0.993 |
GA – 12 points | 0.00253 | 0.992 | 0.846 | 0.993 |
The high accuracy of all the implemented methods and the success of the model calibration are also clearly attributed to the goodness-of-fit measures presented in Table 3. When considering the Thiessen polygon method as the reference simulation, it is evident that the corresponding simulated discharges closely match the observed ones, as indicated by the statistical metrics (e.g., R² = 0.992 or Nash–Sutcliffe = 0.991). Similarly, it is observed that the proposed GA methods meet the high standards set by the reference simulation. The hydrological simulations derived from the GA – 10 points and the GA – 12 points methods, especially, are nearly identical, with only a statistically negligible difference in the percent bias coefficient to be apparent. For these two methods, the RMSE is equal to or slightly smaller than that of the reference method, while the Nash–Sutcliffe and R² coefficients are slightly better than those of the Thiessen polygon method (e.g. 0.992 vs. 0.991 and 0.993 vs. 0.992, respectively). Finally, the GA – 8 points method also provides excellent results, with all corresponding statistical measures reaching their best performance. For instance, R2 and Nash–Sutcliffe are almost equal to unity, percent bias is less than one and the RMSE is very close to zero.
The inability of the GA – 8 points method to accurately simulate the observed runoff is also evident in the utilized statistical measures, as shown in Table 4. While the coefficient of determination (R2) is relatively high, at 0.710, the other measures are quite small. For instance, the NSE is only 0.190, indicating a rather poor performance of this particular point vector combination. On the other hand, the GA – 10 points and GA – 12 points methods not only yield relatively high R2 coefficients (0.839 and 0.832, respectively) but also demonstrate very good NSE coefficients (0.799 and 0.722, respectively). These measures surpass those of the Thiessen polygon method, which serves as the reference method. For example, the NSE corresponding to the Thiessen polygon method is only 0.565.
Gauge weight method . | Statistical metrics . | |||
---|---|---|---|---|
RMSE (m3/s) . | Nash–Sutcliffe . | Percent bias (%) . | R2 . | |
Thiessen polygons | 0.018 | 0.565 | −30.152 | 0.822 |
GA – 8 points | 0.026 | 0.190 | −42.143 | 0.710 |
GA – 10 points | 0.013 | 0.799 | −6.113 | 0.839 |
GA – 12 points | 0.015 | 0.722 | −19.807 | 0.832 |
Gauge weight method . | Statistical metrics . | |||
---|---|---|---|---|
RMSE (m3/s) . | Nash–Sutcliffe . | Percent bias (%) . | R2 . | |
Thiessen polygons | 0.018 | 0.565 | −30.152 | 0.822 |
GA – 8 points | 0.026 | 0.190 | −42.143 | 0.710 |
GA – 10 points | 0.013 | 0.799 | −6.113 | 0.839 |
GA – 12 points | 0.015 | 0.722 | −19.807 | 0.832 |
In terms of computational resources, all simulations were conducted on a workstation with an i7 processor clocked at 2.8 GHz and 16 GB of memory. The required time for both the GA to create distinct polygons adhering to the generation rules and overcoming penalties, as well as for executing the hydrologic simulations and generating results matrices with corresponding statistical measures, varied across methods (between 12 and 14 h per method).
Sensitivity analysis on vector points and model's performance
In regard to the calibration period, it is observed that all the GA – XX points methods, with XX ranging from 5 to 13, achieve outputs of high accuracy, as both the R2 and NSE consistently exceeding 0.986 and 0.984, respectively (Figure 8(a)). It can be noted that after the GA – 9 points method, the outputs are stabilized around the same high values, with the fluctuations observed from the GA – 5 points method to the GA – 8 points method being negligible, as they pertain to the third decimal place of the figures. Similar results of high accuracy are also obtained for the validation period, Figure 8(b), where the R2 consistently exceeds 0.7, except for the GA – 6 points method, and is getting stable to values greater than 0.8 when the number of utilized vector points increases. NSE values also demonstrate the relatively good correlation (>0.6) of observed and simulated discharges when 9 or more vector points are used to force the simulation. Finally, both during the calibration period, Figure 8(a), and the validation period, Figure 8(b), the RMSE, which served as the objective function of the GA, was approximately 0.0027 and 0.019 m3/s, respectively, demonstrating the successful minimization of the objective function. The weights generated for each utilized vector points method, i.e. GA – XX point method, along with the resulting simulation outputs are given in the Supplementary Appendix.
In terms of computational resources, all simulations were conducted on a workstation with an i7 processor clocked at 2.8 GHz and 16 GB of memory. The required time for both the GA to successfully create distinct polygons adhering to the generation rules and overcoming penalties, as well as for executing the hydrologic simulations and generating results matrices with corresponding statistical measures, varied across methods (between 12 and 14 h per method).
DISCUSSION
Geometric deterministic approaches for analyzing the spatial distribution of rainfall data, which are used to trigger rainfall-runoff models, are commonly employed in hydrological simulation processes. These methods often involve gauge weight techniques like Thiessen polygons, which are routinely applied by various scholars, as exemplified in the introduction. The literature also demonstrates that GAs can be utilized to optimize techniques and numerical models dedicated to rainfall distribution (e.g. Bărbulescu et al. 2021), although their applicability remains somewhat limited. In the current research, GAs are used to determine rainfall spatial distribution geometries that optimize the performance of the HEC-HMS hydrological model by comparing the simulated runoffs with the observed ones. The optimization is based on runoff information rather than the a-prior rainfall's distribution optimization, which is typically conducted by selecting a method that responds better to independent rainfall gauges (Cheng et al. 2017; Lazoglou et al. 2019). The proposed methodology, along with the produced outputs, is considered the innovation of the research. The coupling of GAs with interpolation techniques for rainfall allocation and hydrological optimization is a contemporary theme that has been limitedly explored. Additionally, the produced outputs highlight that there are multiple distinct areal combinations that yield results the same as or even better than those obtained from the Thiessen polygon method.
The selection of the specific pair ‘case study area’ (namely the Mahoning Creek watershed) and ‘hydrological model’ (namely the HEC-HMS) is an intentionally strategic choice for the research. Firstly, HEC-HMS is a highly reputed model in the field of hydrological simulations, supported by a robust community of scholars and with numerous easy-to-access applications at various scales and places as detailed in the Materials and Methods section (e.g. Frysali et al. 2023). What's important to note is that HEC-HMS is a freeware model, available for download and use by anyone. This ensures the applicability and transferability of the proposed methodology to any case study since free access to the specific software is guaranteed. Moreover, while HEC-HMS is not an open-source model, meaning users cannot access the core simulation processes, all the input parameters are stored in .ascii files in specific repositories, which can be easily edited with simple notepad editors to store the custom user's demands. This allows the uninterrupted implementation of the proposed methodological automated sequence, which begins with the gauge weight generation through GAs, followed by the adjustment of the batch files responsible for storing the gauge parameters for each simulation run and scenario, and finalized with the automated export and analysis of the statistical measures of each simulation, all of which are stored in separate .ascii files. Secondly, the case basin corresponds to the pilot basin used in the online-accessible HEC-HMS tutorial. This choice ensures that all figures, such as precipitation records, and/or infiltration coefficients, and/or baseflow coefficients, have been rigorously examined for their consistency and accuracy. The possibility, thus, of inserting uncertainties that could bias the proposed methodology is minimal.
Regarding the obtained results, it is observed that during the calibration period, the objective function of generating new polygonal shapes, when used with the HEC-HMS model, produces smaller hydrological RMSE values than those obtained using Thiessen polygons. It is worth mentioning that in the presentation of the results (see Table 3), five decimal places are used to represent the achieved RMSE values. This precision is necessary because the RMSE value produced by the Thiessen method is exceptionally small, reflecting the excellent outputs of the reference simulation. Particularly, in the GA – 10 points and the GA – 12 points methods, the RMSE equals 0.00253 m3/s, a figure that is slightly larger than the Thiessen polygons one and which is the aim of the objective function. Contrarily, in the case of the GA – 8 points method, the RMSE is negligibly smaller than the Thiessen polygons one (0.00321 vs. 0.00262 m3/s, respectively). During the validation period, the outputs have similar behavior to those of the calibration period, i.e. the GA – 10 points and GA – 12 points methods outperform the Thiessen polygon method in terms of RMSE. Additionally, while all three methods produce comparable R2 coefficients (see Table 3), the GA – 10 points and GA – 12 points methods demonstrate much significantly higher NSE coefficients (0.799 and 0.722, respectively), compared to the Thiessen polygon coefficient of 0.565. Supplementary Appendix Table A1 presents the optimum weights that were produced by each GA – XX method, where XX ranges from 5 to 13, both for the calibration and validation period. Additionally, it includes the produced hydrological outputs expressed in statistical metrics.
What is particularly important to note is that high accuracy outputs can be obtained even when using only one of the case study rainfall stations, instead of all three stations. This is the case with the GA – 10 points method, where the simulation is practically forced by the rainfall data of the PNXP station, as shown in Figure 5. This implies that the arbitrary spatial boundaries set by the Thiessen polygon method, where no other physical characteristics in rainfall distribution apart from the vectorial proximity to a gauge are considered, can be effectively reproduced using different geometric shapes when not all gauge stations are utilized. Similar high accuracy outputs are also obtained when other disproportional gauge weights are used, e.g. the GA – 7 points or the GA – 13 points methods cases, as illustrated in Supplementary Appendix Table A1. Therefore, solutions like those initiated in this research and relies on the use of GAs offer an enhanced alternative to the conventional interpolation techniques used to fill data gaps. This is particularly valuable as very few stations on a global scale maintain continuous measurements (Estévez et al. 2022). Moreover, the proposed methodology mitigates biases introduced by interpolation methods used to estimate missing precipitation, as these methods tend to understate high values and overstate low values (Teegavarapu 2014).
In the research, and regarding the optimization process, various combinations of vector points, which delineate the rainfall-influenced area, give optimal outputs; however, the manuscript emphasizes the use of 8, 10, and 12 points. The sensitivity analysis demonstrated that increasing the number of points gives the algorithm greater flexibility to create complex shapes that can better divide the overall area into sub-areas (polygons), thereby enhancing the algorithm's performance. However, the use of more vector points results in the need for more computational resources and longer simulation times. Particularly, when attempting to start with more than 14 points, it was found that the algorithm required significantly more time to find acceptable solutions (polygons that intersect with exactly one station). This was attributed to the complexity of the shapes created, necessitating an increase in the number of generations to provide the algorithm with more ‘time’ to search for possible solutions. Similarly, radically decreasing the vector points, i.e. less than 5 points, make it also impossible to find an optimal solution. For example, using 4 points and K-NNGs may satisfy the penalties, but it is practically impossible to optimize such a complicated space into the best three areas with the use of only 4 points. The results also validate that the partially less good performance is achieved when fewer points are used, i.e. with the GA – 8 points method or fewer as indicated in the sensitivity analysis section. At the same time, by increasing the spatial discretization (e.g. at 10 × 10 m), the outputs demonstrated a negligible increase of the accuracy of the solution, the computation cost, nevertheless, was significantly more impacted.
To sum up, the designation of weights, apart from the classical Thiessen and IDW gauge weight methods, is a topic that has attracted the interest of scholars. For example, the Geographically Weighted Regression (GWR) model (Brunsdon et al. 1996) deals with data having ‘spatial non-stationarity’, i.e. having alterations in relationships between variables from one point to another, such as precipitation, with the specific model to have been adopted by numerous scholars for assessing the spatial variability of precipitation and investigate its relationship with other factors (e.g. Brunsdon et al. 2002; Xu et al. 2015; Kara et al. 2016). Based on the aforementioned model, Chao et al. (2018) proposed the use of topographic variables (elevation, slope, aspect, surface roughness, and distance to the coastline) and a meteorological variable (wind speed) to address the issue of artificial spatial autocorrelation in traditional interpolation methods in order to merge satellite with gauge precipitation data. In the research, the use of GAs has yielded successful out-of-the-box rainfall distributions, aiming to optimize the accuracy between simulated and observed river discharges regardless of how the rainfall is spatially allocated within the basin. It is, nevertheless, important to note that not considering the basin's physical parameters, such as elevation, represents a limitation that requires further investigation. The forthcoming research advancements are planned to address the physiography of the case basin and assess the methodology's applicability to catchments of larger scales and different climatic conditions. Concluding, the research aims to demonstrate that GAs, together with other modern approaches such as AI, can effectively determine accurate rainfall patterns for hydrological simulations, relying solely on the coordinates of the rainfall network, with the proposed methodology offering an engineering solution to address accuracy issues in cases with limited data. AI models such as neural networks are data dependent and should be preferred in cases where large datasets are available. Conversely, GAs utilized in research necessitate the presence of an objective function to evaluate potential solutions. Utilizing HEC-HMS facilitates the construction of an objective function and enables the numerical evaluation of solutions until optimization is achieved.
CONCLUSION
The research introduces a reverse-engineering approach for determining the spatial distribution of rainfall before being applied in rainfall-runoff models. Practically, instead of calibrating and validating a hydrological model solely based on observed streamflow measurements, the implemented approach utilizes GAs to reconfigure the rainfall spatial pattern and to increase the model performance. The effectiveness of these new configurations (geometries) is validated for a different simulation period than the one used for the model's calibration. The validation is conducted using performance indicators, comparing HEC-HMS simulated discharges with observed ones. Additionally, the simulations are compared with those obtained using the Thiessen polygon method which is embedded in the HEC-HMS model.
The results demonstrate that numerous geometries can be generated depending on the number of point vectors that are used to initiate the algorithm. Many of these geometries meet the customized criteria, set within the methodology, and are further employed to drive hydrologic simulations. Furthermore, the study yields highly accurate outputs (with R2 ranging from 0.83 to 0.99 and Nash–Sutcliffe ranging from 0.73 to 0.99, where the high and low limits correspond to the calibration and validation periods, respectively) for both the GA – 10 points and GA – 12 points methods in the case study. In the validation period, the results outperform those obtained using the reference method. It is important to highlight that the generated geometries from the two methods are entirely distinct. This suggests that the simplification of abrupt changes in rainfall allocation at vectorial boundaries, as introduced by the Thiessen polygons, can be better addressed through more sophisticated approaches such as GAs. Concluding, the proposed gauge weight method, which essentially mirrors the principles of the Thiessen polygon method, along with the implemented methodology can be applied in basins that are negligibly monitored (limited number of gauges) and in cases where observation datasets are subject to uncertainties and errors.
AUTHOR CONTRIBUTIONS
C.S.: Supervision, Conceptualization, Investigation, Methodology, Software, Data curation, Validation, Writing – Original draft preparation, Writing – Reviewing and Editing. N.N.: Conceptualization, Investigation, Methodology, Software, Data curation, Writing – Original draft preparation.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.
Gauge weights precipitation method: https://www.hec.usace.army.mil/confluence/hmsdocs/hmsguides/meteorologic-models-for-historical-precipitation/gage-weights-precipitation-method.