## Abstract

In this work, the capability of STORAGE (STOchastic RAinfall GEnerator) model for generating long and continuous rainfall series for the upper Vistula basin (southern Poland) is tested. Specifically, in the selected area, only parameters of depth–duration–frequency curves are known for sub-daily rainfall heights (which are usually estimated in an indirect way by using Lambor's equations from daily data), while continuous daily series with a sufficient sample size are available. Attention is focused on modelling the sample frequency distributions of daily annual maximum rainfall. The obtained results are promising for further elaborations, concerning the use of STORAGE synthetic continuous rainfall data as input for a continuous rainfall-runoff approach, to be preferred with respect to classical event-based modelling.

## HIGHLIGHTS

STORAGE model as an effective tool in rainfall modelling.

Possibility of modelling rainfall with different characteristics.

Promising results in the upper Vistula basin, paving the way for a continuous approach.

### Graphical Abstract

## INTRODUCTION

Hydro-meteorological data availability plays a crucial role in planning and managing the optimum use of water resources. In recent years, high maintenance costs and the need of trained staff have led to a consistent decrease in the number of observations in hydro-meteorological networks (Turnipseed & Sauer 2010). Hence, the modelling of hydrological or meteorological factors assumes an increasingly important role, for instance in the calculation of design discharges and flood prone areas in scarcely gauged catchments, for which the knowledge of design rainfall is mandatory (Sabzevari 2017). Rainfall temporal distribution, frequency, and duration are the basic information necessary to correctly estimate the runoff amount in such catchments (Wałęga *et al.* 2016; Petroselli *et al.* 2020). This is particularly important when using hydrological and hydrodynamic models, which are sensitive to changes in the input parameter values (Mannina *et al.* 2006; Sabzevari *et al.* 2009).

When sufficiently long observed rainfall information is available, statistical methods based on theoretical distribution functions are commonly used to assess the design rainfall. In particular, concerning annual maximum time series, commonly adopted functions are generalized extreme value (GEV), Log-Normal (LN), Weibull, Gumbel, generalized exponential distribution, generalized logistic distribution (Fitzgerald 2005; Boni *et al.* 2006; Villarini *et al.* 2011; Kaźmierczak & Kotowski 2012, 2015; Młyński *et al.* 2019a, 2019b).

Another solution is taking advantage of radar observations that also consider the spatial variability of the rainfall field, in order to reconstruct the sequences of rainfall characteristics, and then using them to determine the depth–duration–frequency (DDF) curves (Wright *et al.* 2013). Since there is a lack of radar meteorological data availability in some areas of the world, there is often the need to use the more widely available satellite-based products (Siddique-E-Akbor *et al.* 2014). Generally, the satellite and radar products underestimate extreme rainfall in comparison to gauge-based observations (Bharti *et al.* 2016; Chen *et al.* 2020). This is mainly due to the different spatial resolutions, i.e. satellite or radar outputs are average estimates on areal pixels, while raingauge data refer to measures in specific points. Spatial resolution refers to the size of one pixel on the land. Landsat data, for example, has a 30 m resolution which is a medium-resolution image. It could cover bigger area alone, but this resolution level is not enough to distinguish individual objects like houses or cars. Currently, the following resolution levels are provided by commercial satellites: low (over 60 m per pixel), medium (from 30 to 10 m per pixel) and high to very high (from 5 m to 30 cm per pixel). For instance, Derin *et al.* (2019) investigated the ability of six satellite-based products to estimate extreme rainfall in nine mountainous locations with a dense raingauge network, determining a constant underestimation of extreme rainfall values. Consequently, satellite or radar estimates should require downscaling or bias correction methods (Willems *et al.* 2012; Maraun 2016) for hydrological analyses. Other methods using satellite products to indirectly estimate rainfall from soil moisture can be used like SM2RAIN (Brocca *et al.* 2018; Dari *et al.* 2020).

However, the evaluation of the design rainfall as input for the calculation of the design discharge (also named as event-based approach) can be affected by several relevant assumptions: the hyetograph shape, the critical rainfall duration, the pre-event soil moisture condition and the implicit hypothesis, assuming that the design storm and the design hydrograph have the same return period. So, in order to address the major drawbacks of event-based approaches, continuous rainfall-runoff models have been recently developed (Grimaldi *et al.* 2012, 2021; Biondi & De Luca 2015), but they require long and continuous rainfall series at high resolutions (which are not so long or available in many areas of the world) as input.

In this context, synthetic rainfall generators (SRGs) are able to produce long and continuous rainfall series at the desired time resolution. Their calibration is carried out using the usually scarce information derived from data at sub-daily resolution (which should correspond to long high-resolution continuous data, but, in many cases, it only refers to the knowledge of annual maxima and DDF curves) and the more available and longer continuous daily time series.

For instance, Iwański & Kuchar (2003) described a SRG using daily series of meteorological data for a single geographical site with a known climatic profile. Pierzga & Jarosińska (2014) presented a SRG that used annual maximum daily rainfall data that later were disaggregated into hourly values, achieving to generate many synthetic hyetographs. Agilan & Umamahesh (2019) proposed a modified version of a K-nearest neighbour weather generator that incorporates nonstationarity in the extreme rainfall series. Zhao *et al.* (2019) described a daily, spatially distributed, SRG based on a first-order Markov chain model to simulate three types of rainfall, i.e. convective, frontal and tropical storms.

Among the SRGs, the Neyman–Scott Rectangular Pulse (NSRP) (Rodriguez-Iturbe *et al.* 1987; Cowpertwait 1991; Burton *et al.* 2010) is a commonly used model, which is suitable for generating continuous rainfall series at several time scales, from fine (e.g. 1 or 5 min) to coarse ones (e.g. average annual), and capable of preserving the statistical properties of the observed rainfall data (Fowler *et al.* 2000). In a stationary context, De Luca & Galasso (2019) calibrated a NSRP model by using information from extreme value distributions. Their results were promising, because they found acceptable relationships among extreme value distributions and rainfall statistical properties of continuous series, and this constitutes an interesting alternative to calibration when only annual maximum data are available. In a climate change context, De Luca *et al.* (2020a, 2020b) investigated the modelling of climate changes with a NSRP model, while Wasko & Sharma (2017) proposed a novel method for simulating continuous rainfall sequences for future warming climate using one-dimensional NSRP model.

Recently, De Luca & Petroselli (2021) developed a modified NSRP model named as STORAGE (STOchastic RAinfall GEnerator). Its main advantages with respect to a classic NSRP model are:

- 1.
The model calibration is carried out using summary statistics from annual maxima series (from sub-hourly/hourly to daily scales), annual and seasonal cumulative rainfall heights, and annual number of wet days, which are usually longer than continuous observed high-resolution series (mainly adopted for SRGs calibration but typically very short or absent in many locations) and

- 2.
The seasonality is modelled by using series of goniometric functions. This approach makes STORAGE more parsimonious with respect to the use of monthly or seasonal sets for parameters.

In the present work, we investigated the performances of the STORAGE model for the raingauge network of the upper Vistula basin, which is located in Poland. In the selected study area, only parameters of DDF curves are known for sub-daily rainfall heights, which are usually estimated in an indirect way by using Lambor's equations from daily data (Młyński *et al.* 2020), while continuous daily series with a sufficient sample size are available. In particular, the goal of the present work is to assess the capability of STORAGE in reconstructing the annual maxima of daily rainfall, which can constitute a starting point for further elaborations, related to the use of STORAGE synthetic high-resolution rainfall data as input for a continuous approach aiming in evaluating the design discharge in scarcely gauged basins.

## STUDY AREA

The study was conducted for 36 rainfall stations that are located in the upper Vistula basin, Poland (Figure 1). This area covers about 25% of the entire Vistula basin and about 15% of the whole Poland. The region is divided into three main physiographic units: Carpathian's, uplands and flats. The water resources of the upper Vistula basin are the highest in the entire Poland. The water cycle in the mountain part is very dynamic, due to steep relief and low permeability of the ground, what causes that floods events develop rapidly here (Pociask-Karteczka 2016). The elevation of rainfall stations ranges from 171 to 1,991 m a.s.l. The used data were continuous daily rainfall series, which were recorded in the period 1989–2018, and the parameters of the DDF curves, which were estimated in an indirect way by using Lambor's equations from daily data (Młyński *et al.* 2020). The data were obtained from the public database of the Institute of Meteorology and Water Management, National Research Institute in Warsaw, Poland. Table A1 in Supplementary Material, Appendix 1 presents names, coordinates (ETRS89/Poland CS92, authority ID: EPSG:2180), elevation of analysed stations, and the mean annual precipitation (MAP), the mean annual number of wet days (MANWD) and the mean values for the seasons DJF (December–January–February), MAM (March–April–May), JJA (June–July–August) and SON (September–October–November). In this work, a wet day is defined when the daily rainfall height is at least equal to 1 mm.

## METHODOLOGY

For each investigated raingauge, the study involved the following steps: (1) extracting the correspondent observed series of daily annual maximum rainfall (AMR, also indicated with *P _{max}* in the following) and determining its statistical properties; (2) analysing the trends in the observed

*P*time series, in order to justify the use of STORAGE model in a stationary context and (3) generating the

_{max}*P*using the STORAGE model and analysing the uncertainty due to the sample size.

_{max}Then, focusing on some specific raingauges as examples, STORAGE performances (in terms of comparison among observed and generated data for annual maxima, cumulative annual and seasonal rainfall data, plus annual number of wet days) are shown and discussed.

### Statistical analysis

The statistical analysis of *P _{max}* observed time series regarded the estimation of minimum (

*min*), mean, maximum (

*max*), standard deviation (

*s*) and coefficient of variation (

*C*) of the sample, together with the modelling of these series with the LN distribution (Kottegoda & Rosso 2008), which is commonly adopted for annual maxima data in Poland. However, Extreme Value Type 1 (EV1) distribution (Kottegoda & Rosso 2008) is also suitable for

_{v}*P*time series in Poland, as shown in Section 4 for some raingauge data series.

_{max}### Trend analysis

*P*observed time series was conducted using the modified Mann–Kendall (MK) test. The test considers the relative magnitudes of sample data rather than the data values themselves, and it does not require the data to follow any specific probability distribution. The null hypothesis H

_{max}_{0}of test indicates a lack of monotonic trend for the time series. The alternative hypothesis H

_{1}claims its existence. The

*S*statistic of the modified Mann–Kendall test is expressed as (Wałęga

*et al.*2016; Ali

*et al.*2019; Agbo

*et al.*2021) follows:where

*n*is the number of elements of the time series, is the observation at time

*j*and is the observation at time

*k*.

Positive values of *Z* statistic indicate increasing trends, while negative values of *Z* show decreasing trends. Moreover, once chosen a specific significance level and calculated the theoretical quantile , the null hypothesis of no trend is rejected when . For the frequently adopted value of *α* = 0.05, the null hypothesis of no trend is rejected when .

*S*, the following adjustment for variance corrections was included only for data where the significant autocorrelation was detected (Sridhar & Raviraj 2017; Młyński

*et al.*2019a, 2019b):where is a correction factor, which is calculated as follows:where

*n*is the number of elements of the time series, is the effective number of observations to account for the autocorrelation in the data and is the autocorrelation function for the ranks of the observations.

It should be remarked that a more comprehensive trend investigation should regard fine and coarse time scales and not only the extremes, but if high-resolution data are scarce or absent, then trend/no trend hypothesis for specific fine time scales can only be assumed on the basis of the obtained MK test results for daily data. Moreover, since the present study aims at evaluating the design discharge, the extreme analysis is sufficient for investigation purposes.

### Generation of daily AMR using the NSRP model and analysis of uncertainty due to the sample size

Authors adopted a modified version of the NSRP (Rodriguez-Iturbe *et al.* 1987; Cowpertwait 1991; Burton *et al.* 2010) model, named STORAGE – STOchastic RAinfall GEnerator (De Luca *et al.* 2020a, 2020b; De Luca & Petroselli 2021). The NSRP model usually presents a set of parameters, which represent mean values for (1) inter-arrival time between the origins of two consecutive storms; (2) the number of rain cells, also indicated as bursts or pulses, inside a specific storm; (3) waiting time between a specific burst origin and the origin of the associated storm; (4) intensity and (5) duration of the bursts. Such parameters are estimated by minimizing an objective function (OF), which is defined as the sum of residuals between the considered summary statistics of the observed data at chosen time resolutions and their theoretical expressions. The summary statistics are typically referred to high-resolution continuous time series, which unfortunately are very short or absent in many locations. Moreover, the main limitation of the basic version of NSRP is that the model fitting to a particular rainfall time series involves highly nonlinear equations, regarding analytical and sample moments, and then the parameters estimation needs nontrivial solutions. Even when the nonlinear sophisticated methods are used, the fitting process is a trial-and-error process consisting in trying different initial parameter values. The process is finished when physically reasonable parameter final values are found that provide a good fit (Morrissey 2009). For this reason, STORAGE was developed, being characterized by two main innovations as respect to the basic versions of the NSRP model. First, the parametric calibration is carried out using data series (e.g. annual maxima rainfall, annual and monthly cumulative rainfall and annual number of wet days), which are usually longer than observed high-resolution series. Second, the seasonality is modelled using series of goniometric functions. This approach makes STORAGE strongly parsimonious as respect to basic NSRP models.

In the present work, STORAGE was applied, for all the considered raingauge stations, by employing as input data, the continuous rainfall daily data, recorded in the period 1989–2018, and the parameters of the DDF curves. MAP, MANWD and mean values of seasonal rainfall (DJF, MAM, JJA and SON) were calculated from daily data for each raingauge (see Table A1 in Supplementary Material, Appendix): they constitute (together with parameters of DDF curves) the input information for STORAGE calibration (De Luca & Petroselli 2021).

Then, a 500-year continuous rainfall series with a resolution of 60 min was generated for each raingauge. It must be highlighted that both stationary and non-stationary time series can be generated with STORAGE, depending on the observed/predicted trend. In this work, the stationary version was adopted based on results from the modified Mann-Kendall test (MMK) test. For further details related to the STORAGE application, see De Luca & Petroselli (2021).

*PBIAS*(Kim & Kim 2016; Młyński 2020):where

*P*is the daily design rainfall, for the assumed return period, which is calculated from observed time series, and is the daily design rainfall, for the assumed return period, which is calculated from the STORAGE model.

_{T}Both *P _{T}* and are evaluated by using LN distribution. The mean of

*PBIAS*(indicated as m

*PBIAS*in the following) was estimated by considering the obtained values from specific return periods (2, 5, 10, 20, 50 and 100 years).

The performance rating for mean values m*PBIAS* can be expressed as in the following: |m*PBIAS*| < 10 very good; 10 ≤ |m*PBIAS*| < 15 good; 15 ≤ |m*PBIAS*| < 25 satisfactory; |m*PBIAS*| ≥ 25 unsatisfactory (Ajmal *et al.* 2020).

## RESULTS

### Initial analysis of the observed daily AMR

The preliminary analysis included the calculation of summary statistics of the observed time series: minimum (*min*), mean, maximum (*max*), standard deviation (*s*) and coefficient of variation (*C _{v}*). The results are presented in Table A2 in Supplementary Material, Appendix. Analysing the coefficient of variation, it was concluded that 78% of all stations were characterized by medium dynamics of changes in AMR, which was described by

*C*values in range from 0.2 to 0.4. Only for the eight stations: Bielsko-Biała, Brenna, Kielce Krościenko, Maków Podhalański, Międzybrodzie Bielskie, Skroniów, Wisła and Zawoja, the values of

_{v}*C*are greater than 0.4, which means high dynamics of changes in their AMR.

_{v}### Analysis of trend in the observed daily AMR

As previously mentioned, trend investigation was carried out for the observed daily AMR series using the modified Mann–Kendall test.

From the obtained results (Figure 2), it should be emphasized that there was no case where the |*Z*| statistic is higher than 1.96, and then the hypothesis of stationary processes cannot be rejected. Similar results were obtained by Młyński *et al.* (2018), where significant trends in the annual daily maximum rainfall for the upper Vistula basin have not been found. The study conducted by Kubiak-Wójcicka (2020) has shown a slight increase of rainfall for the Vistula basin, mostly in particular seasons. Seasonal trends were analysed also by Pińskwar *et al.* (2019). The analysis of rainfall extremes showed that maximum daily rainfall mostly increased for the summer half-year and that the increases during summer half-year are more numerous than in winter half-year. However, many changes were weak and statistically insignificant. Finally, Niedźwiedź *et al.* (2014) published similar results for southern Poland, which indicated growing but not significant trends in daily rainfall.

Concerning *P _{max}* modelling, under the hypothesis of stationarity, both EV1 and LN distributions (Kottegoda & Rosso 2008) usually provide comparable performances. As examples, the EV1 and LN fittings are illustrated in Figure 3 for Kraków-Balice and Solina raingauges using EV1 probabilistic plots. In detail, EV1 distribution is estimated using the maximum-likelihood (ML) method, while the method of moments (MM) is adopted for LN function. The observed time series for Kraków-Balice and Solina raingauges are in-depth analysed in Section 4.4, where examples of the STORAGE application are discussed.

### STORAGE uncertainty analysis due to the sample size

As previously mentioned in Section 3.3, the uncertainty analysis of the STORAGE model consisted in selecting randomly 30-element series from the 500-year synthetic AMR series, for each raingauge station, and comparing with the observed annual maxima. Next, the values of *PBIAS* were determined. The operation was repeated 100 times. The results are presented in Figures 4 and 5 and in Table A3 in Supplementary Material, Appendix.

Based on the obtained results, it was concluded that the STORAGE model generated the daily AMR series usually with good or very good quality. The best results were obtained for Bieruń, Borusowa, Brenna, Kraków-Balice and Międzybrodzie Bialskie rainfall stations, where the PBIAS quality was very good in more than 90 out of 100 simulations. It should be emphasized that in 75% of all analysed rainfall stations, the unsatisfactory quality of model was not observed. Only for the Przemyśl station, there was 50 out of 100 unsatisfactory cases. Generally, analysing the model's quality for the entire upper Vistula basin (Figure 4), it was concluded that only 2% of all simulations were unsatisfactory, 20% was satisfactory, 23% was good and 55% was very good, i.e. 98 and 78% of simulations were at least satisfactory and good, respectively. Figure 6 presents the summary of quality for the station with best (Kraków-Balice) and worst results (Przemyśl). Figure 7 presents the values of PBIAS for the station with best and worst results. Figure 6 shows that that the very good quality of the model was not obtained for the Przemyśl station. The most common PBIAS values were in range from 25 to 30%. Moreover, for the Przemyśl station, the values of PBIAS were only positive. For the Kraków-Balice station, the unsatisfactory quality of model was not observed and the most common PBIAS values were in the range from 0 to 5%. In this case, negative values of PBIAS were also observed.

### Examples of STORAGE generation for two observed time series

As examples of in-depth analyses of STORAGE application, we focused on continuous daily series for Kraków-Balice and Solina raingauges, for which the summary statistics concerning annual and seasonal rainfall are reported in Table 1. These selected time series present good performances in terms of mPBIAS and are located in different areas of the investigated Vistula basin (Figure 1).

Raingauge . | MAP (mm) . | MANWD (–) . | DJF (mm) . | MAM (mm) . | JJA (mm) . | SON (mm) . |
---|---|---|---|---|---|---|

Kraków-Balice | 673.2 | 108.0 | 100.3 | 163.9 | 251.9 | 157.1 |

Solina | 940.3 | 129.6 | 140.1 | 235.4 | 343.9 | 220.9 |

Raingauge . | MAP (mm) . | MANWD (–) . | DJF (mm) . | MAM (mm) . | JJA (mm) . | SON (mm) . |
---|---|---|---|---|---|---|

Kraków-Balice | 673.2 | 108.0 | 100.3 | 163.9 | 251.9 | 157.1 |

Solina | 940.3 | 129.6 | 140.1 | 235.4 | 343.9 | 220.9 |

In detail, from the generated 500-year synthetic series with a resolution of 60 min, we carried out model validation by analysing the reproduction of sample frequency distributions of daily AMR, annual and seasonal rainfall, and annual number of wet days (Figures 8–11).

Concerning annual and seasonal frequency distributions, STORAGE provided for both raingauges (on Gaussian probabilistic plots), a very good fitting for annual rainfall and a slight underestimation for annual number of wet days, while a good/acceptable fitting is obtained for seasonal rainfall heights. Sample daily AMR distributions are fitted in good/acceptable way as well (on EV1 probabilistic plots).

These specific results for Kraków-Balice and Solina raingauges clearly constitute a first promising step for further investigations of rainfall fields in Poland employing the STORAGE model, which will specifically focus on the high resolutions and on the use of STORAGE synthetic rainfall data for a continuous approach aiming at evaluating the design discharge in ungauged or scarcely gauged basins.

## CONCLUSION

This work aimed at modelling, for 36 raingauge stations located in the upper Vistula basin (Poland), the daily annual maximum rainfall (AMR) time series, extracted from synthetic continuous rainfall data which were generated using the STORAGE model. The present analysis constitutes a preliminary step of a more comprehensive study, focused on the use of the synthetic high-resolution rainfall series as input for a continuous rainfall-runoff modelling, aiming at evaluating the design discharge in ungauged or scarcely gauged basins.

For all the investigated raingauge stations, a lack of significant trends for the observed data series was assessed; hence, a stationary version of STORAGE was adopted for the successive steps. Moreover, the uncertainty analysis, due to sample size, for STORAGE output highlighted that the generated values of daily AMR are usually characterized by good or very good quality (in terms of mPBIAS), with respect to the observed data. Good or very good qualities have been confirmed from the in-depth analysis of STORAGE performances for two raingauges, chosen as examples: Kraków-Balice and Solina stations. The obtained reconstruction of sample frequency distributions, in terms of daily AMR, annual and seasonal rainfall heights and annual number of wet days, allows to consider STORAGE as a promising tool for modelling Polish rainfall fields. Further steps of the more comprehensive study of interest (i.e. design discharge evaluation for ungauged or scarcely gauged basins in Poland) are encouraged.

## AUTHORS CONTRIBUTIONS

A.P., D.L.D.L., D.M. and A.W. conceptualized and investigated the study; A.P. and D.L.D.L. did the data curation, formulated the study methodology and provided software and validation; D.M. provided resources; D.L.D.L. and A.W. provided supervision; D.M. did the visualization; writing of the original draft was done by D.M. and A.W., and writing of review & editing was done by A.P. and D.L.D.L.

## CONFLICTS OF INTEREST

The authors declare no conflicts of interest.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

Applied Statistics for Civil and Environmental Engineers. Wiley-Blackwell, pp. 736