In the present paper we propose a new model of monthly river flow rate as a simple nonlinear function of air temperature and rainfall. Response surface methodology is used to analyze the observed monthly flow rates from 1950 to 1990 for Great Morava River, as the largest domestic river in Serbia. Obtained results indicate significant linear and quadratic effect of both individual factors, while two-factor interactions show significantly smaller influence, indicating occurrence of maximum flow rate for low temperature and high rainfall regime. Statistical reliability of the proposed model is verified by internal and external validation, the latter of which included comparison of predicted and observed values from 1991 to 2012. It is shown that predicted flow rates exhibit a similar statistical pattern as observed ones, with a satisfying value of Nash–Sutcliffe coefficient (NSE = 0.73), although the derived model cannot capture well the highest flow rates. Obtained results further indicate the sequence of residuals represents random time series, which is confirmed by appropriate test statistics and surrogate data testing. The advantage of using the derived model for hydrological simulations in river basins instead of existing ones lies in its explicit mathematical form, making it suitable for quick and reliable estimation and prediction of monthly flow rates.

## INTRODUCTION

Modeling of river flow rates lies in the focus of hydrological research, primarily since these rates are used as a basis for sizing water systems, including river reservoirs and retention ponds, and for optimizing the water management process. In particular, time series of flow rates are used to identify ‘critical’ time periods that jeopardize functional integrity of water supply or irrigation segments. On the other hand, assessment of available hydropower potential is generally based on monthly flow rates. Concerning the aforementioned reasons, formulation of an appropriate model of flow rate would be of great significance for engineering practice.

There are three distinct approaches towards designing an appropriate model of flow rate. Empirical black-box models, based on artificial neural networks (ANN) (Shamseldin 2010; Karunasingha & Jayawardena 2011; Zeng *et al.* 2012; Chen *et al.* 2013), genetic programming (Rodriguez-Vazquez *et al.* 2012), and phase-space reconstruction technique (Sivakumar *et al.* 2002; Ng *et al.* 2007; Khatibi *et al.* 2012) are by far the most popular, since they enable fast computation, without explicitly solving the inter-relations among the input parameters. In other words, they are designed to identify the connection between the input factors and the response (output data), without going into the analysis of the internal structure of the physical process. The main drawback of using these models is that they often do not lead to a better understanding of the river flow process. Nevertheless, these methods usually provide results of satisfying accuracy in the absence of accurate information about the physical mechanisms underlying the river flow at a particular location.

The second approach towards the modeling of river flow rate uses physically based models, which mimic the hydrological processes either by finite difference approximation of the partial differential equation representing the mass, momentum, and energy balance or by empirical equations (Abbott *et al.* 1986; Young & Beven 1994; Spruill *et al.* 2000). Such models include primary components of the hydrologic cycle, such as interception, snowmelt, evapotranspiration, sub-surface runoff, groundwater flow, surface runoff, and channel routing. Moreover, these models are usually defined as distributed, since they take into account spatial variability of vegetation, soil, topography, etc. According to the way that a river basin is discretized, there are semi-distributed models, including TOPMODEL (Beven *et al.* 1995), SWAT (Arnold *et al.* 1998), and TOPNET (Bandaragoda *et al.* 2004) and distributed models, like SHE (Abbott *et al.* 1986), ISBA (Nolihan & Mahfouf 1996), tRIBS (Ivanov *et al.* 2004), and PAWS (Shen & Phanikumar 2010). Nevertheless, although previous research largely focused on these models, they typically require excessive input data and parameters and, therefore, are difficult to apply practically.

The third approach, formally known as conceptual modeling, provides representation of different hydrological processes in a form of simple mathematical expressions. They are typically defined as lumped models, since spatial variability of watershed characteristics are ignored. The main disadvantage of this class of models, which includes STANFORD IV (Crawford & Linsley 1966), SSARR (US Army Corps of Engineers 1991), TANK (Sugawara *et al.* 1984), and ARNO (Todini 1996), is that their parameters are not directly measurable and must be inferred from the observed data.

Within the present paper, we aim to develop a model that could give a clear insight into the mechanism in the background of flow rate, but is based on simple mathematical relations among the influential parameters. In addition, such a model should be both easy to use and provide high prediction accuracy. For this purpose, we apply response surface methodology (RSM), which has already produced favorable results in hydrologic forecasting (Yu *et al.* 2014), as well as in the area of geotechnical analysis (Kostić *et al.* 2015), concrete production (Fang & Perera 2011; Bektas & Bektas 2014), and analytical chemistry (Hibbert 2012; Cela *et al.* 2013). Our approach relies on previously observed values, but at the same time, it includes the effects of some external factors, which are estimated to predetermine the flow rate. Some of the former investigations suggest that amount of rainfall and air temperature could be solely used to determine river flow rate (Zealand *et al.* 1999; Akhtar *et al.* 2009; Brocca *et al.* 2010; Lauri *et al.* 2014).

Concerning the fact that models of river flow rate have been extensively investigated by previous authors using both black-box and purely physical approaches, one may wonder about the merit of the present research. There might be three main reasons for this. First, in order to formulate such a model, we use a RSM (Fisher 1935; Deming & Morgan 1996), which enables satisfying predictive power using a relatively small number of input data. In that way, regions that lack systematic hydrologic observations could develop a solid model for making reliable hydrologic forecasting. In particular, the model that is developed belongs to the class of conceptual models, which are based on relatively simple mathematical relations and, therefore, can be easily applied in engineering practice. In the general case, response surface method is based on the analysis of a small number of observations in which statistically adequate combinations of input factors are suggested. After the most important input factors are selected, a mathematical model is built using nonlinear regression technique. Such a model could be further used for estimation and prediction of flow rate, without the need for the actual measurements *in situ*.

As a second reason, the proposed methodology which is based on flow forecasting solely using rainfall and air temperature could be useful when other parameters describing the watershed properties are missing, due to weak network of monitoring stations, or ambiguous observation data. Moreover, sensitivity analysis performed in the present paper enables relatively easy estimation of the influence of controlling parameters on river flow rate, and it could also be used for assessment of controlling factors other than rainfall and temperature, such as evapotranspiration and different watershed properties (vegetation, soil type, topography, etc.). A similar approach has already been used in Zhan *et al.* (2013). Finally, a model which is formulated using only recordings of temperature and rainfall could be used for river flow forecasting on the basis of different climatic scenarios in the future.

The paper is organized as follows. Applied methods are briefly described in the section below, followed by a section describing the study area and the input data. Derivation of model and testing of its statistical significance is provided in the section after, followed by the effect of input data analyzed. A brief review of the obtained results is given in the final section, together with suggestions for further research.

## APPLIED METHODS

In order to derive a model of river flow rate, we apply RSM. The core idea of RSM lies in performing a small number of well-defined experiments, which enable definition of the quadratic relationship among investigated factors and observed responses. In the case of two-factor influence, the aforementioned relationship can be graphically represented as a response surface, enabling clear interpretation of individual factors’ influence, as well as prediction of response values for different combinations of input factors. Nevertheless, in the present case, input data are recorded *in situ* and, therefore, RSM is applied using ‘historical’ design. In particular, we use observations of the chosen influential parameters, namely, air temperature and rainfall, and the corresponding response, i.e., monthly flow rate, in order to determine the appropriate model, which provides reliable relation among the examined variables. In general, RSM is applied in the following stages:

(a) Choice of the most important input factors that predetermine the flow rate. In the present case, this step is constrained by the available observation data, which are rainfall and air temperature.

(b) Definition of factor intervals. This step is again constrained by the real observed values of the air temperature and rainfall. When the response surface method is applied for experimental design, this step is crucial, since the limit values of input factors must be chosen according to commonly observed values in engineering practice.

(c) Selection of statistical design. This step depends on the assumed complexity of the relationship among the input parameters and the response. In the present case, statistical design is determined by the nature of the values of input factors, which represent real observed parameters. Therefore, the so-called historical design is applied in the present paper.

- (d) Building of a mathematical model using multiple linear regression and least squares method of approximation in the following general form: where
*Y*is the response variable (i.e. monthly flow rate), which could be defined as a linear or nonlinear function of the examined input factors*X*and_{i}*X*(i.e., air temperature and rainfall). Coefficients β_{j}_{0}, β_{i}, and β_{j}denote coefficients for intercept and linear term, respectively, while β_{ij}, β_{ii}, and β_{jj}correspond to coefficients of quadratic terms. Coefficients of the cubic terms are β_{iii}, β_{iij}, β_{ijj}, and β_{jjj}, where (i,j) = (1,2). From the strict mathematical viewpoint, applied method belongs to the class of multiple linear regression approaches, where dependent (flow rate) and independent variables (rainfall and temperature) are a priori known, while coefficient values are chosen in a way that the created model provides the most accurate matching with real observed data. (e) Once the model is created, statistical significance of the obtained interactions is further evaluated. In this, so-called ‘calibration’ stage, statistically insignificant terms are singled out and excluded from further analysis. Standard statistical approach includes the application of analysis of variance (ANOVA) test, assessment of

*R*^{2}, adjusted*R*^{2}and predicted*R*^{2}values, as well as calculation of standard error. In the next phase of analysis, the impact of each input factor is further evaluated by comparing absolute values of coefficients in terms of coded values and their statistical significance estimated according to*p*-value and their standardized effects (Pareto chart). In the final stage, construction of a three-dimensional response surface enables clear interpretation of the effect of input factors on flow rate, including identification of extreme values of the developed function. Such a response surface could serve as a quick estimation of flow rate for specific values of input factors.(f) Validation stage. By obtaining satisfying results after applying all the previous steps of the analysis, we are able only to confirm that the proposed model sufficiently well describes the interaction among the chosen input factors and the response function for a given time period. Nevertheless, in order to confirm its predictive power, the suggested model should be applied in a time period which was not used for its derivation. Verification of the created model is conducted by analyzing statistical characteristics of the observed and predicted hydrological time series. Apparently, the derived model provides sufficiently accurate predictions if predicted flows have similar statistical patterns as observed flows. Statistical characteristics of time series are expressed by cumulative distribution function (CDF), probability density function (PDF), autocorrelation function (ACF), and partial autocorrelation function (PACF). Random properties of the sequence of residuals are confirmed by applying a surrogate data test for the surrogates generated using the Matlab toolkit, MATS (Kugiumtzis & Tsimpiris 2010), while zeroth-order prediction error is calculated according to the solution proposed by Kantz & Schreiber (2004). Independence of residuals is additionally verified by conducting the appropriate test statistic, including Anderson–Darling, Ryan–Joiner and Kolmogorov–Smirnov tests.

## CASE STUDY

^{2}. We chose this basin as a case study because it is the largest river basin in Serbia, and, consequently, with the greatest and the most reliable observation database. Great Morava River basin, at the mouth of the Danube River, yields approximately 6 × 10

^{9}m

^{3}of water per year (Prohaska 2009). It is constituted of two major tributaries, namely, the Western and Southern Morava with specific water yield of 7.15 m

^{3}/s/km

^{2}and 6.08 m

^{3}/s/km

^{2}, respectively, while specific water yield for the whole river basin at h.s. Ljubičevski Most is 6.27 m

^{3}/s/km

^{2}. Hence, distribution of water yield at the Great Morava River basin is non-homogeneous with decreasing gradient from west to south. This heterogeneity is additionally confirmed by different values of runoff coefficients for the Southern and Western Morava (0.26 and 0.28, respectively).

It should be noted that the Great Morava drainage basin has heterogeneous morphological and geological properties, including the various land uses. Approximately 50% of this basin is hilly, with altitudes between 150 and 600 m above sea level, and with 10–30% inclination of slopes. Plains (with altitudes up to 150 m) and mountains (with altitudes higher than 600 m) constitute 30% and 20% of the drainage basin, respectively. Slopes in mountainous areas have inclinations greater than 30%.

From the geological viewpoint, this drainage basin is primarily built of Paleogene and Neogene sediments, including sands, clays, gravel, conglomerates, and marls (approximately 30% of the drainage basin consists of these sediments). Besides these Tertiary sedimentary rocks, the drainage basin of Great Morava also comprises schists of lower crystallinity (20%), flysch sediments and clastic rocks (12%), limestones (11%), and diabase-chert formation (10%). Regarding the land use, approximately 48% of the investigated area is covered by forests, 28% pastures and valleys, while 24% is used for mixed farming. One should note that these data are provided on the basis of the results of small-scale analysis of the investigated drainage area.

In order to develop a proper model, we assume that rainfall and air temperature predetermine the flow rate. One should note that these climatic data were the only reliable numeric information about the investigated watershed, which is the reason why the prediction model is developed solely as a function of rainfall and air temperature. Input data for these two main input parameters are provided by the observations made at 16 meteorological stations (Figure 1(a)). Averaged monthly time series of rainfall and temperature are obtained by Thiessen polygons which define individual areas of influence around each set of meteorological stations. In the present case, a model is formulated for the observations made in the period 1950–1990, while its predictive power is verified for the period 1991–2012. Input data used for model formulation are shown in Figure 1(b). The necessary hydrologic time series were obtained from the Hydrometeorological Service of Serbia (Table 1) (Hydrometeorological Yearbooks 2014).

Station . | Mean annual sums of precipitation (mm) . | σ_{P} (mm)
. | Mean annual values of temperature (°C) . | σ_{T} (°C)
. |
---|---|---|---|---|

m.s. Smederevska P. | 637.2 | 113.5 | 11.5 | 0.740 |

m.s. Zaječar | 607.8 | 120.4 | 10.8 | 0.723 |

m.s. Crni vrh | 789.1 | 142.9 | 8.6 | 2.058 |

m.s. Ćuprija | 656.8 | 120.7 | 11.1 | 0.717 |

m.s. Kragujevac | 628.1 | 114.0 | 11.4 | 0.732 |

m.s. Valjevo | 782.7 | 131.7 | 11.2 | 0.730 |

m.s. Kruševac | 647.0 | 131.7 | 11.2 | 0.752 |

m.s. Kraljevo | 750.6 | 132.2 | 11.3 | 0.676 |

m.s. Zaltibor | 967.2 | 144.9 | 7.5 | 0.745 |

m.s. Niš | 588.0 | 99.1 | 11.8 | 0.743 |

m.s. Novi Pazar | 625.7 | 115.1 | 9.4 | 0.888 |

m.s. Sjenica | 726.1 | 129.2 | 6.5 | 0.707 |

m.s. Dimitrovgrad | 632.8 | 114.3 | 10.0 | 0.729 |

m.s. Leskovac | 619.0 | 111.7 | 11.1 | 0.711 |

m.s. Vranje | 604.8 | 118.3 | 11.1 | 0.651 |

m.s. Kuršumlija | 647.1 | 123.8 | 10.3 | 0.703 |

Station . | Mean annual sums of precipitation (mm) . | σ_{P} (mm)
. | Mean annual values of temperature (°C) . | σ_{T} (°C)
. |
---|---|---|---|---|

m.s. Smederevska P. | 637.2 | 113.5 | 11.5 | 0.740 |

m.s. Zaječar | 607.8 | 120.4 | 10.8 | 0.723 |

m.s. Crni vrh | 789.1 | 142.9 | 8.6 | 2.058 |

m.s. Ćuprija | 656.8 | 120.7 | 11.1 | 0.717 |

m.s. Kragujevac | 628.1 | 114.0 | 11.4 | 0.732 |

m.s. Valjevo | 782.7 | 131.7 | 11.2 | 0.730 |

m.s. Kruševac | 647.0 | 131.7 | 11.2 | 0.752 |

m.s. Kraljevo | 750.6 | 132.2 | 11.3 | 0.676 |

m.s. Zaltibor | 967.2 | 144.9 | 7.5 | 0.745 |

m.s. Niš | 588.0 | 99.1 | 11.8 | 0.743 |

m.s. Novi Pazar | 625.7 | 115.1 | 9.4 | 0.888 |

m.s. Sjenica | 726.1 | 129.2 | 6.5 | 0.707 |

m.s. Dimitrovgrad | 632.8 | 114.3 | 10.0 | 0.729 |

m.s. Leskovac | 619.0 | 111.7 | 11.1 | 0.711 |

m.s. Vranje | 604.8 | 118.3 | 11.1 | 0.651 |

m.s. Kuršumlija | 647.1 | 123.8 | 10.3 | 0.703 |

The mean annual flow and its standard deviation at considered hydrological station (h.s.) Ljubičevski Most are 229.3 m^{3}/s and 73.2 m^{3}/s, respectively.

Preliminary small-scale analysis showed that rainfall regime is considerably heterogeneous at mountainous parts and plains within Great Morava River basin. Apparently, there is a general tendency of decreasing of annual rainfall from western to eastern parts of the investigated basin. As for the air temperatures, mean annual temperatures recorded in this basin linearly decrease with the increase of elevation. In particular, mean annual air temperatures recorded at 300, 1,000, and 1,700 m above sea level vary in the range of 11.4 to 3.3 °C, where the latter temperature is recorded at the highest elevation (Prohaska 2009).

## MODEL DEVELOPMENT

In order to create a reliable model of monthly flow rate, the value of each input factor, cumulative monthly precipitation P and the average monthly temperature T, is coded on three levels (Table 2): the lower level of the interval, as a minimum observed value (denoted as −1 in coded factor values); the higher level of the interval, as a maximum observed value (denoted as +1 in coded factor values); and the central point of the interval, as a middle between maximum and minimum observed data (denoted as 0 level in coded factor values). Coding of real factor values (i.e., their normalization) is a necessary step for comparison of the parameters’ effect on flow rate independently of their units of measurement.

Input parameters . | Lower interval level (coded value) . | Central interval level (coded value) . | Higher interval level (coded value) . |
---|---|---|---|

P (mm) | 1.80 ( − 1) | 79.85 (0) | 157.90 ( + 1) |

T (°C) | −7.00 ( − 1) | 8.9 (0) | 24.80 ( + 1) |

Input parameters . | Lower interval level (coded value) . | Central interval level (coded value) . | Higher interval level (coded value) . |
---|---|---|---|

P (mm) | 1.80 ( − 1) | 79.85 (0) | 157.90 ( + 1) |

T (°C) | −7.00 ( − 1) | 8.9 (0) | 24.80 ( + 1) |

Historical design typically does not contain a satisfying distribution of input factors in examined cases since they represent only recorded values, which are not obtained in controlled laboratory conditions. Therefore, careful detection and elimination of outliers needs to be performed in order to obtain a reliable and meaningful model. In the present study, leverage of each point is calculated in order to estimate whether its position in the design space significantly influences the fit of the model, after which, extreme points are removed. Moreover, outlier t-test for calculating externally standardized residuals is performed to examine whether each point result is consistent with the others, assuming the created model holds and identified outliers are excluded. Finally, influence on fitted value and Cook's distance are estimated for final detection of outliers. According to the described statistical procedure, the model is developed using 394 out of 492 observations.

The model term . | Coefficient for coded factor values . | Coefficient for real factor values . | Sum of squares . | Degrees of freedom . | Variance . | F value . | p-values
. | Standard error . |
---|---|---|---|---|---|---|---|---|

Intercept | 5.84 | 4.856 | 176.89 | 7 | 25.27 | 210.43 | 0.041 | |

P | 0.75 | 0.035 | 10.16 | 1 | 10.16 | 84.58 | <0.0001 | 0.082 |

T | −1.08 | −0.0769 | 43.95 | 1 | 43.95 | 365.96 | <0.0001 | 0.057 |

P × T | 0.97 | 1.137 × 10^{−6} | 3.88 | 1 | 3.88 | 32.30 | <0.0001 | 0.17 |

P^{2} | −0.77 | −2.296 × 10^{−4} | 6.91 | 1 | 6.91 | 57.55 | <0.0001 | 0.10 |

T^{2} | −0.92 | 1.098 × 10^{−3} | 5.64 | 1 | 5.64 | 46.95 | <0.0001 | 0.13 |

P^{2} × T | 1.12 | 1.152 × 10^{−5} | 3.06 | 1 | 3.06 | 25.47 | <0.0001 | 0.22 |

P × T^{2} | −1.17 | −5.931 × 10^{−5} | 2.51 | 1 | 2.51 | 20.94 | <0.0001 | 0.26 |

R^{2} | 0.7924 | |||||||

Adjusted R^{2} | 0.7886 | |||||||

Predicted R^{2} | 0.7848 |

The model term . | Coefficient for coded factor values . | Coefficient for real factor values . | Sum of squares . | Degrees of freedom . | Variance . | F value . | p-values
. | Standard error . |
---|---|---|---|---|---|---|---|---|

Intercept | 5.84 | 4.856 | 176.89 | 7 | 25.27 | 210.43 | 0.041 | |

P | 0.75 | 0.035 | 10.16 | 1 | 10.16 | 84.58 | <0.0001 | 0.082 |

T | −1.08 | −0.0769 | 43.95 | 1 | 43.95 | 365.96 | <0.0001 | 0.057 |

P × T | 0.97 | 1.137 × 10^{−6} | 3.88 | 1 | 3.88 | 32.30 | <0.0001 | 0.17 |

P^{2} | −0.77 | −2.296 × 10^{−4} | 6.91 | 1 | 6.91 | 57.55 | <0.0001 | 0.10 |

T^{2} | −0.92 | 1.098 × 10^{−3} | 5.64 | 1 | 5.64 | 46.95 | <0.0001 | 0.13 |

P^{2} × T | 1.12 | 1.152 × 10^{−5} | 3.06 | 1 | 3.06 | 25.47 | <0.0001 | 0.22 |

P × T^{2} | −1.17 | −5.931 × 10^{−5} | 2.51 | 1 | 2.51 | 20.94 | <0.0001 | 0.26 |

R^{2} | 0.7924 | |||||||

Adjusted R^{2} | 0.7886 | |||||||

Predicted R^{2} | 0.7848 |

*R*

^{2}≈ 0.79 (Figure 2), and adjusted coefficient of determination (

*R*

^{2}≈ 0.79) given in Table 3 indicate that the model describes the analyzed system with satisfying statistical reliability. Moreover, the high value of predicted

*R*

^{2}(0.78), which is a marker of internal validation, suggests that simulation power of the proposed model is sufficiently accurate.

*R*

^{2}indicate sufficient statistical reliability of the created model, it seems that derived Equation (2) fails to mimic high values of monthly flow rate. Indeed, if we take a closer look at Figure 2, relatively large residuals occur for the observed values of flow rate higher than 500 m

^{3}/s. Performance of the derived model in predicting higher magnitude flows could be estimated using the peak percent threshold statistics (PPTS) suggested by Lohani

*et al.*(2014): in which

*k*

_{l=}(l*×*

*N)/100*and

*k*

_{u=}(u*×*

*N)/100*, where

*l*and

*u*represent lower and upper limits in percentage,

*N*is the number of data and

*ξ*is a relative error between observed flows,

*Q*, and estimated flows,

^{o}*Q*, in %:

^{e}In the present case, we assume that the upper limit is *u* = 100%, while the estimation power of the derived model is examined for different values of lower limit in the range 0–95%. One should note that symbol PPTS(l) denotes the peak percent threshold statistics of the top l% data. As is shown in Table 4, PPTS statistics show higher values for higher magnitude flows, indicating higher estimation error.

PPTS (top l% data) . | Estimation performance of derived model (2) . |
---|---|

PPTS(5) | 44.03 |

PPTS(10) | 41.19 |

PPTS(20) | 37.16 |

PPTS(30) | 35.21 |

PPTS(50) | 34.06 |

PPTS(80) | 31.03 |

PPTS(100) | 29.59 |

PPTS (top l% data) . | Estimation performance of derived model (2) . |
---|---|

PPTS(5) | 44.03 |

PPTS(10) | 41.19 |

PPTS(20) | 37.16 |

PPTS(30) | 35.21 |

PPTS(50) | 34.06 |

PPTS(80) | 31.03 |

PPTS(100) | 29.59 |

Apparent inability of the derived model (2) to capture the highest flows is consistent with results of previous research (Perrin *et al.* 2001; Holländer *et al.* 2009), who claimed that simple models, as in derived model (2), cannot capture runoff accurately enough in calibration phase. Such residuals are expected as a consequence of a strong influence of factors other than rainfall and temperature, such as soil saturation, vegetation, etc., whose impact on flow rate may become pronounced, depending on flow rates, rainfall, and temperature in preceding months. For example, in 1953, obviously different values of flow rate were recorded in May and October (195.71 and 50.99, respectively) for approximately the same values of rainfall and air temperature (41.7 and 11.6, in May and 45.4 and 11.8, in October, respectively). This further means that such changes in flow rate cannot always be captured only by considering the influence of rainfall and temperature. Modeling of higher flow rates over the winter and spring months could be significantly improved by utilizing measurements of snow cover due to higher correlation between current monthly flow rate and previous monthly state of the water equivalent of snow cover. However, records of snow cover at the Great Morava drainage basin are limited only to measurements of snow depth without the observations of snow density. In this manner, water equivalent of snow cover cannot be objectively assessed.

*et al.*2001; Holländer

*et al.*2009). In the present case, predictive power of the suggested model (2) is tested for the observations made in the period 1991–2012. Results of the analysis indicate a satisfying level of Nash–Sutcliffe coefficient, NSE ≈ 0.73 (Figure 3). One of the probable reasons for this lies in the fact that validation of the derived model is performed for pronounced dry period, i.e., drought in South-Eastern Europe, which is well recorded in the last decade of the 20th century. This causes lower monthly flow rates in the validation phase enabling higher prediction accuracy of the derived model. As for the influence of other factors on higher prediction accuracy in the validation stage, there are not reliable and systematic data about the appreciable land cover change in the calibration period, or any known water abstractions.

One should note that derived model (2) is still not able to capture high flow rates, which is confirmed by the low correlation between the highest three observed values of flow rate (Figure 3). Explanation for this could be found in the snowmelt effect, as it was already suggested as a reasonable explanation for the results of model calibration.

PPTS statistics for prediction performance of the model in verification stage are given Table 5. It is clear that derived model (2) provides higher prediction performance (lower values of PPTS) for top 5% data in comparison to the performance for other lower limits.

PPTS (top l% data) . | Prediction performance of derived model (2) . |
---|---|

PPTS(5) | 44.98 |

PPTS(10) | 65.15 |

PPTS(20) | 68.92 |

PPTS(30) | 66.64 |

PPTS(50) | 59.41 |

PPTS(80) | 53.94 |

PPTS(100) | 54.53 |

PPTS (top l% data) . | Prediction performance of derived model (2) . |
---|---|

PPTS(5) | 44.98 |

PPTS(10) | 65.15 |

PPTS(20) | 68.92 |

PPTS(30) | 66.64 |

PPTS(50) | 59.41 |

PPTS(80) | 53.94 |

PPTS(100) | 54.53 |

Results of the performed analysis (Figure 4) indicate that predicted monthly time series have similar statistical patterns as the observed monthly flow rates. In particular, derived hydrological model mimics low, medium, and high monthly flows in a sufficiently accurate manner (Figure 4(a) and 4(b)). ACF and PACF suggest similar correlation pattern at different time lag with greater values of ACF ρ and PACF α_{kk} for predicted time series (Figure 4(c) and 4(d)).

*p*-value larger than 0.05 for Anderson–Darling test (Figure 5(a)) and Kolmogorov–Smirnov test (Figure 5(b)). Although the Ryan–Joiner test resulted in

*p*-value smaller than 0.05 (Figure 5(c)), which could arise due to the effect of sample size, visual inspection of quantile-quantile plot implies that deviations from straight line are minimal (

*R*

^{2}≈ 0.97), showing only slight skew for high quantiles (Cs = 0.191), which indicates that medium and lower quantiles closely follow normal distribution (Figure 5(d)).

Random properties of the sequence of residuals are additionally confirmed by surrogate data testing of the starting hypothesis that the observed data are independent random numbers drawn from some fixed but unknown distribution (Perc *et al.* 2008). In order to achieve a significance level of *α**=* 0.95 when confirming or rejecting the null hypothesis, we generate 19 surrogates, after which original data and generated surrogates are compared by calculating the zeroth-order prediction error *γ,* which is defined as a mean squared error based on comparing the average predicted value from the neighboring points and the obtained values of residual in reconstructed phase space of the observed scalar series (Perc *et al.* 2008). If the zeroth-order prediction error for the original recordings (*γ _{0}*) is smaller in comparison to the calculated error for surrogate data (

*γ*), then a null hypothesis can be rejected. Surrogate data analysis is performed for time series embedded into one-dimensional phase space with embedding delay τ = 1 determined by symplectic geometry (Lei

*et al.*2002) and mutual information method (Fraser & Swinney 1986), respectively. Neighbors for prediction were sought within 2% of the examined residuals which are nearest to the reference point (i.e., point for which the prediction is made).

*γ*is within

_{0}*γ*for all the examined cases, preventing us from rejecting the null hypothesis with significance level α = 1 (Figure 6).

## IMPACT OF INPUT DATA

The obtained Pareto chart (Figure 7) indicates linear dependence of flow rate on each input parameter, while the significance of squared terms for P and T suggests also a quadratic relation among Q and these two factors, but with a much smaller impact in comparison to the linear effect (−4.83 and −4.48, respectively). Regarding the impact of individual factors, it is clear that air temperature has stronger influence on flow rate (−19.39), while the effect of rainfall shows almost half of the impact of T (10.57). This is further confirmed by the results of linear correlation analysis, which showed that air temperature has the strongest influence on flow rate with correlation coefficient r_{Q,T=} − 0.532. On the other hand, correlation coefficient r_{Q,P} = +0.214 confirmed lower positive correlation of flow rate and rainfall. One should note that the effect of the two-factor interactions is almost insignificant (1.14).

From a hydrological viewpoint, observed decline of monthly flows in the domain of higher rainfall (Figure 8(a)) could be explained by the fact that such high rainfall values in Great Morava River basin occur in conjunction with negative air temperatures, which transform rainfall into snow form (Figure 8(a)). This is further confirmed by the slight decline of flow rate for low air temperatures (Figure 8(b)). Maximum flow rate is obtained for air temperature slightly above 0 °C, when the snow melt effect starts to significantly affect the amount of flow rate.

## CONCLUSION

In the present paper, we propose a model of Great Morava flow rate, developed and verified for the observation time 1950–2012. The model is developed for the period 1950–1990 using a response surface method, which is the first application of this technique in hydrological analyses. River flow rate in a suggested model is provided as a simple nonlinear function of rainfall and air temperature. Statistical analysis of the created model indicates high statistical reliability for the period 1950–1990, i.e., both *R*^{2} and adjusted *R*^{2} have sufficiently high values (≈0.79 in both cases). One should note that although it seems that the created model fails to explain large values of monthly flow rate, it corresponds well with the real observed values for the same pairs of rainfall and air temperature recorded at different periods. Such inconsistency could arise due to possible occasional strong influence of some other natural factor, like the impact of snowmelt effect on surface runoff generation, or due to the impact of soil saturation, vegetation, etc. Nevertheless, statistical reliability of the created model was additionally verified by testing its predictive power against the unknown data (i.e., data which were not used for model development). In particular, regarding the observations made for the period 1991–2012, the derived model showed high predictive power, with Nash–Sutcliffe coefficient, NSE ≈ 0.73. Moreover, calculated cumulative distribution function, PDF, autocorrelation, and PACF for both the observed and predicted time series indicate their similar statistical characteristics. As well, appropriate normality tests, including Anderson–Darling, Kolmogorov–Smirnov, and Ryan–Joiner (with q-q plot), indicated that the sequence of residuals follows normal distribution, i.e., residuals represent independent random numbers. This is further confirmed by the results of surrogate data testing, which proved that time series of residuals both represent random numbers.

Results of the performed analysis indicate significant linear, quadratic, and cubic impact of input factors on flow rate. Analysis of their individual influence indicates that flow rate rises with the amount of rainfall only up to a certain level, reaching its maximum for relatively high values of rainfall. The opposite behavior of flow rate is observed for the separate impact of air temperature, i.e., flow rate reaches its peak for relatively low air temperature, after which it starts to decrease. Regarding the two-factor interactions, whose impact could statistically be neglected in the present case, obtained results indicate that flow rate increases for high values of rainfall and low values of air temperature, which corresponds well with the observed individual effect of the examined input parameters.

In essence, the proposed model represents an example of nonlinear regression model. We are aware of the fact that modeling of flow time series has been the subject of extensive research in the last few decades (Chow & Kulandaiswamy 1971; Kundzewicz & Napiórkowski 1986). However, in the present paper, we focus on deriving a model for flow prediction based only on data about rainfall and temperature. We consider this to be extremely convenient for engineering practice, since rainfall and air temperature are climate parameters which are commonly measured in each drainage area. Still, we are aware of the fact that the proposed model is limited in the way that it is formulated only for a single drainage basin. Nevertheless, in that case, this paper could be considered as a methodological paper, in the way that it describes the methodology for creating a convenient model for flow prediction based only on rainfall and temperature.

Validation of the created hydrological model suggests that predicted monthly flow rates could be used for hydrological simulations in Great Morava River basin. Moreover, the model can be used for prediction of monthly flow rates using the forecasted meteorological time series. This fact makes it an essential element for decision-making processes in the water industry.

One should note that the main advantage of the developed model over the existing ones lies in its simple and explicit mathematical expression, which enables easy and quick estimation of a flow rate using only rainfall and air temperature as input factors. Although one could criticize such a pragmatic approach, where flow rate is modeled as a function of only two parameters, results of previous research indicated that model performance does not necessarily improve with complexity (Orth *et al.* 2015). This is confirmed by conducting additional analysis using ANN. In particular, ANN multilayer perceptron with Levenberg–Marquardt backpropagation learning algorithm provided lower prediction power in comparison to proposed model (2). However, implementation of some recent methods, like evolutionary polynomial regression (Giustolisi & Savić 2006) could provide models with better prediction performance. Therefore, such methods should be applied in further research in order to improve prediction power of a flow rate model based only on temperature and rainfall data, especially in a high flow domain.

On the other hand, although the derived model provided satisfying prediction accuracy, one should note that its application for river flow forecasting in a watershed with remarkably different characteristics could provide ambiguous results, due to dependence of flow rate solely on climate data. Therefore, the proposed model should be applied only for rivers with small and medium river basins in South-Eastern Europe, which have similar watershed properties, including heterogeneous morphological and geological properties and various land uses. For rivers with different watershed characteristics, one should use the same methodology as in the present paper to develop a model whose results would correspond well with the observations made in a specific area of investigation.

Regarding future research, derived model (2) can be used for long-term prediction of hydrological time series in the 21st century. In such a case, the hydrological model may utilize meteorological time series provided by different scenarios of greenhouse gas emission from regional climate models instead of the actual observed time series. When formulated in such a way, prediction results of the proposed model could be compared to the previously developed models, e.g., models based on stochastic decomposition of time series. Only then would the suggested model have a chance to find its real practical value in forecasting river flow rate.

## ACKNOWLEDGEMENTS

This research was partly supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia (Contract No. 37005).