Abstract

Data-driven models and conceptual models have been utilized in an attempt to perform rainfall–runoff modelling. The aim of this study is comparing the performance of an artificial neural network (ANN) model, wavelet-based artificial neural network (WANN) model and GR4J lumped daily conceptual model for rainfall–runoff modelling of two rivers in the USA. It was obtained that the performance of the data-driven models (ANN, WANN) is better than the GR4J model especially when streamflow data the preceding day (Qt-1) and streamflow data the preceding two days (Qt-2) are used as input data in the ANN and WANN models for the simulation of low and high flows, in particular. On the other hand, when only precipitation and potential evapotranspiration data are used as input variables, the GR4J model performs better than the data-driven models.

INTRODUCTION

Detecting the rainfall–runoff relationship is very significant in terms of water resources planning. In this context, there have been a lot of studies which aim to reveal the rainfall–runoff relationship by using data-driven or conceptual models for decades (Anctil et al. 2004; Sedki et al. 2009; Demirel et al. 2013). De Vos & Rientjes (2007) carried out multi-objective comparisons between an artificial neural network (ANN) model and HBV conceptual rainfall–runoff model. In this regard, they put forward that the ANN model performs better than the HBV model for one-hour-ahead forecasting, whereas the HBV conceptual model outperforms ANN when the time interval expands. Nayak et al. (2013) used ANN, Nedbør-Afstrømnings-Model (NAM) and wavelet neural network (WNN) models for rainfall–runoff modelling and they found that the WNN model is more successful than conventional ANN and NAM models. In relation to that, they pointed out that the WNN model could be more useful as regards revealing that the nonlinear rainfall–runoff relationship and decomposition process which is carried out by wavelet transform could be accountable for the outperformance. Demirel et al. (2015) analysed the performances of ANN-Ensemble (ANN-E), HBV and GR4J models for low-flow prediction by utilizing ensemble precipitation and evapotranspiration parameters as input data. Accordingly, they stated that ANN-E and HBV are the two useful models with regard to simulating low streamflow. Daliakopoulos & Tsanis (2016) compared the performances of ANN models and conventional conceptual models for high-flow forecasting and they found that ANN models are useful in comparison with conceptual models. Humphrey et al. (2016) compared GR4J and Bayesian artificial neural network (BANN) models and a hybrid model approach based on using the inputs that are the output of the GR4J model in addition to conventional inputs such as rainfall and evapotranspiration. As a result of their study, they maintained that the performance of the hybrid model was more successful than the ANN and GR4J models. Mehr & Demirel (2016) researched on low-flow simulations by using genetic programming, ANN, HBV and GR4J models in Moselle River basin. As a result of their study, they pointed out that genetic programming reveals good performance as compared with ANN, HBV and GR4J models for low-flow prediction. Makwana & Tiwari (2017) compared Soil and Water Assessment Tool and ANN models in order to predict the daily streamflow and they obtained that the ANN model is more useful than the SWAT model for the calibration and validation periods.

In this study, it is aimed to determine how wavelet-based artificial neural network (WANN), ANN and GR4J models exhibit a performance for rainfall–runoff modelling of two rivers in the USA. In this regard, it will be an original study with regard to comparison of a GR4J hydrological conceptual model with a WANN data-driven model, particularly. The comparison of the model performances was carried out by calculating the Nash Sutcliffe Efficiency Coefficient (NSE), Root Mean Square Error (RMSE), and Kling–Gupta Efficiency (KGE).

DATA AND METHODOLOGY

Data

In this study, ANN, WANN and GR4J models were applied for Saline River near Rye, Arkansas and Embarras River at Ste. Marie, Illinois in the USA. The data cover 01/01/1981–30/09/2001 for both rivers. The training period consists of the time period between 01/01/1981 and 24/07/1996, while the test period covers the time slot of 25/07/1996–30/09/2001. These data are part of the MOPEX dataset (NOAA 2018). The statistics about the precipitation and streamflow data (minimum, mean, maximum and standard deviation) are shown in Table 1. Furthermore, the hydrographs belonging to Embarras River and Saline River are indicated in Figure 1(a) and 1(b), respectively.

Table 1

The daily data statistics for precipitation and streamflow in the rivers

Rivers Period P (mm)
 
Q (mm/d)
 
Minimum Mean Maximum Std Minimum Mean Maximum Std 
Embarras River 01.01.1981–30.09.2001 2.81 69.96 6.27 0.006 0.87 18.2 1.48 
Saline River 01.01.1981–30.09.2001 3.63 101.5 8.59 0.002 1.17 30.3 1.92 
Rivers Period P (mm)
 
Q (mm/d)
 
Minimum Mean Maximum Std Minimum Mean Maximum Std 
Embarras River 01.01.1981–30.09.2001 2.81 69.96 6.27 0.006 0.87 18.2 1.48 
Saline River 01.01.1981–30.09.2001 3.63 101.5 8.59 0.002 1.17 30.3 1.92 
Figure 1

Flow regime of (a) Embarras River and (b) Saline River.

Figure 1

Flow regime of (a) Embarras River and (b) Saline River.

Methods

Artificial neural network

The Levenberg–Marquardt (LM) trained feed-forward back-propagation ANN model was preferred for the training of data. The benefits of the Levenberg–Marquardt algorithm have been expressed in some studies (Aqil et al. 2007a, 2007b; Badrzadeh et al. 2013; Tongal & Booij 2018). Aqil et al. (2007b) indicated that the Levenberg–Marquardt algorithm performs better than Bayesian regularization and gradient descent with momentum and adaptive learning rate back-propagation algorithms in various ANN models. Accordingly, the Levenberg–Marquardt algorithm is an efficient training algorithm in terms of its robustness and fast convergence capabilities (Demuth & Beale 1998; Aqil et al. 2007b). Furthermore, the outperformance of the tangent sigmoid function as activation function against other activation functions like logistic sigmoid and linear transfer functions has been illustrated in previous studies (Maier et al. 1998; Zadeh et al. 2010). In this regard, the tangent sigmoid function was chosen as the activation function. In this study, different input combinations were applied for both ANN and WANN models as seen in Table 2. Accordingly, precipitation on that day (Pt), precipitation the preceding day (Pt-1), precipitation the preceding two days (Pt-2), potential evapotranspiration on that day (PEt), the runoff the preceding day (Qt-1) and the runoff the preceding two days (Qt-2) were used for streamflow forecasting. In addition, the number of neurons in the hidden layer was determined as one more than the number of the input. To illustrate, if the number of inputs is two, the number of neurons is appointed as three. When the number of neurons was changed, it was realized that the results did not change remarkably for the different combinations.

Table 2

Input combinations for ANN and WANN models

Models Input combination no. Input combination 
ANN, WANN Pt, PEt 
Pt-1, PEt 
Pt, Pt-1, PEt 
Pt, Pt-1, Qt-1, PEt 
Pt, Pt-1, Qt-1 Qt-2 PEt 
Models Input combination no. Input combination 
ANN, WANN Pt, PEt 
Pt-1, PEt 
Pt, Pt-1, PEt 
Pt, Pt-1, Qt-1, PEt 
Pt, Pt-1, Qt-1 Qt-2 PEt 

Wavelet transformation

Wavelet transformation is a way to perform time–frequency analysis in a time slot (Partal 2017). If wavelets have finite dimensions in the time interval, it is called discrete wavelet transformation (Sahay & Srivastava 2014). In addition, for a continuous time [-∞, -∞] it is called continuous wavelet transformation. For a continuous time domain, wavelet function ψ (τ, s) can be obtained as indicated in Equation (1): 
formula
(1)
In Equation (1), t represents the time, stands for the time step in which the window function is iterated and s for the wavelet scale (Meyer 1993). The continuous wavelet transform of x(t) is as illustrated in Equation (2): 
formula
(2)
In Equation (2), (*) refers to the complex conjugate, and W(τ,s) represents the two-dimensional depiction of wavelet power. Furthermore, discrete wavelet transformation is also demonstrated in Equation (3): 
formula
(3)
In Equation (3), m and n are integers which control the wavelet scale and time, respectively, s0 stands for a specific fixed expansion step greater than 1, and for the location parameter, which must be greater than zero. In addition, is the translation step and it is based on the expansion . The most widespread option for s0 is 2 and for is 1. Discrete wavelet transformation that reveals the strength of 2 logarithmic scaling of the translations is a very influential method in terms of practical aims (Mallat 1989). The discrete wavelet transformation for a discrete time series xi, where xi takes place at discrete time i (i.e., here integer time steps are used), is illustrated in Equation (4): 
formula
(4)

In Equation (4), is the coefficient of the wavelet for the discrete wavelet of scale and location . The Mallat pyramid algorithm was utilized for multiresolution analysis. One should refer to Mallat (1989) for further details about the algorithm. The Daubechies (db2) wavelet as the mother wavelet was selected in this study. The Daubechies wavelet is a member of the orthogonal wavelets defining a discrete wavelet transform and characterized by a maximal number of vanishing moments for some given support. Many of former studies have generally used db2 and db4 wavelets (Nourani et al. 2014). The general choice of db2 in past studies may be due to fact that significant information of the data is successfully expressed by the relatively simpler wavelet function db2, which is a polynomial with two coefficients (Shoaib et al. 2014).

In this study, wavelet analysis was combined with the ANN model. In this regard, precipitation and streamflow data (Pt, Pt-1, Qt-1 Qt-2) were decomposed by using discrete wavelet transformation (Figure 2). The wavelet components which had high correlations with streamflow data were summed for each variable (Pt, Pt-1, Qt-1 Qt-2) separately as illustrated in Figure 2. Then, the decomposed precipitation or streamflow data was used with PEt in different input combinations as illustrated in Table 2 for the WANN model.

Figure 2

Structure of WANN model for input combination 5.

Figure 2

Structure of WANN model for input combination 5.

GR4J model

GR4J is a hydrological conceptual model in which precipitation and evapotranspiration are used as input data for daily rainfall–runoff modelling. Perrin et al. (2003) showed the outperformance of a GR4J daily lumped conceptual model against even more complicated and parametric models (e.g., HBV and Xinanjiang models). In this respect, the GR4J model was preferred in this study because of its superior performance to some conceptual models (Perrin et al. 2003) as well as its widespread utilization in the literature. The depiction of the GR4J model is presented in Figure 3. The GR4J model has four free parameters: X1 (maximum capacity of the production store), X2 (groundwater exchange coefficient), X3 (one-day-ahead maximum capacity of the routing store) and X4 (time base of unit hydrograph) as indicated in Figure 3. One should refer to Perrin et al. (2003) to obtain information about GR4J model structure. GR4J daily rainfall–runoff modelling was carried out by using the AirGR R package (Coron et al. 2017, 2018), which is part of the R software (R Development Core Team 2015).

Figure 3

Representation of GR4J rainfall–runoff model adapted from Perrin et al. (2003).

Figure 3

Representation of GR4J rainfall–runoff model adapted from Perrin et al. (2003).

Evaluation of the model performances

The performances of the ANN, WANN and GR4J models were evaluated according to the error criteria of the NSE, RMSE, and KGE (Gupta et al. 2009) as indicated in Equations (5)–(7), respectively: 
formula
(5)
 
formula
(6)
 
formula
(7)

Qobs,i, Qsim,i and N stand for the observed flow, simulated flow for the ith time and data length, respectively in Equations (5) and (6). represents the mean of the observed values in Equation (6). Furthermore, r stands for the correlation coefficient, α for the ratio of simulated mean flow to observed mean flow and β the proportion of the standard deviation of simulated flow to the standard deviation of observed flow in Equation (7).

RESULTS AND DISCUSSION

First of all, the precipitation and streamflow data were decomposed to ten wavelets and one approximation component by using the Daubechies wavelets. The correlation coefficients between the decomposed Pt, the streamflow the preceding day (Qt-1) and Qt are shown in Figure 4 for the Embarras River basin as an example. The wavelet components of precipitation and streamflow data which provide the highest correlation (for instance, the highest correlation between Qt-1 and Qt was acquired by the sum of the D2, D3, D4, D5, D6, D7, D8, D9, D10, and S10 components of Qt-1) were summed in order to improve the performance of the WANN model. The same procedure was also employed for Saline River. Precipitation and potential evapotranspiration were used in all models as input data. For the ANN and WANN models, additionally Qt-1 and Qt-2 were also utilized in input combinations as seen in Table 2 in order to observe whether a performance improvement occurs or not. In this regard, it is obtained that when Qt-1 and Qt-2 are added to the input combination for the WANN and ANN models, they yield a better performance than the GR4J model in either Embarras River or Saline River. According to Figures 5, 6, and 7, it can be interpreted that the ANN and WANN models are more successful than the GR4J model in terms of predicting the streamflow in Embarras River. It can also be seen that the WANN and ANN models especially are more useful than the GR4J model for the prediction of extreme flow (i.e low or high flow). Similarly, the WANN and ANN models perform better than the GR4J model for input combination 5 in Saline River (Figure 8 and Table 3). Although the WANN model shows good performance compared with the ANN and GR4J models, it should be noted that the performances of the WANN and ANN models are very close to each other for Saline River. This could be related to the high autocorrelation values between Qt and the streamflow values of preceding days in Saline River (the correlation coefficients Qt-1–Qt, Qt-2–Qt are 0.97 and 0.91, respectively). Similar results for the low-flow simulation performance of the GR4J model were also revealed in previous studies (Le Moine 2008; Pushpalatha et al. 2011; Demirel et al. 2013). Demirel et al. (2013) utilized HBV and GR4J conceptual models for low-flow forecasting in Moselle River. They stated that the performance of GR4J is relatively low in comparison with the HBV model because of mainly parameter uncertainty. Le Moine (2008) and Pushpalatha et al. (2011) implemented the improved GR4J model structure with additional parameters (Génie Rural à 5 paramètres Journalier (GR5J) and Génie Rural à 6 paramètres Journalier (GR6J) models, respectively) in order to enhance the flow simulation performance. They indicated the superiority of the GR5J and GR6J models to the GR4J model.

Table 3

Performances of the WANN, ANN and GR4J models for Saline River

Models Performance of the models
 
Correlation (R) between simulated and observed flows NSE RMSE (mm/d) KGE 
WANN 0.9997 0.9995 0.035658 0.9910 
ANN 0.9973 0.9946 0.119182 0.9900 
GR4J 0.9123 0.8103 0.707 0.7447 
Models Performance of the models
 
Correlation (R) between simulated and observed flows NSE RMSE (mm/d) KGE 
WANN 0.9997 0.9995 0.035658 0.9910 
ANN 0.9973 0.9946 0.119182 0.9900 
GR4J 0.9123 0.8103 0.707 0.7447 
Figure 4

Correlations between wavelet components of (a) Pt, and Qt (b) Qt-1 and Qt for Embarras River.

Figure 4

Correlations between wavelet components of (a) Pt, and Qt (b) Qt-1 and Qt for Embarras River.

Figure 5

Relationship between simulated flow for ANN (for input combination 5), WANN (for input combination 5), GR4J models and observed flow in Embarras River.

Figure 5

Relationship between simulated flow for ANN (for input combination 5), WANN (for input combination 5), GR4J models and observed flow in Embarras River.

Figure 6

Scatter diagrams for (a) ANN (for input combination 5), (b) WANN (for input combination 5) and (c) GR4J models in Embarras River.

Figure 6

Scatter diagrams for (a) ANN (for input combination 5), (b) WANN (for input combination 5) and (c) GR4J models in Embarras River.

Figure 7

Performance of the models' (a) NSE, (b) RMSE and (c) KGE in Embarras River.

Figure 7

Performance of the models' (a) NSE, (b) RMSE and (c) KGE in Embarras River.

Figure 8

Relationship between the simulated flow for ANN (for input combination 5), WANN (for input combination 5), GR4J models and observed flow in Saline River.

Figure 8

Relationship between the simulated flow for ANN (for input combination 5), WANN (for input combination 5), GR4J models and observed flow in Saline River.

The RMSE values (Table 4) were calculated for the simulated and observed flows that are over the threshold (50% of the maximum flow in the test period for each river in Figures 5 and 8) in order to evaluate the high-flow simulation performance of the WANN, ANN and GR4J models. In this respect, it is seen that the WANN model yields more accurate results than the ANN and GR4J models. Tian et al. (2013) compared the GR4J, HBV and Xinanjiang models for daily high-flow simulation in Jinhua River, China. They indicated the outperformance of the GR4J model against HBV and Xinanjiang models with regard to high-flow simulation. In relation to that, they explained that this could be related to the less complex structure of the GR4J model than the HBV and Xinanjiang models. However, they also emphasized the importance of study area characteristics for the changeable performance of hydrological models. On the other hand, we found that the performance of the GR4J model is not as good as the data-driven models (i.e., ANN and WANN) for the high-flow simulation (Figures 5 and 8, and Table 4).

Table 4

RMSE values for the high flows predicted by WANN, ANN and GR4J models

Rivers RMSE (mm/d) values for high flows estimated by each model
 
Models
 
WANN ANN GR4J 
Embarras River 0.93 1.33 3.74 
Saline River 0.18 0.56 2.78 
Rivers RMSE (mm/d) values for high flows estimated by each model
 
Models
 
WANN ANN GR4J 
Embarras River 0.93 1.33 3.74 
Saline River 0.18 0.56 2.78 

When P and PE only are used as input data, the GR4J model outperforms the ANN and WANN models in both rivers (Table 5). This could be related to the low correlations between PEt–Qt (−0.12 for Embarras River; −0.26 for Saline River) and Pt–Qt (0.08 for Embarras River; 0.02 for Saline River). It can be understood that the different input combinations affect the performance of the WANN and ANN models significantly. Zadeh et al. (2010) pointed out that the selection of input variable is very significant for accurate simulation. In this context, input variable selection (IVS) algorithms are being presented to increase the efficiency of data-driven models (Galelli et al. 2014). However, this topic will be the focus of future studies for the development of data-driven model performance.

Table 5

Performances of the WANN, ANN and GR4J models when only P and PE (i.e., input combination no. 1 for the ANN and WANN models) used as input data

Rivers Performance of the models
 
Models Correlation (RNSE RMSE (mm/d) KGE 
Embarras River WANN 0.5237 0.25 1.15 0.20 
ANN 0.0377 −0.03 1.35 −0.29 
GR4J 0.8481 0.69 0.74 0.63 
Saline River WANN 0.5294 0.27 1.38 0.32 
ANN 0.3333 0.10 1.54 0.03 
GR4J 0.9123 0.81 0.71 0.74 
Rivers Performance of the models
 
Models Correlation (RNSE RMSE (mm/d) KGE 
Embarras River WANN 0.5237 0.25 1.15 0.20 
ANN 0.0377 −0.03 1.35 −0.29 
GR4J 0.8481 0.69 0.74 0.63 
Saline River WANN 0.5294 0.27 1.38 0.32 
ANN 0.3333 0.10 1.54 0.03 
GR4J 0.9123 0.81 0.71 0.74 

CONCLUSION

The application of different hydrological models (e.g., data-driven or conceptual models) is important to observe their performances for hydrological modelling. This study aims to compare data-driven models (WANN and ANN) and a lumped conceptual model (GR4J) in order to forecast daily streamflow in Embarras and Saline Rivers in the USA. In this study, it was found that the WANN and ANN models yield better than the GR4J conceptual model with regard to forecast daily streamflow when streamflow of preceding days (Qt-1, Qt-2) is included in input combination. Furthermore, it was observed that the performance of the GR4J model is worse than the WANN and ANN models with regard to low- and high-flow simulation. It was also obtained that when only precipitation and evapotranspiration data were used as input variables, the GR4J model shows a better performance than the WANN and ANN models. This reveals that the selection of input data is significant for obtaining more accurate forecasting results by data-driven models. In addition, the performances of WANN and ANN seem very close to each other particularly for Saline River. Probably, this arises from the high autocorrelations between streamflow of the preceding days (Qt-1, Qt-2) and streamflow data (Qt). As can be seen, further studies need to be carried out in order to comprehend the behaviour of hydrological models in different catchments. In this regard, the authors will focus on the improvement of the hydrological models so as to obtain better rainfall–runoff simulation performance in future studies.

REFERENCES

REFERENCES
Anctil
F.
,
Michel
C.
,
Perrin
C.
&
Andréassian
V.
2004
A soil moisture index as an auxiliary ANN input for stream flow forecasting
.
Journal of Hydrology
286
(
1–4
),
155
167
.
Aqil
M.
,
Kita
I.
,
Yano
A.
&
Nishiyama
S.
2007b
Neural networks for real time catchment flow modeling and prediction
.
Water Resources Management
21
(
10
),
1781
1796
.
Coron
L.
,
Thirel
G.
,
Delaigue
O.
,
Perrin
C.
&
Andréassian
V.
2017
The suite of lumped GR hydrological models in an R package
.
Environmental Modelling & Software
94
,
166
171
.
Coron
L.
,
Perrin
C.
,
Delaigue
O.
,
Thirel
G.
&
Michel
C.
2018
airGR: Suite of GR Hydrological Models for Precipitation-Runoff Modelling. R package version 1.0.15.2. https://webgr.irstea.fr/en/airGR/
.
Demirel
M. C.
,
Booij
M. J.
&
Hoekstra
A. Y.
2015
The skill of seasonal ensemble low-flow forecasts in the Moselle River for three different hydrological models
.
Hydrology and Earth System Sciences
19
,
275
291
.
Demuth
H.
&
Beale
M.
1998
Neural Network Toolbox for Use with MATLAB, User's Guide, Version 3
.
The MathWorks, Inc.
,
MA, USA
.
Galelli
S.
,
Humphrey
G. B.
,
Maier
H. R.
,
Castelletti
A.
,
Dandy
G. C.
&
Gibbs
M. S.
2014
An evaluation framework for input variable selection algorithms for environmental data-driven models
.
Environmental Modelling & Software
62
,
33
51
.
Gupta
H. V.
,
Kling
H.
,
Yilmaz
K. K.
&
Martinez
G. F.
2009
Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling
.
Journal of Hydrology
377
(
1–2
),
80
91
.
Le Moine
N.
2008
Le bassin versant de surface vu par le souterrain: une voie d'amélioration des performance et du réalisme des modéles pluie-débit?
PhD thesis
,
Université Pierre et Marie Curie (Paris)
,
Cemagref (Antony), France
.
Mallat
S. G.
1989
A theory for multiresolution signal decomposition: the wavelet representation
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
11
(
7
),
674
693
.
Mehr
A. D.
&
Demirel
M. C.
2016
On the calibration of multigene genetic programming to simulate low flows in the Moselle River
.
Uludağ University Journal of The Faculty of Engineering
21
(
2
),
365
376
.
Meyer
Y.
1993
Wavelets: Algorithms & Applications
.
Society for Industrial and Applied Mathematics
,
Philadelphia, PA, USA
.
National Weather Service, NOAA
2018
.
Nayak
P. C.
,
Venkatesh
B.
,
Krishna
B.
&
Jain
S. K.
2013
Rainfall-runoff modeling using conceptual, data driven, and wavelet based computing approach
.
Journal of Hydrology
493
,
57
67
.
Nourani
V.
,
Baghanam
A. H.
,
Adamowski
J.
&
Kisi
O.
2014
Applications of hybrid wavelet–artificial intelligence models in hydrology: a review
.
Journal of Hydrology
514
,
358
377
.
Perrin
C.
,
Michel
C.
&
Andréassian
V.
2003
Improvement of a parsimonious model for streamflow simulation
.
Journal of Hydrology
279
(
1–4
),
275
289
.
Pushpalatha
R.
,
Perrin
C.
,
Le Moine
N.
,
Mathevet
T.
&
Andréassian
V.
2011
A downward structural sensitivity analysis of hydrological models to improve low-flow simulation
.
Journal of Hydrology
411
,
66
76
.
R Development Core Team
2015
R: A Language and Environment for Statistical Computing
.
R Foundation for Statistical Computing
,
Vienna
,
Austria
. .
Sedki
A.
,
Ouazar
D.
&
El Mazoudi
E.
2009
Evolving neural network using real coded genetic algorithm for daily rainfall–runoff forecasting
.
Expert Systems with Applications
36
(
3
),
4523
4527
.
Shoaib
M.
,
Shamseldin
A. Y.
&
Melville
B. W.
2014
Comparative study of different wavelet based neural network models for rainfall–runoff modeling
.
Journal of Hydrology
515
,
47
58
.
Zadeh
M. R.
,
Amin
S.
,
Khalili
D.
&
Singh
V. P.
2010
Daily outflow prediction by multilayer perceptron with logistic sigmoid and tangent sigmoid activation functions
.
Water Resources Management
24
(
11
),
2673
2688
.