Transit data analysis and artificial neural networks (ANNs) have proven to be a useful tool for characterizing and modelling non-linear hydrological processes. In this paper, these methods have been used to characterize and to predict the discharge of Lor River (North Western Spain), 1, 2 and 3 days ahead. Transit data analyses show a coefficient of correlation of 0.53 for a lag between precipitation and discharge of 1 day. On the other hand, temperature and discharge has a negative coefficient of correlation (−0.43) for a delay of 19 days. The ANNs developed provide a good result for the validation period, with R2 between 0.92 and 0.80. Furthermore, these prediction models have been tested with discharge data from a period 16 years later. Results of this testing period also show a good correlation, with R2 between 0.91 and 0.64. Overall, results indicate that ANNs are a good tool to predict river discharge with a small number of input variables.

INTRODUCTION

Nowadays, river systems are subject to high anthropic stresses which are leading to increases in the frequency and severity of droughts and floods. In addition, climate change is likely to modify hydrological cycles, increasing extreme events (IPCC 2012). Due to high human occupation of flood plains, prediction of floods sufficiently in advance is of vital importance to minimize the effects on population. For these reasons, hydrological prediction is an essential tool for an adequate management of water resources from the points of view of social, environmental and economic (Thornton et al. 2007; Marques et al. 2015).

The behaviour of hydrologic systems is complex and basically nonlinear (Sivakumar & Singh 2012). It is directly influenced by different kinds of variables, such as edaphic, geological, geographical and climatic (Post & Jakeman 1996; Soulsby et al. 2006; López-Moreno et al. 2013). Due to the nonlinearity of the behaviour, and the variety of the variables involved, artificial neural networks (ANNs) can be considered as an excellent prediction method.

Transit data analysis studies the behaviour of different variables over time and how they can be correlated even if the output variable responds later to the change of the input variable. It allows determination of the lag time between the two variables, and how statistically significant the correlation is (Sahu et al. 2009). In this way, it is possible to know how long it takes for the discharge to respond to precipitation and evapotranspiration. This lag time is important because it represents the period of time that is necessary to model, to predict the response of discharge to other variables.

The ANNs have been used in many types of applications because they are a very useful tool for characterization (Willis et al. 1991; Mariey et al. 2001; Coppola et al. 2005; El Ouahed et al. 2005; Papadopoulos et al. 2005; Corma et al. 2006), modelling (Thompson & Kramer 1994; Lek & Guégan 1999; Araújo et al. 2005; Smith et al. 2011), or time series forecasting (Zhang et al. 1998, 2000; Bunn 2000; Zhang 2003; Antanasijevic et al. 2013). The traditional approaches, such as Box–Jenkins or autoregressive integrated moving average, assume that the time series are generated from a linear process; however, the real world systems are often nonlinear (Granger & Teräsvirta 1993; Zhang et al. 1998). ANNs are a set of computational methods inspired by the human brain (Sutariya et al. 2013) and the way it works using the fundamental cell of the neural networks (neuron) (Zhang et al. 1998). An ANN has a high number of neurons interconnected with other neurons, is this fact which gives the neural network capability of generalization, that is, the ANN can learn the data presented, and then, it can infer correct information although the request data contain noise (Zhang et al. 1998).

The aims of this study were: (a) to characterize the behaviour of water discharge of an undammed river in relation to temporal distribution of rainfall and temperature, employing time series analysis; and (b) to model discharge 1, 2 and 3 days ahead through the use of ANNs.

MATERIALS AND METHODS

Description of the study river

Lor River is a mountain river with very good water quality, and it has been proposed as a Natural Fluvial Reserve in the last Hydrological Plan (Real Decreto 285/2013) of Confederación Hidrográfica Miño-Sil (the institution responsible for water management of this area). The Lor basin (Figure 1) is situated in an area included in the Natura 2000 Network, which is the centrepiece of European Union nature and biodiversity policy. The Natura 2000 Network is an ecological network of conservation areas of biodiversity that consists of Special Areas of Conservation established under the Habitats Directive and Special Protection Areas for Birds established under the European Birds Directive (Directive 2009/147/CE). The study area is denominated Ancares-Caurel (Directive 92/ 43 /CEE). From the geological point of view, the basin is over metamorphic rocks, mainly quartzite, schists and slates (Vera 2004).
Figure 1

Location map of the Lor River basin.

Figure 1

Location map of the Lor River basin.

The area of the Lor River basin is 372 km2 and the difference in elevation between the highest point (1,203 masl) and the outlet (223 masl) is 980 m; the length of the river is 53 km. The area has a mountainous climate with a Mediterranean tendency at the lowest part of the basin. The mean annual precipitation is 1,560 mm and the annual mean evapotranspiration is 687 mm (Martínez Cortizas & Pérez-Alberti 1999).

The daily Lor River discharge data from October 1984 to September 1994 were divided into two subsets, the first data set with data from October 1984 to December 1992 used as training period of ANNs, and a second data set with data from January 1993 to September 1994 used as validation period. Additionally, data from period 2008 to 2011 were used as the testing period.

The mean discharge over the training period was 11.4 m3s−1, in the validation period was 13.6 m3s−1 and in the testing period was 10.9 m3s−1. For the period 1959–2007, the mean discharge was 13.3 m3s−1 (with a coefficient of variation of 42%). The maximum annual discharge was in 2000 with 25.4 m3s−1 and the minimum discharge was in 2001 with 3.1 m3s−1.

Data base

The procedure for developing the different ANN models is described in Figure 2 (left). When all models have been developed (data mining), the models are validated with the validation data previously reserved; we calculate different parameters (adjustments) and take a decision about the best model with highest capability to predict the Lor River discharge in the future (model selection). Once we have developed, validated and selected the best neural network models, we proceed to predict the discharge for years 2008–2011 (testing period), Figure 2 (right).
Figure 2

Scheme to develop prediction models (left) and validate the prediction models with modern data using the best models developed in the previous phases (right).

Figure 2

Scheme to develop prediction models (left) and validate the prediction models with modern data using the best models developed in the previous phases (right).

Meteorological and hydrological data were collected from different sources. The discharge data were obtained from Ministerio de Agricultura, Alimentación y Medio Ambiente (Lor Station in Parada, http://sig.magrama.es/geoportal/). Precipitation data were obtained from Conselleria de Medio Ambiente Territorio e Infraestructura (http://www.meteogalicia.es) and temperature data were collected from the network of AEMET (Agencia Estatal de Meterología) meteorological stations (www.datosclima.es). Maximum temperature was considered because it is well correlated with evapotranspiration and it is available in many regions of the world (Enku & Melesse 2014).

Data from 1984 and 1994 were divided into two subsets, one for training (1984–1992) to develop the best model that can predict the discharge, and the second subset (1993–1994) to validate the model. Also, the best model selected (in validation phase) was used to predict the discharge at Lor River for the period 2008–2011 (testing subset).

Time series analysis

The time series analysis seeks to find the correlation between different variables (cross-correlation) as well as within the variable itself (autocorrelation). These trends or seasonal variation should be accounted for when implementing the model prediction (Sahu et al. 2009). In this study univariate (autocorrelation) and bivariate (cross-correlation) methods are applied to study the Lor River hydrology using the following time series data: Julian day, precipitation, maximum temperature and discharge. Autocorrelation coefficient varies in the interval [−1, 1] and the representation of this parameter versus lag visually indicates the periodicity of the event throughout time. If the perturbation of the variable (e.g. rainfall) has a long effect in time, the slope is gentle, while if the event has a short range in the temporary series, the slope will be more pronounced (Lee & Lee 2000; Sahu et al. 2009). The cross-correlation represents the relationship between the input series and output series (Lee & Lee 2000), in this case precipitation and temperature vs. discharge. In the cross-correlation function, the delay is the time lag between lag 0 and the maximum cross-correlation coefficient value; this lag determines the transfer velocity of the system, for example how long it takes for the river discharge to respond to precipitation (Lee & Lee 2000; Sahu et al. 2009). The computer program PAST was used to obtain the auto- and cross-correlograms (Hammer et al. 2001).

ANNs

The performance of a neural network is based on the summation of the operations in each of the neurons in the system. The information is gathered in the neural network by a vector (Equation (1)), and it is propagated to the first intermediate layer. 
formula
1
All information is processed by the propagation function. The propagation function task is to add all the excitatory signals to reach the neuron and generate a single response to all inputs (Equation (2)). In this equation, Sj is the propagation function for the intermediate neuron j, N corresponds with the number of neurons in the first layer (input layer), wij corresponds with the value of the importance (weight) between the input neuron i and the intermediate neuron j, and finally bj corresponds with the bias associated to the neuron j of the intermediate layer. 
formula
2
The value obtained by the propagation function is used by the activation function to provide an output value for each input entered into the ANN system (Astray et al. 2013). These values are propagated to all the neurons on the following layers, and finally to the last neuron in the network, the output neuron. The value provided by the output neuron (yo) is compared with the experimental value (do), and therefore the error (E) in the prediction is calculated (Equation (3)). 
formula
3
Neural networks can use various activation functions; they can be different even in the intermediate layer and the output layer. The most commonly used activation functions are sigmoidal function (Cybenko 1989), linear function (Baldi & Hornik 1995) or hyperbolic tangent (Sorsa et al. 1991). In this work the sigmoidal function (Equation (4)) was chosen. 
formula
4

Statistical evaluation of ANNs

Once the different neural networks have been developed, it is necessary to determine the error for each of the neural networks for training and validation phases, in order to compare and to choose the best model implemented. To do this, root mean square error (RMSE) was calculated with predicted and observed data, according to Equation (5); additionally the average percentage deviation (APD) (Equation (6)) was estimated. In all of these equations n corresponds with the cases number. 
formula
5
 
formula
6

RESULTS AND DISCUSSION

Time series analysis of precipitation, temperature and discharge

A period of 10 hydrological years with a time step of 1 day was used to explain the relationship between discharge, precipitation and temperature. In Figure 3, we can see the autocorrelation and cross-correlation functions of these three variables. In Figure 3(a), discharge, precipitation, and maximum temperature auto-correlograms show a clear annual oscillation that corresponds with the seasonal oscillations. For maximum temperature, the auto-correlogram shows high regularity. On the other hand, for precipitation and to a minor degree for discharge, the auto-correlograms show small oscillation owing to the frequency of measurement and the variability of precipitation. These oscillations are not specific to a particular time of the year. Figure 3(b) shows a closer look for the first 200 lags to facilitate the assessment of the autocorrelations. These figures show the random character of the small oscillation of precipitation; this fact can be seen in the rapid decrease of autocorrelation coefficient, reaching a value of 0.21 in 5 days. This decrease does not occur so quickly with the other two variables, maximum temperature and discharge, because, as we know, these variables do not present a random function (Angelini 1997). This is demonstrated with the mild slope of both variables, compared with the steep slope of the precipitation. In fact, the temperature reaches a value of 0.40 in 55 days and a null value in 88 days showing the seasonal effect. Similar values have been found by other researchers in similar systems (Angelini 1997). The discharge decreases more smoothly, reaching a value of 0.20 in 54 days and a null value of autocorrelation in 105 days.
Figure 3

(a) Autocorrelation functions for discharge (continuous line), precipitation (dot line) and temperature (dash line), (b) magnified portion of the autocorrelation functions for discharge (continuous line), precipitation (dot line) and temperature (dash line), (c) cross-correlation functions for discharge and temperature (dash line) and cross-correlation function for discharge and precipitation (continuous line), and (d) magnified portion discharge and temperature (dash line) and for discharge and precipitation (continuous line).

Figure 3

(a) Autocorrelation functions for discharge (continuous line), precipitation (dot line) and temperature (dash line), (b) magnified portion of the autocorrelation functions for discharge (continuous line), precipitation (dot line) and temperature (dash line), (c) cross-correlation functions for discharge and temperature (dash line) and cross-correlation function for discharge and precipitation (continuous line), and (d) magnified portion discharge and temperature (dash line) and for discharge and precipitation (continuous line).

To identify the relationship between discharge and the variables maximum temperature and precipitation, cross-correlation functions were calculated. The results obtained with the cross-correlation function show the cross-correlation coefficient vs. lag time between two variables (Figure 3(c) and 3(d)): precipitation–discharge and maximum temperature–discharge. In these models, precipitation and maximum temperature were used as input series, and discharge was used as output series. The cross-correlograms show that the coefficient of correlation for precipitation–discharge is 0.53 for a lag time of 1 day. On the other hand, the cross-correlation between the maximum temperature and discharge has a minimum cross-correlation value of −0.43 with a lag time of 19 days, proving that the effect of the temperature on water discharge takes longer time than that for precipitation.

Development and training of ANNs

The development of ANNs requires a great implementation of networks, using trial and error method, to obtain the best neural network. In this study, over 1,500 neural networks were implemented, with different input variables (Table 1), topologies and training cycles, to identify the neural networks with the best fit for previously untrained cases to predict the discharge to 1 (ANN1), 2 (ANN2) and 3 days (ANN3) ahead. All neural network topologies have an input layer with different numbers of neurons depending on the type of neural network that is implemented (Table 1). Also intermediate layers with different numbers of neurons were constructed. Finally the output layer had a single neuron.

Table 1

Variables used in each of the best types of neural networks implemented. The grey squares represent the variables used in the input layer of each neural network type. The black squares represent extra variables added to predict the discharge to 3 days ahead

  Type (T)
 
Variables 10 
Julian day (Jd)             
Precipitation (P)            
Precipitation 1 day before (P−1              
Precipitation 2 days before (P−2            
Precipitation 3 days before (P−3           
Maximum temperature (Tmax           
Maximum temperature 1 day before (Tmax−1           
Minimum temperature (Tmin           
Discharge (Q)            
Discharge 1 day before (Q−1               
Discharge 2 days before (Q−2              
Discharge 3 days before (Q−3           
  Type (T)
 
Variables 10 
Julian day (Jd)             
Precipitation (P)            
Precipitation 1 day before (P−1              
Precipitation 2 days before (P−2            
Precipitation 3 days before (P−3           
Maximum temperature (Tmax           
Maximum temperature 1 day before (Tmax−1           
Minimum temperature (Tmin           
Discharge (Q)            
Discharge 1 day before (Q−1               
Discharge 2 days before (Q−2              
Discharge 3 days before (Q−3           

Table 2 shows the best implemented neural networks for each of the selected types of ANNs for the training phase (October 1984 to December 1992). This table also show the linear fit coefficients (R2) and RMSE for the best neural networks for each type of input variable selection. As we can see, the same prediction topologies for 1, 2 or 3 days ahead were studied. As expected, the fits in developed predictive models are better when the time window is smaller. The types of neural networks that offer better results for the training phase show different selection of input variables for the different prediction days. So, the best ANN1 is type 5 (Topology 6-5-1) with R2 = 0.94 and RMSE = 3.57 m3s−1. For ANN2 the best type is T3 (Topology 8-5-1) and for ANN3 the best type is 1 (9-7-1). These findings are consistent with the transit data analysis, which indicates that the lag time between precipitation and discharge is 1 day. Also the time of concentration (the time needed for water to flow from the farthest point in a watershed to the outlet) was calculated by the Temez expression (Temez 1991) and the result was 14 hours (less than our time step: 1 day).

Table 2

Fits for training and validation phases of best neural networks topology implemented. Type (T) is implemented neural network model depending on the selected input variables, topology (Top.) corresponds to the internal structure of neural network, cycles are the number of training cycles for each neural network, R2 is the square correlation coefficient, and RMSE is the root mean square error (m3s−1)

  Training phase
 
  ANN1 (One day ahead)
 
ANN2 (Two days ahead)
 
ANN3 (Three days ahead)
 
Top. Cycles R2 RMSE Top. Cycles R2 RMSE Top. Cycles R2 RMSE 
7-2-1 8·104 0.922 4.18 7-3-1 8·104 0.809 6.52 9-7-1 1·105 0.761 7.30 
4-2-1 3·105 0.905 4.59 4-3-1 3.5·105 0.790 6.84 4-3-1 2·105 0.716 7.96 
8-2-1 2·105 0.923 4.14 8-5-1 3·105 0.819 6.35 8-7-1 4·105 0.733 7.72 
6-2-1 4·105 0.919 4.25 6-3-1 4·105 0.803 6.62 6-3-1 8·105 0.723 7.86 
6-5-1 8·105 0.943 3.57 6-2-1 8·105 0.789 6.86 6-3-1 2·105 0.722 7.87 
5-2-1 8·105 0.909 4.50 5-3-1 8·105 0.797 6.73 5-3-1 2·105 0.716 7.95 
3-2-1 4·105 0.904 4.63 3-2-1 2·106 0.778 7.04 3-2-1 2·105 0.697 8.22 
5-2-1 8·105 0.918 4.28 5-3-1 4·105 0.803 6.63 5-4-1 2·105 0.721 7.88 
6-2-1 8·105 0.919 4.24 6-2-1 2·105 0.793 6.79 6-4-1 4·105 0.72 7.91 
10 7-2-1 4·105 0.920 4.22 7-4-1 1·105 0.807 6.56 7-4-1 3·105 0.733 7.72 
  Validation phase 
  ANN1 (One day ahead) ANN2 (Two days ahead) ANN3 (Three days ahead) 
T Top. Cycles R2 RMSE Top. Cycles R2 RMSE Top. Cycles R2 RMSE 
7-2-1 8·104 0.894 5.71 7-3-1 8·104 0.804 7.88 9-7-1 1·105 0.769 8.60 
4-2-1 3·105 0.917 5.15 4-3-1 3.5·105 0.835 7.41 4-3-1 2·105 0.803 8.02 
8-2-1 2·105 0.891 5.78 8-5-1 3·105 0.805 7.82 8-7-1 4·105 0.778 8.28 
6-2-1 4·105 0.892 5.78 6-3-1 4·105 0.804 7.92 6-3-1 8·105 0.762 8.72 
6-5-1 8·105 0.902 5.66 6-2-1 8·105 0.805 7.90 6-3-1 2·105 0.756 8.78 
5-2-1 8·105 0.912 5.22 5-3-1 8·105 0.828 7.49 5-3-1 2·105 0.770 8.56 
3-2-1 4·105 0.875 6.30 3-2-1 2·106 0.804 8.04 3-2-1 2·105 0.745 9.18 
5-2-1 8·105 0.890 5.86 5-3-1 4·105 0.804 7.87 5-4-1 2·105 0.678 10.09 
6-2-1 8·105 0.888 5.89 6-2-1 2·105 0.798 8.01 6-4-1 4·105 0.707 9.58 
10 7-2-1 4·105 0.892 5.78 7-4-1 1·105 0.804 7.86 7-4-1 3·105 0.744 9.05 
  Training phase
 
  ANN1 (One day ahead)
 
ANN2 (Two days ahead)
 
ANN3 (Three days ahead)
 
Top. Cycles R2 RMSE Top. Cycles R2 RMSE Top. Cycles R2 RMSE 
7-2-1 8·104 0.922 4.18 7-3-1 8·104 0.809 6.52 9-7-1 1·105 0.761 7.30 
4-2-1 3·105 0.905 4.59 4-3-1 3.5·105 0.790 6.84 4-3-1 2·105 0.716 7.96 
8-2-1 2·105 0.923 4.14 8-5-1 3·105 0.819 6.35 8-7-1 4·105 0.733 7.72 
6-2-1 4·105 0.919 4.25 6-3-1 4·105 0.803 6.62 6-3-1 8·105 0.723 7.86 
6-5-1 8·105 0.943 3.57 6-2-1 8·105 0.789 6.86 6-3-1 2·105 0.722 7.87 
5-2-1 8·105 0.909 4.50 5-3-1 8·105 0.797 6.73 5-3-1 2·105 0.716 7.95 
3-2-1 4·105 0.904 4.63 3-2-1 2·106 0.778 7.04 3-2-1 2·105 0.697 8.22 
5-2-1 8·105 0.918 4.28 5-3-1 4·105 0.803 6.63 5-4-1 2·105 0.721 7.88 
6-2-1 8·105 0.919 4.24 6-2-1 2·105 0.793 6.79 6-4-1 4·105 0.72 7.91 
10 7-2-1 4·105 0.920 4.22 7-4-1 1·105 0.807 6.56 7-4-1 3·105 0.733 7.72 
  Validation phase 
  ANN1 (One day ahead) ANN2 (Two days ahead) ANN3 (Three days ahead) 
T Top. Cycles R2 RMSE Top. Cycles R2 RMSE Top. Cycles R2 RMSE 
7-2-1 8·104 0.894 5.71 7-3-1 8·104 0.804 7.88 9-7-1 1·105 0.769 8.60 
4-2-1 3·105 0.917 5.15 4-3-1 3.5·105 0.835 7.41 4-3-1 2·105 0.803 8.02 
8-2-1 2·105 0.891 5.78 8-5-1 3·105 0.805 7.82 8-7-1 4·105 0.778 8.28 
6-2-1 4·105 0.892 5.78 6-3-1 4·105 0.804 7.92 6-3-1 8·105 0.762 8.72 
6-5-1 8·105 0.902 5.66 6-2-1 8·105 0.805 7.90 6-3-1 2·105 0.756 8.78 
5-2-1 8·105 0.912 5.22 5-3-1 8·105 0.828 7.49 5-3-1 2·105 0.770 8.56 
3-2-1 4·105 0.875 6.30 3-2-1 2·106 0.804 8.04 3-2-1 2·105 0.745 9.18 
5-2-1 8·105 0.890 5.86 5-3-1 4·105 0.804 7.87 5-4-1 2·105 0.678 10.09 
6-2-1 8·105 0.888 5.89 6-2-1 2·105 0.798 8.01 6-4-1 4·105 0.707 9.58 
10 7-2-1 4·105 0.892 5.78 7-4-1 1·105 0.804 7.86 7-4-1 3·105 0.744 9.05 

Validation of ANNs

Once the adjustment of different ANNs for the training phase for each of the types of models implemented was calculated, the neural networks were validated using the period 1993–1994. As we can see in Table 2 the models with better fit for the validation phase are the models of type 2 (Julian day, precipitation, maximum temperature and discharge). Again, the neural network fit to predict the discharge 1 day ahead (R2 = 0.917, RMSE = 5.15 m3s−1) is better than the others, ANN2 (R2 = 0.835, RMSE = 7.41 m3s−1) and ANN3 (R2 = 0.803, RMSE = 8.02 m3s−1) (Table 2). These latter two ANNs have a good prediction power, although lower than ANN1, but show high R2 coefficients, always over 0.80.

Choosing the best neural network should not be based on the fit in the training phase, but rather in the best fit for the validation phase because these cases are really unknown to the neural network implemented, allowing a better idea of how to adjust the neural network to future data. In this sense, and as mentioned above, the best-fitting networks are type two (T2).

Figure 4 shows the predicted discharge of Lor River by neural network model versus the observed discharge. For ANN1, the best implemented network has a topology of 4-2-1, and it has been trained for 300,000 cycles, with an R2 coefficient of 0.917 and a lower RMSE (5.15 m3s−1), which represents an APD of 18.9% (Table 2 and Figure 4(a) and 4(b)). This ANN was trained for 3·105 cycles with a start learning rate of 0.6 (to control the weights variation (Yu et al. 1995) and a momentum of 0.8 (to speed up convergence and maintain generalization power (Istook & Martinez 2002); both were decreased with the training phase of the neural network. The ANN2 for predicting the discharge 2 days ahead has a topology 4-3-1 (Table 2 and Figure 4(c) and 4(d)). This ANN2 presents a good fit (R2 = 0.835 and RMSE = 7.41 m3s−1) that corresponds with an APD of 15.9%. Finally, the ANN3 to predict the discharge 3 days ahead presents a good predictive power for previously unseen cases; these settings are, obviously, worse than ANN1 and ANN2, with R2 of 0.803 and RMSE of 8.02 m3s−1, which corresponds with an APD of 19.3% (Table 2 and Figure 4(e) and 4(f)). Both ANNs, ANN2 and ANN3, present the same topology with a start learning rates of 0.6 and a momentums of 0.8; both errors were decreased with the training phase but the training cycles for each ANN were different, 3.5·105 and 2·105, respectively.
Figure 4

Correlation and temporal distribution between observed and predicted data for 1 day ahead (a)–(b), 2 days ahead (c)–(d) and 3 days ahead (e)–(f) for years 1993 and 1994 (validation period). The line of slope 1 is also shown.

Figure 4

Correlation and temporal distribution between observed and predicted data for 1 day ahead (a)–(b), 2 days ahead (c)–(d) and 3 days ahead (e)–(f) for years 1993 and 1994 (validation period). The line of slope 1 is also shown.

The predictions 2 and 3 days ahead are 44.0 and 55.9% worse than a prediction 1 day ahead. This huge error between fits of ANN2 and ANN3 with respect to the prediction of ANN1 is directly related to the cross-correlation function for precipitation and discharge (1 day); thus, trying to predict the discharge of Lor River for more than 3 days is, as was thought at first, an unnecessary task.

The temporal distribution for the three ANNs is shown in Figure 4(b), 4(d) and 4(f), providing a good fit between observed discharge (grey shading) and predicted (black line) for low discharge. For periods of high flows the models tend to underestimate the discharge and this fact is more pronounced for ANN2 and ANN3.

The importance of input neuron value depends on the weights of each neuron with the neurons in the intermediate layer; the sum of the absolute value of weights determines the importance of input variable to predict the discharge (Table 3).

Table 3

Importance of variables (%) for each ANN developed in this study

  Julian day Precipitation Tmax Discharge 
ANN1 2.5 18.0 4.5 74.9 
ANN2 7.2 16.0 15.7 61.2 
ANN3 17.3 9.1 4.6 69.1 
  Julian day Precipitation Tmax Discharge 
ANN1 2.5 18.0 4.5 74.9 
ANN2 7.2 16.0 15.7 61.2 
ANN3 17.3 9.1 4.6 69.1 

The analysis of weights shows that the dominant variable is the discharge, as expected, for all ANNs. The next variable that controls the discharge of the Lor River is the precipitation; however, we can see that to network 2 days ahead (ANN2) the temperature becomes more important compared with the other two models implemented. This effect can be related to the importance of temperature in evapotranspiration. To confirm that, the daily potential evapotranspiration (PET) was calculated using Hamon's method (Dingman 2008) (Equation (7)), where D corresponds with day length in hours and e*a(Ta) corresponds with the saturation vapour pressure at the mean daily temperature and was calculated by Equation (8) (Dingman 2008). 
formula
7
 
formula
8
Evapotranspiration markedly influences water availability for stream discharge, and temperature is one of the most important factors that determine the magnitude of water loss to atmosphere by evaporation. As we can see in Figure 5, cross-correlations between discharge and temperature and between discharge and evapotranspiration are practically identical.
Figure 5

Cross-correlation functions for discharge and temperature (continuous line), for discharge and evapotranspiration (dash line) and for discharge and precipitation (dot line).

Figure 5

Cross-correlation functions for discharge and temperature (continuous line), for discharge and evapotranspiration (dash line) and for discharge and precipitation (dot line).

Testing of ANNs

Once the top three neural networks had been developed, validated and selected (model 4-2-1 for 1 day prediction ahead, and model 4-3-1 for 2 and 3 days prediction ahead), the river discharge of years 2008 to 2011 (testing period) was predicted.

Figure 6 and Table 4 show that for the testing period, the fit between observed and predicted discharge 1 day ahead (R2 = 0.908) is similar to that obtained for the validation period (R2 = 0.917). Similar results were obtained with the discharge prediction 2 and 3 days ahead, but in these cases the difference is larger, around 12.4% and 21.9% respectively. Nevertheless, the values of RMSE are better for all days ahead, 5.15 m3s−1 vs. 3.74 m3s−1 (27.3% less) for 1 day ahead, 7.41 m3s−1 vs. 6.42 m3s−1 (13.3%) for 2 days ahead, and 8.02 m3s−1 vs. 7.58 m3s−1 (5.5%) for 3 days ahead. This was probably related to the fact that in the validation period an important error was observed in a short period of high flows, resulting in a significant increase of RMSE value. As happened in the validation period, the ANNs underestimate discharge in high flow periods, and this fact is more pronounced in ANN2 and ANN3.
Table 4

Fits for validation and testing period of best neural networks

  ANN1 (1 day ahead)
 
ANN2 (2 days ahead)
 
ANN3 (3 days ahead)
 
Period R2 RMSE R2 RMSE R2 RMSE 
1993–1994 0.917 5.15 0.835 7.41 0.803 8.02 
2008–2011 0.908 3.74 0.731 6.42 0.635 7.58 
  ANN1 (1 day ahead)
 
ANN2 (2 days ahead)
 
ANN3 (3 days ahead)
 
Period R2 RMSE R2 RMSE R2 RMSE 
1993–1994 0.917 5.15 0.835 7.41 0.803 8.02 
2008–2011 0.908 3.74 0.731 6.42 0.635 7.58 
Figure 6

(a) Correlation and temporal distribution between real data and predicted data for 1 day ahead (a)–(b), 2 days ahead (c)–(d) and 3 days ahead (e)–(f) for testing period (years 2008–2011). The line of slope 1 is also shown.

Figure 6

(a) Correlation and temporal distribution between real data and predicted data for 1 day ahead (a)–(b), 2 days ahead (c)–(d) and 3 days ahead (e)–(f) for testing period (years 2008–2011). The line of slope 1 is also shown.

CONCLUSIONS

In this study, the combination of time series analysis and ANNs has provided useful information to analyse the hydrologic behaviour of an undammed river, such as the lag time between precipitation and discharge and the prediction of discharge 1, 2 and 3 days ahead.

The key findings are summarized below.

  • A lag time of 1 day is found between precipitation and discharge of Lor River. Knowledge of the lag of the river discharge is important to determine the optimal time window threshold to predict the discharge.

  • The ANNs implemented in this study have shown a great ability to predict discharges to 1, 2 and 3 days ahead. All models feature high linear correlation, always greater than 0.80.

  • The prediction in the testing period shows a good correlation for 1 day prediction ahead (R2 = 0.91), but for 2 and 3 days ahead the correlation worsened considerably due to the time lag between precipitation and discharge of this river being 1 day, and therefore there is no clear relation between precipitation and discharge beyond this 1 day period.

  • Taking into account the different models implemented and the results obtained we can say that ANNs have proved to be a valid tool to predict the Lor River discharge with lower RMSE for 1 day ahead, but in the case of 2 and 3 days ahead the models implemented present an important deviation, especially in periods of high flows in which stream discharge is underestimated.

  • Long-term predictions using a small number of variables must be considered with caution. Probably, the time in advance of forecasting for water discharge prediction using ANNs is limited by the lag time between rainfall and river discharge.

ACKNOWLEDGEMENTS

G. Astray thanks Xunta de Galicia, Consellería de Cultura, Educación e Ordenación Universitaria, for the Postdoctoral grant (Plan I2C) and CIA Project for financial support to develop this communication. M. A. Iglesias thanks University of Vigo for his predoctoral fellowship which supported this research.

REFERENCES

REFERENCES
Antanasijevic
D. Z.
Pocajt
V. V.
Povrenovic
D. S.
Ristic
M. T.
Peric-Grujic
A. A.
2013
PM10 Emission forecasting using artificial neural networks and genetic algorithm input variable optimization
.
Science of the Total Environment
443
,
511
519
.
Araújo
M. B.
Pearson
R. G.
Thuiller
W.
Erhard
M.
2005
Validation of species-climate impact models under climate change
.
Global Change Biology
11
(
9
),
1504
1513
.
Astray
G.
Iglesias-Otero
M. A.
Moldes
O. A.
Mejuto
J. C.
2013
Predicting critical micelle concentration values of non-ionic surfactants by using artificial neural networks
.
Tenside Surfactants Detergents
50
(
2
),
118
124
.
Baldi
P. F.
Hornik
K.
1995
Learning in linear neural networks: a survey
.
IEEE Transactions on Neural Networks
6
(
4
),
837
858
.
Bunn
D. W.
2000
Forecasting loads and prices in competitive power markets
.
Proceedings of the IEEE
88
(
2
),
163
169
.
Coppola
E. A.
Jr.
Rana
A. J.
Poulton
M. M.
Szidarovszky
F.
Uhl
V. W.
2005
A neural network model for predicting aquifer water level elevations
.
Ground Water
43
(
2
),
231
241
.
Corma
A.
Moliner
M.
Serra
J. M.
Serna
P.
Díaz-Cabañas
M. J.
Baumes
L. A.
2006
A new mapping/exploration approach for HT synthesis of zeolites
.
Chemistry of Materials
18
(
14
),
3287
3296
.
Cybenko
G.
1989
Approximation by superpositions of a sigmoidal function
.
Mathematics of Control, Signals, and Systems
2
(
4
),
303
314
.
Dingman
S. L.
2008
Physical Hydrology
,
2nd edn
.
Prentice Hall/Waveland Press, Upper Saddle River
,
NJ, USA
.
El Ouahed
A. K.
Tiab
D.
Mazouzi
A.
2005
Application of artificial intelligence to characterize naturally fractured zones in Hassi Messaoud Oil Field, Algeria
.
Journal of Petroleum Science and Engineering
49
(
3–4
),
122
141
.
Enku
T.
Melesse
A. M.
2014
A simple temperature method for the estimation of evapotranspiration
.
Hydrological Processes
28
(
6
),
2945
2960
.
Granger
C. W. J.
Teräsvirta
T.
1993
Modelling Nonlinear Economic Relationships (Advanced Texts in Econometrics)
.
Oxford University Press
,
Oxford, UK
.
Hammer
Ø.
Harper
D. A. T.
Ryan
P. D.
2001
Past: paleontological statistics software package for education and data analysis
.
Palaeontologia Electronica
4
(
1
),
XIX
XXX
.
IPCC
2012
Managing the risks of extreme events and disasters to advance climate change adaptation
. In:
A Special Report of Working Groups I and II of the Intergovernmental Panel on Climate Change
(
Field
C. B.
Barros
V.
Stocker
T. F.
Qin
D.
Dokken
D. J.
Ebi
K. L.
Mastrandea
M. D.
Mach
K. J.
Plattner
G.-K.
Allen
S. K.
Tignor
M.
Midgley
P. M.
, eds).
Cambridge University Press
,
Cambridge
,
UK
.
Istook
E.
Martinez
T.
2002
Improved backpropagation learning in neural networks with windowed momentum
.
International Journal of Neural Systems
12
(
3–4
),
303
318
.
López-Moreno
J. I.
Vicente-Serrano
S. M.
Zabalza
J.
Beguería
S.
Lorenzo-Lacruz
J.
Azorin-Molina
C.
Morán-Tejeda
E.
2013
Hydrological response to climate variability at different time scales: a study in the Ebro basin
.
Journal of Hydrology
477
,
175
188
.
Marques
R. C.
da Cruz
N. F.
Pires
J.
2015
Measuring the sustainability of urban water services
.
Environmental Science & Policy
54
,
142
151
.
Martínez Cortizas
A.
Pérez-Alberti
A.
1999
Atlas Climático de Galicia (Climate Atlas of Galicia)
.
Santiago de Compostela, Xunta de Galicia
,
Spain
.
Sivakumar
B.
Singh
V. P.
2012
Hydrologic system complexity and nonlinear dynamic concepts for a catchment classification framework
.
Hydrology and Earth System Sciences
16
(
11
),
4119
4131
.
Smith
S. M.
Miller
K. L.
Salimi-Khorshidi
G.
Webster
M.
Beckmann
C. F.
Nichols
T. E.
Ramsey
J. D.
Woolrich
M. W.
2011
Network modelling methods for FMRI
.
Neuroimage
54
(
2
),
875
891
.
Sorsa
T.
Koivo
H. N.
Koivisto
H.
1991
Neural networks in process fault diagnosis
.
IEEE Transactions on Systems, Man and Cybernetics
21
(
4
),
815
825
.
Sutariya
V.
Groshev
A.
Sadana
P.
Bhatia
D.
Pathak
Y.
2013
Artificial neural network in drug delivery and pharmaceutical research
.
Open Bioinformatics Journal
7
(
1
),
49
62
.
Temez
J. R.
1991
Extended and improved rational method. Version of the highways administration of Spain. 24th World Congress of International Association for Hydro-Environment Engineering and Research (IAHR), Madrid, Spain. Volume A, pp. 33–40
.
Thompson
M. L.
Kramer
M. A.
1994
Modeling chemical processes using prior knowledge and neural networks
.
AIChE Journal
40
(
8
),
1328
1338
.
Thornton
G.
Franz
M.
Edwards
D.
Pahlen
G.
Nathanail
P.
2007
The challenge of sustainability: incentives for brownfield regeneration in Europe
.
Environmental Science & Policy
10
(
2
),
116
134
.
Vera
J. A.
2004
Geología de España (Geology of Spain). Sociedad Geológica de España-Instituto Geológico y Minero de España, Madrid, Spain
.
Willis
M. J.
Di Massimo
C.
Montague
G. A.
Tham
M. T.
Morris
A. J.
1991
Artificial neural networks in process engineering
.
IEE Proceedings D: Control Theory and Applications
138
(
3
),
256
266
.
Yu
X.
Chen
G.
Cheng
S.
1995
Dynamic learning rate optimization of the backpropagation algorithm
.
IEEE Transactions in Neural Networks
6
(
3
),
669
677
.
Zhang
G.
Eddy Patuwo
B.
Hu
M. Y.
1998
Forecasting with artificial neural networks: the state of the art
.
International Journal of Forecasting
14
(
1
),
35
62
.
Zhang
G. P.
Patuwo
B. E.
Hu
M. Y.
2000
A simulation study of artificial neural networks for nonlinear time-series forecasting
.
Computers and Operations Research
28
(
4
),
381
396
.