## Abstract

This research focuses on the statistical analyses of hydrometeorological time series in the basin of Lake Tana, the largest freshwater lake in Ethiopia. We used autocorrelation, cross-correlation, Mann–Kendall, and Tukey multiple mean comparison tests to understand the spatiotemporal variation of the hydrometeorological data in the period from 1960 to 2015. Our results show that mean annual streamflow and the lake water level are varying significantly from decade to decade, whereas the mean annual rainfall variation is not significant. The decadal mean of the lake outflow and the lake water level decreased between the 1990s and 2000s by 11.34 m^{3}/s and 0.35 m, respectively. The autocorrelation for both rainfall and streamflow were significantly different from zero, indicating that the sample data are non-random. Changes in streamflow and lake water level are linked to land use changes. Improvements in agricultural water management could contribute to mitigate the decreasing trends.

## INTRODUCTION

Management and analysis of time series data are integral parts of hydrological and climate studies. Good quality data are required for climate change detection as well as for hydrological studies and can often be taken from observation networks (Stahl *et al.* 2010). Data can be accessed in different formats from different organizations and should be managed properly. There are a number of tools for data management, analysis and interpretation, for example SPSS, R and Matlab which are capable of accessing data from many different sources and a smaller number of systems capable of handling data management, analysis and interpretation (Horsburgh & Reeder 2014). Different kinds of data analysis methods can be chosen for different research objectives. Time series analysis of rainfall and streamflow is crucial as it is a prerequisite for further using the data in, for example, hydrological modelling studies. A time series is defined as a sequence formed by the values of a variable at increasing points in time that may be composed of a random element and a non-random element (Matalas 1967). It is said to be random if the values of the time series are independent of each other, otherwise it is non-random. A non-randomly distributed sequence repeats some of the information contained in previous values. The nature of hydrolometeorological data can be investigated by testing their randomness, trend and association with other variables such as biophysical and socio-economic variables. For instance, the interaction of hydrological variables with land use changes has been studied by Wagner & Waske (2016) and Wagner *et al.* (2016). Persistence of a trend and its magnitude in hydrological time series data were studied by different scholars (e.g. Thomas & Pool 2006; Stahl *et al.* 2010; Hawtree *et al.* 2015; Wagner *et al.* 2018). A number of hydrological studies were carried out in Ethiopia in general, in Lake Tana Basin in particular (Setegn *et al.* 2008; Alemayehu *et al.* 2009, 2010; Dargahi & Setegn 2011; Gebremicael *et al.* 2013; Koch & Cherie 2013; Mehari *et al.* 2014; Dessie *et al.* 2015; Woldesenbet *et al.* 2017). For instance, Woldesenbet *et al.* (2017) studied the impact of land use land cover change on streamflow of Tana and Beles sub-basins in Ethiopia. This research revealed that the average annual water yield, the average annual baseflow and average annual basin percolation decreased gradually, to the contrary the average annual surface runoff increased. These changes are associated with expansion of cultivation land and the shrinkage in woody shrub from 1986 to 2010. Koch & Cherie (2013) studied the impact of future climate change on hydrology and water resources management of the whole Upper Blue Nile Basin. Streamflow records over the time period 1970–2000 of Abay River at Eldiem gauging station close to the Ethio-Sudan border were analysed using Mann–Kendall (MK) and the seasonal MK test and the result showed a significantly increasing trend (Koch & Cherie 2013).

Although many research studies were carried out on various hydrological and environmental issues in the Blue Nile basin, very few of them were focused on long-term trends of hydrometeorological variables at the catchment level (Tekleab *et al.* 2013). Furthermore, there were conflicting results on the trends of hydrometeorological time series in the Blue Nile basin. For instance, Tesemma *et al.* (2010) reported that the mean annual streamflow at the Lake Tana outlet (Abay) was significantly increasing during 1964–2003. On the other hand, Tekleab *et al.* (2013) found a decrease of the mean annual streamflows of Gilgleabay and Ribb sub-catchments (inflows) of the Lake Tana Basin during 1973–2005 and 1973–2003 respectively. This indicates that analyses on one long-term time series alone might not be sufficient to understand hydrometeorological variabilities. Moreover, most of the hydrometeorological variability and trend analysis studies were carried out using MK and Pettitt tests as the only methods of investigations. On top of that, hydrometereological trend analysis studies conducted so far in Ethiopia are not conclusive and some are conducted at macro scale, underlining the need for further research (Asfaw *et al.* 2017). As Ethiopia is strongly dependent on agriculture with a highly variable hydrology, its agricultural yield is frequently affected by droughts and famines. The United Nations Children's Fund (UNICEF) reported that the 2015–2017 drought caused by the El-Nino effect was one of the worst droughts in decades (UNICEF 2016). Consequently, improved understanding of the patterns of historical observed hydrometeorological time series on the local scale using different time spans is crucial for water use and management. Since the study area is highly dynamic with respect to hydrology and most of the previous studies were carried out on the macro-scale, attention should be given to hydrometeorological changes on the local scale. Therefore, this study aims at conducting a thorough analysis of the temporal and spatial variation of long-term rainfall, streamflow and lake water level in the Lake Tana Basins over the period (1960–2015) on decadal, annual, seasonal, and daily time scales. To this end multiple statistical methods such as Tukey multiple mean comparison tests, autocorrelation, cross-correlation (cc) and MK test are used to characterize the decadal and seasonal changes, the dependency of events on the adjacent ones with respect to time, and the response of streamflow to rainfall events and existence of trends. Accordingly, the research questions were as follows:

Are the rainfall/streamflow events related to their preceding ones?

Are the time series data random or non-random?

Are rainfall and streamflow events cross-correlated?

Do decadal, annual and seasonal rainfall, streamflow and lake water levels show significant changes over time?

## MATERIALS AND METHODS

### General overview of the study area

Lake Tana is the largest freshwater lake in Ethiopia and the third largest in the Nile Basin. The catchment area of the lake at its outlet is 15,321 km^{2}. About 20% of the catchment area is covered by Lake Tana (Kebede *et al.* 2006; Alemayehu *et al.* 2010). The catchment is approximately 84 km long, 66 km wide and is located in the country's north-west highlands. Its topography is very diverse with an altitude ranging from 1,322 to 4,111 m above sea level (m.a.s.l.). The lake has a surface area of 3,156 km^{2} and extends between 10.95 °N to 12.78 °N latitude and from 36.89 °E to 38.25 °E longitude at an average altitude of 1,786 (m.a.s.l.) (Tegegne *et al.* 2013). The lake is shallow with a maximum depth of 15 m and is characterized by a steep slope at the borders and a flat bottom (Kebede *et al.* 2006). Lake Tana is the source of the Blue Nile River (McCartney *et al.* 2010). It contains about 50% of the country's fresh water. More than 40 rivers and streams flow into Lake Tana, but 93% of the water comes just from four major rivers: Gilgelabay, Gumara, Ribb and Megech (Setegn *et al.* 2008; Alemayehu *et al.* 2010). The mean annual inflow is estimated to be 158 m^{3}/s (Alemayehu *et al.* 2010). The only surface outflow from the lake is the Blue Nile (Abay) River with an annual flow volume of four billion m^{3} measured at Bahir Dar gauge station (lake outlet in Figure 1).

Rainfall records in the basin show strong spatial and temporal variability as the basin is influenced by the inter-tropical convergent zone (ITCZ) and a heterogeneous topographic nature. The position of the ITCZ is the most dominant factor that controls the amount of summer rainfall in the basin. In the Lake Tana basin, rainfall has high seasonal variability. July, August and September are wet months with the highest amounts of rainfall as the ITCZ position is in the northern hemisphere. June and October are transition months of wet and dry seasons. November, December, January, February, and March belong to the dry season. April and May are months with little rainfall. A similar classification applies to the West Sahel region (Lucio *et al.* 2012). There is also high spatial variability of annual, seasonal and monthly rainfall amounts in the study area because of small changes of the location of the ITCZ (Gleixner *et al.* 2017; Woldesenbet *et al.* 2017).

### Data analysis

Several years of daily hydrometeorological data records from 1960 to 2015 were collected. The streamflow and lake water level records were gathered from the Department of Hydrology, Ministry of Water, Irrigation and Electricity of the Ethiopian Government (MoWIE 2016) and the meteorological data were collected from the National Meteorological Service Agency (NMA 2016). There are many weather stations in the study area, but only 18 stations that have relatively good continuity were considered for the analysis (Figure 1).

### Statistical methods

For this study, basic statistical analysis techniques including the Tukey's multiple mean comparisons, a nonparametric Kendall tau and seasonal MK tests as well as auto- and cross-correlation analyses were used. These methods were chosen to understand the variability of the hydrometeorological data over time as well as to characterize and detect the relation between the hydrometeorological variables.

#### Tukey's (‘honestly significant difference’ or ‘HSD’)

Tukey's multiple comparison test is a useful statistical method that can be used to determine which means amongst a set of means differ from the rest (Bates 2010). For the first time, Tukey's HSD test was applied in the Lake Tana Basin to understand the variability of mean values of river discharge and lake water level on a decadal basis. Streamflows of Gilgelabay, Gumara, Ribb, Megech, outflow from the Lake Tana, and the lake water level were considered. The decadal analyses of streamflows at the aforementioned stations and the lake water level were made by partitioning the recorded data into different decades to understand the change of annual mean values overtime. The time series data were split into the following decadal groups: 1960–1979, 1980–1989, 1990–1999 and 2000–2014. In this case, the null hypothesis assumed was that the annual mean values of streamflows and lake water level were time invariant (same for different decades) and the alternative hypothesis was that annual mean values differed with time. The 5% level of significance was considered in all of these analyses. The aov and TukeyHSD functions available in the base package of R were used to calculate the statistical values (R Core Team 2017).

#### Autocorrelation

*et al.*2015). It is also commonly used to determine if the data series is random or non-random (e.g. Matalas 1967; Modarres & da Silva

*.*2007; Gautam

*et al.*2010; Duvert

*et al.*2015). Autocorrelation coefficients of the rainfall and streamflow events were calculated using Equation (1) (Duvert

*et al.*2015). Furthermore, following Matalas (1967) who refer to Anderson (1942), for the test of significance of the autocorrelation coefficient (acf) to a given probability level was tested based on Equation (2). where

*r*is the acf at lag k, , is arithmetic mean of the observation,

_{k}*t*is the standard normal variate corresponding to a probability level

_{α}*α*, is the upper and lower bounds and

*N*is the series length. The

*r*value calculated by Equation (1) could be compared with the corresponding value calculated using Equation (2) for the significance test. If the value calculated on Equation (1) is greater than the values on Equation (2), the

_{k}*r*seems to be significantly different from zero and the sample observations are dependent on their preceding events at a given time lag k (Matalas 1967). Therefore, the null hypothesis (Ho) and alternative hypotheses (Ha) tests of this study were as follows: Ho: events of the daily rainfall, streamflow and lake water level time series were not dependent on their preceding events at time lag −k. In other words, the autocorrelation coefficient at lag k is not beyond or below the upper and lower bounds and the data were random. The alternative assumption considered was the reverse one, i.e. events of the daily rainfall and streamflow time series were dependent on their preceding events at time lag k. In other words, the autocorrelation coefficient at lag

_{k}*k*is out of the upper and lower bounds and the data were non-random.

#### Cross-correlation

Cross-correlation is the correlation between two time series shifted relatively in time. The method has been widely applied in diverse fields (Chenhua 2015). Lagged correlation is important in studying the relationship between time series for two reasons. First, one series may have a delayed response to the other series, or perhaps a delayed response to a common stimulus that affects both series. Second, the response of one series to the other series or an outside stimulus may be ‘smeared’ in time, such that a stimulus restricted to one observation causes a response at multiple observations. Detailed mathematical equations are explained in Duvert *et al.* (2015). Here we used cross-correlation of rainfall versus streamflow.

*k*equals value at lag −

*k*). In contrast, the cross-correlations are asymmetrical functions. The cross-correlation function is described in terms of ‘lead’ and ‘lag’ relationships. Equation (3) applies to

*y*shifted forward relative to

_{t}*x*. With this direction of shift,

_{t}*x*is said to be ‘lead’

_{t}*y*. This is equivalent to saying that

_{t}*y*‘lags’

_{t}*x*. A negative value for

_{t}*k*in Equation (3) is a correlation between the

*x*-variable at a time before

*t*and the

*y*-variable at time

*t*. For instance, if

*k*= −1, the cross-correlation (cc) value would give the correlation between

*x*

_{t}_{−1}and

*y*(Chatfield 2004). where

_{t}*N*is the series length, and are the sample means, and

*k*is the lag.

Pairwise cross-correlations of streamflow with the corresponding regional rainfall of each sub-basin were carried out. Cross-correlation tests were carried out for rainfall versus streamflow based on the following null and alternative hypotheses tests stated as follows:

Ho: the daily streamflow and catchment rainfall time series are not correlated significantly or the correlation coefficients at time lag *k* between daily streamflow and rainfall is not significantly different from zero.

Ha: the daily streamflow and catchment rainfall time series are correlated significantly or the correlation coefficients at time lag *k* between daily streamflow and rainfall is significantly different from zero.

#### Kendall tau and seasonal Mann–Kendall tests

The Kendall and seasonal MK tau tests are nonparametric statistical tests used for detecting trends in time series data (Thomas & Pool 2006). The tests were applied for precipitation, the lake water level and streamflow time series under the following null (Ho) and alternative (Ha) hypotheses:

Ho: the streamflow, rainfall and lake water level time series data are showing neither an upward nor a downward trend.

Ha: the streamflow, rainfall and lake water level time series data are showing either an upward or a downward trend with significant change.

## RESULTS AND DISCUSSION

### Rainfall analysis

Record lengths of the rainfall data vary from one station to the other stations. The maximum record length is available at Enjabara station which started recording in 1954. About 5.4% of the rainfall data are missing values. The rainfall in the study area has a unimodal pattern with a peak in July or August (Figure 2). Moreover, the rainfall data shows high spatial and temporal variability. Most of the precipitation frequencies lay below the median values. Data values which are out of the range of [Q_{1}−1.5*(Q_{3}−Q_{1}), Q_{3} + 1.5*(Q_{3}−Q_{1})] (Q_{1} = first quartile and Q_{3} = third quartile) are suspected as outliers (Thomas & Pool 2006; Li *et al.* 2016). About 15 of the stations have mean rainfall values greater than the third quartile which indicates that mean values are biased to the maximum values.

#### Rainfall lag time correlation

The overall pattern of the correlogram shows a gradual decrease as the lag time increases (Figure 3). The values range from 0.31 to 0.7. Wereta station has the maximum (approximately 0.7) lag day one autocorrelation coefficient. Even though these coefficients are decreasing when lag time is increasing, all values are beyond the upper bound of the 95% confidence interval. The lower and upper bounds were computed using Anderson's formula (Equation (2)). According to Matalas (1967), a time series exhibiting nonsignificant values of autocorrelation coefficient is not necessarily random since autocorrelation coefficient of order greater than one, if significant, would indicate a lack of randomness. However, a data series exhibiting significant values of autocorrelation coefficient indicate non-randomness. Therefore, as the autocorrelation coefficient values are statistically significantly different from zero, they indicate that the rainfall series is characterized by a non-random distribution and linear dependency of successive values over a given period (Matalas 1967; Modarres & da Silva 2007; Gautam *et al.* 2010). The non-random distribution of rainfall may be explained by prevailing weather conditions and large scale transport of water vapour in the atmosphere.

#### Spatial and temporal variation of rainfall

Time series rainfall plots were also analysed to see the long-term pattern and variability. There are intra- and inter-daily, monthly, seasonal and annual variability of rainfall of the stations considered in this study. On the one hand, the rainfall patterns are relatively consistent during the dry months (January, February, November and December) when compared to March, April and May. On the other hand, wet months (June, July, August, and September) are highly variable. The rainfall records of each station either showed a downward or an upward trend. However, the *p*-values of MK and seasonal MK tests were greater than 0.05 for most rainfall stations considered in this study, indicating that the changes were statistically not significant. While most of the stations showed a similar trend direction in both seasons, the direction of trends are opposite in the rainy and the dry season in some stations (Table 1). Mean areal basin rainfall of recent years records are showing a negative deviation from the normal average values (Figure 4(a)) and a few stations such as Bahir Dar (Figure 4(c)), Amed Ber, Adet, Addis Zemen, Enfranz, Maksegnit and Weteabay show a downward trend on daily, seasonal and annual rainfall even though the change is statistically not significant. To the contrary, Dangila, Debre Tabor, Merawi, Tis Abay and Wereta are showing upward trends with *p*-values greater than 0.05, so that no significant trend could be detected (null hypothesis). The geographic locations of the stations are shown in Figure 5. The annual rainfall change is mostly insignificant except for Enjibara, Gondar and Zege stations that show a significant upward trend and Addis Zemen a downward trend. Nevertheless there are a few more stations that showed a significant change in seasonal rainfall (Table 1). Gebremicael *et al.* (2013) also reported an increasing trend for Gondar station but with insignificant change. The significant increase (decrease) of seasonal rainfall are related to the timing (late onset and early cessation) and very short duration rainfall events (Teshome 2016). Moreover, variations of the north–south movement of ITCZ and the El Niño teleconnection that affects the sea surface temperatures (SSTs) of the Indian and Atlantic oceans are the most probable factors that cause seasonal variability of the rainfall in the study area (Gleixner *et al.* 2017). Around 50% of the Ethiopian summer rainfall variances are influenced by equatorial Pacific SST variability (Gleixner *et al.* 2017).

Station | Daily | Seasonal | Annual | |||
---|---|---|---|---|---|---|

τ | p-value | τ/p-value rainy season | τ/p-value dry season | τ | p-value | |

Adet | + | 0.60 | −/0.06 | +/0.56 | − | 0.05 |

Addis Zemen | − | *** | −/*** | +/0.56 | − | * |

Amed Ber | − | 0.36 | −/*** | −/*** | − | 0.05 |

Bahir Dar | − | *** | −/0.60 | +/0.12 | − | 0.46 |

Dangila | + | 0.34 | +/0.31 | +/0.78 | + | 0.13 |

Debre Tabor | + | *** | −/0.34 | +/0.75 | + | 0.93 |

Delgi | − | 0.10 | −/0.24 | −/0.11 | − | 1.00 |

Dera Hamusit | + | 0.20 | +/0.85 | −/0.90 | − | 0.15 |

Enfiranz | − | 0.16 | −/0.27 | +/0.86 | − | 0.14 |

Enjibara | + | *** | +/*** | +/0.85 | + | * |

Gondar | + | *** | +/0.84 | −/0.16 | + | ** |

Maksegnit | + | * | +/0.21 | +/0.51 | + | 0.88 |

Mekaneyesus | + | ** | +/** | +/*** | + | 0.62 |

Merawi | + | 0.87 | +/*** | −/* | + | 0.85 |

Tis Abay | + | 0.23 | +/** | +/0.79 | + | 0.20 |

Wereta | + | 0.30 | +/*** | +/0.75 | + | 0.61 |

Wete Abay | − | 0.33 | +/0.58 | −/0.53 | − | 0.37 |

Zege | + | 0.75 | +/*** | −/* | + | * |

Station | Daily | Seasonal | Annual | |||
---|---|---|---|---|---|---|

τ | p-value | τ/p-value rainy season | τ/p-value dry season | τ | p-value | |

Adet | + | 0.60 | −/0.06 | +/0.56 | − | 0.05 |

Addis Zemen | − | *** | −/*** | +/0.56 | − | * |

Amed Ber | − | 0.36 | −/*** | −/*** | − | 0.05 |

Bahir Dar | − | *** | −/0.60 | +/0.12 | − | 0.46 |

Dangila | + | 0.34 | +/0.31 | +/0.78 | + | 0.13 |

Debre Tabor | + | *** | −/0.34 | +/0.75 | + | 0.93 |

Delgi | − | 0.10 | −/0.24 | −/0.11 | − | 1.00 |

Dera Hamusit | + | 0.20 | +/0.85 | −/0.90 | − | 0.15 |

Enfiranz | − | 0.16 | −/0.27 | +/0.86 | − | 0.14 |

Enjibara | + | *** | +/*** | +/0.85 | + | * |

Gondar | + | *** | +/0.84 | −/0.16 | + | ** |

Maksegnit | + | * | +/0.21 | +/0.51 | + | 0.88 |

Mekaneyesus | + | ** | +/** | +/*** | + | 0.62 |

Merawi | + | 0.87 | +/*** | −/* | + | 0.85 |

Tis Abay | + | 0.23 | +/** | +/0.79 | + | 0.20 |

Wereta | + | 0.30 | +/*** | +/0.75 | + | 0.61 |

Wete Abay | − | 0.33 | +/0.58 | −/0.53 | − | 0.37 |

Zege | + | 0.75 | +/*** | −/* | + | * |

*Spatial variation of rainfall*. There is significant variation of the annual rainfall distribution in the basin. An annual rainfall map has been produced based on the long-term mean annual rainfall records by applying inverse distance weighted interpolation (IDW) in a GIS environment (Figure 5). The map shows a general N–S gradient of rainfall. The south western part of the basin has the highest rainfall and northern and north western part receives less rainfall. Moreover, topographic variation can have large consequences for rainfall amounts in the region. The amount of annual rainfall is directly related to elevation above mean sea level; high rainfall is corresponding to the highlands, whereas low rainfall is measured in the lowlands (Figure 5). The minimum, maximum and mean annual rainfall values are 815, 1,599, and 1,238 mm respectively. The standard deviation value is 160 mm.

### Stream flow analysis

About 93% of the water of the Lake Tana originated from only four major rivers: Gilgelabay, Ribb, Gumara and Megech. Among the four contributors, Gilgelabay is the largest one with a long year daily average discharge of 55 m^{3}/s and Gumara is the second largest contributor with an average flow of 34 m^{3}/s. Ribb contributes a mean flow of 14.4 m^{3}/s and Megech is the least contributor (9 m^{3}/s) among the four tributaries considered in this study (Figure 6). The analysis shows that outflow from the lake is decreasing, particularly between the years 2002–2006 and 2008–2011 (Figure 4(d)). Possible reasons that might contribute to the abrupt change of the outflow are linked to anthropogenic activities. Intensive development interventions are taking place at the major tributary rivers. Damming and abstraction of water has been taking place in recent years (Alemayehu *et al.* 2010; Minale & Rao 2011; Abate *et al.* 2015). These activities are influencing the water level of Lake Tana. In addition to the human-induced factors, natural changes on the rainfall amount and intensity in the catchment are considered to be one of the main reasons for decreasing outflow.

The streamflow changes were detected on a daily and annual basis as well as in the wet and dry seasons (Figure 7). All stations show a significant increase on the daily time scale (Table 2). However, Gumara and Megech were the only stations that showed a significant upward change at annual and seasonal time scales as well (Table 2). The lake water level showed a significant increase during the wet season and a significant decrease during the dry season, resulting in no significant annual trend. Gilgelabay showed a decreasing trend in the wet season, but no significant trend was detected for the dry season and on the annual time scale. The outflow from the lake and Ribb station have opposite changes in the wet and dry season, but these changes are not significant (Table 2).

Streamflow trends on | ||||||
---|---|---|---|---|---|---|

Station | Daily | Seasonal | Annual | |||

τ | p-value | Wet season (τ/p-value) | Dry season (τ/p-value) | τ | p-value | |

Abay/outflow | + | *** | −/0.41 | +/0.52 | + | 0.95 |

Gilgelabay | + | *** | −/* | +/1.00 | − | 0.74 |

Gumara | + | *** | +/** | +/*** | + | *** |

Megech | + | *** | +/*** | +/*** | + | *** |

Ribb | + | *** | +/0.81 | −/0.31 | + | 0.57 |

Lake water level | + | *** | +/*** | −/*** | + | 0.57 |

Streamflow trends on | ||||||
---|---|---|---|---|---|---|

Station | Daily | Seasonal | Annual | |||

τ | p-value | Wet season (τ/p-value) | Dry season (τ/p-value) | τ | p-value | |

Abay/outflow | + | *** | −/0.41 | +/0.52 | + | 0.95 |

Gilgelabay | + | *** | −/* | +/1.00 | − | 0.74 |

Gumara | + | *** | +/** | +/*** | + | *** |

Megech | + | *** | +/*** | +/*** | + | *** |

Ribb | + | *** | +/0.81 | −/0.31 | + | 0.57 |

Lake water level | + | *** | +/*** | −/*** | + | 0.57 |

Based on the Tukey's multiple mean comparison test, the decadal means of the outflow between the 1980s and the 1990s show a significant increase (*p*-value = 0.03801), to the contrary a significant decrease was observed between the 1990s and the 2010s (*p*-value = 0.02316). Thus, there was sufficient evidence (at *α* level = 0.05) to conclude that the means of decades 1980s and 1990s, and 1990s and 2010s were significantly different. On the other hand, this test indicates that there were no significant variations of the decadal means between the 1970s and the 1980s, the 1970s and the 1990s, the 1970s and the 2000s, the 1980s and the 2000s and the 1990s and 2000s. The inter-annual variability of the outflow increases from 34% (case of 1990s) to 40% (case of 2000s). These test results are in agreement with a previous study of Conway & Hulme (1993) but are contrary to the findings of Gebremicael *et al.* (2013) and Tesemma *et al.* (2010). Gebremicael *et al.* (2013) reported that the annual streamflow of Upper Blue Nile Basin showed a significant increase from 1971 to 2009. Tesemma *et al.* (2010) also found a significant increase of the outflow of Lake Tana over the period from 1959 to 2003. Our investigation showed a disagreement with the above two results as our analyses were carried out on the basis of a 10-year moving average while the other studies were focused on one long time period. Additionally, the mean seasonal outflow values changed downward in the wet season and upward during the dry season, but these changes were not significant (*p*-values = 0.41/0.95). Similar tests were carried out for inflow discharges into the lake. The Megech and Gumara flows showed statistically significant variation of the mean values at *α* level = 0.05. Annual mean flow of Gumara for the 2000s and the 2010s are significantly increasing compared to the 1960s (*p*-value = 0.014 and 0.001), the 1970s (*p*-value = 0.0013 and 0.004) and the 1980s (*p*-value = 0.003 and 0.008) (Figure 7). The mean flows for other decades do not show a significant change. The result shows agreement with a similar study conducted for the whole Abay/Blue Nile basin 2013 (Gebremicael *et al.* 2013; Tekleab *et al.* 2013). The decadal mean annual flow of Megech for the 2000s showed a highly significant (*p*-values <0.001) increase when it is compared with the 1980s and 1990s. On the other hand, Gilgelabay and Ribb flows were not showing a significant variation of their decadal means. Gilgelabay flow has shown an increase in the mean from 1980s to 1990s and again a decrease from 1990s to 2000s but it is not statistically significant (*p*-value >0.05). The intra-annual variability of Gilgleabay streamflow increases by 9, 17 and 25% during 1980s, 1990s and 2000s, respectively (Figure 8).

The high degree of variabilities of streamflow and the change of runoff magnitude in the study area are mainly caused by the combined effect of land use dynamics and changes on the rainfall intensity and duration. For instance, cultivation land of the Eastern Lake Tana Basin was increased by 72.7% while the forest cover was decreased by 71.3% and the degraded land was increased by 31.34% between the years 1985 and 2011 (Gashaw & Fentahun 2014). Woldesenbet *et al.* (2017) reported that continuous expansion of cultivated land and decline in woody shrubs and natural forest were the major changes of Lake Tana Basin in the period from 1973 to 2010. In addition to land use changes, recently the number of rainy days in a year are decreasing while total rainfall remains more or less constant with high intensity (Teshome 2016). These are important factors causing high runoff magnitudes by decreasing the rate of infiltration.

#### Discharge autocorrelation analysis at different time lags

Based on our analyses, the daily streamflow is a non-random process that shows high seasonal variability. The autocorrelation coefficients of each river have a maximum value at lag 1 day and steeply decrease as the lag time is increased. Arora *et al.* (2014) found similar results for a glacier catchment in the Himalayas. As Figure 9 indicates, the maximum autocorrelation coefficient values are at lag 1 for each of the rivers. Abay discharge, which is the outflow from the lake, has the highest lag 1 autocorrelation coefficient (0.99). The autocorrelogram of the outflow has the highest of all. This shows that the lake outflow strongly depends on the previous day's outflow as it is a function of the lake water level. Megech discharge has the lowest lag 1 autocorrelation (0.56) due to a large variation during a few months of high flows. All of the rivers have similar characteristics with respect to autocorrelations that show maximum autocorrelation with the previous day's discharge (Q_{i−1}) indicating storage effects of the previous day streamflow. Consequently, it is easier to forecast the streamflow of the next day if the discharge of the previous day is known. All plots except Megech's start with a high autocorrelation at lag 1 and generally decrease linearly with little noise. Such a pattern is assumed to be a signature of ‘strong autocorrelation’, which in turn provides high predictability. Megech's discharge autocorrelation plot has a different shape due to the high variability of the data series from decade to decade. It has moderately high autocorrelation at lag 1 (value = 0.56) and gradually decreases for longer lag times. The decreasing autocorrelation is generally linear with some noise which indicates that the pattern is moderately autocorrelated. In summary, results of the autocorrelation tests are significantly different from zero so that our null hypothesis (independency and randomness of the streamflow data) was rejected and the alternative assumption (events of the daily streamflow time series were dependent on their preceding events at time lag-k) was accepted (Matalas 1967; Modarres & da Silva 2007; Gautam *et al.* 2010; Duvert *et al.* 2015).

#### Discharge cross-correlation analysis at different time lags

Most of the time, surfaces runoff generation is not taking place during the rainfall events as there may be initial soil moisture deficit that should be satisfied during the beginning of the rainfall events (Wagner *et al.* 2016). Moreover, the soil, land use and land cover conditions of catchments affect runoff generation. There are clear differences among cross-correlation (cc) coefficients of rainfall and streamflow in the Lake Tana Basin (Figure 10). Thus, the cc values show strong correlation after one month time lag. Gumara has the highest cc value (0.68) after a time lag of 31 days, Gilgelabay, Ribb and Megech have their maximum cc values of 0.65, 0.62 and 0.42 after 35, 20, and 32 days lag, respectively. These positive correlation coefficients between rainfall and streamflow with lagging time are a signature of an autoregressive model (Osman *et al.* 2017). On the other hand, the outflow discharge cc behaviour is different from the other four due to large storage capacity of the lake and its maximum cc value is achieved after more than 60 days. The cc coefficient between daily rainfall and streamflow at other time lags are comparatively small even though all values are statistically significant at 95% confidence interval. The outflow discharge (Abay) from the lake has different correlation behaviour with rainfall indicating that its response is not only dependent on rainfall but also depends on other inputs such as inflow discharges into the lake. For example, the lake storage has a retarding effect on the peak flow of Abay (Setegn *et al.* 2008). In conclusion, the daily streamflow and catchment rainfall time series are correlated and the correlation coefficients at time lag k between daily streamflow and rainfall were statistically significantly different from zero. This result showed an agreement with our alternative assumptions.

### Lake water level analysis

The lake level varies seasonally. Its maximum level is recorded in September which is one month after the peak rainfall (Figure 11). There is an inter-annual variability of the water level (Figure 4(b)). The mean lake water level for the 1980s was 2.47 m and it increased to 2.77 m in the 1990s and declined again to 2.42 m in the 2000s. Moreover, the dry season water level showed a significant decrease over time (*p*-value <0.001, Table 2). On the contrary, the wet season showed a significant increase (*p*-value <0.01, Table 2). The lake water level showed an abrupt decline during 2002 (Figure 4(b)). The sharp drop of the lake level was caused as a result of an attempt to maximize electricity production by regulating the lake outflow after the construction of Chara Chara weir from the end of 2001 at Tis Abay (Setegn *et al.* 2008; Alemayehu *et al.* 2009; Rientjes *et al.* 2011). The extended crop production by farmers since 2003 on about 562 ha of the Lake Tana bed following the lower lake levels was a good example of the annual impact which made the lake level unable to restore to its previous level in 2002 (Alemayehu *et al.* 2009; Minale & Rao 2011). This indicates that lower water levels during the dry season will almost certainly result in people moving both cultivation and grazing onto the dried lake bed. Moreover, this would exacerbate adverse impacts on near-shore vegetation and could greatly increase sedimentation in the lake (Alemayehu *et al.* 2009).

Similar to the outflow discharge, the decadal mean of the lake water level significantly decreases from 1980s to 1990s and 1990s to 2000s. The difference of the lake water level between the 1980s to 1990s was about 0.30 m. On the other hand the level dropped by 0.35 m between the 1990s to the 2000s. These changes were consistent with outflow discharge at Bahir Dar gauging station. The coefficient of variation increased in all of the three decades, underlining the variability of the Lake Tana level in the stated periods.

## CONCLUSIONS

In this study, rainfall, lake level and streamflow data of several years were analysed. The rainfall pattern in the basin is monomodal with a peak in July or August. The direction of the MK trend tests vary among annual, dry and wet seasons as well as daily rainfall and streamflows, implying that the time period of investigation matters. The annual rainfall change over time is mostly not statistically significant. About 55% of the rainfall stations showed a positive trend and only three of them (Enjibara and Zege at *α* = 0.05, Gondar at *α* = 0.01) showed significant changes. The remaining 45% of the stations showed insignificant downward trends, except Addis Zemen. However, the seasonal changes were significant for eight stations (Table 1). The summer rainfall changes could be related to SST variation. The maximum lake water level is recorded in the month of September, following the maximum amount of rainfall in the previous months. The annual mean lake water level has shown a decreasing trend from decade to decade and since the year 2002. The long-term annual mean lake water level is about 2.6 m. The autocorrelation coefficient values of daily rainfall and streamflow are decreasing linearly as the time lag increases but they are significantly different from zero indicating that the data come from an underlying autoregressive process with moderate to strong positive autocorrelation (Matalas 1967; Modarres & da Silva 2007; Gautam *et al.* 2010). The streamflow values have their maximum cross-correlation coefficient after 20–60 days due to basin lag time caused by the shape of the catchment, size of the drainage basin, soil and vegetation cover.

In general, the Lake Tana Basin is a hydrologically highly dynamic area that shows high variability on the daily, monthly and yearly streamflow and the lake water level. The changes in streamflow and the lake water level are mainly linked to intensive land use changes such as expansion of intensive agriculture, urbanization and deforestation as well as a change in the number of rainfall days and intensity. Improvements in agricultural water management could help to increase the water use efficiency, which consequently may contribute to mitigate the decreasing trends.

## ACKNOWLEDGEMENTS

We would like to acknowledge funding for a doctoral study grant from the Federal State of Schleswig-Holstein, Germany, through the Landesgraduiertenstipendium of Kiel University. We are also grateful to the Ministry of Water, Irrigation and Electricity and the National Meteorological Service Agency of the Government of Ethiopia for their support by providing hydrological and climate data.