Multi-variate in ﬁ lling of missing daily discharge data on the Niger basin

The Niger basin have experienced historical drought episodes and ﬂ oods in recent times. Reliable hydrological modelling has been hampered by missing values in daily river discharge data. We assessed the potential of using the Multivariate Imputation by Chained Equations (MICE) to estimate both continuous and discontinuous daily missing data across different spatial scales in the Niger basin. The study was conducted on 22 discharge stations that have missing data ranging from 2% to 70%. Four ef ﬁ ciency metrics were used to determine the effectiveness of MICE. The Flow Duration Curves (FDC) of observed and ﬁ lled data were compared to determine how MICE captured the discharge patterns. Mann-Kendall, Modi ﬁ ed Mann-Kendall, Pettit and Sen ’ s Slope were used to assess the complete discharge trends using the gap- ﬁ lled data. Results shows that MICE near per-fectly ﬁ lled the missing discharge data with Nash-Sutcliffe Ef ﬁ ciency ( NSE ) range of 0.94 – 0.99 for the calibration (1992 – 1994) period. Good ﬁ ts were obtained between FDC of observed and gap- ﬁ lled data in all considered stations. All the catchments showed signi ﬁ cantly increasing discharge trend since 1990s after gap ﬁ lling. Consequently, the use of MICE in handling missing data challenges across spatial scales in the Niger basin was proposed. different stations which has signi ﬁ cant impacts on trend analysis. All the discharge stations show high negative autocorrelations before gap- ﬁ lling. Comparison of autocorrelations of the observed and gap-ﬁ lled data reveals gradual reduction in degree of autocorrelation after gap- ﬁ lling. The performance of the MK and MMK statistics were compared on the gap- ﬁ lled data and results showed very poor performance of the MK as compared to the MMK due to autocorrelation. Signi ﬁ cantly increasing dischargetrends was observed on all


INTRODUCTION
West Africa have been ascribed with adverse climate change impacts. In past decades, the region experienced decline in food security due to warming, changing precipitation patterns, and greater frequency of some extreme events (IPCC 2019). Climate change has driven decreased discharge and increased drought in the Sahel since 1970, with observations showing that 1984 is the driest year on record (Biasutti 2019). High intensity rainfall and flood magnitudes was projected to increase in coming decades (Sylla et al. 2015;Aich et al. 2016). Increasing trend in the annual rainfall-runoff erosivity and soil loss are expected in the 21st century (Amanambu et al. 2019). Water resources are fundamental for several sectors such as hydropower, crop production and fisheries (Roudier et al. 2014). Sylla et al. (2018) disclosed that increase in temperature will lead to decrease in the potential to sustain large dams and irrigated agriculture in west Africa.
The Niger basin is ascribed with poorly documented historical hydrological data. There has been decrease in the amount of reliable rainfall stations since 1980 Oyerinde et al. 2015) and discharge stations since 2000 (Schröder et al. 2019). These has been identified to be due to underfunding of data collection agencies, lack of technical capacity and commitment, inaccessibility to remote gauge stations due to logistical and security challenges and equipment malfunction (Ekeu-Wei et al. 2018). Radar altimetry measurements was found to aid in improving observed discharge data but generating reliable altimetry-discharge rating curves is still unresolved (Schröder et al. 2019). Temporal resolution of radar altimetry measurements due to satellite passing time also create challenges when there is consecutively gapped hydrological record (Ekeu-Wei et al. 2018). This has led to insufficient research for development in water resources in west Africa (Vahid et al. 2015) and sometimes, contrasting future trends have been reported (Dosio et al. 2020); which makes policy making difficult.
Multiple imputation is now accepted as the best general method to deal with incomplete hydrological data (Little 1992;van der Heijden et al. 2006;van Buuren & Groothuis-Oudshoorn 2011;Ekeu-Wei et al. 2018;Sidibe et al. 2018). It was described as the method of choice for complex incomplete data problems (van Buuren & Groothuis-Oudshoorn 2011). Multiple Imputation performed better in filling missing data when compared to other methods such as: complete case analysis, available case methods, least squares on inputted data, maximum likelihood and Bayesian methods (Little 1992;van der Heijden et al. 2006). Ekeu-Wei et al. (2018) compared the performance of radar altimetry and a multiple imputation technique (MICE) on 5 hydrological stations located in the lower parts of the Niger basin (Nigeria). They concluded that MICE have potentials for ameliorating missing data challenges on the evaluated 5 catchments. These studies highlighted above have failed to assess the potential of multiple imputation across different scales on all parts of the Niger basin. The river basin has different flow regimes from upper, middle and lower parts (Oyerinde et al. 2017c) and it covers 10 countries. Therefore, this study assesses the efficacy of the Multiple imputation (MICE algorithm) in filling missing discharge data across different spatial scales on the whole Niger basin. The infilled data were subsequently used to assess complete discharge trends on 22 hydrological stations widely spread across the upper, middle and lower parts of the Niger basin. The objectives of the study are to: 1. Evaluate the efficacy of MICE in improving observed discharge data across different spatial scales on the Niger basin. 2. Estimate missing discharge data on 22 hydrological stations from 1980-2013. 3. Assess the effects of missing discharge data on hydrological trend analysis.

Study area
The Niger River Basin is the largest river basin in West Africa and it sustain livelihood of most countries in the Sahel. It covers 2.27 million km 2 , with the active drainage area comprising less than 50% of the total basin (Ogilvie et al. 2010). The river has a length of 4,200 km and it is the third longest river in Africa. The basin has population of over 100 Million people based in ten countries (Algeria, Benin, Burkina Faso, Cameroon, Chad, Cote d'Ivoire, Guinea, Mali, Niger and Nigeria). The source of the Niger river is in the mountains of Guinea in an area with very high rainfall. Annual precipitation average over the whole Niger basin is 690 mm/year. This precipitation pattern varies across countries and regions. In the Sahelian parts, annual precipitation is about 280 mm while at Guinea parts precipitation is up to 1,635 mm. The average temperature ranges between 22°C in the south to 27°C in the northern parts. The basin topography shows high elevation up to 2,202 meters above sea level (MASL) in the Guinea mountains and lower elevation of 4 MASL at the basin exit into the ocean (Figure 1). Vegetation ranges from evergreen forests in the south to deserts in the Northern parts (Aich et al. 2016).
The Niger river flows Northeast from the source through the Upper Niger basin and enters the Inner Delta in Mali. During the rainy season, the delta forms a large flood plain of 20,000-30,000 km 2 , facilitating the cultivation of rice, cotton and wheat as well as cattle herding and fishing (KfW 2010). The size of the flooded area is subject to strong annual variations, depending on the discharge of the Upper basin. A large part of the water is lost in the delta due to evaporation and seepage. Its main tributary, the Benue River, flows from highlands of Cameroon and joins the Niger at Lokoja, Nigeria, before reaching the Atlantic Ocean at the Gulf of Guinea (Oguntunde & Abiodun 2013). The Niger has an annual average flow of 1,005.83 m 3 /s (average 1980-2013) at Koulikoro (Mali) and up to 5,000 m 3 /s (average 1980-2013) close to the basin exit at Lokoja, Nigeria ( Figure 1). The World Bank estimates that 30,000 gigawatt hours could be generated in the Niger River and its tributaries, but only 6,000 gigawatt hours have been developed so far. The Kainji Hydroelectric PLC ( Figure 1) generate 22% of total hydroelectricity (KfW 2010). There are potential for increasing hydropower generation in the Niger basin.

Data
Daily river discharge data used in the study was obtained from the Niger River Basin Authority (NBA). We got data for 22 stations widely spread across the Niger River Basin from 1920 to 2013 ( Figure 1). As shown in Table 1 and Figure 2, all the evaluated 22 hydrological stations were established at different times. Two stations  Uncorrected Proof (Koulikoro and Dire) has data record since 1920s, 8 stations started in the 1950s, 3 stations were established in 1960s while 9 stations have records since 1980s. Clear visual assessments of intra and inter annual data quality shows most hydrological stations has good data from the1980s ( Figure 2). Thus, we chose the period of 1980-2013 for this study.

MICE Algorithm
The The gap filled discharge data Y is a partially observed random sample from the p-variate multivariate distribution P (Y ju). We assume that the multivariate distribution of Y is completely specified by θ, a vector of unknown parameters. The MICE algorithm obtains the posterior distribution of θ by sampling iteratively from conditional distributions of the form The parameters θ 1 ,…, θ p are specific to the respective conditional densities and are not necessarily the product of a factorization of the 'true' joint distribution P (Y ju). Starting from a simple draw from observed marginal distributions, the t th iteration of chained equations is a Gibbs sampler that successively draws through its relation with other variables, and not directly. Convergence can therefore be quite fast, unlike many other MCMC methods. The name chained equations refers to the fact that the MICE algorithm can be easily implemented as a concatenation of univariate procedures to fill out the missing data (van Buuren & Groothuis-Oudshoorn 2011). Uncorrected Proof MICE gap filling method was calibrated and validated at 3 discharge stations (Koulikoro, Dire and KeMacina). The 3 stations were selected because they have less than 5% missing data from 1980-2013 ( Figure 3). The average amount of missing data on the 22 discharge stations were calculated as 27%. The same amount of missing data was artificially generated into the 3 discharge stations (Figure 4). Twenty percent discontinuous and random missing data were generated throughout the 3 discharge time series as shown in Figure 4. The remaining 7% missing gap was added through a continuous means from 1992-1994. Calibration was done by comparing the observed and inputted discharge data during the continuous missing years (1992)(1993)(1994). Observed discharge data and Gap filled discharge data for the whole time series from 1980 to 2013 was used in data validation. Three efficiency metrics described below were used as an indicator of agreement between observed discharge and inputted discharge. The 3 metrics have been previously used in the Niger basin (Oyerinde et al. 2016(Oyerinde et al. , 2017bOyerinde & Diekkrüger 2017;Poméon et al. 2018). a) Nash-Sutcliffe Efficiency (∞ , NSE 1) (Nash & Sutcliffe 1970) (NSE) is commonly used to assess the predictive power of river discharge. It is defined as: Where O is the observed value and S is the predicted value at day i. An efficiency of 1 corresponds to a perfect match between predicted and observations. b) Kling-Gupta Efficiency (KGE) was developed by Gupta et al. (2009) to provide a diagnostically interesting decomposition of the NSE, which facilitates the analysis of the relative importance of its different components  (correlation, bias and variability) in the context of hydrological modeling (Kling et al. 2012).
r is the correlation coefficient between predicted and observed discharge (dimensionless), β is the bias ratio (dimensionless), γ is the variability ratio (dimensionless), μ is the mean discharge in m 3 /-s, CV is the coefficient of variation (dimensionless). The KGE has its optimum at unity (Kling et al. 2012). Uncorrected Proof c) Volumetric Efficiency (VE) was proposed in order to circumvent some problems associated to the NSE which is not sensitive to differences in absolute discharge values. It represents the fraction of water delivered at the proper time; its compliment represents the fractional volumetric mismatch (Criss & Winston 2008).
Flow duration curves (FDC) To evaluate the performance of the gap filling method, FDC was done at daily timestep between observed data and filled data. FDC have been widely applied in hydrological studies (Kling et al. 2012;Onyutha & Willems 2013;Abadzadesahraei & Sui 2016;Burgan & Aksoy 2020). It represents the relationship between the magnitude and frequency of daily, monthly (daily in this article) streamflow for a particular river basin. It provides an estimate of the percentage of time a given streamflow was equaled or exceeded over a historical period. FDC was used to provide a simple, yet comprehensive, graphical view of the overall historical variability associated with streamflow in a river basin. It is the complement of the cumulative distribution function (cdf) of daily streamflow. Each value of discharge Q has a corresponding exceedance probability p, and an FDC is simply a plot of the pth quantile or percentile of daily streamflow versus exceedance probability p, where p is defined by: Trend analysis

Mann-Kendall test (MK)
To analyze for increasing/decreasing trends of river discharge, the Mann-Kendall test was employed. Mann-Kendall test evaluates the relative magnitudes of data trend and it is widely used in hydrology (Verstraeten et al. 2006;Meusburger et al. 2012;N'Tcha M'Po et al. 2017;Oyerinde et al. 2017a). The advantage of this test is that the data need not comply with any particular distribution. Mann-Kendall statistic (S) was computed as: where x j and x k are the annual values in years j and k, respectively and n is the number of years.
For an independent data sample without tied values, the mean and variance of S are given by: If tied values are present in the sample, Var(S) is computed by: Then, the MK test statistic Z for all those cases where n is larger than 10 is given by: The trend was dignified as "no trend" when the change is not significant', 'an increasing or a decreasing trend' when S is positive or negative respectively. Similarly, If Z . 0, it indicates an increasing trend, and vice versa.

Modified Mann-Kendall test by variance correction (MMK)
Study of Yue & Wang (2004) demonstrated that serial correlation in time series alters the variance of the MK statistic. Their study was able to use modification of variance to limit the effect of serial correlation. In addition, Chen et al. (2016) recommended the combine use of the MK and MMK tests when there are autocorrelations above lag 1. Due to these reasons, we assessed and compared the autocorrelation of observed and gap-filled discharge data. The results of the MK statistics before and after variance corrections of gap-filled data by MMK were also compared. Modified variance V(S)* for computing the MK statistic Z is calculated as: where n is the actual sample size of actual sample data, n* is the effective sample size, and n/n* is termed the correction factor.

Sen's slope
Besides the magnitude of a time series, trend was evaluated by a simple non-parametric procedure developed by Sen (Sen 1968;Ali et al. 2019). The trend is calculated by: where β is Sen's slope estimate. β . 0 indicates upward trend in a time series. Otherwise the data series presents downward trend during the time period (Ali et al. 2019).

Pettit test
To evaluate the difference between cumulative distribution function before and after a time instant (K), the Pettit test was applied. Pettit test detects any significant change in the mean value in a time series (Kliment et al. 2011). The non-parametric Pettit rank test was reported to have good capabilities in handling outliers (Pettitt 1979;Verstraeten et al. 2006) and it was previously used in hydrological studies in the region (Nka et al. 2015;Oyerinde et al. 2017a). The significance of the analyzed trends in the dataset was tested at probability level p 0.05 to show 99.5% experimental precision. It tests the H 0 : The T variables follow one or more distributions that have the same location parameter (no change), against the alternative: a change point exists. The non-parametric statistic is defined as: where: The change-point of the series is located at K T , provided that the statistic is significant.

Calibration and validation of MICE gap filling method
The MICE gap filling method was evaluated at three discharge stations (Koulikoro, KeMacina and Dire) which has less than 5% missing data ( Figure 3). There was high efficiency metrics between the inputted data and observed data during both continuous missing years (1992)(1993)(1994) and the randomly created missing data in all the time series from 1980 to 2013 (Table 2). From Figure 5, visual assessments show that the MICE gap filling method well reproduced the observed seasonality of flow at the 3 discharge stations. The observed low and high flows were well reproduced in the gap filled discharge data.

Flow duration curves (FDC)
FDCs were plotted for all the 22 hydrological stations to show the fitness between the gap-filled and observed discharge data (Figure 6-8). The FDC of inputted and observed data nearly indicated a perfect fit at all stations. Figure 6 present FDCs that shows good fitness of observed and inputted data at selected stations on the main Niger river at the Upper Parts (Banakoro (11% missing data)), Inland Delta Parts at Mopti (7% missing data), Middle parts at Malanville (34% missing data) and the lower parts at Lokoja (19% missing data). From Figure 7, there was an excellent fit between the FDC of inputted and observed discharge data at selected head water catchments at Couberi (30% missing data), Makurdi (35% missing data), Douna (30% missing data) and Kakassi (45% missing data). We assessed the FDC of four hydrological stations with highest percentage of missing data as shown in Figure 8 (Kompongou (72% missing data), Garbey Kourou (50% missing data), Baro (46% missing data) and Alkongui (63% missing data)). The pattern of flow of the observed was adequately reproduced by the inputted, although there was slight over estimation at Baro while slight underestimation was observed at Kompongou, Garbey Kourou and Alkongui. Figure 9 shows the autocorrelation plots for filled and observed discharge data of the 22 catchments using lags 1 to 10. Observed discharge data has very high autocorrelation values that ranges from 0.67 to 1.00 for all the 22 discharge stations. The gap-filled data however has decreased and more diverse autocorrelation values (À0.2 to 0.77). We compared the MK tests statistics before and after filling the missing data (Table 3). Missing data decreased the magnitude of S and Z statistics due to decreased sample size while this problem was corrected in the gap filled MK test. Sen's slope p value indicates that six discharge stations with high amounts of missing data (Alkongui (63%), Banakoro (11%), Couberi (30%), Douna (30%), Kompongou (70%) and Makurdi (35%)) shows no significant trend Uncorrected Proof before gap filling. However, all the six catchments show significant increases after gap-filling. The significant levels and magnitude of the Sen's slope were improved across all the 22 catchments after gap filling.

Discharge trends
We compared results obtained from MK and MMK tests on gap-filled data of the 22 catchments and the results are shown in Table 4. MMK made substantial improvement of the Z and variance. The MK has problem with variance computation due to autocorrelation after gap-filling (Figure 9). Constant variance value of 4550.33 was computed by MK across the 22 stations while MMK was able to correct the problem.
Rapidly increasing trendline was observed in the graphical presentation shown in Figure 10. Pettit tests presented in Table 5 shows that all the 22 catchments witnessed significant increasing river discharge with break points of 14 catchments at 1993, 4 catchments had breaks at 1994, 3 catchments at 1997 and 1 at 1987 (Table 5). Figure 11 presents the spatial map of the Sen's slope on 22 stations on the Niger basin after gap-filling. Most (15) of the discharge stations showed significant increasing discharge at p 0.001. Five discharge stations have been increasing since 1980s at p 0.01 while 2 stations increase at p 0.05.

DISCUSSION
The study shows that missing data is a major challenge in the Niger basin. Twenty-one discharge stations have missing values and incomplete records. This was due to the earlier discussed observed decrease in rainfall and      Uncorrected Proof such as Badou et al. (2016) have to exclude some important locations where inadequate data are recorded. These drives incomplete important information that will be useful for sustainable river basin management. MICE gap filling method shows promising results on 22 river discharge stations widely spread on the Niger Basin. Gap filled discharge data have high efficiency metrics when compared with observed during both continuous and discontinuous missing discharge time series. These findings are all in agreements with previous studies where MICE was evaluated for filling missing river discharge data (Little 1992;Ekeu-Wei et al. 2018;Sidibe et al. 2018). Little (1992) reported that the use of regression based approaches in filling missing data doesn't account for errors in the imputations thus leading to small standard errors. MICE overcome this challenge by drawing values from the predictive distribution and then repeating complete-data analyses. Ekeu-Wei et al. (2018) found out that MICE outperformed radar altimetry when filling continuous missing data and MICE filled data gives good similarity to natural floods. Sidibe et al. (2018) disclosed that MICE better estimate missing data more than the random forest at discharge stations with high amount of missing data at West and Central Africa. The use of hydrological models in simulating missing discharge data has been promising but poor observed meteorological data required for hydrological model calibration and validation have made setting up such models difficult (Poméon et al. 2017).
We evaluated the performance of MICE on all the 22 catchments by comparing FDC of inputted data with the FDC of observed. There were perfect fits on all the 22 stations evaluated on the river basin and MICE captures both low and high flows on the basin. These will enable MICE enhanced hydrological records to be applicable in extreme hydrological applications (Ekeu-Wei et al. 2018). FDC of stations with high amount of missing data captured the observed river discharge patterns with slight under and over-estimation. This shows that MICE should be applied with precautions in stations with daily missing data percentage that is more than 70%. Barnes et al. (2006) found out that small sample size constrains the generalization potential of the MICE, thus resulting in uncertain missing data estimates.
Gap-filling of missing discharge data with MICE decreased autocorrelation and significantly improved the MK statistics and Sen's Slope. MK test has been reported to perform worse when there is autocorrelation (Chen et al. 2016). Yue & Wang (2004) have demonstrated that the existence of negative serial correlation will decrease the possibility of rejecting the null hypothesis of no trend. We compared the performance of the MK and MMK statistics and discovered that MK had challenges at variance computation on the Niger basin. This happens because negative serial correlation reduces the variance of the MK statistics, and hence a smaller number of samples falls in the critical regions (Yue & Wang 2004).
All the 22 discharge stations evaluated on the Niger basin shows significant increasing trend since 1980s with a break point at the 1990s for most of the stations. This result corroborates the findings of Amogu et al. (2010) who attributed the increases in discharge in Sahel catchments to land use changes. Descroix et al. (2018) also attributed the increases in discharge from 1990s in the Sahel to recovery from the great drought of West Africa . The authors also explained that increasing discharge is due to lower soil infiltration rate compared to the pre-drought era (Bichet & Diedhiou 2018;Descroix et al. 2018). Some authors found direct relationship among the trends in discharge, flood and some extreme rainfall indices (Nka et al. 2015;Adeyeri et al. 2019). They attributed the rainfall factors as major factor aggravating increase in runoff coefficients in the Sahelian region (Nka et al. 2015).

CONCLUSIONS
The challenge of missing discharge data has hindered reliable flow prediction and forecast in West Africa thus, hindering sustainable river basin management. This study assessed the multiple benefits of MICE gap filling method in ameliorating the challenges posed by high amounts of missing discharge data on the Niger basin. We evaluated the percentage of missing data on 22 discharge stations. MICE gap-filling method was assessed and used in filling missing data gaps on the discharge stations. It was observed that the basin has high percentage of missing data across different stations which has significant impacts on trend analysis. All the discharge stations show high negative autocorrelations before gap-filling. Comparison of autocorrelations of the observed and gapfilled data reveals gradual reduction in degree of autocorrelation after gap-filling. The performance of the MK and MMK statistics were compared on the gap-filled data and results showed very poor performance of the MK as compared to the MMK due to autocorrelation. Significantly increasing dischargetrends was observed on all