Forecast of stream ﬂ ows to the Arctic Ocean by a Bayesian neural network model with snowcover and climate inputs

Increasing water ﬂ owing into the Arctic Ocean affects oceanic freshwater balance, which may lead to the thermohaline circulation collapse and unpredictable climatic conditions if freshwater inputs continue to increase. Despite the crucial role of ocean in ﬂ ow in the climate system, less is known about its predictability, variability, and connectivity to cryospheric and climatic patterns on different time scales. In this study, multi-scale variation modes were decomposed from observed daily and monthly snowcover and river ﬂ ows to improve the predictability of Arctic Ocean in ﬂ ows from the Mackenzie River Basin in Canada. Two multi-linear regression and Bayesian neural network models were used with different combinations of remotely sensed snowcover, in-situ in ﬂ ow observations, and climatic teleconnection patterns as predictors. The results showed that daily and monthly ocean in ﬂ ows are associated positively with decadal snowcover ﬂ uctuations and negatively with interannual snowcover ﬂ uctuations. Interannual snowcover and antecedent ﬂ ow oscillations have a more important role in describing the variability of ocean in ﬂ ows than seasonal snowmelt and large-scale climatic teleconnection. Both models forecasted in ﬂ ows seven months in advance with a Nash – Sutcliffe ef ﬁ ciency score of ≈ 0.8. The proposed methodology can be used to assess the variability of the freshwater input to northern oceans, affecting thermohaline and atmospheric circulations. land, are used to estimate the fraction of SCE for the entire basin. The SCE fraction can be greater than 90% in October and less than 10% in July in the study area. A threshold of 50% SCE fraction is used to separate snow-covered and non-snow-covered grids. Those grids that have SCE fractions greater than 50% are used to estimate the total fraction of the basin covered by snow.


INTRODUCTION
The thermohaline circulation is a key element of the global The freshwater input and its transport from the Arctic Ocean to the North Atlantic have high interannual variability (Serreze et al. ), which has a large effect on regional and global climate variability (Proshutinsky et al. ).
Freshwater discharge from large snow-dominated rivers varies with weather patterns, seasonality of snowcover extent (SCE), and large-scale climatic teleconnections such as the Arctic Oscillation (AO) and Antarctic Oscillation (AAO) (Wang et al. ; Rasouli ). Large-scale atmospheric circulations like El-Niño and La-Niña (Cohen & Entekhabi ) affect SCE. Studies show that there is a strong correlation between SCE and winter Pacific-North American (PNA) and North Atlantic Oscillation (NAO) teleconnections in North America and Canada (Gutzler & Rosen ; Brown & Goodison ). A strong link between NAO and SCE was also found in Europe and central Asia (Bojariu & Gimeno ). For instance, the unusually cold conditions over the last decade of the 20th century near Greenland and the eastern Mediterranean region and the anomalous warmness over northern Europe, Russia, and much of Central Asia are related to NAO (Hurrell ). The link between SCE and large-scale atmospheric circulations has been intensified in the last decades as winter and early spring temperatures have increased in western North America (Cohen & Entekhabi ; Folland et al. ) and the Arctic (Eythorsson et al. ).
Changes in SCE affect local-scale ground heat fluxes, snow redistribution by blowing wind (Pomeroy & Gray ; Liston ), soil moisture, and soil thermal regime (Park et al. ). Changes in the soil thermal regime alter thawing and freezing mechanisms, which affect water storage in soils and runoff patterns in cold regions and, hence, the rate of discharge to the Arctic Ocean (Mackay & MacKay ; Bao et al. ). An increase in ground temperatures, especially in northern regions, has increased the global temperature in response to anomalies in the atmospheric circulations (Coulibaly & Burn ). Snow recedes quickly, and snowmelt is intensified under increased ground temperatures, which can affect infiltration and streamflow, especially when rain-on-snow events occur. Burn et al. () showed an increasing trend in the annual minimum flow and a decreasing trend in the date of occurrence of the spring freshet in two main subbasins of MRB attributed to changes in snow processes under climate changes. In a relatively mild climate in the southern parts of MRB, the annual streamflow of the rivers feeding Lake Athabasca showed decreasing trends over the last half-century (Rasouli et al. , a). As circumpolar drainage rivers are the main contribution to the freshwater in the northern latitudes, it is crucial to study the predictability and variability of discharges from these rivers under spatial and temporal variations of climate and snowcover extent. To our knowledge, there is no previous research on forecasting the Arctic Ocean inflows from the land-rivers incorporating snowcover and climate variations. The contribution of this study is to fill this knowledge gap by providing short-term and long-term forecasts and identifying the role of snow, climate, and streamflow memory of the northern rivers on decadal to sub-seasonal time scale in the variability of the freshwater input to the Arctic Ocean. The objectives of this research are to (1) identify the linkages of Arctic Ocean inflow with SCE, antecedent inflows and climate teleconnection patterns, and (2) improve the predictability of ocean inflows from the Mackenzie River at multiple temporal scales. More specifically, we differentiate the roles of antecedent river flow conditions, climatic teleconnection patterns, and multi-decadal to monthly variations of snowcover in forecasting inflows of the Arctic Ocean one day to seven months in advance. Linking regional-scale SCE and large-scale climate variations to ocean inflows from the large river basins has important implications in predicting the freshwater input to the Arctic Ocean, estimating rates of freshwater transport to the North Atlantic, and better understanding climate variability and predictability in the near future in regional and global scales.

STUDY AREA AND DATA SOURCES
The case study in this paper is MRB in western Canada, which is fed by Great Slave Lake, Great Bear Lake, and Liard and Peace rivers, and it releases water at a rate of 9,910 m 3 s À1 into the Beaufort Sea in the Arctic Ocean. The basin drainage area is 1,679,100 km 2 . The Mackenzie River yields the largest North American source of freshwater discharge into the Arctic Ocean. Figure 1 shows MRB, its subbasins, and drainage systems. There is a delta near the Beaufort Sea at the mouth of the basin. In this paper, combinations of data for  land, are used to estimate the fraction of SCE for the entire basin. The SCE fraction can be greater than 90% in October and less than 10% in July in the study area. A threshold of 50% SCE fraction is used to separate snow-covered and non-snow-covered grids. Those grids that have SCE fractions greater than 50% are used to estimate the total fraction of the basin covered by snow.

METHODOLOGY
The role of climatic teleconnection patterns and fluctuations of snowcover and streamflows on decadal to sub-seasonal time scale in the predictability of ocean inflows was assessed. The performance of two linear and nonlinear statistical models in representing the interrelationship between snowcover variation modes and inflows from a large-scale watershed was evaluated. For this purpose, daily SCE products with 5 km spatial resolutions obtained from optical remote sensing satellites by Japan Aerospace Exploration Agency were processed and applied to predict Arctic Ocean inflows from the MRB in Canada. To extract spacetime patterns of SCE and inflow, singular spectrum analysis (SSA) was conducted. Multi-linear regression and nonlinear Bayesian neural networks were applied to forecast the Arctic Ocean inflows. Daily and monthly snowcover and inflow observations, obtained respectively from remotely sensing products and a gauge at the outlet of the MRB, were used to study spectral information and extract low frequency (decadal) and high frequency (monthly) modes of oscillation in snowcover and the ocean inflow. Four temporal variation modes of SCE and four temporal variation modes of inflows, covering a wide range of variability from multi-decades to months, were obtained and regressed against inflows from the outlet of the basin to forecast inflows one day to seven months in advance.

Linear and nonlinear forecasting methods
To identify the relevant hydro-climatic predictors of the Arctic Ocean inflows from the MRB, two forecasting methods were used at lead times of one to seven days and one to seven months. These models are the multi-linear regression (MLR) and the Bayesian neural network ((BNN) MacKay ). The reasons for choosing these two linear and nonlinear models in this study are that we can compare the performance of both linear and nonlinear methods in large-scale streamflow forecasts, and BNN has shown a better performance than other machine learning methods in streamflow forecasting (e.g., Rasouli et al. ). Other nonlinear regression models can also be applied, including generalized additive models with smoothing link functions. A multiple model intercomparison, however, was beyond the scope of this study. The information extracted from SCE, climatic teleconnection patterns, and antecedent flows were used to forecast inflows of the Arctic Ocean from MRB. Let x ¼ (x 1 , . . . , x m ) be the vector of m predictor variables in time t and y is the forecast variable of interest in time t þ p, with p ∈ {1, . . . , 7}. Given a dataset of (y tþp , x t1 , Á Á Á , x tm ) n t¼1 with n sample size and m predictors, the relationship between y tþp and x t is established by the MLR model assuming a linear relationship and through an error variable, ε t . The error variable follows a normal distribution with mean 0 and variance σ 2 . Hence, the MLR model can be written as: where β j and σ 2 are the parameters that are estimated. After estimating these parameters, a stepwise method was used to select the best model with the highest parsimony. For the nonlinear model, we used the Bayesian neural network which is described as a nonlinear parameterized mapping from an input, x to an output, y through a nonlinear function g(x t ; w, b, β). In fact, the relationship between the variable of interest and the predictors are given by: where ε t ∼ N(0, σ 2 e ) and σ 2 e is the variance of the error terms. s is the number of neurons, w k is the weight of the k-th neuron, b k is a bias for the k-th neuron, β k j is the weight of the j-th input, and g k is the activation function. In this study, a singular spectrum analysis on SCE and ocean inflows was conducted to remove or minimize the noises and extract short-term and long-term variability modes of SCE and ocean inflows from MRB. We then used multilinear regression and Bayesian neural network models to determine the major contributors to the variability of the Arctic Ocean freshwater using combined remotely sensed and in-situ observations.

The predictors of Arctic ocean inflow
In this study, the forecasted variable of interest is ocean inflow with lead times of one day to seven months, and the predic- (), however, the SSA method showed the highest performance. Therefore, the SSA method was used in this study. The basic mechanism of SSA is to decompose time series into a sum of interpretable components such as trend, periodic components, and noise. It is based on the principal component analysis of the auto-covariance matrix for lags ∈ {1, . . . , L}.
According to seasonality shown in Figure 2, for the analysis of daily data, we chose L ¼ 365 and for monthly data we chose L ¼ 12. The SSA algorithm can be divided into two steps: decomposition and reconstruction. In the first step, the one-dimensional time series were decomposed into multidimensional series by adding lagged variables and then creating a so-called trajectory matrix, which is used to calculate the principal components. In the second step, similar components were regrouped, and disjoint subsets were created to form the final signal decomposition. For more information about SSA, refer to Hsieh (). Main SSA components of SCE t and Q t instead of the original series were used as inputs to the MLR model as in Equation (1) and the BNN model as in Equation (2).

The model training and validating strategy
The parameters of the MLR and weights and biases of the Bayesian network were adjusted so as to minimize the cost/loss/error function. The method of least squares was used to train the coefficients of the MLR model (Equation (1) In this paper, we applied the hyperbolic tangent function that is used in the brnn R package, developed by Rodriguez & Gianola (), and it is more suitable for streamflow forecasts. This function is given by was used to estimate the unknown parameters of the BNN model (Equation (2)) using a CRAN package developed by Rodriguez & Gianola (). The leave-one-out cross-validation method (Vehtari et al. ) was applied to the entire study period to train and evaluate the MLR and BNN models to take advantage of all the data and increase the model evaluation period, especially for forecasts with monthly time steps. The leave-one-out cross-validation method includes these steps: first, the entire data period was divided into multiple chunks. Then, one chunk of the entire data was left out at a time and used to validate the model, while the rest was used for training. The validation chunk was selected each time so that all of the divided chunks were once used for validating the model. Four different metrics were used to assess the forecasting performance of the daily and the monthly inflows using the linear and nonlinear models: the bias (BIAS), the root mean squared error (RMSE), the normalized root mean squared error (NRMSE), and the Nash-Sutcliffe efficiency (NSE).

RESULTS
The linkage between snowcover extent and the Arctic

Ocean inflows
The remotely sensed daily SCE products were processed and long-term time series of SCE fractions were obtained for MRB with a large drainage area. Hydrometric data were used from the records at the Arctic Red River station, the outlet of MRB, to estimate the basin discharge. The SCE fractions range between 0 and 100, where 0 denotes no snow, and 100 shows complete snow coverage of the basin. Figure 3 is very small. SCE gradually increases starting in August and reaches its annual peak in late October. It gradually decreases after February and reaches its minimum in July (Figure 3(a)). The Arctic Ocean inflow from the basin is very low and almost constant in winter (October through January). It increases gradually after February and reaches its peak in July (Figure 3(b)). The interannual variations of both SCE and inflows are very high. Unlike low interannual variability of inflows in winter, flows are highly variable in spring and summer (Figure 3(b)).
The high variability of SCE in spring is partly due to rapid changes in weather systems and rain-on-snow events.
Inflows also respond to convective storms in summer months, which can cause a rise in summer flows. Four modes of inflow obtained from SSA were also chosen as predictors for inflow predictions at short-term (one-to seven-days) and long-term (one-to seven-months) lead times. Figure 5(b)-5(e) illustrate the modes of the inflow variation, ranging from decadal to monthly. Figure 5 (Table 1). Despite a low cross-correlation between climate teleconnection patterns (e.g., AO and NAO) as predictors and inflows on January 1 as predictands (Figure 6), all of the teleconnections were selected as predictors of inflows at lead times of one to seven days and one month when they were used in the pool of linear and nonlinear models inputs ( Table 1). The climatic teleconnection patterns, however, showed less linkage to monthly inflows at forecasting lead times of two to seven months. All of the decadal, interannual, seasonal, and monthly variation modes of SCE and inflow (Table 1) were selected as predictors of daily inflows, while interannual, seasonal, and monthly (and not decadal) variation modes of SCE were selected as predictors of monthly inflows (Table 1). Strong correlations between the predictors (e.g., decadal variation modes of SCE and inflow in Figure 6) can cause multicollinearity in regression analysis (Farrar & Glauber ). A forward and backward stepwise regression method was used to select the relevant predictors for both MLR and BNN models based on the F-test. For daily forecasts of the Arctic Ocean inflow, the correlation coefficients between the predictors (Figure 6(a)) were not statistically significant, and all of the SCE and streamflow variation modes were selected based on the stepwise regression method and the F-test (Table 1). For monthly forecasts, however, the correlation coefficients were significant for some variables, and therefore, the correlated predictors were discarded (Table 1). Biases between forecasted and observed daily inflows are below 9%, and NSE is above 0.45 for both linear and non-linear models used in this study (Figure 8). With only SCE variation modes as daily inflow predictors, all ranges of ocean inflow were predicted reasonably well, except for high flows (Figure 7). Over-estimation of high inflows forecasted by the BNN model degraded the performance scores ( Figure 8) when the SCE variation modes were used as predictors for lead times of 2-5 days (Figure 7).
The prediction skill scores, however, improved when inflow variation modes were also used as predictors. For instance, NSE for the one-day lead time forecasts increased from 0.45, when only SCE variation modes were used as predictor, to 0.95, when both SCE and inflow variation modes were selected as predictors (Figure 8). The NRMSE score also improved from 75% to 65%. The prediction skill scores substantially improved when climate teleconnections were used as predictors. For instance, NSE for the one-day lead time forecasts increased from 0.45, when only SCE variation modes were used as predictors, to 0.61 when both SCE variation modes and climate teleconnections were used as predictors (Figure 8). The NRMSE score also improved from 75% to 23%. Results showed that the linear model was not capable of improving forecast skill scores when climatic teleconnections were selected as predictors. This is shown in Figure 8 with similar values of the model performance scores for the cases with only SCE variation modes as predictors and both SCE variation modes and climatic teleconnections as predictors. The linear and nonlinear models performed well in forecasting monthly inflows with a different combination of SCE and inflow variation modes, and climatic teleconnections as predictors. In monthly forecasting, the BNN model outperformed the MLR model for all of the four combinations of predictors ( Figure 9). Similar to daily forecasts of inflow, monthly forecasts improved when variation modes of inflow were used as predictors. The forecast skill score of NSE increased gradually from the lead time of one month to five months and degraded after a five-month lead time. It is encouraging that monthly inflows from MRB can be forecasted with high confidence (NSE > 0:80, NRMSE < 30%) seven months in advance ( Figure 10). This is partly because of a seven-month lag time between peak SCE and peak inflow (Figure 3(c)), which was included in the predictor set using monthly variation modes of SEC and inflow. The BNN model performed well when the climatic teleconnections were used as predictors in addition to the SCE variation modes to forecast one month in advance ( Figure 10).
Including the climatic teleconnections in the predictor pool did not improve the forecasts for lead times beyond one month. Differentiating the role of SCE and inflow variation modes, and climatic teleconnections in forecasting both daily and monthly inflows revealed that the multi-decadal and interannual variations in SCE and inflow had a primary role and the climatic teleconnection patterns had a secondary role in the predictability of Arctic Ocean inflows from MRB.

DISCUSSION
The Arctic Ocean inflow from MRB is highly correlated to SCE, and there is a seven-month delay in flow response to snowmelt and SCE recession (Figures 3(c) and 6), suggesting that SCE can be a good proxy to forecast seasonal inflows. In general, there is a negative correlation between SCE and inflow (Zhou et al. ). In this study, however, we found that the decadal variation of SCE fluctuations ( Figure 4(b)), obtained from the singular spectrum analysis, surprisingly, is not negatively correlated to inflows ( Figure 6).