Time series data such as monthly stream flows can be modelled using time series methods and then used to simulate or forecast flows for short term planning. Two methods of time series modelling were reviewed and compared: the well-known auto regressive moving average (ARMA) method and the state-space time-series (SSTS) method. ARMA has been used in hydrology to model and simulate flows with good results and is widely accepted for this purpose. SSTS modelling is a more recently developed method that is relatively unused for hydrologic modelling. This paper focuses on modelling the stream flows from basins of different sizes using these two time series modelling methods and comparing the results. Three rivers in Labrador and South-East Quebec were modelled: the Romaine, Ugjoktok and Alexis Rivers. Both models were compared for accuracy of prediction, ease of software use and simplicity of model to determine the preferred time series methodology approach for modelling these rivers. The SSTS was considered very easy to use but model diagnostics were found to require a high level of statistical understanding. Ultimately, the ARMA method was determined to be the better method for the typical engineer to use, considering the diagnostics were simple and the monthly flows could be easily simulated to verify results.

INTRODUCTION

Developing or managing river systems is important in the field of water resources. Stream flow analysis is used to determine if flows are sufficient and reliable for a particular purpose. For developing a run-of-river hydroelectric project, for example, the engineer uses stream flow analysis to determine whether a stream can meet the energy demand throughout the year. Part of this design process includes developing a stream flow model based on historical flow records, simulating flows from the model to determine whether the model provides a good representation of the historical flows and then, in some cases, using the model for short-term forecasting.

There are a few industry accepted methods for forecasting flows. These include exponential smoothing, periodic autoregressive modelling and autoregressive moving average. A comparison of simulated flows using the above noted methods and the state-space time-series (SSTS) method was previously completed for Island Rivers in Newfoundland (Sidhu & Lye 1995); however, the same comparison has not been done for Labrador Rivers. This paper compares two methodologies for modelling stream flows in Labrador and South-East Quebec: one of the standard methods, auto regressive moving average (ARMA), and SSTS.

The ARMA or Box-Jenkins methodology (Box et al. 2008) is well-known and has been used in modelling of hydrometric time series since the early 1970s. The SSTS methodology (Harvey 1989; Commandeur & Koopman 2007) on the other hand is relatively new, and has rarely been used for hydrological modelling. It has been used primarily for economic time series model development and forecasting. The SSTS methodology is also referred to as a Structural Time Series approach. One of the principal purposes of the work described here is to assess the SSTS as an additional tool for hydrologic time series modelling.

Both modelling methodologies are used here to model and forecast monthly flows in selected Labrador and South East Quebec Rivers. The models are then fitted to each time series, excluding the last 5 years of flow data, with the last 5 years of data being used to compare the forecasting capability of the models.

PRELIMINARY ANALYSIS OF THE DATA

Labrador and South-East Quebec, that is, the portion of Quebec south of Labrador and draining into the Gulf of St. Lawrence, were selected as the area of study, primarily due to the number of hydrometric gauges in the area, the lack of previous study in this area as well as the pristine nature of many of the gauged basins. Figure 1 depicts the locations of 79 hydrometric stations in Labrador and South-East Quebec. The length of record at these stations ranges between eight and 60 years, some stations being on regulated rivers and some not. For the purposes of this study, the initial screening criteria were set such that rivers of interest should have more than 15 years of data, be on non-regulated rivers and have as much continuous data as possible (or as little missing data as possible). Three rivers meeting these initial screening criteria, and having different basin sizes (upstream of the gauges), basin aspects and geographic locations, were selected for analysis, as described in this section below. All data were taken from Environment Canada (2013) and the province of Quebec (2013) hydrometric records.
Figure 1

Hydrometric stations in study area (Environment Canada 2010).

Figure 1

Hydrometric stations in study area (Environment Canada 2010).

The study area was divided into two subareas: North and South of Goose Bay. Geologic and surface conditions in general appear to be different between areas north and south of Goose Bay; the ground is rocky with less storage capacity north of Goose Bay compared to south of Goose Bay. As runoff can be affected by vegetation type and density as well as ground infiltration and storage, Goose Bay was chosen as the divide. As a result of varying ground conditions, at least one river was selected from each of the north and south regions. In addition to geographic location, drainage basin size as well as drainage aspect were taken into account.

Having records from 79 gauge locations seems promising; however, most gauges did not meet the initial screening criteria. Table 1 includes those gauge records having more than 15 years of data. Of the remaining 33 gauged records, the rivers highlighted in bold, as shown in Table 1, did not meet all of the initial screening criteria and were removed from the study. Of the 79 gauged rivers, only 12 remained on the list. These were then reviewed based on geographic location, drainage aspect and drainage size. It was desirable to include at least one river from each of the northern and southern regions, as well as at least one river draining to each of the Gulf of St. Lawrence and the Labrador Sea. After the initial screening, only three basins in northern Labrador remained viable: Ugjoktok, Naskaupi and Kanariktok. Ugjoktok was selected since it has the longer record of the three rivers: 32 years versus 18 and 16 years respectively. Ugjoktok River drains to the Labrador Sea and with a basin size of 7,570 km2 is a moderately sized basin for this region.

Table 1

Hydrometric stations with more than 15 years of record in Labrador and South-East Quebec

Gauge nameDrainage area km2YearsRegulatedContinuousMissing dataStateNorth or south
Aganus Riviere 5,590 21 No Yes None Discontinued South 
Alexis River near Port Hope Simpson 2,310 34 No Yes None Active South 
Ashuanipi River at Menihek 19,000 59 Yes   Discontinued North 
Atikonak Lake at Gabbro lake 21,400 38 Yes   Discontinued  
Atikonak river above Panchia Lake 15,100 25 No No 1983–98 Active South 
Big Pond Brook below Big Pond 71.4 17 No No 6 months Active North 
Churchill River above Upper Muskrat 92,500 60 Yes   Active South 
Churchill River at Flour Lake 33,900 17 Yes   Discontinued North 
Churchill River at foot of Lower Muskrat  16 Yes   Discontinued South 
Churchill River at powerhouse 69,200 39 Yes   Discontinued South 
Churchill River between Up and Low Muskrat  16 Yes   Discontinued South 
Eagle River above falls 10,900 46 No Yes 4 months Active South 
Etamamiou Riviere 2,950 20 No Yes None Discontinued South 
Joir Riviere near boundary 2,060 17 No   Active South 
Kanairiktok River below Snegamook Lake 8,930 18 No Yes None Discontinued North 
Little Mecatina 19,100 16    Discontinued South 
Little Mecatina below Breton Lake 12,100 16    Discontinued South 
Little Mecatina River above Lac Fourmont 4,540 34 No No 7 months Active South 
Magpie Riviere 7,230 24 No Yes None Active South 
Minipi River below Minipi Lake 2,330 32 No No 29 months Active South 
Moisie Riviere 19,000 37 No No 18 months Active South 
Nabisipi Riviere 2,060 27 No Yes None Discontinued South 
Naskaupi River at Fremont Lake 8,990 16    Discontinued North 
Naskaupi River below Naskaupi Lake 4,480 32 No No 28 months Active North 
Natashquan Riviere 15,600 22 No Yes None Active South 
Natashquan Riviere below East Natash River 11,600 18 No Yes None Discontinued South 
Romaine Riviere 13,000 46 No Yes None Active South 
Saint Augustin Riviere 5,750 16    Active South 
Saint Marguerite Riviere 6,140 51 Yes   Active South 
Saint Paul Riviere 6,630 35 No No 12 months Active South 
Tonnerre Riviere 684 40 No  >24 months Discontinued South 
Ugjoktok River below Harp Lake 7,570 32 No Yes None Active North 
Unknown river at Lake 51 19,900 16 Yes   Discontinued South 
Gauge nameDrainage area km2YearsRegulatedContinuousMissing dataStateNorth or south
Aganus Riviere 5,590 21 No Yes None Discontinued South 
Alexis River near Port Hope Simpson 2,310 34 No Yes None Active South 
Ashuanipi River at Menihek 19,000 59 Yes   Discontinued North 
Atikonak Lake at Gabbro lake 21,400 38 Yes   Discontinued  
Atikonak river above Panchia Lake 15,100 25 No No 1983–98 Active South 
Big Pond Brook below Big Pond 71.4 17 No No 6 months Active North 
Churchill River above Upper Muskrat 92,500 60 Yes   Active South 
Churchill River at Flour Lake 33,900 17 Yes   Discontinued North 
Churchill River at foot of Lower Muskrat  16 Yes   Discontinued South 
Churchill River at powerhouse 69,200 39 Yes   Discontinued South 
Churchill River between Up and Low Muskrat  16 Yes   Discontinued South 
Eagle River above falls 10,900 46 No Yes 4 months Active South 
Etamamiou Riviere 2,950 20 No Yes None Discontinued South 
Joir Riviere near boundary 2,060 17 No   Active South 
Kanairiktok River below Snegamook Lake 8,930 18 No Yes None Discontinued North 
Little Mecatina 19,100 16    Discontinued South 
Little Mecatina below Breton Lake 12,100 16    Discontinued South 
Little Mecatina River above Lac Fourmont 4,540 34 No No 7 months Active South 
Magpie Riviere 7,230 24 No Yes None Active South 
Minipi River below Minipi Lake 2,330 32 No No 29 months Active South 
Moisie Riviere 19,000 37 No No 18 months Active South 
Nabisipi Riviere 2,060 27 No Yes None Discontinued South 
Naskaupi River at Fremont Lake 8,990 16    Discontinued North 
Naskaupi River below Naskaupi Lake 4,480 32 No No 28 months Active North 
Natashquan Riviere 15,600 22 No Yes None Active South 
Natashquan Riviere below East Natash River 11,600 18 No Yes None Discontinued South 
Romaine Riviere 13,000 46 No Yes None Active South 
Saint Augustin Riviere 5,750 16    Active South 
Saint Marguerite Riviere 6,140 51 Yes   Active South 
Saint Paul Riviere 6,630 35 No No 12 months Active South 
Tonnerre Riviere 684 40 No  >24 months Discontinued South 
Ugjoktok River below Harp Lake 7,570 32 No Yes None Active North 
Unknown river at Lake 51 19,900 16 Yes   Discontinued South 

More gauged records were available to choose from south of Goose Bay, and it was desirable to select one river that drained toward the Gulf of St. Lawrence and one that drained toward the Labrador Sea. Of the rivers draining towards the Gulf of St. Lawrence, Magpie, Romaine, Nabasipi, Aganus, Natashquan and Etamamiou remained after the initial screening. Most of these rivers had 20–27 years of data; however, the Romaine had 46 years of data. As a result, the Romaine River was selected for further study. This river has a large basin size at 13,000 km2. Of the remaining rivers draining toward the Labrador Sea, both the Eagle and Alexis are located south of Goose Bay. They both have more than 30 years of data, are not regulated and the data are mostly continuous. Since one of the criteria was to have a range of drainage sizes, the Alexis River was selected in preference to Eagle River because its drainage area of 2,310 km2 is small compared with that of Ugjoktok and Romaine. The Eagle River on the other hand has a large drainage basin; 10,900 km2, similar in size to the Romaine River. The Alexis, Romaine and Ugjoktok Rivers were therefore selected for detailed study. Table 2 summarizes the information for these selected rivers and their locations are shown in Figure 1.

Table 2

Selected rivers for time series modelling

RiverDrainage area km2Years of dataLocationDrainage basin aspectOutlet location
Ugjoktok 7,570 (medium) 32 North West/East Labrador Sea 
Alexis 2,310 (small) 34 South West/East Labrador Sea 
Romaine 13,000 (large) 56 South North/South Gulf of St. Lawrence 
RiverDrainage area km2Years of dataLocationDrainage basin aspectOutlet location
Ugjoktok 7,570 (medium) 32 North West/East Labrador Sea 
Alexis 2,310 (small) 34 South West/East Labrador Sea 
Romaine 13,000 (large) 56 South North/South Gulf of St. Lawrence 

In summary, one relatively small, one medium and one large river were selected for the study. Each river has more than 32 years of almost continuous data. One river is located north of Goose Bay and two are located south, with the two southern rivers having differing basin sizes, aspects and drainage outlet locations. This diversity of location, basin orientation and outlet location was selected to cover a wide variety of features that might provide a more varied list of rivers to model.

METHODOLOGY

Two methodologies for modelling monthly stream flows were used: ARMA as per the Box-Jenkins methodology and SSTS. The software Structural Time Series Modelling Program (STAMP Version 8.3) developed by Koopman et al. (2009) and sold by Timberlake Consultants Ltd was used to carry out the SSTS modelling. ARMA modelling was completed using the Minitab 16 software (Minitab Inc. 2013). The goal was to compare the two modelling methodologies to determine accuracy of the models for simulation and forecasting as well as ease of use of the software for the practising hydrotechnical engineer.

Prior to modelling, the monthly flow data for each of the three rivers were collected and reviewed for normality. A review of all three time series indicates that the data sets are not normally distributed. Each data set was transformed and normality tests indicate that the log transformed data are normally distributed. The results for Alexis River have been used for illustration purposes with some commentary included relating to the specific results of Ugjoktok and Romaine Rivers.

Box plots of the original and log transformed data for the Alexis River are shown in Figures 2 and 3. As shown in the box plots, the original data set has outliers and a large skew in the data set, which is typical of stream flow data. Once the data set was natural log transformed, there were no outliers, the skew was minor and the logged data were normally distributed. Similar results were found for Ugjoktok and Romaine River and therefore, the transformed data for each river were used for modelling.
Figure 2

Box plot of original Alexis River data.

Figure 2

Box plot of original Alexis River data.

Figure 3

Box plot of log transformed Alexis River data.

Figure 3

Box plot of log transformed Alexis River data.

The data were plotted as a time series for all rivers. As shown in Figure 4, there appears to be no trend in the Alexis River time series. The time series plots were similar for Ugjoktok and Romaine rivers with no apparent trend. To verify the lack of trend for the Alexis River, the data set was statistically tested for trend using simple regression of the logged flows with time. As shown in the fitted line plot in Figure 5, there appears to be a small upward trend; however, a review of the regression analysis indicates that the trend is not statistically significant at the 5% level. All three rivers were tested and none were found to have significant trend, which indicates the data sets are stationary.
Figure 4

Time series plot of the Alexis River.

Figure 4

Time series plot of the Alexis River.

Figure 5

Fitted line plot for Alexis River data.

Figure 5

Fitted line plot for Alexis River data.

As with many streams and rivers in Labrador, with a climate of long cold winters and short warm summers, seasonality is expected. To verify seasonality, autocorrelation function (ACF) plots for each river were prepared. As shown in Figure 6, for the Alexis River, autocorrelation is significant and seasonality is confirmed based on the sine wave appearance of the plot, which repeats every 12 months. The ACF plots for Ugjoktok and Romaine also showed significant autocorrelation and seasonality.
Figure 6

ACF of Alexis River monthly flows.

Figure 6

ACF of Alexis River monthly flows.

Upon completion of the general review of each data set, each of the three rivers were modelled using both the ARMA and SSTS methodologies. The log transformed data were used in both methodologies for all rivers.

ARMA with harmonic analysis

When modelling monthly stream flows with seasonality, ARMA models provide better results when the data set is deseasonalized (Hipel & McLeod 1994). There are two ways to deseasonalize a time series: remove the seasonal mean and possibly the seasonal standard deviation from each data point in the time series or use Fourier Series (harmonic analysis) to deseasonalize (Hipel & McLeod 1994). Subtracting the mean and standard deviation from each observation point results in a large number of model parameters. Fourier series is a way to remove seasonality while using fewer model parameters (Hipel & McLeod 1994). For this reason, the seasonal component for each of the three rivers was modelled using harmonic analysis and the residuals (deseasonalized data) were then modelled using the ARMA methodology. The two models could then be used together to generate or simulate new data sets for comparison to the actual data set. The summary statistics for the actual and generated data sets could then be compared to confirm the model's ability to simulate flows that are within specified confidence intervals. In theory, the verified model can be used to forecast stream flows up to several months in advance. In order to facilitate verification of the model, each actual flow data set was divided into two differing types of sets: a fitting set and a verification set. The fitting set had the last 5 years of data removed from each actual flow data set, and this set was used to fit the model. This set was also used to verify the model simulation results. The verification set included the full actual data set, and the model was then used to forecast the last 5 years: the years that were removed from the fitting set. The forecasted data points were compared to the actual observations for the same time period, understanding that only short term forecasting is relevant, as forecasting beyond 12 months ahead loses significance (Sidhu & Lye 1995).

In the case of the Alexis River, the seasonality was modelled using harmonic analysis by fitting a regression equation using sine and cosine pairs (Box et al. 2008). Since the largest peak relates to a period of 12 months, as is shown in Figure 7, a maximum of six sine and cosine pairs can be used in the equation and thus models using one to six pairs were checked to determine which regression equation best fit the seasonal component. In Figure 7, sdf is spectral density function, UC is upper 95% confidence limit, LC is lower 95% confidence limit and WN is white noise spectral density function.
Figure 7

Spectral density plot for Alexis River.

Figure 7

Spectral density plot for Alexis River.

Since the best fitting model is selected based on the statistical significance of the regression, as well as a low residual error and a high adjusted R2, it was determined that the equation with six sine and cosine pairs was the best fit for the Alexis data. As with the regression analysis, the assumptions of analysis of variance (ANOVA) were checked and found to be acceptable, primarily that the residuals are normally distributed, have constant variance and are independent. The general harmonic analysis equation is as shown in Equation (1): 
formula
1
where Zt is the time series, α is the regression constant, β is the coefficient for each sine or cosine term, t is month (1 to 12), 2πf is a constant with f = 1/12 for monthly values, and yt are the residuals.
Once a seasonal model was selected, the ACF and partial autocorrelation function (PACF) of the residuals were plotted (see Figures 8 and 9). These plots provide guidance in determining which ARMA model may best fit the deseasonalized data (Box et al. 2008). For the Alexis river, there is a significant lag 1 on the ACF plot shown in Figure 8 as well as a significant lag 1 on the PACF plot shown in Figure 9. This would suggest that an ARMA(1,0) model or possibly an ARMA(1,1) may be the best fitting models, so both were reviewed (Box et al. 2008). Since running an ARMA model is quick to do in Minitab, other ARMA models were tested as well. Only ARMA(1,0) and ARMA(1,1) are presented in this paper as review of the ACF and PACF plots suggest these are the most likely best fitting models. As indicated, interpretation of the ACF and PACF plots is an important tool to help guide the determination of the best ARMA model.
Figure 8

ACF plot of deseasonalized residuals.

Figure 8

ACF plot of deseasonalized residuals.

Figure 9

PACF plot of deseasonalized residuals.

Figure 9

PACF plot of deseasonalized residuals.

This first model reviewed was an ARMA(1,0) model since, as per the principle of parsimony (Box et al. 2008), this model produces the simplest model. The residuals from the harmonic analysis (deseaonalized data) were modeled for Alexis using an ARMA(1,0) also referred to as an AR1 model. The model results indicate that the AR1 parameter is significant and the chi-squared statistics are not significant, thus confirming independence. Once the AR1 model was developed, the autocorrelation and partial autocorrelation were plotted for the AR1 model residuals, as shown in Figures 10 and 11. There are no significant lags shown in the plots, which indicate that the model has addressed the significant correlations; thus the AR1 model is suitable for modelling the residuals of the deseasonalized data.
Figure 10

ACF plot of AR1 residuals for Alexis River.

Figure 10

ACF plot of AR1 residuals for Alexis River.

Figure 11

PACF plot of AR1 residuals for Alexis River.

Figure 11

PACF plot of AR1 residuals for Alexis River.

For comparison, the residuals from the harmonic analysis (deseasonalized data) were modeled for Alexis using an ARMA(1,1) model. The autocorrelation and partial autocorrelation were plotted for the ARMA(1,1) model residuals, as show in Figures 12 and 13. There are no significant lags shown in the plots which indicate that the model has addressed the significant correlations. A closer review of the results indicate that the AR1 parameter is significant and the chi-squared statistics are not significant, thus confirming independence. The MA1 term, however, was not shown to be significant at the 5% level. This lack of significance in the MA term would confirm that an AR1, and not ARMA(1,1) is the best fitting model.
Figure 12

ACF plot of ARMA(1,1) residuals for Alexis River.

Figure 12

ACF plot of ARMA(1,1) residuals for Alexis River.

Figure 13

PACF plot of ARMA(1,1) residuals for Alexis River.

Figure 13

PACF plot of ARMA(1,1) residuals for Alexis River.

The general AR1 model equation is given by Equation (2): 
formula
2
where yt is the residual from Equation (1), a time dependent series, and ɛt is the independent series, μ is the mean of yt, and ρ is the AR1 coefficient.
The model was used to simulate the data set for comparison with the actual values. As shown in Figure 14, graphically, the model does a reasonable job of simulating flows for the Alexis River. Only 8 years of simulated data are shown in Figure 14 for clarity; however, the full simulated data set was used to verify the model. To verify the model for simulation, it must be demonstrated that the model can indeed generate data sets of the same sample size that on average have similar historical statistics. The mean, standard deviation and lag 1 correlation (r1) for the generated data should be within a few percent of the actual statistics. For the Alexis River, these are within 2% of the historical values. The minimum and maximum flows marginally exceeded the 90% confidence intervals, and the skew is not as close to that of the historical data. In this case, the minimum and maximum flows as well as the skew were not explicitly modelled and therefore would be difficult to preserve in the model. Based on the model's ability to adequately reproduce the mean, standard deviation and the r1, in conjunction with the ACF plot of the AR1 model residuals, the AR1 or ARMA(1,0) model in conjunction with the harmonic seasonal model is the preferred model for the Alexis River.
Figure 14

Actual, simulated and forecasted flows for Alexis River.

Figure 14

Actual, simulated and forecasted flows for Alexis River.

To validate the prediction ability of the model, the predicted values for the last 5 years of data were graphically compared to the actual last 5 years of data; from 2007 to 2011. As shown in Figure 14, the model appears to do a good job of predicting the first 3 years (2007 and 2009) but does not fit the actual data well for the 4th and 5th years (2010–2011). A closer look at the actual data suggests that the peak patterns that exist in the data pre-2007 do not appear to be replicated between 2010 and 2011, even though the pattern is replicated between 2007 and 2009. Two methods were used to validate the predictions: Nash–Sutcliffe efficiency and median absolute percentage error (MAPE). Since the actual data for 2010–2011 did not appear to have a pattern consistent with previous years' data, the data set was segregated to specifically look at the data set with and without the 2010 and 2011 years. Table 3 shows the error calculation results.

Table 3

Validation results for ARMA with harmonic analysis of Alexis River

Years in data setNash–Sutcliffe efficiency coefficientMAPE
Full set (1978–2011) 0.755 9.86% 
2007–2011 0.614 9.49% 
2007–2009 0.823 8.00% 
2010–2011 0.309 17.78% 
1978–2009 0.777 9.67% 
Years in data setNash–Sutcliffe efficiency coefficientMAPE
Full set (1978–2011) 0.755 9.86% 
2007–2011 0.614 9.49% 
2007–2009 0.823 8.00% 
2010–2011 0.309 17.78% 
1978–2009 0.777 9.67% 

Using the Nash–Sutcliffe efficiency coefficient, the best fitting model will have values close to 1. As shown in the table, the predicted flows between 2007 and 2011 show a value of 0.614, which is less efficient than the full data set at 0.755. As noted above, a graphical review of Figure 14 indicates that actual flows between the winter of 2010 and winter 2011 did not appear to display the same pattern as historical flows had shown; primarily there were higher actual flows during the winter period than observed historically. As a result, the Nash–Sutcliffe efficiency coefficient was reviewed separately for the 2007–2009 flows and the 2010–2011 flows. As noted in Table 3, the 2007–2009 predicted flows match well to the actual flow data with a coefficient of 0.823, which is close to 1; however, the 2010–2011 predicted flows do not match the actual flow data very well, with a coefficient of 0.309. These findings are re-iterated when the MAPE was reviewed for the same sets of flows. The table shows that there is a larger percentage error (17.78%) between the predicted and actual flows for 2010–2011 and a small percentage error (8.00%) between predicted and actual flows for 2007–2009. Based on these results, the selected model, generated using ARMA with the harmonic analysis method, has been used to predict short term with reasonable accuracy. The larger discrepancy between measured and predicted flows in 2010 and 2011 appears to result from a different distribution of seasonal flows, where the flows are higher in the winter for these 2 years. This might suggest a warmer winter and/or earlier snow melt.

The same process was used for the Ugjoktok and Romaine Rivers. The characteristics of these rivers were similar to those of the Alexis River, in that the data were not normally distributed and required log transformation, displayed seasonality with 12- and 6-month cycles and were best fit using an AR1 model. The models for these two rivers were simulated and were found to generate data sets of the same sample size that on average have similar historical statistics. Similar results were found when validating the prediction values using Nash–Sutcliffe efficiency and MAPE for the Ugjoktok and Romaine Rivers.

SSTS ANALYSIS

The STAMP software applies the SSTS modelling theory described in Harvey (1989) to explain each of the time series components. The components include level (roughly equivalent to intercept in a regression equation), slope (together with level these form a trend), seasonality, cycle and irregular or white noise. Each component can be specified as fixed or stochastic as required: fixed means that the component baseline does not change over the time period and stochastic means that the baseline varies over the time period. For example, the state equation for a model with level, seasonal and irregular components is given by Equations (3)–(15) below: 
formula
3
 
formula
4
 
formula
5
 
formula
6
 
formula
7
 
formula
8
 
formula
9
 
formula
10
 
formula
11
 
formula
12
 
formula
13
 
formula
14
 
formula
15
where μt is the level component and γt is the seasonal component with (s–1) state equations where s is the periodicity of the seasonal component.
The calculations are complicated, if not impossible, to complete by hand but the software program makes it easy for a user to enter the data and quickly develop a model. As with the ARMA/harmonic model, the ‘fitting’ data set was used to develop the model using SSTS. In order to select the best fitting model, the model results must be reviewed and the diagnostic test values must fit within critical values. STAMP does not provide the critical values for the diagnostic tests, so a spreadsheet was developed to calculate these values and to compare a number of models to find the best fit. For the three rivers, not all the diagnostic tests were within the critical values, so the model meeting the most diagnostic criteria for each river was selected as the preferred model. To diagnose the suitability of each model, the software provides a results page displaying test statistic results. The results for different model runs can then be compared. Figure 15 is a sample of the results for one of the Alexis River models.
Figure 15

Sample output from STAMP software.

Figure 15

Sample output from STAMP software.

In the sample results page in Figure 15, note that very strong convergence is identified and a steady state was found. These are two key components of an acceptable model using this software. If a steady state and convergence are not found, then the model generated is not a good representation of the actual data set. The diagnostics listed under summary statistics in the sample output (Figure 15) include a test for Normality, Durbin Watson test, r(1) and r(q) correlation values and AIC value. These must all be compared to critical values to determine if the base assumptions, on which the model was developed, have been met.

Plots are easy to generate, and as Figure 16 (one of the model runs for the Alexis River) shows, the time series can be broken down into separate plots illustrating level, seasonality and irregular components. The level component is captured in the upper plot of Figure 16. Here it is shown that the mid line value changes over time within the plot, indicating that the level component is stochastic. The plot in the middle of the figure captures the seasonal component. Here the seasonal plot appears to have a regularly repeating pattern. This type of pattern confirms that there is a seasonal component to the data set. The irregular component plot in the lower section of the figure appears random and shows no pattern, thus confirming that the seasonal component has been captured in the seasonal equation and thus there is no residual seasonality still remaining in the irregular or white noise component that was not captured in the seasonal plot.
Figure 16

Plots of level, seasonal, and irregular components for Alexis River.

Figure 16

Plots of level, seasonal, and irregular components for Alexis River.

Developing multiple models is easy when a software package is used, but the difficulty lies in diagnosing each model. Unlike the diagnostic tests for the illustrated ARMA modeling, which are well prescribed, the SSTS software diagnostic test results do not provide the acceptable critical value or the P-value for each of the test statistics. As a result, determining the suitability of a model was difficult and time consuming. A model could also be mistakenly accepted as a good one if the diagnostics were not properly understood and verified. It was also challenging to determine from the results how to modify the model to achieve the best fit. Reviewing an ACF plot helps direct the user to the appropriate model by quickly showing in graphical form if there are significant correlations remaining in the model residuals. Unlike the ARMA modelling process, where an ACF plot is easily generated from the model residuals, the SSTS software requires the user to store the residual values from the model and then separately plot the ACF plot. Reviews of the plots that are automatically generated for each model do not provide enough help on determining how to adjust the model, and as such the modeller is left to try various combinations of components and then review the diagnostic test results for each. As shown in Table 4 below, the X marked fields indicate the model run combinations that were completed to determine the best model for Alexis River.

Table 4

Combinations of structural models attempted for the Alexis River

Model runLevelSlopeSeasonalCycleIrregular
DeterministicStochasticDeterministicStochasticDeterministicStochasticShortMedium
       
       
      
      
      
      
    
    
     
Model runLevelSlopeSeasonalCycleIrregular
DeterministicStochasticDeterministicStochasticDeterministicStochasticShortMedium
       
       
      
      
      
      
    
    
     

There were nine models developed, each with diagnostic results that were separately reviewed in order to select the best model. The critical values were calculated in a separate spreadsheet and Table 5 gives a summary of the diagnostics and the critical value calculations that were completed to help select the best model.

Table 5

Summary of diagnostics and critical values for models compared for the Alexis River

 r(1) and r(q)Q(q, q–p)HnAICConvergenceSteady state
ModelCritical valueResultChi square critical valueResultF critical valueResultChi square critical valueResult
0.099015 Not met 35.172 Not met 1.3286 Met 5.991 Not met 84.02  Found 
0.099015 r1 met, r24 not 35.172 Not met 1.3286 Met 5.991 Not met 39.73 Very strong  
0.099015 Not met 33.924 Not met 1.3286 Met 5.991 Not met 99.71  Found w/o full 
0.099015 r1 met, r24 not 33.924 Not met 1.3286 Met 5.991 Not met 46.64 Very strong Found w/o full 
0.099015 r1 not met, r24 met 33.924 Not met 1.33 Met 5.991 Not met –409.26 Pattern in irregular  
0.099015 r1 not met, r24 met 33.924 Not met 1.33 Met 5.991 Not met –431.22 Very strong Found 
0.099015 Both met 28.869 Met 1.33 Met 5.991 Not met –477.98 Weak Found w/o full 
0.099015 Both met 28.869 Met 1.33 Met 5.991 Not met –477.98 Strong Found w/o full 
9 0.099015 Both met 30.144 Met 1.33 Met 5.991 Not met 491.81 Very strong Found 
 r(1) and r(q)Q(q, q–p)HnAICConvergenceSteady state
ModelCritical valueResultChi square critical valueResultF critical valueResultChi square critical valueResult
0.099015 Not met 35.172 Not met 1.3286 Met 5.991 Not met 84.02  Found 
0.099015 r1 met, r24 not 35.172 Not met 1.3286 Met 5.991 Not met 39.73 Very strong  
0.099015 Not met 33.924 Not met 1.3286 Met 5.991 Not met 99.71  Found w/o full 
0.099015 r1 met, r24 not 33.924 Not met 1.3286 Met 5.991 Not met 46.64 Very strong Found w/o full 
0.099015 r1 not met, r24 met 33.924 Not met 1.33 Met 5.991 Not met –409.26 Pattern in irregular  
0.099015 r1 not met, r24 met 33.924 Not met 1.33 Met 5.991 Not met –431.22 Very strong Found 
0.099015 Both met 28.869 Met 1.33 Met 5.991 Not met –477.98 Weak Found w/o full 
0.099015 Both met 28.869 Met 1.33 Met 5.991 Not met –477.98 Strong Found w/o full 
9 0.099015 Both met 30.144 Met 1.33 Met 5.991 Not met 491.81 Very strong Found 

Where r(1) is the serial correlation and r(q) is serial correlation lag at step q, Q(q, q–p) is the Box-Ljung Q-statistic, H is the test for heteroscedasticity, N is a test for normality (Koopman et al. 2009), and AIC is the Akaike Information Criterion (Akaike 1974).

Table 5 shows that the model highlighted in bold (model 9), which is comprised of a stochastic level term, a stochastic seasonal term, a short term cycle term and an irregular term as seen in Table 4, meets most of the criteria and has a very strong convergence. The assumption of normality is not met, but this is the best model of all model runs and as such, was ultimately selected as the preferred model for Alexis River. A similar process was completed for Ugjoktok and Romaine. For all of the three rivers, the model generation was very quick and easy, but the diagnostics associated with verifying and selecting the most appropriate model was time consuming, with none of the models meeting all of the diagnostic criteria.

The software does not allow for simulation of the model to verify summary statistics. A separate module called ssf pack is required to write a program to simulate the specified model. Writing a program to simulate and verify the model developed using SSTS is very time consuming and beyond the expertise of many hydrotechnical engineers. The model developed using SSTS could, however, be used to predict a period within the actual data time series and compare it to the actual data. In order to compare a predicted set of flows to the actual flows, the verification data set, including the last 5 actual years of data, was used. This procedure was the same as that used to verify the ARMA/harmonic analysis model forecast values. Figure 17 compares the actual flow data to the predicted flows for the last 5 years of the sample set time period. As can be seen graphically, the predicted flow values are reasonably close to the actual flow values.
Figure 17

Actual and forecasted flows from the SSTS Model 9 for Alexis River.

Figure 17

Actual and forecasted flows from the SSTS Model 9 for Alexis River.

The same validation method was used to validate the forecasted flows using SSTS for Alexis River. Both the Nash–Sutcliffe efficiency coefficient and the MAPE were used. As shown in Table 6, the predicted flows between 2007 and 2011 show a value of 0.622 for Nash which is similar to the Nash coefficient using ARMA and much less than the target value of 1. Like the ARMA analysis, the Nash–Sutcliffe efficiency coefficient was reviewed separately for the 2007–2009 flows and the 2010–2011 flows. As noted in Table 6, the 2007–2009 predicted flows match well to the actual flow data with a coefficient of 0.825, which is close to 1; however, the 2010–2011 predicted flows do not match the actual flow data very well with a coefficient of 0.334. These findings were reiterated when the MAPE was reviewed for the same sets of flows. Table 6 shows that there is a larger percentage error (17.73%) between the predicted and actual flows for 2010–2011 and a small percentage error (8.42%) between the predicted and actual flows for 2007–2009. This model, generated using SSTS methodology, can be used for short term forecasting with reasonable accuracy.

Table 6

Validation results for forecasted flows using SSTS for Alexis River

Years in data setNash–Sutcliffe efficiency coefficientMAPE
2007–2011 0.622 9.49% 
2007–2009 0.825 8.42% 
2010–2011 0.334 17.73% 
Years in data setNash–Sutcliffe efficiency coefficientMAPE
2007–2011 0.622 9.49% 
2007–2009 0.825 8.42% 
2010–2011 0.334 17.73% 

DISCUSSION

Both methods produce models that reasonably simulate and forecast the actual monthly time series data for the three rivers selected in this study: Alexis, Romaine and Ugjoktok Rivers. The location, basin size and flow directions did not seem to greatly affect the choice of model. Both methods produce models that estimate the actual values fairly closely. In addition, the validation results for the forecasted values derived from SSTS are very similar to the results calculated using ARMA with harmonic analysis. When considering results, both models perform approximately equally well. There are, however, pros and cons relating to the software capabilities for each of these methodologies, as listed in Table 7.

Table 7

Pros and Cons of ARMA methodology using Minitab and SSTS methodology using STAMP

ARMA with harmonic analysis model using MinitabSSTS model using STAMP
PROS 
Results indicate whether diagnostic tests are significant so user can quickly diagnose the model Seasonality can be included in a single model 
ACF plots for the model residuals are produced with the model results so user can easily decide how to improve the model Modelling is very fast and easy to do 
Simulation of the data can be completed to verify the model accuracy  
CONS 
Monthly data must be deseasonalized prior to ARMA modelling Diagnosis of the model is very time consuming and requires knowledge of diagnostic tests, some of which are specific to this methodology 
Simulation is time consuming and requires macros to perform the simulation, but can be done Multiple runs are required to verify diagnostics before the best model can be selected 
Harmonic analysis is not built into the software and a number of macros are required to complete the seasonal modelling component No easy way to simulate the model to verify that the summary statistics for the model are close to the actual summary statistics. Simulation can be programmed using a separate software module 
ARMA with harmonic analysis model using MinitabSSTS model using STAMP
PROS 
Results indicate whether diagnostic tests are significant so user can quickly diagnose the model Seasonality can be included in a single model 
ACF plots for the model residuals are produced with the model results so user can easily decide how to improve the model Modelling is very fast and easy to do 
Simulation of the data can be completed to verify the model accuracy  
CONS 
Monthly data must be deseasonalized prior to ARMA modelling Diagnosis of the model is very time consuming and requires knowledge of diagnostic tests, some of which are specific to this methodology 
Simulation is time consuming and requires macros to perform the simulation, but can be done Multiple runs are required to verify diagnostics before the best model can be selected 
Harmonic analysis is not built into the software and a number of macros are required to complete the seasonal modelling component No easy way to simulate the model to verify that the summary statistics for the model are close to the actual summary statistics. Simulation can be programmed using a separate software module 

CONCLUSIONS

Both methods are effective ways to develop models of stream flows, however for a practising engineer, the tools make the difference. Although it is more time consuming to develop an ARMA model in Minitab than to develop an SSTS model in STAMP, it is relatively easy to simulate the flows using the ARMA model. In addition, the ARMA or Box-Jenkins methodology is well-known, widely used in hydrology, and the statistical principles are easier to follow. Furthermore, there are many software packages available for ARMA modelling. The main difficulty with developing an SSTS model is in the diagnostics, and the theory behind the state-space approach is considerably more complex. If features were added to STAMP such as automatic graphing of residual plots, provision of critical values for diagnostic tests, provision of guidance for the interpretation of diagnostics in the model results, and adding the capability of simulation, the program could be a very good tool for the practising hydrotechnical engineer wanting to develop a model to simulate and forecast stream flows.

REFERENCES

REFERENCES
Akaike
H.
1974
A new look at the statistical model identification
.
IEEE Trans. Auto. Control
19
(
6
),
716
723
.
Box
G. E. P.
Jenkins
G. M.
Reinsel
G. C.
2008
Time Series Analysis
, 4th edn.
Wiley and Sons
,
New York
,
USA
.
Commandeur
J. J. F.
Koopman
S. J.
2007
An Introduction to State Space Time Series Analysis.
Oxford University Press Inc.
,
New York
,
USA
.
Environment Canada
2010
Water Survey of Canada, EC Data Explorer V1.2.30 [Online]. www.ec.gc.ca/rhc-wsc/ (accessed 13 May 2013)
.
Environment Canada
2013
Water Survey of Canada, ‘HYDAT Database 2013’ [Online]. www.ec.gc.ca/rhc-wsc/ (accessed 20 May 2013)
.
Harvey
A. C.
1989
Forecasting, Structural Time Series Models and the Kalman Filter
.
Cambridge University Press
,
Cambridge
,
UK
.
Hipel
K. W.
McLeod
A. I.
1994
Time Series Modelling of Water Resources and Environmental Systems
.
Elsevier
,
Amsterdam
.
Koopman
S. J.
Harvey
A. C.
Doornik
J. A.
Shephard
N.
2009
Structural Time Series Analyser, Modeller and Predictor – STAMP 8.2.
Timberlake Consultants Ltd
,
London
,
UK
.
Minitab Inc.
2013
Minitab 16 Software.
Minitab Inc.
,
State College, PA
.
Sidhu
A.
Lye
L. M.
1995
Seasonal flow forecasting of Newfoundland rivers
. In:
Proceedings of the Annual Conference of the CSCE
,
Ottawa
, pp.
585
592
.
Quebec provincial government, Centre d'expertise hydrique, Hydrometric stations, Region hydrographique du Saint-Laurent nord-est [Online]
. .