In this paper, we address the critical task of 24-h streamflow forecasting using advanced deep-learning models, with a primary focus on the transformer architecture which has seen limited application in this specific task. We compare the performance of five different models, including persistence, long short-term memory (LSTM), Seq2Seq, GRU, and transformer, across four distinct regions. The evaluation is based on three performance metrics: Nash–Sutcliffe Efficiency (NSE), Pearson's r, and normalized root mean square error (NRMSE). Additionally, we investigate the impact of two data extension methods: zero-padding and persistence, on the model's predictive capabilities. Our findings highlight the transformer's superiority in capturing complex temporal dependencies and patterns in the streamflow data, outperforming all other models in terms of both accuracy and reliability. Specifically, the transformer model demonstrated a substantial improvement in NSE scores by up to 20% compared to other models. The study's insights emphasize the significance of leveraging advanced deep learning techniques, such as the transformer, in hydrological modeling and streamflow forecasting for effective water resource management and flood prediction.

  • Transformer model surpasses persistence, LSTM, GRU, Seq2Seq in 24-h streamflow prediction.

  • Employed different data extension methods, zero-padding and persistence, to enhance predictive capabilities.

  • Evaluated model performance using Nash–Sutcliffe Efficiency, Pearson's r, and normalized root mean square error metrics.

  • Conducted region-specific analysis, demonstrating the transformer model's adaptability across varied hydrological environments.

Globally, the incidence and catastrophic effects of natural disasters have increased dramatically. The World Meteorological Organization's analysis (2021) shows that, on average, each day for the past half-century, a weather, climate, or water-related disaster has led to a loss of $202 million and claimed 115 lives. Further, Munich Re's (2022) report indicated that natural catastrophes, encompassing hurricanes, floods, and other disaster types, have inflicted more than $280 billion in projected damage worldwide. Out of this total, disasters caused $145 billion in damages in the United States alone, along with thousands of fatalities and substantial damage to properties and infrastructure. Current research suggests that ongoing climate change is projected to cause an upsurge in extreme and intense natural disasters globally, leading to an increase in the number of victims and losses (Banholzer et al. 2014; WMO 2021).

Floods are the most commonly occurring natural disaster, leading to billions in financial losses and innumerable fatalities over time (WMO 2021). In the year 2020, over 60% of all reported natural disasters were flood-related, accounting for 41% of the overall death toll due to such events (NDRCC 2021). Multiple studies suggest that climate change is causing an escalation in the frequency and severity of floods in specific areas (Tabari 2020; Davenport et al. 2021; NOAA 2022). This rise in flooding events can be attributed to factors like an increase in sea level (Strauss et al. 2016), the heightened occurrence of extreme rainfall (Diffenbaugh et al. 2017), or amplified rainfall during hurricanes (Trenberth et al. 2018). Hence, accurately forecasting streamflow and, as a result, potential flooding is essential for effectively mitigating the destructive consequences in terms of property damage and fatalities (Alabbad & Demir 2022).

In addition, streamflow forecasting plays a vital role in numerous aspects of hydrology and water management, including watershed management (Demir & Beck 2009), agricultural planning (Yildirim & Demir 2022), flood mapping systems (Li & Demir 2022), and other mitigation activities (Yaseen et al. 2018; Ahmed et al. 2021). Yet, achieving accurate and reliable predictions poses a challenge due to the inherent complexity of hydrological systems, which include nonlinearity, and unpredictability in the datasets (Yaseen et al. 2017; Honorato et al. 2018; Sit et al. 2023a).

Over time, a plethora of physical and data-driven methods have been introduced, each exhibiting diverse characteristics such as employing different types of data, focusing on specific geographical areas, or offering varying levels of generalization (Salas et al. 2000; Yaseen et al. 2015). Physics-driven prediction models (Beven & Kirkby 1979; Ren-Jun 1992; Arnold 1994; Lee & Georgakakos 1996; Devia et al. 2015) have the capability to simulate the complex interactions among different physical processes, including atmospheric circulation and the long-term evolution of weather patterns in the world (Yaseen et al. 2019; Sharma & Machiwal 2021). However, these models, while valuable, come with notable limitations. They demand extensive and precise hydrological and geomorphological data, increasing operational costs. The accuracy of these models tends to wane in long-term forecasting scenarios.

Furthermore, due to their computational intensity and high parameter counts, traditional physically based hydrological models require substantial computing resources, leading to significant computational costs (Mosavi et al. 2018; Sharma & Machiwal 2021; Liu et al. 2022; Castangia et al. 2023). As a result, recent research (Yaseen et al. 2015) has explored alternative approaches to streamflow forecasting, indicating that machine learning, especially deep learning models, can serve as viable alternatives and often outperform physically based models in terms of accuracy. These deep learning models have shown promising results in enhancing the accuracy and reliability of streamflow predictions, presenting an opportunity to revolutionize hydrological modeling (Demiray et al. 2023; Sit et al. 2023b).

Many classical machine-learning approaches have been used in streamflow forecasting and environmental studies (Bayar et al. 2009; Li & Demir 2023) including support vector machines (SVMs) and linear regression (LR) (Granata et al. 2016; Yan et al. 2018; Sharma & Machiwal 2021). However, advancements in artificial intelligence (AI) coupled with the increasing capabilities of graphics processing units (GPUs) have opened up new possibilities and accelerated the progress of deep learning techniques, which has led to the widespread usage of these techniques in streamflow forecasting as well (Sit et al. 2022a). Out of various neural network architectures explored for streamflow forecasting (Sit et al. 2021a; Xiang & Demir 2022b; Chen et al. 2023), Recurrent neural networks (RNNs), especially the long short-term memory (LSTM) neural network and gated recurrent units (GRUs), have emerged as the most extensively studied and researched models in this domain.

Kratzert et al. (2018) applied an LSTM model to predict daily runoff, incorporating meteorological observations, and demonstrated that the LSTM model outperformed a well-established physical model in their study area. In their study, Xiang et al. (2021) demonstrated that the LSTM-seq2seq model surpasses the performance of other linear models, such as LR, lasso regression, and ridge regression methods. The LSTM-seq2seq model outperformed these linear models in terms of predictive accuracy or other relevant evaluation metrics. Guo et al. (2021) compared LSTMs, GRUs, and SVMs over 25 different locations in China and found that while LSTMs and GRUs demonstrated comparable performance, GRUs exhibited faster training times. Since the research about the field is extensive, more detailed information about deep learning studies on streamflow prediction can be found in Yaseen et al. (2015) and Ibrahim et al. (2022).

In 2017, a group of researchers from Google introduced a new way to model longer sequences for language translation (Vaswani et al. 2017) and this new model, namely transformers, was applied to various tasks since then including time series prediction (Wu et al. 2021; Zhou et al. 2021, 2022; Lin et al. 2022). Despite attention from other fields, there is a limited number of studies that focus on the performance and usage of transformers in streamflow forecasting. Liu et al. (2022) introduced a transformer neural network model for monthly streamflow prediction of the Yangtze River in China. Their approach utilized historical water levels and incorporated the El Niño-Southern Oscillation (ENSO) as additional input features. This allowed the model to capture the influence of ENSO on streamflow patterns and improve the accuracy of monthly streamflow predictions for the Yangtze River. More recently, Castangia et al. (2023) used a transformer based model to predict the water level of a river 1 day in advance, leveraging the historical water levels of its upstream branches as predictors. They conducted experiments using data from the severe flood that occurred in Southeast Europe in May 2014.

In this work, we investigate the performance of a transformer model in streamflow forecasting for four different locations in Iowa, USA. More specifically, we predict the upcoming 24-h water levels using the previous 72-h precipitation, evapotranspiration, and discharge values, then compare the results of transformer-based model with three deep learning models as well as the persistence method. According to experiment results, transformer-based model outperforms all tested methods.

The structure of the remaining sections of this paper is as follows: in the next section, the dataset that has been used in this research and study area will be introduced. Section 3 outlines the methods employed in this study. Following that, Section 4 presents the results of our experiments and provides a detailed discussion of the findings. Finally, in Section 5, we summarize the key findings of this study and discuss future prospects.

WaterBench, developed by Demir et al. (2022), is a benchmark dataset explicitly created for flood forecasting research, adhering to FAIR (findability, accessibility, interoperability, and reuse) data principles. Its structure is designed for easy application in data-driven and machine-learning studies, and it also provides benchmark performance metrics for advanced deep-learning architectures, enabling comparative analysis. This dataset has been compiled by gathering streamflow, precipitation (Sit et al. 2021b), watershed area, slope, soil types, and evapotranspiration data from various federal and state entities, including NASA, NOAA, USGS, and the Iowa Flood Center. This consolidated resource is specifically geared towards studies of hourly streamflow forecasts.

The dataset's time-series spans from October 2011 to September 2018. In this work, four different U.S. Geological Survey (USGS) stations, each one of them located in a different watershed, are selected from WaterBench. More specifically, USGS 05387440 Upper Iowa River at Bluffton, USGS 05418400 North Fork Maquoketa River near Fulton, USGS 05454000 Rapid Creek near Iowa City, and USGS 06817000 Nodaway River at Clarinda are selected. Figure 1 illustrates the locations of the designated sites and their corresponding watersheds within the State of Iowa.
Figure 1

Selected locations and corresponding watersheds in the State of Iowa.

Figure 1

Selected locations and corresponding watersheds in the State of Iowa.

Close modal

The data from October 2011 to September 2017 are selected for the training set. Fifteen percent of the remaining data are used for validation and the rest is allocated for testing. This data split was consistent across all stations and models studied in our research. The use of a uniform data division strategy ensures comparability and fairness in the evaluation of each model's performance in different regions. As a preprocessing step, we followed the same methods in the original dataset paper (Demir et al. 2022) since we compared our results with the models provided in the WaterBench paper. The data and benchmark models can be accessed from https://github.com/uihilab/WaterBench. The statistical summary of streamflow values in used test data is provided in Table 1.

Table 1

Statistical summary of streamflow values in test data (m3/s)

BlufftonFultonIowa CityClarinda
Max 13,050.00 10,075.00 2,242.50 11,575.00 
Min 41.09 121.99 0.16 70.87 
Mean 436.98 425.59 12.76 443.70 
Median 246.00 308.00 4.28 256.00 
BlufftonFultonIowa CityClarinda
Max 13,050.00 10,075.00 2,242.50 11,575.00 
Min 41.09 121.99 0.16 70.87 
Mean 436.98 425.59 12.76 443.70 
Median 246.00 308.00 4.28 256.00 

In this study, we evaluated the transformer-based model in streamflow prediction tasks and compared the results with the four models (Persistence, GRU, LSTM, and Seq2Seq), that are mentioned and provided in the WaterBench dataset. In this section, we will provide the details of these methods as well as the transformer-based approach.

Persistence approach

Persistence (Equation (1)), also known as the nearest frame approach, is based on the principle that ‘tomorrow will be the same as today.’ In other words, persistence forecasts rely solely on the most recent available data and assume that future conditions will remain unchanged from the present. It is accepted as one of the baselines for hydrological studies including streamflow forecasting and several hydrological studies have indicated that the fundamental persistence model is challenging to surpass in terms of short-range predictions, especially when the forecasting lead time (n) is less than 12 h (Krajewski et al. 2021; Demir et al. 2022).
(1)
where ; .

LSTM and GRU

In the context of time-series forecasting, RNNs have proven effective in capturing temporal dependencies. However, they suffer from the vanishing gradient problem, where the gradient diminishes exponentially over time, hindering the model's ability to retain long-term dependencies. This limitation impacts the accuracy of time-series predictions, particularly for tasks that require memory of events far back in the past. To address these shortcomings, LSTM networks were introduced by Hochreiter & Schmidhuber (1997). LSTMs are designed to extend the lifespan of short-term memory and effectively capture long-term dependencies in the data. This makes them well-suited for time series problems consequently hydrological forecasting tasks as well that involve longer memory requirements, such as flood and rainfall forecasting (Kratzert et al. 2018; Feng et al. 2020; Frame et al. 2022; Sit et al. 2022b).

The core of an LSTM unit comprises three gates: the input gate, forget gate, and output gate, each playing a pivotal role in the information flow within the network. The input gate controls the addition of new information to the cell state, balancing between the current input and the previous state. The forget gate, on the other hand, determines which parts of the existing memory to retain or discard, allowing the model to forget irrelevant data over time. The output gate decides the next hidden state based on the current cell state, effectively controlling the output information of the LSTM unit. These gates, governed by sigmoid and tanh activation functions, operate through a series of equations that manage data storage, retention, and output. This sophisticated gating mechanism is fundamental to the LSTM's ability to manage information over extended periods, making it a powerful tool in forecasting where past events significantly influence future outcomes.

While LSTM networks have been instrumental in addressing the vanishing gradient problem and achieving remarkable progress in Natural Language Processing and time-series prediction, their time complexity can be a concern, especially for large-scale applications. To mitigate this issue, GRU networks were introduced by Cho et al. (2014) as an efficient alternative that retains the effectiveness of LSTM while reducing computational burden. It merges the functionalities of the input and forget gates into a single update gate, reducing the complexity and computational burden. The GRU's architecture comprises two main components: the update gate and the reset gate. The update gate in GRUs determines the extent to which the previous state influences the current state, thus controlling the flow of information from the past. The reset gate, on the other hand, decides how much past information to forget, allowing the model to drop irrelevant data from previous time steps. These gates use sigmoid and tanh functions to manage the model's memory effectively. By employing these gating mechanisms and streamlined computations, the GRU model strikes a balance between computational efficiency and predictive performance.

Both LSTM and GRU models have been applied successfully in various domains, particularly in hydrological forecasting (Yaseen et al. 2015; Ibrahim et al. 2022). Their ability to model the non-linear relationships inherent in hydrological processes has led to them becoming popular choices for tasks such as predicting streamflow and rainfall. Studies such as those by Kratzert et al. (2018) and Guo et al. (2021) have demonstrated the efficacy of these models in hydrological forecasting, highlighting their strengths in capturing complex temporal patterns and relationships in hydrological data.

Seq2Seq model

In addition to LSTM and GRU, a variant of the Seq2Seq model (Xiang & Demir 2022a) is also employed as a baseline method in this study. The Seq2Seq model follows an encoder–decoder architecture and utilizes multiple TimeDistributed layers with a final dense layer. The encoder–decoder structure of the Seq2Seq model consists of two main components: an encoder and a decoder. The encoder processes the input time series data and encodes it into a fixed-size context vector, effectively capturing relevant temporal patterns and features. For this implementation, multiple GRUs are used as both the encoder and decoder, proven effective in modeling sequential data and handling long-range dependencies.

During the encoding process, the input time series data, including historical rainfall, streamflow, and evapotranspiration for the past 72 h, along with 24-h forecast data of rainfall and evapotranspiration, is passed through the multiple GRUs. The encoder generates a context vector that summarizes the important information from the input sequence. Next, the decoder takes the context vector produced by the encoder and predicts the future 24-h streamflow. The decoder GRUs process the context vector along with the predicted streamflow values from the previous timestep, iteratively generating the streamflow predictions for the next 24 h. To capture intricate patterns and temporal dynamics in the predictions, multiple TimeDistributed layers are employed, applying the same dense layer to each timestep of the output sequence. Finally, the Seq2Seq model concludes with a final dense layer that projects the output sequence to the desired format for 24-h streamflow predictions. For comprehensive implementation details, we recommend referring to the works by Xiang & Demir (2022a) and Demir et al. (2022).

Transformer model

The transformer model represents a revolutionary neural network architecture that emerged as a seminal work by Vaswani et al. (2017) to tackle challenges in machine translation tasks. Its groundbreaking design subsequently found applications in various domains that deal with long input sequences, including time series forecasting (Wu et al. 2021; Zhou et al. 2021; Lin et al. 2022; Wen et al. 2022; Zhou et al. 2022). The transformer's key innovation lies in the self-attention mechanism, which completely replaces traditional recurrent layers, enabling more efficient and effective analysis of extended input sequences.

The self-attention computation in the transformer can be broken down into several stages to reveal its inner workings. Initially, each element in the input sequence is projected into three distinct representations – query (Q), key (K), and value (V) vectors of dimension dmodel. The self-attention scores are then obtained by performing a dot-product operation between the query and key matrices, followed by scaling and applying a softmax function to capture the importance of each element in relation to others. Consequently, the weighted sum of the value vectors produces a new representation of the input sequence. By employing self-attention, the transformer can dynamically adjust the representation of each element, factoring in the influence of all other elements in the sequence. This enables distant elements to contribute meaningfully to each other, fostering the capture of long-range dependencies that may be crucial for accurate time series forecasting. For a visual understanding of this process, Figure 2 illustrates the self-attention mechanism in action, highlighting how these computations interact within the transformer's architecture. The self-attention mechanism can be mathematically represented as given in the following equation:
(2)
Figure 2

Scaled dot-product attention mechanism.

Figure 2

Scaled dot-product attention mechanism.

Close modal
To enhance the model's ability to capture diverse patterns, transformer model employs multi-head attention. The query, key, and value vectors are divided into multiple chunks of dimension dmodelh, where h is the number of attention heads. Each head independently computes the self-attention process for each chunk, and the resulting representations are concatenated and subjected to a final linear transformation. The introduction of multi-head attention increases the potential combinations between elements, thereby enhancing the model's ability to capture intricate relationships within the input sequence. For further clarity on how this process functions within the transformer's framework, Figure 3 depicts the multi-head attention mechanism, illustrating the parallel computation of attention heads and their integration.
Figure 3

Multi-head attention mechanism.

Figure 3

Multi-head attention mechanism.

Close modal
As self-attention inherently lacks information about the order of elements in the input sequence, static positional encoding is introduced to provide positional awareness. The positional encoding is added to the initial input embedding, ensuring that the model distinguishes the positions of different elements. Each element's position is encoded using a specific formula, involving positional index and sine and cosine functions. The positional encoding can be mathematically represented as given in the following equation:
(3)
The transformer model employed in this study deviates slightly from the original implementation by Vaswani et al. (2017). Notably, all decoder layers are removed, as the task specifically focuses on time series forecasting without the need for decoding. Additionally, a different input embedding technique is utilized, where the input sequence is passed through a linear layer before positional encoding. Dropout layers are incorporated for regularization after adding positional encoding. A final linear layer is employed to reduce the feature size to 1 at the end of the last encoder, given the model has no decoders and a single output value is required for forecasting. The model is depicted in Figure 4.
Figure 4

Transformer model architecture.

Figure 4

Transformer model architecture.

Close modal

The persistence, GRU, LSTM, and transformer models are developed with Pytorch, whereas the Seq2Seq model is developed with Keras. Please refer to Demir et al. (2022) for further implementation details and model architectures of GRU, LSTM, and Seq2Seq utilized in this study. In the transformer model, we employed a linear embedding layer to expand the feature size of the input from 3 to 64, preparing the input data for efficient processing by the transformer architecture. The model comprises a single encoder layer equipped with eight attention heads, enhancing its ability to focus on different facets of the input sequence simultaneously. The model size for the transformer is set to 64, balancing the model's complexity and computational efficiency. Additionally, the encoder's internal feedforward network has a dimension of 256, which provides sufficient capacity for internal feature transformations and processing. In addition, GELU activation function is used between two linear functions inside the feed-forward component. To provide a clearer understanding of the computational complexity of the models employed in our study, Table 2 provides a detailed account of the trainable parameter counts for each model.

Table 2

Trainable parameter counts of tested models

Model nameNumber of trainable parameters
LSTM 51,009 
SeqSeq 77,505 
GRU 38,273 
Transformer 56,449 
Model nameNumber of trainable parameters
LSTM 51,009 
SeqSeq 77,505 
GRU 38,273 
Transformer 56,449 

During the training, we used mean squared error (MSE) as the loss function and Adam as the optimizer. In addition, we set the batch sizes to 512 and the learning rate to 0.00001. The learning rate is divided by two if no improvement is noticed for 10 epochs and training is frozen if there is no improvement for 20 epochs.

In this section, we present and discuss the findings from our research into the 24-h prediction of streamflow using different models, focusing primarily on the performance of the transformer model we used. To assess its effectiveness, we compare it against four other models, three of which are deep learning models – LSTM, GRU, and Seq2Seq – and one is a classical approach known as persistence.

Streamflow prediction holds immense significance in various domains such as water resource management, environmental monitoring, and decision-making processes. Deep learning models have demonstrated remarkable capabilities in time-series forecasting tasks, making them a natural choice for tackling streamflow prediction challenges. However, the application of transformer models in this specific context is relatively new and deserving of detailed investigation. The transformer's self-attention mechanism has shown great promise in sequence modeling tasks, making it an intriguing candidate for capturing temporal dependencies in streamflow data.

Our comparative analysis employs three metrics, namely, Nash–Sutcliffe efficiency (NSE), Pearson's r, and normalized root mean square error (NRMSE). Each of these metrics serves to facilitate a thorough and multidimensional understanding of each model's predictive capacities and the effectiveness of the transformer model. The subsequent sections delve into a detailed exposition of the three-evaluation metrics and their relevance in streamflow prediction assessment. Following that, we meticulously present and analyze the results obtained from each model, highlighting their respective strengths and limitations. Through this thorough examination, we aim to uncover the effectiveness of the transformer model in 24-h streamflow prediction and its potential implications for future research and real-world applications.

Performance metrics

In the evaluation of streamflow prediction models, several performance metrics are commonly employed to assess the accuracy and reliability of the forecasts. In this study, we utilized three widely accepted metrics: NSE, Pearson's r, and NRMSE. These metrics have been extensively applied in hydrological modeling and streamflow forecasting research due to their interpretability and ability to capture different aspects of model performance (Kratzert et al. 2018; Xiang & Demir 2021; Liu et al. 2022). To ensure clarity and conciseness in our presentation of performance metrics, the parameters used in the following equations are detailed once and applied uniformly across all formulas, with specific deviations noted accordingly.
First, NSE (Equation (4)) is a widely used metric to quantify the predictive performance of hydrological models (Krause et al. 2005; Arnold et al. 2012). It provides a measure of how well the model predictions match the observed streamflow data, relative to the mean of the observed data. The NSE ranges from negative infinity to 1. A value of 1 indicates a perfect match, where the model predictions precisely align with the observations. Values greater than 0 denote better performance than using the mean of the observed data as a predictor. On the other hand, negative NSE values signify that the mean of the observed data outperforms the model, indicating poor predictive ability. Values greater than 0.5 are considered acceptable in hydrological modeling (Arnold et al. 2012).
(4)
Second, Pearson's correlation coefficient (r), also known as Pearson's r or simply r, is a statistical measure used to assess the linear relationship between the model's predicted streamflow values and the observed streamflow data. Pearson's r directly quantifies the strength and direction of the linear association between the predicted and observed streamflow values. Ranging from −1 to 1, with 1 indicating a perfect positive linear relationship, a higher positive Pearson's r value signifies a more reliable and accurate model, capable of accurately capturing the patterns in the observed data and making precise predictions. Utilizing Pearson's r allows us to measure the accuracy and effectiveness of the forecasting models in capturing the variability of the observed streamflow for the 24-h prediction horizon.
(5)
Last, NRMSE measures the average error between the predicted and observed streamflow values, normalized by the mean of the observed data. It provides a relative measure of the model's predictive accuracy, enabling comparison across different datasets. Since this study uses different locations, it seems reasonable to use NRMSE. The NRMSE ranges from 0 to 1, with lower values indicating better predictive performance and higher values indicating larger errors relative to the mean of the observed streamflow.
(6)

These metrics serve as essential tools in quantifying the predictive performance of our streamflow prediction models, enabling us to assess their effectiveness in capturing the underlying patterns and dynamics of streamflow behavior.

Experiment results and discussion

In this section, we present the experiment results that address the core objective of our study: 24-h streamflow prediction. For this investigation, we utilized a comprehensive dataset comprising historical data on precipitation, evapotranspiration, discharge values from the preceding 72 h, as well as forecast data of 24-h precipitation and evapotranspiration. Our investigation focused on evaluating the performance of the transformer-based model, comparing it against three deep learning models (LSTM, GRU, and Seq2Seq), and a classical method (persistence). To assess the predictive capabilities of these models, we employed three commonly used metrics in hydrological modeling and streamflow forecasting: NSE, Pearson's r, and NRMSE. These metrics provide valuable insights into the accuracy and effectiveness of the models in capturing streamflow patterns.

In the experiments, a crucial aspect involved adjusting the dimensions of the input data and incorporating additional values to accommodate the implementation specifications of GRU, LSTM, and transformer models. More specifically, input data for these networks are a combination of previous values and forecast values. Previous values are 72 h of precipitation, evapotranspiration, and discharge values, for the forecast values 24 h of precipitation, and evapotranspiration information are used. So, one has a shape of [batch size, 72, 3] and the other has [batch size, 24, 2]. To merge and align these two input groups for the models, an extra dimension for forecast values needed to be introduced. According to experiment results, what is used as an additional dimension affects the results dramatically. Two approaches are considered to handle this dimension discrepancy. One approach is zero-padding, wherein the forecast values are extended with zeros in the additional dimension. Alternatively, the persistence method can be adopted, wherein the historical values were extended into the forecast period by repeating the last available data. This method ensured consistency in the input data across time steps. Both techniques are employed to ensure compatibility between the input data and the specific model requirements. Once the additional dimension is added, past and forecast values merge and input with a dimension of [batch size, 96, 3] is obtained for transformer, GRU, and LSTM models.

The results in Table 3 demonstrate the performance comparison of the transformer model using zero-padding and persistence approaches for 24-h streamflow forecasting in four different regions. The NSE scores reveal valuable insights into the model's predictive capabilities under each data extension method. Upon analysis, it becomes evident that the persistence method for data extension consistently outperforms zero-padding in capturing underlying streamflow patterns and dynamics for the transformer model in three of the four analyzed regions. These findings emphasize the critical role of data extension techniques in improving the transformer model's performance for streamflow forecasting tasks.

Table 3

Performance comparison of transformer model for 24-h streamflow forecasting using zero-padding and persistence approaches in four different regions (NSE scores)

Bluffton
Fulton
Iowa City
Clarinda
MeanMedianMeanMedianMeanMedianMeanMedian
Transformer-zero-padding 0.73 0.70 0.62 0.64 0.24 0.25 0.72 0.72 
Transformer-persistence 0.82 0.81 0.70 0.71 0.42 0.42 0.65 0.68 
Bluffton
Fulton
Iowa City
Clarinda
MeanMedianMeanMedianMeanMedianMeanMedian
Transformer-zero-padding 0.73 0.70 0.62 0.64 0.24 0.25 0.72 0.72 
Transformer-persistence 0.82 0.81 0.70 0.71 0.42 0.42 0.65 0.68 

Similar to Table 3, Table 4 displays the NSE scores obtained from the predictions made by the GRU and LSTM models under the zero-padding and persistence data extension methods for each region. Upon analysis, we observe variations in the models' performance across the four regions. Interestingly, for the LSTM model, the zero-padding approach yields higher mean and median NSE scores compared to the persistence method. Conversely, for the GRU model, the persistence method consistently outperforms the zero-padding approach, resulting in higher mean and median NSE scores.

Table 4

Performance comparison of GRU and LSTM models for 24-h streamflow forecasting using zero-padding and persistence approaches in four different regions (NSE scores)

Bluffton
Fulton
Iowa City
Clarinda
MeanMedianMeanMedianMeanMedianMeanMedian
GRU-zero-padding 0.56 0.54 0.42 0.45 0.12 0.12 −0.40 −0.45 
GRU-persistence 0.72 0.72 0.62 0.64 0.19 0.19 0.57 0.59 
LSTM-zero-padding 0.77 0.76 0.45 0.44 −0.45 −0.52 0.51 0.53 
LSTM-persistence 0.50 0.50 0.40 0.41 −1.50 −1.60 0.09 0.09 
Bluffton
Fulton
Iowa City
Clarinda
MeanMedianMeanMedianMeanMedianMeanMedian
GRU-zero-padding 0.56 0.54 0.42 0.45 0.12 0.12 −0.40 −0.45 
GRU-persistence 0.72 0.72 0.62 0.64 0.19 0.19 0.57 0.59 
LSTM-zero-padding 0.77 0.76 0.45 0.44 −0.45 −0.52 0.51 0.53 
LSTM-persistence 0.50 0.50 0.40 0.41 −1.50 −1.60 0.09 0.09 

In summary, the different performance trends for the three models under the zero-padding and persistence approaches highlight the significance of selecting appropriate data extension techniques in streamflow forecasting tasks, as the effectiveness can vary depending on the model architecture.

Table 5 presents the 24-h streamflow prediction results for four different regions using five different models: Persistence, LSTM, Seq2Seq, GRU, and Transformer. The results are evaluated using three performance metrics: mean of 24 h of NSE scores, Pearson's r, and NRMSE. In this study, the persistence model serves as the baseline for comparison. While it exhibits moderate performance in some regions, it falls short of capturing the underlying dynamics of streamflow, leading to higher NRMSE values. As expected, it shows limited predictive capabilities compared to the advanced deep learning models. The LSTM and Seq2Seq models demonstrate mixed results across regions. While they achieve reasonably high NSE scores in certain regions, they struggle to consistently outperform the persistence model, especially in regions Iowa City and Clarinda.

Table 5

24-h Streamflow prediction results (NSE scores)

RegionMetricPersistenceLSTMSeq2SeqGRUTransformer
Bluffton NSE 0.58 0.77 0.66 0.72 0.82 
r 0.77 0.88 0.85 0.86 0.91 
NRMSE 1.26 0.92 1.13 1.01 0.82 
Fulton NSE 0.46 0.45 0.58 0.62 0.70 
r 0.73 0.67 0.76 0.78 0.83 
NRMSE 0.98 0.99 0.87 0.83 0.73 
Iowa City NSE −0.30 −0.45 0.01 0.19 0.42 
r 0.34 0.29 0.16 0.44 0.66 
NRMSE 6.36 6.65 5.53 4.95 4.22 
Clarinda NSE 0.48 0.51 0.29 0.57 0.65 
r 0.74 0.84 0.56 0.90 0.90 
NRMSE 1.23 1.20 1.43 1.08 1.00 
RegionMetricPersistenceLSTMSeq2SeqGRUTransformer
Bluffton NSE 0.58 0.77 0.66 0.72 0.82 
r 0.77 0.88 0.85 0.86 0.91 
NRMSE 1.26 0.92 1.13 1.01 0.82 
Fulton NSE 0.46 0.45 0.58 0.62 0.70 
r 0.73 0.67 0.76 0.78 0.83 
NRMSE 0.98 0.99 0.87 0.83 0.73 
Iowa City NSE −0.30 −0.45 0.01 0.19 0.42 
r 0.34 0.29 0.16 0.44 0.66 
NRMSE 6.36 6.65 5.53 4.95 4.22 
Clarinda NSE 0.48 0.51 0.29 0.57 0.65 
r 0.74 0.84 0.56 0.90 0.90 
NRMSE 1.23 1.20 1.43 1.08 1.00 

This indicates that their recurrent architecture might face challenges in capturing the complex temporal dependencies in streamflow data. The GRU model showcases competitive performance across all regions. With consistent NSE scores and relatively lower NRMSE values compared to LSTM and Seq2Seq models, it proves its capability to effectively model the temporal dynamics in streamflow data. However, it still falls behind the transformer model's overall performance. The transformer model emerges as the top-performing model in 24-h streamflow prediction across all regions. With the highest NSE scores and the lowest NRMSE values among all models, the transformer demonstrates its efficacy in capturing and learning the long-range dependencies and patterns in the time series data. The self-attention mechanism, along with positional encoding, enables the transformer to effectively process and utilize the sequential information, leading to its superior predictive capabilities.

In Figure 5, we present a graphical comparison of the NSE scores over time for the transformer model and other forecasting models (persistence, LSTM, Seq2Seq, GRU) across four regions. The graph highlights the transformer's outstanding performance in Bluffton, where it maintains an NSE score consistently above 0.9, indicating a strong predictive capability. In Fulton, the transformer model shows a notable improvement over other models, especially after the initial forecasting hours, maintaining scores above 0.6, while the others fluctuate below this threshold. In Iowa City, the transformer demonstrates resilience in a challenging hydrological context, with a steady increase in NSE scores over time, surpassing other models. Clarinda's results are particularly illustrative of the transformer's efficiency, where it consistently outperforms other models with NSE scores remaining above 0.6 for the majority of the forecast period. This visual representation underscores the transformer model's robustness and adaptability in diverse hydrological environments. To complement our statistical analysis and provide a visual representation of our models' performance, Figures A1–A4 illustrate the comparison of forecasted and observed streamflow across four different sites. The figures enable a clear comparison of the temporal patterns captured by each model, underscoring the strengths and limitations of our forecasting approach in diverse watershed conditions.
Figure 5

24-h NSE comparison across regions for streamflow models.

Figure 5

24-h NSE comparison across regions for streamflow models.

Close modal

In conclusion, the experimental results highlight the transformer model's significant advantage over other models in 24-h streamflow forecasting. Its powerful self-attention mechanism allows it to efficiently capture and utilize the temporal dependencies in the input time series, resulting in more accurate and reliable predictions compared to traditional LSTM and Seq2Seq models, as well as the GRU model. These findings underscore the importance of leveraging advanced deep learning architectures like the transformer in hydrological modeling and streamflow forecasting tasks, offering valuable insights for the research community and practical applications in water resource management and flood forecasting. However, it is crucial to recognize certain limitations that are inherent to streamflow forecasting models in general. One notable observation is the varying performance across different watersheds, as seen in the Iowa City basin. This basin, being the smallest among those studied, presented challenges not only for the transformer model but also for other models. The reasons for these variations are complex and might be related to factors such as watershed size, land use patterns, or specific hydrological characteristics. This indicates a need for further investigation into how different environmental and regional factors influence model performance. Additionally, across all models, we observed a trend of diminishing NSE scores over the 24-h forecasting period. This pattern suggests that while these models are effective in short-term forecasting, their accuracy tends to decrease over longer periods. This is a critical area for future research, which could focus on enhancing long-term forecasting capabilities and examining the causes of this accuracy decline. Furthermore, while the extended inference time of the transformer model at 0.95 s, compared to the LSTM's 0.51 and GRU's 0.49 s for a batch of 32 examples, may not significantly impact our 24-h prediction window, it is a crucial consideration for other environmental and hydrological tasks that demand immediate action, such as flash flood early warning systems or real-time water quality monitoring. This aspect of the transformer's performance, sometimes observed in its architecture (Tay et al. 2022), underscores the broader importance of selecting models not only for their accuracy but also for their operational efficiency in various hydrological contexts. Acknowledging these broader challenges and limitations is important for advancing the field of hydrological modeling. Our findings, while highlighting the strengths of the transformer model, also point to the ongoing need for research and development to address these complex issues in streamflow prediction.

In this study, we conducted an in-depth investigation of 24-h streamflow forecasting using various deep learning models, with a particular focus on the transformer architecture. Through extensive experimentation and analysis, we compared the performance of five different models across four distinct regions. The results demonstrate that the transformer model consistently outperforms other models, including persistence, LSTM, Seq2Seq, and GRU, in terms of accuracy and predictive capabilities. The transformer's powerful self-attention mechanism, along with positional encoding, enables it to effectively capture long-range dependencies and underlying patterns in the input time series data. Consequently, the transformer model excels in providing accurate and reliable streamflow predictions.

Furthermore, we explored the influence of two data extension methods: zero-padding and persistence, on the model's performance. The findings indicate that the persistence method, which incorporates historical streamflow data, consistently yields superior results compared to zero-padding. This underscores the importance of carefully considering data extension techniques to improve the model's forecasting accuracy.

Overall, our research contributes valuable insights into the field of hydrological modeling and streamflow forecasting, with the transformer model exhibiting superior accuracy. This model's success in streamflow prediction opens new opportunities in water resource management, where precise forecasts can inform reservoir level adjustments and water distribution planning, crucial for drought mitigation and flood prevention. In flood prediction, the model's reliable forecasts provide essential data for developing proactive flood management strategies, enhancing emergency response, and safeguarding communities and infrastructure.

The applications of the transformer model also extend to sectors reliant on streamflow predictions. In agriculture, accurate forecasts from the transformer model can guide irrigation scheduling, contributing to water conservation and crop yield optimization. In urban planning, insights from streamflow predictions can be pivotal in designing robust drainage systems and managing sewage overflow events, especially in cities prone to sudden water level changes. As future work, we suggest exploring the applicability of the transformer in handling larger datasets and further investigating the impact of different hyperparameters on the model's performance. The knowledge gained from this study can significantly benefit water management practices, supporting sustainable decision-making and mitigation strategies in the face of increasingly unpredictable weather patterns and climate change.

All relevant data are included in the paper. In addition, the dataset and benchmark models can be accessible from https://github.com/uihilab/WaterBench.

The authors declare there is no conflict.

Alabbad
Y.
&
Demir
I.
2022
Comprehensive flood vulnerability analysis in urban communities: Iowa case study
.
International Journal of Disaster Risk Reduction
74
,
102955
.
Arnold
J.
1994
SWAT-Soil and Water Assessment Tool
.
Arnold
J. G.
,
Moriasi
D. N.
,
Gassman
P. W.
,
Abbaspour
K. C.
,
White
M. J.
,
Srinivasan
R.
,
Santhi
C.
,
Harmel
R. D.
,
Van Griensven
A.
,
Van Liew
M. W.
&
Kannan
N.
2012
SWAT: Model use, calibration, and validation
.
Transactions of the ASABE
55
(
4
),
1491
1508
.
Banholzer
S.
,
Kossin
J.
&
Donner
S.
2014
The impact of climate change on natural disasters
. In:
Singh, A. & Zommers, Z. (eds.)
Reducing Disaster: Early Warning Systems for Climate Change
.
Springer
,
Dordrecht, The Netherlands
, pp.
21
49
.
Bayar
S.
,
Demir
I.
&
Engin
G. O.
2009
Modeling leaching behavior of solidified wastes using back-propagation neural networks
.
Ecotoxicology and Environmental Safety
72
(
3
),
843
850
.
Castangia
M.
,
Grajales
L. M. M.
,
Aliberti
A.
,
Rossi
C.
,
Macii
A.
,
Macii
E.
&
Patti
E.
2023
Transformer neural networks for interpretable flood forecasting
.
Environmental Modelling & Software
160
,
105581
.
Chen
Z.
,
Lin
H.
&
Shen
G.
2023
TreeLSTM: A spatiotemporal machine learning model for rainfall-runoff estimation
.
Journal of Hydrology: Regional Studies
48
,
101474
.
Cho
K.
,
Van Merriënboer
B.
,
Bahdanau
D.
&
Bengio
Y.
2014
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
.
arXiv preprint arXiv:1409.1259
.
Davenport
F. V.
,
Burke
M.
&
Diffenbaugh
N. S.
2021
Contribution of historical precipitation change to US flood damages
.
Proceedings of the National Academy of Sciences
118
(
4
).
Demir
I.
&
Beck
M. B.
2009
GWIS: A prototype information system for Georgia watersheds
. In
Georgia Water Resources Conference: Regional Water Management Opportunities
,
Athens, GA, USA
.
Demir
I.
,
Xiang
Z.
,
Demiray
B.
&
Sit
M.
2022
WaterBench-Iowa: A large-scale benchmark dataset for data-driven streamflow forecasting
.
Earth System Science Data
14
(
12
),
5605
5616
.
Demiray
B. Z.
,
Sit
M.
&
Demir
I.
2023
EfficientTempNet: Temporal Super-Resolution of Radar Rainfall
.
arXiv preprint arXiv:2303.05552
.
Devia
G. K.
,
Ganasri
B. P.
&
Dwarakish
G. S.
2015
A review on hydrological models
.
Aquatic Procedia
4
,
1001
1007
.
Diffenbaugh
N. S.
,
Singh
D.
,
Mankin
J. S.
,
Horton
D. E.
,
Swain
D. L.
,
Touma
D.
,
Charland
A.
,
Liu
Y.
,
Haugen
M.
,
Tsiang
M.
&
Rajaratnam
B.
2017
Quantifying the influence of global warming on unprecedented extreme climate events
.
Proceedings of the National Academy of Sciences
114
(
19
),
4881
4886
.
Frame
J. M.
,
Kratzert
F.
,
Klotz
D.
,
Gauch
M.
,
Shalev
G.
,
Gilon
O.
,
Qualls
L. M.
,
Gupta
H. V.
&
Nearing
G. S.
2022
Deep learning rainfall–runoff predictions of extreme events
.
Hydrology and Earth System Sciences
26
(
13
),
3377
3392
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
Honorato
A. G. D. S. M.
,
Silva
G. B. L. D.
&
Guimaraes Santos
C. A.
2018
Monthly streamflow forecasting using neuro-wavelet techniques and input analysis
.
Hydrological Sciences Journal
63
(
15–16
),
2060
2075
.
Ibrahim
K. S. M. H.
,
Huang
Y. F.
,
Ahmed
A. N.
,
Koo
C. H.
&
El-Shafie
A.
2022
A review of the hybrid artificial intelligence and optimization modelling of hydrological streamflow forecasting
.
Alexandria Engineering Journal
61
(
1
),
279
303
.
Krajewski
W. F.
,
Ghimire
G. R.
,
Demir
I.
&
Mantilla
R.
2021
Real-time streamflow forecasting: AI vs. Hydrologic insights
.
Journal of Hydrology X
13
,
100110
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
2018
Rainfall–runoff modelling using long short-term memory (LSTM) networks
.
Hydrology and Earth System Sciences
22
(
11
),
6005
6022
.
Krause
P.
,
Boyle
D. P.
&
Bäse
F.
2005
Comparison of different efficiency criteria for hydrological model assessment
.
Advances in Geosciences
5
,
89
97
.
Lee
T. H.
&
Georgakakos
K. P.
1996
Operational rainfall prediction on Meso-γ scales for hydrologic applications
.
Water Resources Research
32
(
4
),
987
1003
.
Lin
T.
,
Wang
Y.
,
Liu
X.
&
Qiu
X.
2022
A survey of transformers
.
AI Open
3
,
111
132
.
Munich Re
2022
Hurricanes, Cold Waves, Tornadoes: Weather Disasters in USA Dominate Natural Disaster Losses in 2021
.
NDRCC
2021
2020 Global Natural Disaster Assessment Report
.
NOAA National Centers for Environmental Information (NCEI)
2022
US Billion-Dollar Weather and Climate Disasters
.
Available from: https://www.ncei.noaa.gov/access/monitoring/billions/. doi:10.25921/stkw-7w73
.
Ren-Jun
Z.
1992
The Xinanjiang model applied in China
.
Journal of Hydrology
135
(
1–4
),
371
381
.
Salas
J. D.
,
Markus
M.
&
Tokar
A. S.
2000
Streamflow forecasting based on artificial neural networks
. In:
Govindaraju, R. S. & Rao, A. R. (eds.)
Artificial Neural Networks in Hydrology
.
Springer, Dordrecht, The Netherlands, 36
, pp.
23
51
.
Sharma
P.
&
Machiwal
D.
2021
Advances in Streamflow Forecasting: From Traditional to Modern Approaches
.
Elsevier
,
Amsterdam, The Netherlands
.
Sit
M.
,
Demiray
B.
&
Demir
I.
2021a
Short-term Hourly Streamflow Prediction with Graph Convolutional GRU Networks
.
arXiv preprint arXiv:2107.07039
.
Sit
M.
,
Seo
B. C.
&
Demir
I.
2021b
Iowarain: A Statewide Rain Event Dataset Based on Weather Radars and Quantitative Precipitation Estimation
.
arXiv preprint arXiv:2107.03432.
Sit
M.
,
Demiray
B. Z.
&
Demir
I.
2022a
A Systematic Review of Deep Learning Applications in Streamflow Data Augmentation and Forecasting
.
EarthArxiv 3617. Available from: https://doi.org/10.31223/X5HM08
Sit
M.
,
Demiray
B. Z.
&
Demir
I.
2022b
A Systematic Review of Deep Learning Applications in Interpolation and Extrapolation of Precipitation Data
.
EarthArxiv 4715. Available from: https://doi.org/10.31223/X57H2H
Sit
M.
,
Seo
B. C.
,
Demiray
B. Z.
&
Demir
I.
2023a
Efficientrainnet: Smaller Neural Networks Based on Efficientnetv2 for Rainfall Nowcasting
.
EarthArxiv 5232. Available from: https://doi.org/10.31223/X5VQ1S
Sit
M.
,
Demiray
B. Z.
&
Demir
I.
2023b
Spatial downscaling of streamflow data with attention based spatio-temporal graph convolutional networks
.
EarthArxiv 5227. Available from: https://doi.org/10.31223/X5666M
Strauss
B. H.
,
Kopp
R. E.
,
Sweet
W. V.
&
Bittermann
K.
2016
Unnatural Coastal Floods: Sea Level Rise and the Human Fingerprint on US Floods Since 1950. Climate Central
.
Tay
Y.
,
Dehghani
M.
,
Bahri
D.
&
Metzler
D.
2022
Efficient transformers: A survey
.
ACM Computing Surveys
55
(
6
),
1
28
.
Trenberth
K. E.
,
Cheng
L.
,
Jacobs
P.
,
Zhang
Y.
&
Fasullo
J.
2018
Hurricane Harvey links to ocean heat content and climate change adaptation
.
Earth's Future
6
(
5
),
730
744
.
Vaswani
A.
,
Shazeer
N.
,
Parmar
N.
,
Uszkoreit
J.
,
Jones
L.
,
Gomez
A. N.
,
Kaiser
Ł.
&
Polosukhin
I.
2017
Attention is all you need
.
Advances in Neural Information Processing Systems
30
,
5998
6008
.
Wen
Q.
,
Zhou
T.
,
Zhang
C.
,
Chen
W.
,
Ma
Z.
,
Yan
J.
&
Sun
L.
2022
Transformers in Time Series: A Survey
.
arXiv preprint arXiv:2202.07125
.
World Meteorological Organization (WMO)
2021
The Atlas of Mortality and Economic Losses From Weather, Climate and Water Extremes (1970–2019)
.
Wu
H.
,
Xu
J.
,
Wang
J.
&
Long
M.
2021
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting
.
Advances in Neural Information Processing Systems
34
,
22419
22430
.
Xiang
Z.
&
Demir
I.
2021
High-Resolution Rainfall-Runoff Modeling Using Graph Neural Network
.
arXiv preprint arXiv:2110.10833
.
Xiang
Z.
&
Demir
I.
2022a
Real-Time Streamflow Forecasting Framework, Implementation and Post-analysis Using Deep Learning
.
EarthArxiv 3162. https://doi.org/10.31223/X5BW6R
Xiang
Z.
&
Demir
I.
2022b
Fully Distributed Rainfall-Runoff Modeling Using Spatial-Temporal Graph Neural Network
.
EarthArxiv 3018. https://doi.org/10.31223/X57P74
Xiang
Z.
,
Demir
I.
,
Mantilla
R.
&
Krajewski
W. F.
2021
A Regional Semi-Distributed Streamflow Model Using Deep Learning
.
EarthArxiv 2152. https://doi.org/10.31223/X5GW3V
Yan
J.
,
Jin
J.
,
Chen
F.
,
Yu
G.
,
Yin
H.
&
Wang
W.
2018
Urban flash flood forecast using support vector machine and numerical simulation
.
Journal of Hydroinformatics
20
(
1
),
221
231
.
Yaseen
Z. M.
,
El-Shafie
A.
,
Jaafar
O.
,
Afan
H. A.
&
Sayl
K. N.
2015
Artificial intelligence based models for stream-flow forecasting: 2000–2015
.
Journal of Hydrology
530
,
829
844
.
Yaseen
Z. M.
,
Ebtehaj
I.
,
Bonakdari
H.
,
Deo
R. C.
,
Mehr
A. D.
,
Mohtar
W. H. M. W.
,
Diop
L.
,
El-Shafie
A.
&
Singh
V. P.
2017
Novel approach for streamflow forecasting using a hybrid ANFIS-FFA model
.
Journal of Hydrology
554
,
263
276
.
Yaseen
Z. M.
,
Awadh
S. M.
,
Sharafati
A.
&
Shahid
S.
2018
Complementary data-intelligence model for river flow simulation
.
Journal of Hydrology
567
,
180
190
.
Yildirim
E.
&
Demir
I.
2022
Agricultural flood vulnerability assessment and risk quantification in Iowa
.
Science of The Total Environment
826
,
154165
.
Zhou
H.
,
Zhang
S.
,
Peng
J.
,
Zhang
S.
,
Li
J.
,
Xiong
H.
&
Zhang
W.
2021
Informer: Beyond efficient transformer for long sequence time-series forecasting
. In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol.
35
, No.
12
, pp.
11106
11115
.
Zhou
T.
,
Ma
Z.
,
Wen
Q.
,
Wang
X.
,
Sun
L.
&
Jin
R.
2022
Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting
. In
International Conference on Machine Learning
,
PMLR
, pp.
27268
27286
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data