Abstract
Skillful streamflow forecasts can inform decisions in various areas of water policy and management. We integrate numerical weather prediction ensembles, distributed hydrological model, and machine learning to generate ensemble streamflow forecasts at medium-range lead times (1–7 days). We demonstrate the application of machine learning as postprocessor for improving the quality of ensemble streamflow forecasts. Our results show that the machine learning postprocessor can improve streamflow forecasts relative to low-complexity forecasts (e.g., climatological and temporal persistence) as well as standalone hydrometeorological modeling and neural network. The relative gain in forecast skill from postprocessor is generally higher at medium-range timescales compared to shorter lead times; high flows compared to low–moderate flows, and the warm season compared to the cool ones. Overall, our results highlight the benefits of machine learning in many aspects for improving both the skill and reliability of streamflow forecasts.
HIGHLIGHTS
We integrate numerical weather prediction ensembles and distributed hydrological models to generate ensemble streamflow forecasts.
We compare the performance of machine learning-based postprocessor with the standalone configurations of hydrometeorological modeling and machine learning.
Machine learning postprocessor improves both the skill and reliability of ensemble streamflow forecasts.
INTRODUCTION
Reliable and skillful streamflow forecasts are crucial for informing decisions related to water resources management, water supply planning, and preparedness against extreme events. The ensemble prediction system is becoming increasingly popular for streamflow forecasting (Cloke & Pappenberger 2009; Demargne et al. 2014; Troin et al. 2021; Hapuarachchi et al. 2022; Liu et al. 2022), as they have demonstrated substantial improvements over the single-valued deterministic forecasts (Siddique & Mejia 2017). An ensemble prediction system can provide multiple realizations of possible streamflow conditions enabling decision-makers to have a better idea of the likelihood of a specific future event (e.g., the probability of exceeding a flood threshold). More specifically, ensembles can provide an estimate of predictive uncertainty that can help decision-makers to determine the level of confidence they can place in the forecast.
Within an ensemble prediction system, hydrological models are generally forced with ensemble meteorological forecasts from weather prediction models (Demargne et al. 2014; Pagano et al. 2016; Alfieri et al. 2017; Siddique & Mejia 2017; Zhang et al. 2020; Hapuarachchi et al. 2022; National Weather Service 2022) as opposed to streamflow simulations, which are often generated by hydrological models forced with meteorological observations (Kratzert et al. 2018; Feng et al. 2020; Konapala et al. 2020). In the United States, the NOAA's National Weather Service River Forecast Centers are implementing the Hydrological Ensemble Forecast Service (HEFS) to incorporate meteorological ensembles into their flood forecasting operations (Brown et al. 2014; Kim et al. 2018; National Weather Service 2022). A few other examples include the European Flood Awareness System from the European Commission (Alfieri et al. 2017) and the Flood Forecasting and Warming Service from the Australia Bureau of Meteorology (Pagano et al. 2016) which have adopted the ensemble paradigm. However, the shortcomings in the hydrologic model structure and parameters, inadequate representation of physical processes, and biased meteorological forcing can introduce biases in ensemble streamflow forecasts (Brown et al. 2014). Uncertainties in the ensemble prediction system are of both meteorological and hydrological origins (Demargne et al. 2014); hence, the bias structure can be different in an ensemble prediction system than in hydrologic simulations (Siddique & Mejia 2017). Correcting these residual errors and biases can improve the skill and reliability of streamflow forecasts (Regonda et al. 2013).
Hydrologic postprocessors are used to quantify total predictive uncertainty and correct forecast biases (Seo et al. 2006; Regonda et al. 2013; López López et al. 2014; Alizadeh et al. 2020). The ensemble postprocessor (EnsPost) is an integral part of the NOAA's HEFS system (Seo et al. 2006). EnsPost is a typical Hydrologic Model Output Statistics (HMOS) approach that relies on the combination of probability matching and autoregressive modeling to correct streamflow forecast biases. Several other HMOS postprocessors have been proposed, including Logistic regression (Duan et al. 2007), Quantile regression (Koenker 2005), Autoregressive exogenous model (Regonda et al. 2013), and General linear model (Zhao et al. 2011). However, HMOS postprocessors (i) show limited performance across longer lead times (beyond ∼day 3), particularly for random errors (Sharma et al. 2019); (ii) are often irrelevant for hydrologic conditions (e.g., extreme events) outside the training period (Siddique & Mejia 2017); and (iii) fail to capture the nonlinear dynamics in hydrometeorological predictions (Regonda et al. 2013).
Another avenue to address shortcomings in postprocessing is machine learning. Machine learning algorithms identify the nonlinear patterns in a historical dataset during training and use those patterns to correct for systematic ensemble biases. Postprocessing the highly nonlinear Numerical Weather Prediction (NWP) outputs is one of many applications of machine learning techniques (Rasp & Lerch 2018; Loken et al. 2019, 2020; Grönquist et al. 2021; Kirkwood et al. 2021). Machine learning applications in hydrology broadly span rainfall–runoff modeling (Van et al. 2020), groundwater modeling (Wunsch et al. 2021), hydrologic predictions (Panahi et al. 2022), and climate change impact assessment (Bai et al. 2021), among others. For instance, Kratzert et al. (2019) demonstrated the ability of the Long Short-Term Memory (LSTM) neural network in simulating streamflow at ungauged basins based on static catchment characteristics. Konapala et al. (2020) built a hybrid framework by coupling the neural network with the hydrological model output to postprocess streamflow simulations in diverse catchments across the conterminous United States. Recently, Frame et al. (2021) demonstrated the application of LSTM networks in postprocessing National Water Model output. Although recent studies (Kratzert et al. 2018; Tyralis et al. 2019; Feng et al. 2020; Konapala et al. 2020; Xiang & Demir 2020; Frame et al. 2021; Sikorska-Senoner & Quilty 2021; Cho & Kim 2022) have shown important applications of machine learning to improve various aspects of hydrologic modeling and simulations, their ability to improve the skill and reliability of streamflow forecasts obtained from the ensemble prediction system has not been examined rigorously (Alizadeh et al. 2021; Lee & Ahn 2021).
Machine learning configurations in forecasting mode range from standalone (Cheng et al. 2020) to hybrid (Hunt et al. 2022) to postprocessing (Liu et al. 2022). Cheng et al. (2020) used an artificial neural network and LSTM to forecast streamflow at daily and monthly scales. Hunt et al. (2022) developed a hybrid LSTM configuration trained with catchment-mean meteorological and hydrological variables to produce skillful medium-range streamflow forecasts across various climate regions over the western United States. Liu et al. (2022) integrated meteorological forecasts, hydrological modeling, and machine learning to improve flood forecasting over a cascade reservoir catchment. Overall, studies indicate that the relative effects of machine learning depend strongly on the forecasting system (e.g., forcing, hydrological model), forecasting conditions (e.g., lead time, study area, flow threshold, season), and machine learning configurations (e.g., standalone, hybrid, postprocessor) underscoring the research need for continuous rigorous verification of new forecasting systems that incorporate NWP, hydrological modeling and machine learning.
We demonstrate the application of a machine learning-based postprocessing approach for medium-range ensemble streamflow forecasts generated using a hydrologic ensemble prediction system. We use the National Centers for Environmental Prediction Global Ensemble Forecast System Reforecast version 2 (GEFSRv2; Hamill et al. 2013) to force a spatially distributed hydrological model and generate raw ensemble streamflow forecasts at medium-range lead times (1–7 days). The raw ensemble streamflow forecast is used to configure the neural network postprocessor, which we describe in detail in Section 2. Then, we assess the quality of machine learning postprocessed forecasts relative to the climatological, persistence, deterministic, and raw ensemble forecasts. We also compare the performance of machine learning-based postprocessor with the standalone configurations of hydrometeorological modeling and machine learning. Here, we address two main questions: (i) How skillful are the machine learning postprocessed ensemble streamflow forecasts at medium-range forecast lead times? and (ii) What forecast conditions (e.g., lead time, season, and flow threshold) benefit the most from machine learning?
MATERIALS AND METHODS
Hydrologic modeling
Map of the study area showing (a) topography, stream network, location of the selected gauge station (green dot), and (b) land cover types. The inset map shows the approximate location of the study area in the United States. Please refer to the online version of this paper to see this figure in color: http://dx.doi.org/10.2166/hydro.2022.114.
Map of the study area showing (a) topography, stream network, location of the selected gauge station (green dot), and (b) land cover types. The inset map shows the approximate location of the study area in the United States. Please refer to the online version of this paper to see this figure in color: http://dx.doi.org/10.2166/hydro.2022.114.
We employ a Regional Hydrological Ensemble Prediction System (RHEPS; Sharma et al. 2019; Siddique & Mejia 2017) to generate raw ensemble streamflow forecasts at the selected USGS gage station (i.e., USGS 01510000). The RHEPS is an ensemble-based river forecasting system aimed at enhancing hydrologic forecasting at a regional spatial scale by integrating new system components within a verifiable scientific and experimental setting. RHEPS uses multisensor precipitation estimates and gridded near-surface air temperature observation datasets (Siddique & Mejia 2017) to run the hydrological model in simulation mode and to train the standalone LSTM neural network. RHEPS uses precipitation and near-surface temperature ensemble forecasts from the National Centers for Environmental Prediction GEFSRv2 (Hamill et al. 2013) as forcing to the NOAA's Hydrology Laboratory Research Distributed Hydrologic Model (Koren et al. 2004) and generates ensemble streamflow forecasts. The GEFSRv2 is an 11-member ensemble forecast generated by perturbing the NWP model initial conditions using the ensemble transform technique with rescaling. The hydrologic model runs were initiated once a day at 00:00 Coordinated Universal Time. Each forecast cycle consists of 6-hourly streamflow forecasts that extend from day 1 to day 7. We run a hydrologic model in a fully distributed manner at a spatial resolution of 2 km × 2 km. Streamflow forecast consists of 11 ensemble members, one of which is an unperturbed member, and the rest are members generated with GEFSRv2 perturbed initial conditions. Deterministic forecasts are obtained by forcing a hydrologic model with unperturbed GEFSRv2 members. Streamflow forecasts from the RHEPS are produced for medium-range time scales and the period of 2004–2012. Daily streamflow observations for the selected location were obtained from the USGS. The streamflow observations are used to verify the raw and LSTM-postprocessed ensemble streamflow forecasts.
LSTM network
LSTM-enabled ensemble streamflow forecasting framework. The framework consists of numerical weather prediction output to force the distributed hydrologic model and produce raw ensemble streamflow forecasts. Raw ensemble streamflow forecast is used as an input to the LSTM model. We also show here the internals of the LSTM cell, where f stands for the forget gate, i for the input gate, o for the output gate, ct denotes the cell state at time step t, and ht denotes the hidden state.
LSTM-enabled ensemble streamflow forecasting framework. The framework consists of numerical weather prediction output to force the distributed hydrologic model and produce raw ensemble streamflow forecasts. Raw ensemble streamflow forecast is used as an input to the LSTM model. We also show here the internals of the LSTM cell, where f stands for the forget gate, i for the input gate, o for the output gate, ct denotes the cell state at time step t, and ht denotes the hidden state.
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if01.gif?Expires=1732031603&Signature=gjOCLwfhL~aH5z0nqA2hN1xToRIpYhd6bnRH0gcWSeNf7KSriPgQMdHpXGnzzImjuZw~rTYJLRt8hcVNilq9NQgPcE3LmzoRvCEhbBwH1rST0xgsFreFTvF~GRgYswJMRA4ql1tKT3p0opuTvb~yYeFBvjCJqhqOUbhj3abpiUFMvHVbf1vdgZM70QLzRwLDr024t43XH-SRoP7qIr7~Rl74PI6-IfipfhP2MUxxG4JBuOHIalQ5hgFHWCoruvz6nQE~MQSkQs1Z8b-HGTUt957XKHTK~117dMFiSa242d2m3koHO-XA9bKz~PEOcwqWKmF5sz3HV4PfYcBYdnfj6A__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if02.gif?Expires=1732031603&Signature=cZih719UzxViHl~jPcKFafVqOtd43MMZij6FPhb1x2UMZ8SWA7xN7UqamvWK9NsC73f-Gp0c6KY3NrPcMKG03lTqtWq01mFdf2virABUFVJRlyzaCTEfDXDOZOAX7MbuSpUf35Plt9anA4SVQCZDjMDjNDRVB8auP3ZQFF9bR4yhk7p-p1s2frOsLXK2unKk2kRHSQeRJwwLkhVHjVfmdGIJRIjQ8aduV2Y5L0pm~-LRnKI2I6rUiOsq0oC8A~2SDqJ05bQgs9AR10ePgSBG48gE8EWOK6xO-IIVs2NA8wRk56qw-3yUEgsqxuTiqAYZ8WIuuBS7My5Z1L8pL7iTKQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if03.gif?Expires=1732031603&Signature=jZrEGTecBBzdWORPH3D9lqBgWi2Fu8XtTMDT5SIiZHK8jU9Dnh2yAiNfL38J7vm525BauiBp1g7i~u-X-uDPoy1z27wybIz6kgMENPBQEGv~ZkIUZmdkrDQEEk6c~mQ34MycBJz8vJWuzpTPXu8GqMAHOczCmsayaBrXRc6okxQ8jrYEm47PhgHAmbgCw3IcDDdOJmj~DOE~193kPHDuITxwhgY4nlPxOEutH3LO4qdZEX24M~PO9BTvheB3euugb6HNsQHg3wnue2iOObBWgRWgi~BDA8hH51yPQ2Ets0Kl5rh1oKVHWqol4Xg8DIw8FSLSCw7CN0BODiHXNGxUlQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if04.gif?Expires=1732031603&Signature=ko64hIjj2Ff62mQw2lRaxy8HDhHzNLzMUOniIFY7NhjqnJR2Ym-22xEKxFypr7CXGUCHIanZIN6I3GntwJVP3vWkDsSMJ3r~CmO76HCLg0Y4jr0pUuOeMYDd7GKf6aOpOalp9UTjKeWPpxY0Agje-JQloFloyUCnkel55E1PCuqFRZ9WiFLuFVadutBk-aVmuGgTnVs4EoC8n0llWiS4SBRcCEERdTHCG4fgkHt9zVPBXb1B3zXC9aqgoSNRf~afAL8pFzKzOqCjYSchevQxhla35OEsqgJF6YAIeGgOrul69~goOFNXEVk~nuq1ic1dqgx2EuwOn9NtX9mhV7PVsw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if05.gif?Expires=1732031603&Signature=ibPOkpK4ica5Xo5s5ZUZkqg1CgFZFQpEkxxtyZwB789LMOeyVDJgWVSemFJ4l0YVGhtgBYqEfbfTpHlIBT3t3op2uAE9sorGgIX-hNXw60dsuATeOsqfAuTxBKuak9enN30uVR8D-RqY5k-GXpUAAvo0Nkes7A6XP2S5ODsHTxIjMRJ1AFMUed464uj1fqrXwaqBffWvxP5oyHTIN4X5oVg7TaxWG-3l2fHYAvL8ER0tU2ym1qEm40BXS8UW7JzRa5KG~sHaw3-xc4ExlrlMljzjWST7JDC7qeoYRBXAjEIcgSX3fMiPAhKlDBuCpULo-8s~Wk-~EOL8LJrJ08Ar-g__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if06.gif?Expires=1732031603&Signature=cqunhizd4HP0vOeir42Surp-ltqshFrL7j~QI6cO-6Y3LQ3LO0xVbDvazypbkqmuewIcV7mSwkTZLDKSEmnbZYbggJJI~XH2S4dfaC-KuFYeNo-0G3VA1KhzPdzumgUkux9lOy7vYbuBAuRPjAU9Js-vw~QW7ZCvtcnSu0Y5rYMttF-nBAm~7W2fwhARwRtRZ0H~sYZg4h14HWW48-a~QNLyRPNyiHOh-wRanQtlqWf9-J7U~Wq-01VGkm78Jq-hbiL3-9jzqYzEu7ks15lrGfWrJomUsEY4WBQBMQD1pXACheRm36-7ymNiiTxVHYB7EXu41oEEK3Lmvjn7-BjpHA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if07.gif?Expires=1732031603&Signature=pViNG-bLv6ZYU3Bj4sad0PO5rLLuht93Vz4CczzpqCzXzupTPa-u3JAGArZr8FB6lbrBHZBf0RKN-JOIDMdlN0EE~3Csdf9k7f1GGD-72r48uThTW9Yh02z0eLGBGLaNSTvu-yp-bU6aWrmdbzs2nhws2BVCj2miDxfDk2wBh~g0anpy0DQR2ZJQbFpqIjlvuL9g5DGyIs7D6q2b2wRNdd4vDCY0elko5sipCnBOCi13ZmF1-S3JZgWzIyeIHJuu~bQVhoEA7UiUrucjUaetw5YKPG-eKTXQDxmcJ-XmqUHByHNovsABzkRkfC2t~vVZbxJ8~07ap9zTsypkVF4cIw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if08.gif?Expires=1732031603&Signature=QglnB89l0ShKoikCxwoarpQM21AThAX4M1H3MahXMt69~QC75CRZ7guG1ngBSIA6En89P2wb0WU1alDp7ZlwVNQld5LDmMK8wTV7Kqmquk0XG~aXBCkf6izL~A6Ow4wYKHfTVN~4kxvNLDbT1E3DCDlSfZXrs7ZmaNyhvthK5bi2m~JxJx41ppn~fj5YfbaNXcA2DCt5V~urQcW9bnSSTvAeVBYMhrkUyfmE2HdkAcrh952sydyZNPBijuWPSeg-wztZYT6Ait7OsMufUPh6EfREhaxcKTciQHLCBhXqVR4Ou~MxxEFYZXRaOeeUUgFsMrcAREcAhYgLndgwAGLp5A__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if09.gif?Expires=1732031603&Signature=1EsQJ~yJY03F6gLvG4nv9XCv3x85tc2sOASFxk0v-QcgKB2gxljGUncYN9MjjhujGPFVCecq7dU~5uDnFgrk9UBBzF4o3stDdFQC3m0n32Wx9l42dA34rZ1wsxfIEk858ks40dTHwjHMl~2VnUAu~xjZirIdnpfMzPLyEpkm6fqkT8hXsjoUnOlQZEWBYcf2bZ9oQZ-4CIbfo8-EzPjzNk1WQsv70pf~SZYmYqLj4CGqjRqFqSMApcoRGT9N0wMxugm1pDIGll3Hef1rmliwhyX3kSjygqh42xIVrGXhrxMoMWN8jb7F51Xf-5Zp03DNLCUk64lDiHxpDhqur3Z2gA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
The intuition behind this network is that the cell states behave as the memory unit to remember useful information through the different operations of each gate.
We trained and tested two different configurations of the LSTM neural network to generate daily ensemble streamflow forecasts. First, a standalone LSTM neural network is trained with meteorological observations, including multisensor precipitation estimates and gridded surface temperature. The trained standalone LSTM is forced by the precipitation and near-surface temperature ensemble forecasts from the GEFSRv2 to produce ensemble streamflow forecasts. Second, we trained and tested the LSTM model that simulates streamflow forecast residuals based on the time series of raw ensemble streamflow forecasts generated by forcing a distributed hydrologic model with GEFSRv2. Raw ensemble streamflow forecast residual is the difference between raw ensemble member forecasts and observations. Here, we implemented LSTM separately for each lead time (days 1–7) and each ensemble member. The estimated residual is then added to the corresponding raw ensembles from the hydrologic model to generate the postprocessed ensemble streamflow forecasts. We implemented LSTM for the period of 2004–2012, with 6 years (2004–2009) for training the LSTM network and the remaining 3 years (2010–2012) for validation. For this, we generated 6-hourly streamflow forecasts since this is a temporal resolution often used in medium-range operational forecasting in the United States. Note that we computed the mean daily flow forecast from the 6-hourly flow forecasts. The LSTM is then configured to the mean daily forecast residuals for forecast lead times from days 1 to 7.
The LSTM model performance is influenced by the choice of hyperparameters, including optimization algorithm, number of hidden layers, number of samples propagated through the network for each gradient update (batch size), and number of trained epochs. For efficient learning and faster convergence, all input and output features are first normalized to the range [0, 1] without altering the shape of the original distribution. We considered a single-layer network with 20 hidden states. We select a batch size of 32 (Bengio 2012) and mean-squared error as a loss function. Adam algorithm (Kingma & Ba 2014) is applied to optimize and update the network weights. We trained the network for 30 epochs. These values are determined through several sensitivity test runs for different lead times and ensemble members. These selections are carefully done to minimize the loss function, prevent overfitting issues, and allow the network to have robust learning.
Forecast verification
We verified streamflow forecasts against observed streamflow at the basin outlet (USGS 01510000). Verification was performed conditionally upon a lead time, flow threshold, and seasonality. We employed Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), and Percent bias (Pbias) for assessing the quality of ensemble mean forecasts. The NSE (Nash & Sutcliffe 1970) is the ratio of the residual variance to the initial variance. The range of NSE can vary between negative infinity to 1, with 1 representing the optimal value and values should be larger than 0 to indicate minimally acceptable performance. RMSE measures the square root of the mean of the squared errors. Pbias measures the average tendency of the model predictions to be larger or smaller than their observed counterparts. We benchmarked the performance of raw and LSTM-postprocessed ensemble mean forecasts relative to low-complexity forecasting approaches such as climatology and persistence-based forecasts (Ghimire & Krajewski 2020; Krajewski et al. 2020). The persistence-based forecast implies that the streamflow behavior does not change over the forecast lead time. The approach is tied to the concept of ‘memory’ of the system. Here we explore two approaches: simple persistence and anomaly persistence in streamflow forecasting. Simple persistence assumes that the streamflow forecast at one step ahead is dependent on the streamflow at the current time step. The anomaly persistence forecast scheme assumes that streamflow anomalies persist over the lead time. Note that we computed the anomaly with reference to the climatological average.
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if19.gif?Expires=1732031604&Signature=EH5GiZJ5vC9MJ8yGnl-MJOvZpG2M--kmcl61K2GKb5mkbywJLZ05J1Ipkq0OoiduwMTBp2G0AZLqG8wCGCq-x2J32FxkbYWtXpJ-NDiy~xQUIRHfYh1rhl5Zl6vsvB3NHdqYQbK0f08vR2yNXQj2lDpateLSQZdjdAS5RYK7p9SAUi~ytxLACl8Wde5JRVtwyMISvXVSXwWoSXbs9Z9L8qaXwFSXz18tEwDJHd06gvA4uhB-5ce1oVpV2Wiz1tiaJu-4HoerSFTMMg8sq1rWDW~01rahPyZCzvOvs2uwzRrhwpENtGKZWu02LJV8iErbn5c4oqXBlw9ZqUnr8mtNvA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if21.gif?Expires=1732031604&Signature=SGzeVyfj1CUAolFnDyhmkNwW4-Fi2c1Nsd~5mWU-SAnULS-nr02kss9Xx29-nkCsJ79xRbEDOGZj6EdGnH2Me4Jnr20JHKaMtJzQvBPJCl-vV32eBE-OfMaa~LWr8g8Dno6P5ffvbL4LeOOjvdECuYCT9W4H1R1XvHYzzNWkc56s1xwK7K5G0o0~KW-ySLUvI6SG-A-dKPIIvCvo4rT-c9wiqXjDggZ0erKPb73ipJ3khb7hsteBKRlz7343LK919I~hyuJWRob291DNJgWq9-SuwQBoVz0YcJrPUjXm9hzLW-b7BWOFdUiMGM-~Dx4xRZ94ycnkmJJ8-gj2KaPVUg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if22.gif?Expires=1732031604&Signature=HRW8tNtfpZPuG95QzGn86lOX4ZbyyYKLZAfZNgHShmgjZWjaqS8PqF3MqF9714BDq5cdNxR~CbjbmB2CzlnUkon7alPPxkchSqBxILjXcpeAfs9ylFjJW6n4jZM708120U0Z42elZxKLBJwx3EMft5CHUUfOlNC00t1Tmnt0-IPzcYuruG0VpXpFPT67wvVKPHw0mnOMfgEresTSTdHAHCvJA5th8mH13J86JnvPTQOqo-hHVr2hxdGkM00Uzbea5vPk4KqfkNCmcM-KrGKm0WrfempLqoNd2GkGrWV-xhOlT-1fGZrTbCFFeFNAccPW-VkglASaiNkvhcytecs4ow__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![](https://iwa.silverchair-cdn.com/iwa/content_public/journal/jh/25/1/10.2166_hydro.2022.114/3/m_hydro-d-22-00114if23.gif?Expires=1732031604&Signature=vLFy-r00Hb3cHi6wStRdVLdn6XzY8qfA0xsicbYV-QyIluVvB-DWr2kiNwxX6lQH3X9uM9NLrdOIWFjMQFyJV2ThRCC139GLV1OsammKCFBfluCO0e2DrxPl6vMkINM-S~1tlYO5OFgKR3rUjCJ1SfwKH0SuLI-7~w-hVbvugUA~Ih1fHQLHoMuluasxacoKtCTaiY2RlWi2k77eJKPSBXLRQ-M~urux4jNqPFxQ~Ste6Gjd55utabBSnBYx0rzvERtDXZB6QXFQybl28itSvj7UnLG5VNaKou0YdUjnAF4L3POXtd1SoOUW22fG2oETbD9RXeHPleG-dkg~ZOPtlQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
The reliability diagram plots against
for the total number of forecasts in each bin.
RESULTS AND DISCUSSION
Comparison of simulated and observed streamflow hydrographs. Daily streamflow observations are from the US Geological Survey (USGS) at USGS 01510000 located in the Susquehanna River, Pennsylvania, USA. We also show the model performance metrics: correlation coefficient (R), Nash–Sutcliffe efficiency coefficient (NSE), and Percent bias (Pbias).
Comparison of simulated and observed streamflow hydrographs. Daily streamflow observations are from the US Geological Survey (USGS) at USGS 01510000 located in the Susquehanna River, Pennsylvania, USA. We also show the model performance metrics: correlation coefficient (R), Nash–Sutcliffe efficiency coefficient (NSE), and Percent bias (Pbias).
(a) Nash–Sutcliffe efficiency (NSE) and (b) root mean square error (RMSE) between forecast and corresponding observed streamflow.
(a) Nash–Sutcliffe efficiency (NSE) and (b) root mean square error (RMSE) between forecast and corresponding observed streamflow.
The most salient feature in Figure 4 is that LSTM-postprocessed ensemble mean forecast outperforms all other forecasting approaches across all the lead times. The improvement is small at the initial lead time but gradually increases with longer forecast lead times. This is expected because LSTM tends to reduce overall uncertainty which shows an increasing trend with increased lead times. At longer lead times, thus, LSTM-postprocessed forecast has shown the ability to remove greater amounts of biases when compared to other techniques (Figure 4(b)). As compared to the raw ensembles generated by hydrometeorological modeling, the relative gain in NSE from the LSTM postprocessor varies from ∼0.03 on day 1 to ∼0.43 on day 7. The general tendency is for hydrometeorological modeling and standalone LSTM to perform similarly. The standalone LSTM, however, tends to show a slight NSE gain at later forecast lead times. As compared to the standalone LSTM, the gain in NSE from the LSTM postprocessor is as high as 0.38 at the lead time of day 7. A slight jump in the deterministic metrics (NSE and RMSE) for both raw and LSTM-postprocessed ensemble mean forecast at day 4 could be due to the inability of the ensemble mean to capture some important features of forecast uncertainty.
Furthermore, we compare the performance of LSTM with the regression-based quantile regression postprocessor (Koenker 2005). We select the quantile regression postprocessor for comparison since it is the most widely used hydrologic postprocessor (Weerts et al. 2011; López López et al. 2014; Dogulu et al. 2015; Mendoza et al. 2016) and has shown to outperform several postprocessors for different forecasting conditions (Sharma et al. 2019). Quantile regression has several strengths (Koenker 2005): (i) no prior assumptions regarding the shape of the distribution; (ii) provides conditional quantiles rather than conditional means, and (iii) less sensitive to the tail behavior of the streamflow dataset and consequently, less sensitive to outliers. We used quantile regression to estimate error distribution, which is then added to the ensemble mean to form a calibrated discrete quantile relationship for a particular lead time and generate an ensemble streamflow forecast (Weerts et al. 2011; López López et al. 2014; Dogulu et al. 2015). Overall, the LSTM postprocessor performs better than quantile regression across all the lead times (Figure 4(a) and 4(b)). The differences in NSE and RMSE between the two postprocessors are minimal at initial lead times (days 1 and 2). However, as the lead time progresses, the LSTM shows substantial improvement over quantile regression. As compared to the quantile regression, the relative gain in NSE from LSTM varies from 0.02 on day 1 to as high as 0.34 on day 7 (Figure 4(a)). Overall, the NSE of LSTM-postprocessed ensemble mean forecasts ranges from ∼0.77 (day 1) to ∼0.52 (day 7) (Figure 4(a)); while the RMSE ranges from ∼6.0 m3/s (day 1) to ∼8.5 m3/s (day 7) (Figure 4(b)).
Brier Skill Score (BSS) of the raw ensemble forecasts (dashed lines) and LSTM-postprocessed ensemble forecasts (solid lines) vs. the forecast lead time during the (a) cool season (October–March) and (b) warm season (April–September), under low–moderate flow and high-flow conditions. Raw ensemble forecasts are generated using the hydrologic model forced with the precipitation and near-surface temperature ensemble forecasts from the National Centers for Environmental Prediction Global Ensemble Forecast System Reforecast version 2 (GEFSRv2). The low–moderate flow category represents flows with a non-exceedance probability of 0.50 (Pr = 0.5), while the high-flow category is for a non-exceedance probability of 0.90 (Pr = 0.9; i.e., flows with exceedance probability less than 0.1 are denoted as high).
Brier Skill Score (BSS) of the raw ensemble forecasts (dashed lines) and LSTM-postprocessed ensemble forecasts (solid lines) vs. the forecast lead time during the (a) cool season (October–March) and (b) warm season (April–September), under low–moderate flow and high-flow conditions. Raw ensemble forecasts are generated using the hydrologic model forced with the precipitation and near-surface temperature ensemble forecasts from the National Centers for Environmental Prediction Global Ensemble Forecast System Reforecast version 2 (GEFSRv2). The low–moderate flow category represents flows with a non-exceedance probability of 0.50 (Pr = 0.5), while the high-flow category is for a non-exceedance probability of 0.90 (Pr = 0.9; i.e., flows with exceedance probability less than 0.1 are denoted as high).
The skill of the postprocessed ensemble streamflow forecasts is greater than the raw ones across seasons, flow thresholds, and lead times (Figure 5). As expected from our previous results (Figure 4), the relative gain in skill from LSTM increases with forecast lead times. This indicates that the streamflow forecasts are influenced by systematic biases, and those biases appear to have a strong effect at longer lead times. Indeed, as compared to the raw ensemble, skill dependence with lead times is reduced after postprocessing. Seasonal improvements in postprocessed forecast skills to capture baseflow or low flow have been remarkably good across all the lead times. This means LSTM can skillfully predict low to moderate flows irrespective of seasonal influences. This is understandable as hydrologic systems do not show significant alterations during low flow events. However, for the high flows, the skill improvement from LSTM becomes more apparent at later forecast lead times demonstrating its added value in forecasting flood conditions at medium-range time scales. High flows result from the direct response of the basins to extreme precipitation events, whereas the low to moderate flows are dominated by subsurface processes. In addition, the hydrologic uncertainties dominate forecast skill at initial lead times, while the meteorological uncertainties are more influential at longer forecast lead times (Siddique & Mejia 2017). Hence, the high-flow forecast at medium-range timescales and smaller basins can particularly benefit from improved ensemble meteorological forcing (Ghimire et al. 2021); whereas the forecast skill at initial lead times can benefit from the improved representation of hydrologic model states (e.g., soil moisture) and initial conditions. This highlights the need of addressing both hydrologic and meteorologic uncertainties to further enhance the streamflow forecast skill. As our results show, LSTM can significantly complement hydrologic models in addressing such a challenge. The results are promising because the LSTM derives its enhanced skill from the streamflow persistence in addition to the long memory of the meteorological forcing that LSTM preserves.
The overall skill of both raw and postprocessed forecasts is slightly greater in the warm season than in the cool ones (Figure 5). Seasonal skill differences are more apparent for raw ensemble streamflow forecasts and particularly for low–moderate flows. The reason for seasonal skill variations is that during the cool season the hydrologic conditions are likely influenced by snow accumulation and melting. Thus, a better representation of snow dynamics in hydrologic modeling could contribute to improving the forecast skill. As compared to the raw ensemble, seasonal skill variations for low–moderate flows are somewhat reduced after postprocessing. Note that LSTM benefits from hydrologic persistence which is generally stronger for low–moderate flow conditions (Ghimire & Krajewski 2020). Overall, for low–moderate flow conditions, the BSS of the postprocessed forecast ranges from ∼0.69 (day 1) to ∼0.57 (day 7) during the cool season (Figure 5(a)), and 0.76 (day 1) to 0.67 (day 7) during the warm season (Figure 5(b)). BSS for high-flow conditions ranges from ∼0.52 (day 1) to ∼0.43 (day 7) during the cool season (Figure 5(a)), and 0.62 (day 1) to 0.50 (day 7) during the warm season (Figure 5(b)).
Reliability diagrams for the low–moderate flow and high-flow across forecast lead times of (a) 1, (b) 3, and (c) 7 days. Different reliability curves represent the raw and LSTM postprocessed streamflow ensembles. Raw ensemble forecasts are generated using the hydrologic model forced with the precipitation and near-surface temperature ensemble forecasts from the National Centers for Environmental Prediction Global Ensemble Forecast System Reforecast version 2 (GEFSRv2).The low–moderate flow category represents flows with a non-exceedance probability of 0.50, while the high-flow category is for a non-exceedance probability of 0.90 (i.e., flows with an exceedance probability less than 0.1 are denoted as high). Reliability curves that tend to align along the diagonal line are preferred (more reliable).
Reliability diagrams for the low–moderate flow and high-flow across forecast lead times of (a) 1, (b) 3, and (c) 7 days. Different reliability curves represent the raw and LSTM postprocessed streamflow ensembles. Raw ensemble forecasts are generated using the hydrologic model forced with the precipitation and near-surface temperature ensemble forecasts from the National Centers for Environmental Prediction Global Ensemble Forecast System Reforecast version 2 (GEFSRv2).The low–moderate flow category represents flows with a non-exceedance probability of 0.50, while the high-flow category is for a non-exceedance probability of 0.90 (i.e., flows with an exceedance probability less than 0.1 are denoted as high). Reliability curves that tend to align along the diagonal line are preferred (more reliable).
CONCLUSION
We demonstrate a case study for machine learning application in postprocessing ensemble streamflow forecasts in a real-time streamflow forecasting setup. We employed the LSTM to correct residual errors and biases in raw ensemble streamflow forecasts generated from a RHEPS (Sharma et al. 2019) at medium-range (1- to7-day lead times) timescales. We configured LSTM for individual 11-member ensemble streamflow forecasts and for each forecast lead time of days 1–7. We used LSTM for the period of 2004–2012, with 6 years (2004–2009) for training the LSTM network and the remaining years (2010–2012) for verification. We assessed and verified the performance of LSTM-postprocessed forecasts with different forecasting approaches, including simple persistence, anomaly persistence, climatology, deterministic forecast, raw ensemble forecast, and standalone LSTM.
In summary, based on our analysis and comparison, we found that raw ensemble streamflow forecasts generated using hydrometeorological modeling outperform both the deterministic and persistence-based forecasts. The general tendency is for the standalone LSTM to perform slightly better than the raw forecast from hydrometeorological modeling across lead times. LSTM postprocessing improves streamflow simulations and forecasts as compared to other forecasting approaches. LSTM postprocessing can improve forecast skill and reliability across all the seasons, flow thresholds, and forecast lead times. The relative gain from LSTM is generally higher at medium-range timescales (4- to 7-day lead times) compared to initial lead time (1- to 3-day lead times); high flows compared to low–moderate flows, and the warm season compared to cool ones. For the high flows, the skill improvement from LSTM becomes more apparent at later forecast lead times demonstrating its added value in forecasting flood conditions at medium-range time scales. The overall skill of both raw and LSTM-postprocessed forecasts is slightly greater in the warm season than in the cool ones. The reliability diagram shows that the LSTM postprocessor can correct biases in the raw ensembles ultimately making the postprocessed ensembles more reliable than the raw ones across lead times and flow thresholds.
This case study demonstrates the potential of LSTM to postprocess ensemble streamflow forecasts in a medium-size Upper Susquehanna River basin in the eastern United States. Future studies could explore it under a wider range of forecasting conditions and basin scales across diverse hydroclimatic regions. To continue expanding this research, we plan to explore and evaluate the neural network multimodel ensemble to potentially improve flood forecasts across a range of spatiotemporal scales. Multimodel ensemble forecasting can account for multiple hydrometeorological processes interacting nonlinearly and hence could improve the forecast skill over single-model forecasts. A machine learning technique called super-learning or stacked-ensemble (Laan et al. 2007) is a loss-based supervised learning method that allows combining multiple machine learning algorithms. The goal with such implementations would be to bias correct the raw ensemble streamflow forecast using different machine learning techniques, optimally combine the bias-corrected forecasts to generate the super learner ensembles, and compare super learner with the traditional multimodal forecasting approaches such as Bayesian model averaging.
ACKNOWLEDGEMENTS
The authors are grateful to the anonymous reviewers for their reviews and constructive comments. Daily streamflow observation data for the selected forecast stations can be obtained from the USGS website (https://waterdata.usgs.gov/nwis/). Precipitation and temperature forecast datasets from the Global Ensemble Forecast System Reforecast Version 2 (GEFSRv2) can be obtained from the NOAA Earth System Research Laboratory website (https://www.esrl.noaa.gov/psd/forecasts/reforecast2/download.html).
FUNDING
This research received no external funding.
AUTHOR CONTRIBUTION
All authors contributed to the study design. S.S. led the calculations and wrote the initial draft of the manuscript. All authors revised and edited the manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.