Accurate river discharge forecasts for short to intermediate time intervals are crucial for decision-making related to flood mitigation, the seamless operation of inland waterways management, and optimal dredging. River routing models that are physics based, such as RAPID (‘routing application for parallel computation of discharge’) or its variants, are used to forecast river discharge. These physics-based models make numerous assumptions, including linear process modeling, accounting for only adjacent river inflows, and requiring brute force calibration of hydrological input parameters. As a consequence of these assumptions and the missing information that describes the complex dynamics of rivers and their interaction with hydrology and topography, RAPID leads to noisy forecasts that may, at times, substantially deviate from the true gauged values. In this article, we propose hybrid river discharge forecast models that integrate physics-based RAPID simulation model with advanced data-driven machine learning (ML) models. They leverage runoff data of the watershed in the entire basin, consider the physics-based RAPID model, take into account the variability in predictions made by the physics-based model relative to the true gauged discharge values, and are built on state-of-the-art ML models with different complexities. We deploy two different algorithms to build these hybrid models, namely, delta learning and data augmentation. The results of a case study indicate that a hybrid model for discharge predictions outperforms RAPID in terms of overall performance. The prediction accuracy for various rivers in the case study can be improved by a factor of four to seven.

  • Hybrid model that combines physics-based model with machine learning model.

  • Two hybrid modeling techniques: delta learning and data augmentation.

  • Hybrid model improves streamflow prediction accuracy over time.

  • Multiple machine learning models are implemented.

The management of stream and river systems entails significant and frequently costly decision-making procedures. These processes include floodplain mapping, citizen water usage, national-level infrastructure designs, and agricultural arrangements. Notable modeling investigations were conducted in the early 1950s, and over the past two decades, hydrologists have made significant efforts to develop large-scale hydrologic models (David et al. 2011a). Prior to the utilization of data-driven analysis through computational simulation, hydrologists predominantly relied on creating mathematical models of river systems as the primary method for generating hydrometeorological forecasts (David et al. 2016). Currently, significant research is dedicated to creating river models that focus on ground-level stream wave propagation across vast geographical boundaries. Notable examples of such models include the routing model proposed by Lohmann et al. (1996), total runoff integrating pathways by Oki & Sud (1998), LISFLOOD-FP by Bates & De Roo (2000), river transport model by Branstetter (2001), RiTHM (river-transfer hydrological model) by Ducharne et al. (2003), hillslope river routing by Beighley et al. (2009), and CaMa-Flood (catchment-based macro-scale floodplain) by Yamazaki et al. (2011).

Despite the advancements in these models, a significant challenge persists: a substantial portion of river stretches remains unmonitored. Streamflow measurements for the United States are primarily provided by the National Water Information System (NWIS) of the US Geological Survey (USGS) (see https://waterdata.usgs.gov/nwis). Local state or municipal institutions also contribute short-duration streamflow observations, which are generally less reliable than the NWIS dataset (Ghimire et al. 2023). As of 2023, the USGS reports that only 10,996 sites are equipped with direct automatic monitoring devices to record current conditions, despite the United States being home to approximately 250,000 rivers and having over 3.5 million miles of waterways. Furthermore, monitoring intervals vary widely and lack consistency. The only monitoring technique available aside from USGS observations entails using a bucket and timer to measure the volume of water collected over a specific period, but this is impractical for most streams except for those of the smallest order. Therefore, the challenge in predicting streams and waterways lies in the limited availability of discharge observations that are obtained from a restricted number of gauged locations. As a result, flow estimations rely on alternative techniques, such as the use of an acoustic Doppler current profiler or the utilization of the routing application for parallel computation of discharge (RAPID) model (McCarthy 1938).

This study focuses on RAPID, which is a river routing model based on the Muskingum algorithm developed by McCarthy (1938). David et al. (2011b, 2013, 2015) expanded on McCarthy’s work by proposing a matrix-based version of the Muskingum method that enables computation of flow and water volume in numerous sections of a river network, even those without measurement data. RAPID was specifically designed for the National Hydrography Dataset Plus (NHDPlus), a comprehensive dataset that effectively captures the water flow wave across a wide range of catchment boundaries (David et al. 2011b; Tavakoly et al. 2017). Within the Geographic Information System (GIS), NHDPlus contains monthly and annual streamflow estimates for over 3 million stream segments and provides a systematic representation of how these streams are interconnected within global river networks (David et al. 2011b).

Despite its high-performance computing capabilities, calibrating parameters in the RAPID model poses challenges due to the necessity for numerous measurements and the intensive computational and data requirements (Zhao et al. 2023). RAPID requires access to streamflow measurement data, which may be unattainable for certain unmonitored catchments. Meanwhile, RAPID’s process-based hydrology simulation also demands a substantial amount of computational processing time (Ghimire et al. 2023). To address these aforementioned difficulties, various research studies have been conducted. One such study integrates anticipated discharges from the variable infiltration capacity model with the RAPID model on a nationally scalable framework, incorporating waterway measurements from the NHDPlusV2 river network. This approach leverages high-level computational efforts to enhance parameter calibration (Ghimire et al. 2023). Another coupled model, AutoRAPID, combines runoff data from RAPID with watershed storm surge values generated by the AutoRoute model. This integration enables swift flood inundation predictions across large geographic areas in response to catastrophic situations (Follum et al. 2017). In addition to these coupled models, data-driven machine learning (ML) models have witnessed tremendous success in hydrological predictions and related fields. In particular, various ML models and hybrid models that combine two different types of ML models have been developed for time series prediction related to hydrology. For instance, a long short-term memory (LSTM) method tuned by ant lion optimization (called LSTM-ALO) has been applied to monthly runoff forecasting in the study by Yuan et al. (2018), where the ALO algorithm is employed to optimize parameters of the LSTM model. Similarly, a method called LSTM-INFO has been developed to predict water temperature by optimizing the LSTM model parameters using reptile search algorithm and weighted mean of vectors optimizer (Ikram et al. 2023). In addition to LSTM-based methods, support vector machine (SVM)-based methods have been explored in the studies by Adnan et al. (2023b) and Adnan et al. (2022) for evaporation prediction, resulting in methods called relevance vector machine tuned with improved Manta-Ray foraging optimization (RVM-IMRFO) and firefly algorithm and particle swarm optimization (SVM-FFAPSO). The other studied ML models for time series forecasting including the random vector functional link networks (Mostafa et al. 2023), hybrid heuristic algorithm (Adnan et al. 2021), and extreme learning machine methods tuned with metaheuristic algorithms (Adnan et al. 2023a). In the past decade, predicting time series data using ML models has become a very crowded field and numerous approaches have been proposed in the literature. Among all these available approaches, deep learning (DL)-based methods have shown promising potentials, especially for river discharge prediction. Kratzert provides convincing evidence that DL, specifically through the utilization of LSTM, produces superior runoff simulations from precipitation data in ungauged sites compared to conceptual models in monitored basins (Kratzert et al. 2019). Similarly, Nearing’s research also highlights the notable advantage of LSTM-based DL models in capturing dynamic watershed behavior and improving hydrological performance across various climatological circumstances (Nearing et al. 2021). Remarkably, even when catastrophic scenarios were not explicitly included in the training data, the LSTM network and its variants consistently predicted severe events with remarkable accuracy compared to the conceptual Sacramento model and the process-based US National Water Model (Frame et al. 2022). Furthermore, the hybrid hydrological ML model, which efficiently leverages the strengths of artificial neural networks and the k-nearest neighbor method, has demonstrated accurate and robust forecasting capabilities for extreme events (Kan et al. 2020).

The challenges associated with calibrating RAPID streamflow parameters, coupled with the calibration process’s reliance on gauged measurement data, motivated us to develop a novel data-driven parameter calibration model that utilizes the hydrological and topological features of the river’s watershed (Zhao et al. 2023). These calibrated parameters can be fed into a physics-based model that produces discharge forecasts. However, physics-based models are often simplified by numerous assumptions, such as RAPID being a linear river routing model, and therefore, they can be improved. While the aforementioned reviewed data-driven ML models has shown promising potential in river discharge prediction, their performance is often affected by both the quantity and quality of the data used for training the models. In this article, we extend our research efforts to build a hybrid streamflow discharge prediction model for gauged rivers that leverages the strengths of both the physics-based RAPID simulation model and data-driven ML models. It is worth noting that the ‘hybrid model’ in this article is different from the variety of hybrid data-driven models previously reviewed, which typically amalgamate various ML algorithms or optimization techniques. The ‘hybrid model’ in this article refers to the integration of the physics-based analysis model with the data-driven ML model, which is a concept also known as physics-informed/enhanced ML models. The major contributions and novelty of the proposed methods can be summarized as follows:

  1. Incorporating physics-based principles: Integrating the intrinsic physics of river flow as encapsulated by the RAPID model into the data-driven ML prediction framework.

  2. Leveraging data-informed insights: Harnessing hydrological insights embedded in the runoff data of nearby gauged rivers to address the gaps, discrepancies, and missing information in the physics-based model created as a result of inherent assumptions in physics-based models.

  3. Utilizing cutting-edge ML techniques: Building a hybrid model that captures the capabilities of the physics-based model while being data-informed, by employing cutting-edge ML-based forecasting techniques of varying complexities, including Gaussian process (GP)-based nonlinear autoregressive with exogenous inputs (NARX) regressor (GP-NARX), neural network-based NARX regressor (NARX-Net), LSTM, and bidirectional LSTM (BiLSTM).

  4. Forecasting for extended time periods: The model should be capable of predicting river discharge for long-time horizons (such as predicting the discharge a year in advance) and capturing the observed seasonality and trends.

In this article, we limit ourselves to forecasting river discharge only for gauged rivers that have available historical true discharge values. The primary idea is to consider a basin that consists of numerous watersheds, each with a respective river crossing the watershed. A small subset of these rivers is gauged, and we aim to forecast the discharge of these rivers. Building a machine learning forecasting model for these gauged rivers will include three primary data sources: (a) the past time series of true discharge values from gauges; (b) the physics-based RAPID discharge predictions, and (c) the runoff data of all the rivers in the basin from a land surface model. These three datasets are useful for two purposes: (a) the physics-based RAPID data, in tandem with runoff data (with reduced dimensionality using singular value decomposition (SVD) or LSTM-based autoencoder) can be used to build a more informed digital surrogate of the RAPID model, and (b) the RAPID data can be compared with the gauged data to learn about the discrepancies in the physics-based model. We use these capabilities to build two hybrid forecasting models: (a) delta Learning and (b) data augmentation. The incorporation of true discharge data and runoff data from the entire basin, along with the adjustment and enhancement of the RAPID model’s performance through these data-driven hybrid models, collectively contributes to the development of a reliable framework for forecasting river discharge. This framework outperforms the currently used physics-based RAPID model in terms of prediction accuracy.

The following sections of this article are structured as follows: Section 2 summarizes the datasets used and the techniques employed for feature extraction from the runoff data. Section 3 provides detailed information on the two algorithms, namely, delta learning and data augmentation, used to build the hybrid model. Section 4 outlines various ML methods employed to implement the delta learning and data augmentation algorithms and obtain the hybrid discharge prediction models. Section 5 presents the case study, and Section 6 concludes the article.

Datasets

Building these forecasting algorithms requires leveraging three data sources. The first data stream consists of the true measured discharge from USGS, obtained using river gauges. However, not all the rivers in a basin are gauged, and the forecasting algorithms presented in this article are limited to rivers with gauges since these algorithms are contingent upon the availability of historic true gauged measurements. The second data stream is RAPID’s historic discharge prediction time series for the river of interest. Readers are referred to Section 2 of our previous work (Zhao et al. 2023) to understand the workings of the RAPID model, which consists of two steps: (1) calibrating the input hydrological parameters of the RAPID model, and (2) predicting future discharge (Figure 1 of Zhao et al. 2023). Unlike the first two data streams associated with the specific rivers whose discharge, we aim to forecast, the third data stream used for building the hybrid model is the runoff data for all the rivers/watersheds that are part of the basin of interest. These data are simulated using the Noah with multiparameterization (Noah-MP) land surface model (Niu et al. 2011). We observe that unlike true gauge data, runoff data can be projected for the intermediate term. Let Nrivers-in-basin denote the number of rivers in the basin of interest. Incorporating the runoff data of all the rivers present in the basin provides information about the dynamics of the river network in the region surrounding the specific river of interest. This helps us gain a better understanding of hydrological conditions, particularly in areas lacking streamflow monitoring networks.
Figure 1

LSTM autoencoder architecture.

Figure 1

LSTM autoencoder architecture.

Close modal

The true discharge data and RAPID’s predictions are vector time series, with each element representing a discharge value on a given day. Let denote the total number of days for which the data are available (these days will be split later for the purpose of training and testing the hybrid model). We denote true gauge data as a vector and the RAPID’s prediction data time series as a vector , with . Unlike the true gauge discharge and RAPID values, which are associated with the river whose discharge we aim to predict, the runoff dataset consists of time series data for runoff from all the rivers in the basin. Therefore, the runoff data are represented as a higher-dimensional matrix, denoted by , with .

The jth element of the ith column of the matrix M, denoted as , represents the runoff values for the jth rivers in the basin on the ith day. Therefore, the dimensionality of the runoff data is . Using these three datasets in their raw format presents two major challenges that require further data processing.

  • First, for basins with a large number of watersheds, the dimension of runoff data can be very high, as there is one time series of runoff data for each watershed. For example, the basin studied in the case study has 431 watersheds, which means that the dimension of the runoff data is . This high-dimensional data poses significant challenges to the training of the hybrid models. Since these watersheds are in the geographic neighborhood of each other, there is bound to be correlation between the runoff time series of rivers passing through these watersheds. Therefore, we first reduce the high-dimensional data into low-dimensional features using dimension reduction techniques (i.e., reducing the number of rows in the M matrix). Two types of dimension reduction techniques, namely, SVD and LSTM-based autoencoder, have been studied.

  • Second, the RAPID model exhibits considerable variations and noise due to the presence of unmodeled features and underlying assumptions. To enhance the accuracy of hybrid models based on RAPID model predictions, it is essential to employ data smoothing techniques that help eliminate outliers and highlight underlying trends. We use a Hanning filter based on the convolution of a scaled window with the signal to smooth RAPID model predictions (Oppenheim 1999). The window size used in this article is 30.

Surface runoff features extracted by singular value decomposition

To reduce the dimension of the runoff data for training the hybrid models, we perform SVD on the matrix M as follows:
(1)
where and are orthogonal matrices, and is a rectangular matrix with eigenvalues . In our case, since , we have .
Following this decomposition, we define another matrix = that represent the latent features of the runoff data. Here, is the jth row of the matrix. Assuming that we reduce the original dimension of the M matrix to a much lower dimension, denoted by , we have the runoff data in the reduced space as follows:
(2)
Readers are referred to the study by Hu et al. (2017) for a more detailed discussion and application of the SVD technique.

Surface runoff features extracted by a long short-term memory autoencoder

LSTM overview

LSTM is a recurrent neural network (RNN)-based architecture specifically developed to address the challenge of learning sequence data with long-term dependencies. LSTM networks have been employed to tackle this issue by ensuring a consistent backward flow in the error signal. The utilization of LSTM networks not only excels in capturing short-term data dependencies for accurate predictions but also effectively considers the explicit dependencies among multiple outputs over a long-term time horizon.

In this article, we leverage LSTM for both the dimension reduction of the runoff data and hybrid modeling. This section explains the application of LSTM for feature extraction of the runoff data, while the LSTM architecture is later described in Section 4.3, which discusses the hybrid modeling.

Long short-term memory autoencoder network

An autoencoder is a form of unsupervised neural network that uses data to find the best encoding–decoding architecture (Nguyen et al. 2021). It can be regarded as an n-layer neural network that aims to reconstruct the inputs based on the provided code. The autoencoder comprises five components: an input layer, an output layer, an encoder, a decoder, and a bottleneck region known as the latent space.

The encoder reduces the input dimensions to generate a representation for the bottleneck, while the decoder expands the bottleneck’s dimensions to recreate the input layers. The latent space represents the high-level inputs using compressed datasets at a lower level. In this article, considering the advantage of LSTM in dealing with time series data and the promising performance of LSTM shown in the literature, LSTM is used for constructing the autoencoder. The structures of the encoder and decoder in this article are similar, with two LSTM layers stacked, as shown in Figure 1. The encoder takes high-dimensional datasets as inputs and generates compressed datasets as outputs. The decoder uses the compressed dataset stored in the bottleneck to reconstruct the original input datasets.

Encoder structure

The encoder consists of two LSTM layers, followed by a RepeatVector layer. It serves as an encoder that takes high-dimensional datasets as inputs and generates compressed datasets as outputs.

To implement the encoder, we begin by reshaping all the input datasets at each time step into a size of (1,431). This allows us to standardize the input datasets before passing them into the LSTM layers. The first LSTM layer (LSTM layer 1) sets the input channel to one, indicating two dimensions. The output dimension of LSTM layer 1 is configured as 36, and the kernel initialization is defined as he_uniform, which follows a truncated normal distribution. Following the first LSTM layer, we proceed to create the second LSTM layer (LSTM layer 2). In this layer, the input dimension is set to 36, reflecting the output channel from the previous layer, while the output dimension is configured as 18. Like the first LSTM layer, we maintain the kernel initialized as he_uniform. The final layer utilized in the encoder structure is the RepeatVector layer, which is repeated a defined number of times using this architecture.

This suggests that this autoencoder model can be viewed as a generative model, particularly when dealing with sparse gradients. We have:
(3)
where denotes the compressed reduced-order representation learned from the encoder for the input m, which represents the runoff data at a particular instance of time.

Decoder structure

The structure of the decoder comprises two LSTM layers. Its purpose is to utilize the compressed dataset stored in the bottleneck to reconstruct the original input datasets.

To implement the decoder structure, we begin by utilizing an LSTM layer to create the first decoding layer (LSTM layer 3), which is the same as the second layer in the encoder structure (LSTM layer 2). In this layer, the input size is set to (1,18), while the output size is 36. Next, we employ a second decoding LSTM layer (LSTM layer 4). The input feature size is 36, and the output feature size is 431. The remaining parameters, including the kernel initialization, are the same as those used in the encoder structure. The river runoff sequence for a particular time instant is reconstructed as:
(4)
where represents reconstructed inputs learned from decoder using the compressed dataset .

The reconstruction error and the parameters used for LSTM autoencoder architecture

The reconstruction error P, defined in Equation (5), must be minimized when training the LSTM autoencoder.
(5)
The primary objective of the autoencoder is to reduce the dimensionality of surface runoff features. By introducing a bottleneck with a lower dimension than the original 431-dimensional input, the autoencoder is compelled to capture the most essential features from the runoff dataset. This combination of components empowers us to achieve the intended goals of analyzing large-dimensional input datasets and adjusting their resolutions. Table 1 displays the parameters used in the LSTM autoencoder.
Table 1

LSTM autoencoder architecture model parameters

Model architectureDescription
Number of layers 
Batch size 32 
Number of epochs 500 
Optimizer Adam 
Loss function MSE 
Model architectureDescription
Number of layers 
Batch size 32 
Number of epochs 500 
Optimizer Adam 
Loss function MSE 

Data processing workflow

Processing the data involves three steps:

  1. Smoothing the RAPID’s prediction data: The prediction data R from RAPID is smoothed using a Hanning filter, with no change in the dimensionality of the data (Oppenheim 1999).

  2. Reducing the dimensions of the runoff data: By employing SVD or a LSTM autoencoder, we can reduce the number of rows in the M matrix, consequently altering its dimensions from to either when using SVD or when using a LSTM autoencoder.

  3. Downsampling the number of days: To further reduce the dimensionality of all three datasets, we downsample the number of days if the ML method deems it necessary. For example, if we downsample by a factor of 2, it would require deleting every other data point in RAPID’s predictions and the true gauge values (since we use daily average RAPID estimates for building the hybrid model), as well as deleting alternate columns in the runoff matrix. Let the number of remaining days after downsampling be denoted by .

Let denote the smoothed and downsampled RAPID data, and let denote the downsampled gauge data. Both and are vectors of size . In addition, let denote the dimensionally reduced runoff matrix via SVD or autoencoder, with dimensions of when using SVD or when using a LSTM autoencoder. We split these three datasets into training and testing subsets, such that , , , and . Here, and denote the number of training and testing data points, respectively. In the following write-up, we will detail two algorithms deployed to build a hybrid discharge forecasting model for a gauged river.

Having presented two data compression techniques previously, in this section, we describe two algorithms used in building the hybrid forecasting models: (a) delta learning and (b) data augmentation (Thelen et al. 2022). Both of these hybrid models utilize RAPID as the physics-based model and enhance it by incorporating runoff data from the entire basin and the historical discharge measurements of the specific gauged river whose discharge we aim to forecast.

Delta learning

The delta learning algorithm is based on learning the error between RAPID’s discharge prediction and measured gauge data.

Training

Training the hybrid forecast model using the delta learning algorithm involves building two ML models and includes three steps, as detailed below.
  1. Step 1: The first step is to construct a runoff data-informed surrogate for the RAPID model, enabling the prediction of future RAPID discharge based on the historical time series of RAPID data and runoff data. Incorporating the runoff data into the surrogate model of RAPID helps include some of the missing information about river dynamics not captured by the physics-based RAPID model. This surrogate model can be considered an enhanced physics-based model. The goal of this surrogate is to predict the discharge at the next time step given a window of RAPID data and runoff data in the immediate past. For instance, consider the current time step and the window size and , such that and . Since Noah-MP provides runoff forecasts for the near future, we also consider the next runoff data as the input to train RAPID surrogate since it informs the future. The input to the surrogate model will be the vector . For the current time step and a window size of , the discharge predicted by the RAPID’s surrogate for the next time step is given by:
    (6)
  2. Step 2: The second step is to obtain the error in the anticipated outflow based on the physics-based prediction relative to the gauged measurement. To obtain the physics-based prediction, we utilize the previously trained surrogate defined in Equation (6). However, instead of using the RAPID data as input, we use the gauge measurements along with the runoff data to evaluate the physics-based surrogate response. This tweak is necessary for a like-for-like comparison of the physics-based prediction relative to the gauged measurement. That is,
    (7)
    The error term (or the delta term) is given as follows:
    (8)
    Using this construct, the delta term can be evaluated for each day by considering data points from the preceding days, except for the first time steps. This exception is necessary because evaluating the predicted discharge using the surrogate requires data from the last time steps. This yields the delta vector denoted by .
  3. Step 3: The goal of this step is to build a predictor for the error in discharge (the delta term surrogate) capable of predicting the error for a future time step, considering the preceding gauge measurements and runoff data for a window size of and , respectively. To maintain compatibility in dimensionality between the input and output, we delete the first columns of and . The predicted error for the step is then defined as follows:
    (9)
    Figure 2 illustrates the steps involved in training the hybrid model using delta learning algorithm.
Figure 2

Schematic flowchart for training the surrogate of the physics-based model and the error (or delta) term in the delta learning algorithm.

Figure 2

Schematic flowchart for training the surrogate of the physics-based model and the error (or delta) term in the delta learning algorithm.

Close modal

Testing and forecasting

The two ML models, and , trained based on the steps defined in the previous subsection, will be deployed to predict the discharge at the next time step using the gauge data and runoff data from preceding time steps as shown in Figure 3. We aim for a time series forecast of future discharge. This involves two steps:
  1. Step 1: For the current time step , we obtain the discharge forecast and the delta term for time step using the following:
    (10a)
    (10b)
    Here, the forecast runoff data for the time step , denoted by , are obtained from Noah with multiparameterization (Noah-MP) land surface model.
  2. Step 2: By using the quantities obtained earlier, we obtain the forecasted gauge measurement for future time step as follows:
    (11)
Figure 3

Schematic flowchart for testing and making discharge forecasts using the delta learning algorithm.

Figure 3

Schematic flowchart for testing and making discharge forecasts using the delta learning algorithm.

Close modal
These steps can be repeated to obtain the predicted gauge measurement for the next time step by using the predicted gauge discharge and the forecasted runoff data as part of the input, such that
(12a)
(12b)
(12c)
Following the same line of reasoning, the gauge discharge forecast can be obtained for as long as the runoff forecast is available.

Data augmentation

Training

Similar to delta learning, training the hybrid forecast model using the data augmentation algorithm involves constructing two ML models and consists of three steps, as outlined below:

  1. Step 1: The first step is identical to delta learning as discussed in Section 3.1.1, and it entails creating a runoff data-informed surrogate for the RAPID model, denoted as and defined in Equation (6).

  2. Step 2: At time step , the model can be used to obtain the predicted physics-based discharge for the next time step using Equation (7). The predicted physics-based discharge can be evaluated for each day by considering data points from the preceding days, except for the first time steps. This exception is necessary because evaluating the predicted discharge using the surrogate requires data from the last time steps. This yields the vector of predictions made by the model for the entire training period denoted by .

    We emphasize that to obtain the prediction vector produced by the model , we use gauge measurements in conjunction with runoff data as inputs to evaluate the physics-based surrogate response, rather than relying on RAPID data that were one of the training input for the model in the first step (Equation (7)). This approach is taken because the physics-based prediction vector will be used to augment the inputs for training the surrogate model that forecasts gauge discharge measurements, as will be explained in the next step.

  3. Step 3: This step involves utilizing the gauge data and runoff data , along with the prediction vector obtained in the previous step, to train a ML model capable of directly predicting the gauge measurements. This is unlike the delta Learning approach, where the error term was predicted first before the gauge measurements.

    To maintain compatibility in dimensionality between the inputs and output vectors, we delete the first columns of and . The predicted gauge measurement for the step is then defined as follows:
    (13)
    Notice that, in addition to the preceding gauge measurements and runoff data, the output of the physics-based surrogate , denoted as , is included as part of the input to the surrogate . That is, the input for the model was obtained by augmenting the input for the model . This is the reason why this approach is referred to as data augmentation. The term contains information about the immediate future (the next time step). Consequently, the model is trained to incorporate this immediate future information as predicted by the physics-based surrogate (at the next time step).

Testing and forecasting

Make a time series prediction of discharge into the future using data augmentation utilizes the models and . This involves two steps:
  1. Step 1: For the current time step , we obtain the discharge forecast using the following:
    (14)
  2. Step 2: By using the value of obtained earlier, we obtain the forecasted gauge measurement for future time step as follows:
    (15)
    The forecast runoff data for the time step , denoted by , is obtained from Noah-MP runoff forecasting model.
These steps can be repeated to obtain the predicted gauge measurement for the next time step by using the predicted gauge discharge and the forecasted runoff data as part of the input, such that
(16a)
(16b)
Following the same line of reasoning, the gauge discharge forecast can be obtained for as long as the runoff forecast is available. Figures 4 and 5 illustrate the training and testing process for the data augmentation algorithm.
Figure 4

Schematic flowchart for training the surrogate of the physics-based model and the gauge measurement in the data augmentation algorithm.

Figure 4

Schematic flowchart for training the surrogate of the physics-based model and the gauge measurement in the data augmentation algorithm.

Close modal
Figure 5

Schematic flowchart for testing and making discharge forecasts using the data augmentation algorithm.

Figure 5

Schematic flowchart for testing and making discharge forecasts using the data augmentation algorithm.

Close modal

As discussed in the previous section, building a hybrid discharge forecast model using the delta learning algorithm requires the construction of two ML models, and . In addition, using the delta augmentation algorithm necessitates the creation of two ML models, and . We have investigated four different ML methods with varying complexity and characteristics to train these models. The following write-up discusses these ML methods. To maintain generality in our descriptions, we use the notation of a generic input vector and a scalar output that is available for the training purpose. We denote a matrix of multivariate training inputs as , where each row represents an instance of the input vector . Similarly, we represent the output as a column vector , with each element denoting an instance of the output variable . Finally, let the input vector for a testing point be denoted by and the predicted output be denoted by .

Gaussian process regression-based nonlinear autoregressive network with exogenous inputs

Gaussian process regression

We start by detaining the Gaussian process (GP) regression (GPR) for the training datasets . The goal is to find the mapping between the input vector (consisting of various features) and the output . As described in Williams & Rasmussen (2006), a GPR probabilistically models the output as a realization of the GP, such that
(17)
where is the mean of the GP and is assumed to be a GP with zero mean and covariance function , and is the noise in the output, such that
(18)
where
(19)
where is the constant variance of the GP, is the correlation function, and is the unknown parameter vector of the covariance function. There are a variety of correlation functions available. The most commonly used one is the Gaussian correlation function given by (Hu & Mahadevan 2016)
(20)
where is the dimension of the input variables and is the th element of .
The GP conditioned on the training data , where is also a GP with the mean function and the covariance function , is given by
(21)
Here,
(22)
where
(23)
The hyperparameters , , and are estimated using the training data through Bayesian parameter estimation or maximum likelihood estimation (Hu & Mahadevan 2016).
From Equations (17) and (21), the predictive distribution of the output for the input realization conditioned on the training data is given by
(24)

NARX model based on GPR

The GPR can be used to build a NARX predictive model for time series data. This model assumes that the current value of the output at time step is predicted using a nonlinear function of previous inputs and the output, such that
(25)
where the residual noise is assumed to be Gaussian. The terms and denote the input and output lags, respectively. The nonlinear function can be employed to train the ML models incorporated in the hybrid discharge prediction model (i.e., , , and ).
For a given training set , this nonlinear function can be estimated probabilistically using the GPR trained for the training set . The training input matrix is constructed such that its th row, denoted by , is given by
(26)

NARX-Net

The NARX basis discussed in Section 4.1.2 can be implemented using a neural network as the regressor function, as opposed to a polynomial or GP regressor (Xie et al. 2009). In this approach, the predictor basis comprises lags of both autoregressive and exogenous terms, which are then processed through a ML model to predict future time steps of the output. Specifically, the function in Equation (25) is estimated using a neural network model, which may be a fully connected feed-forward network with multiple layers, activation functions, dropout, and numerous parameters in both the neuron weights and biases.

Unlike the immediate explicit solution of model parameters, which is typical in ordinary least squares regression and GPR, due to the high nonlinearity of the network function, iterative training on a subset of the data is now necessary. This training involves optimization schemes commonly used in the ML/AI community, such as ADAM, RMSProp, Adagrad, and others.

The choice and application of this modeling methodology are motivated by its utilization of potentially sophisticated neural network architectures, which can capture complex patterns and data features, as well as its traditional aspect of establishing a relationship between previous and current values in an autoregressive sense. This characteristic is favorable for physics-based problems that are typically governed by differential equations (Liu et al. 2022). Furthermore, this method serves as a suitable middle ground in terms of computational complexity when compared to GP-NARX and more intensive ML methods, such as LSTM or gated response unit (GRU), when assessing its performance.

Long short-term memory

As mentioned in Section 2.3.1, we recall that LSTM is an RNN-based algorithm that is especially useful for learning sequences of data with long-term dependencies. Apart from using LSTM to extract features from high-dimensional runoff data, we also utilize LSTM to build a hybrid discharge forecasting model. In this section, we present the general description of the LSTM algorithm and its application for hybrid modeling.

Figure 6 illustrates an LSTM cell. A standard LSTM cell consists of an input gate, a forget gate, and an output gate. The input gate integrates new information, which includes the previously observed value from the current iteration, the LSTM unit’s output, and the current input. The forget gate determines whether to retain or discard this information. The output gate regulates how to compute the values for the next neuron (Yu et al. 2019).
Figure 6

Schematic diagram of an LSTM cell.

Figure 6

Schematic diagram of an LSTM cell.

Close modal

As illustrated in Figure 6, multiple LSTM cells are connected together to map an input sequence to an output sequence . In each LSTM cell, the operations in the three gates are explained as follows.

Input gate

At a specific time step , the input gate adds new information from the input to the current cell state, scaling it by the amount we want to update the cell state, as follows:
(27)
Here, and are, respectively, a sigmoid function and a hyperbolic tangent function with weights and , and biases and .

Forget gate

The forget gate, employing the sigmoid function, processes the input data before outputting a vector of values between 0 and 1, which determines whether to retain or discard information. The LSTM unit uses this vector to decide which data from its previous hidden state should be forgotten. When the value in the sigmoid function’s vector is 0, it means that the LSTM forgets the corresponding information from the previous hidden state, and when it is 1, the information is retained and the gate outputs the value such that
(28)
where and denote the weight and bias, respectively, used in the sigmoid function to obtain the value of .

Output gate

The output gate first aggregates cell state from the previous cell, the data from forgot gate, and data and from the input gate to modify the results for the newly allocated cell state as follows:
(29)
where stands for Hadamard product operation.
Finally, the outcomes of this LSTM cell can be found through the Hadamard product operation, together with refreshed cell state (at the scale between –1 and 1) and output gate findings, such that
(30)
where and denote the weight and bias used in the sigmoid function to obtain the value of .

The hidden states are ultimately transformed into the output through a fully connected layer. In a conventional LSTM topology, groups of neurons are organized into layers, comprising the input layer, output layer, and multiple hidden layers. The input layer receives input from external sources, while the output layer transmits information to the surroundings. The hidden layers, consisting of several levels, do not have a direct connection to the external physical world but are linked to the input and output layers. In the multilayer LSTM architecture used in this article, feed-forward neural networks are constructed by stacking cells across multiple hidden levels.

Bidirectional long short-term memory

BiLSTM is a modification of the conventional LSTM. In 2005, Graves and Schmidhuber introduced bidirectional training to an LSTM network for the first time (Graves & Schmidhuber 2005). The fundamental concept behind BiLSTM is to train an additional set of LSTM cells in the reverse direction. Each unit’s output, denoted as , results from the combination of both forward and reversed subunits, represented as and (Ma et al. 2020). If we remove the backward LSTM component, this configuration simplifies into a conventional forward LSTM, illustrated in Figure 7. Likewise, removing the forward LSTM yields a conventional LSTM with a reversed time axis (Schuster & Paliwal 1997).
Figure 7

Architecture of bidirectional LSTM.

Figure 7

Architecture of bidirectional LSTM.

Close modal
The production of the hidden gate in BiLSTM is achieved by stacking two BiLSTM layers, as shown in Equation (31). The integration of both forward and backward directions within the same network enables the prompt utilization of input data from both past and future time frames to minimize errors efficiently.
(31)
where is the th BiLSTM hidden state signal at time , is the number of cells, is the BiLSTM layer’s forward direction algorithm, and is the BiLSTM layer’s backward orientation algorithm. Here, is the function that combines the forward and backward computation procedures.

Basin and rivers of interest

As discussed by Zhao et al. (2023), the United States is primarily divided into seven catchment zones. We focus our attention on a basin located in Colorado, as shown in Figure 8. This basin is a part of catchment zone 6. This particular basin is home to 431 watersheds with a river passing through each of them. Each river is assigned a unique river ID, and there is a corresponding shape file (.shp file) representing the watershed in the basin. Among these 431 rivers, six have gauges that measure and record discharge data. Our goal is to forecast the discharge of these six rivers. Figure 9 illustrates the river network within the basin of interest. The red lines depict rivers crossing the watersheds, while the dark blue watersheds indicate the locations containing gauged rivers. Our objective is to demonstrate the applicability of the proposed forecasting algorithms in predicting the discharge of these rivers. This particular basin is chosen to demonstrate the proposed approaches for three main reasons: (1) the runoff data of this basin has been verified using a benchmark dataset, which allows us to isolate model uncertainty in runoff data from the studied problem; (2) this basin contains a variety of rivers, including rivers with low discharge rates, rivers with high discharge rates, and a human-controlled river with a dam; and (3) this basin has sufficient historical data, allowing for the training and testing of the hybrid modeling methods developed in this article. It should be noted that even though the proposed approaches are demonstrated using this particular basin, their application is not limited to this basin.
Figure 8

Basin of interest.

Figure 8

Basin of interest.

Close modal
Figure 9

River network (marked in red) and the watersheds corresponding to the six gauged rivers (colored in blue).

Figure 9

River network (marked in red) and the watersheds corresponding to the six gauged rivers (colored in blue).

Close modal

To develop the hybrid model, three datasets are required, as detailed in Section 2.1. The first and second datasets pertain to gauge measurements and discharge predictions produced by the RAPID model for the six gauged rivers, while the third dataset consists of runoff data for all 431 rivers in the basin, and all these datasets span 5,311 days. Specifically, the dimensions of the gauge measurements , the RAPID prediction vector R, and the runoff data M are 1× 5000, 1× 5000, and 431× 5000, respectively. The six gauged rivers for which we aim to forecast discharge are identified by the following USGS river IDs: 700739316, 700749972, 700777836, 700782048, 700811946, and 700825284. In this article, we refer to these rivers using numerals 1–6, respectively.

Data preprocessing

We perform the following:
  1. Dimension reduction of the runoff data: Since these rivers all belong to the same basin and their dynamics are interrelated, it is expected that the runoff data for all 431 rivers will exhibit some correlation. This correlation arises from the interconnected nature of the river network, where the dynamics of each river are influenced by the dynamics of the entire network and vice versa. Consequently, this correlation allows for the reduction of dimensions in the runoff data to extract essential information in a latent space. Reducing the dimensionality of the runoff dataset M is crucial for enhancing the overall construction of the hybrid model, as it significantly improves computational efficiency. To achieve this, we employ both the single value decomposition (SVD) technique (Section 2.2) and an LSTM-based autoencoder (Section 2.3) for dimension reduction. By employing both of these techniques, we extract 18 essential features from the original dataset. The dimensionality of features obtained through SVD and an LSTM-based autoencoder is the same, consisting of 18 features.

    Figure 10 depicts the reconstruction of the runoff data distributed over the entire basin for a particular day using an LSTM-based autoencoder. The top figure displays the original runoff data, while the bottom figure shows the reconstructed runoff dataset obtained through the LSTM-based autoencoder. Each pixel in this figure represents the runoff volume for a specific catchment, with varying colors indicating different volume ranges as listed on the color-coded legend. The darker hues signify larger volumes, and among the 431 catchments, the largest recorded volume is 54,500 m3/s. The similarity between the original and reconstructed datasets indicates that the autoencoder efficiently captures the properties of the runoff dataset using just 18 dimensions.

  2. Smoothing of the RAPID prediction data: The RAPID prediction is smoothed using a Hanning filter with a window size of 65 days (Oppenheim 1999).

  3. Quality assessment and data standardization: We conduct a thorough data quality assessment to address issues such as mismatched data types, mixed data values, data outliers, and missing data. To ensure standardized data for analysis, the StandardScaler transform in Python is applied to normalize all three datasets. This normalization method ensures that the data are scaled consistently, allowing for accurate outcomes in subsequent studies while limiting the impact of various scales on hybrid model building.

  4. Down sampling the number of days: We downsampled the data by a factor of 2 for GP-NARX. This was necessary due to GP-NARX’s inability to handle large dimensionality, as will be discussed in more detail in Section 5.4. No downsampling was performed for the other methods.

Figure 10

An instance of an original runoff dataset compared to the reconstructed runoff dataset.

Figure 10

An instance of an original runoff dataset compared to the reconstructed runoff dataset.

Close modal

Performance evaluation metrics

To comprehensively evaluate the performance of the proposed methods, we employed multiple performance metrics, which include the well-acknowledged and widely used Kling–Gupta efficiency (KGE) score and other commonly used accuracy metrics in the ML and hydrology domain. In what follows, we explain the metrics used in this article in detail. The mean-squared error (MSE) and its associated normalization, along with the Nash–Sutcliffe efficiency (NSE), are widely used metrics for the calibration and assessment of hydrological simulations using observable information (Nash & Sutcliffe 1970; Murphy 1988). To compute the MSE, we calculate the squared difference between the model’s predictions and the ground truth and then average this value across the entire dataset. Root-mean-square error (RMSE) is obtained by taking positive square root of MSE. The NSE is obtained by dividing the MSE by the overall variability of the data and subtracting the resulting ratio from 1. Let , , , and denote the observed discharge, forecasted discharge, mean of the observed discharge, and mean of the predicted discharge, respectively, and let denote the aggregate quantity of samples tested. The MSE and NSE are defined as follows:
(32a)
(32b)
(32c)
Another commonly used metric to evaluate the similarity between the observed discharge and the predicted discharge is the KGE (Gupta et al. 2009; Kling et al. 2012). Let and denote the standard deviation of observed and simulated discharge, respectively.
The KGE score is defined as follows:
(33)
where ρ, β, and γ denote the Pearson correlation coefficient, bias ratio, and variability ratio, respectively.

Since MSE can be obtained by squaring RMSE, and RMSE has the same physical units as the predictions, we report RMSE in addition to KGE and NSE error metrics.

Numerical results of hybrid forecasting models

Prediction results

We recall that our aim is to forecast the discharge of six gauged rivers in the chosen basin by using two different algorithms: delta learning (Section 3.1) and data augmentation (Section 3.2). These algorithms are built using four different ML algorithms as detailed in Section 4. In addition, two different feature extraction techniques, namely, SVD and LSTM-based autoencoder, are used to reduce the dimensionality of the runoff data, as detailed in Sections 2.2 and 2.3. This gives us four combinations of algorithm and two feature extraction scenarios for the runoff data: (1) Scenario 1: delta learning algorithm combined with runoff data features extracted using SVD; (2) Scenario 2: delta Learning algorithm combined with runoff data features extracted using LSTM-based autoencoder; (3) Scenario 3: data augmentation algorithm combined with runoff data features extracted using SVD; and (4) Scenario 4: data augmentation algorithm combined with runoff data features extracted using LSTM-based autoencoder. We call these the four scenarios of hybrid models, and these are implemented using different ML methods as illustrated in Figure 11. We present prediction results for all four of these scenarios, implemented using various ML approaches discussed in Section 4.
Figure 11

Flowchart showing different combinations of data reduction techniques and ML methods exploited to build the hybrid prediction models.

Figure 11

Flowchart showing different combinations of data reduction techniques and ML methods exploited to build the hybrid prediction models.

Close modal

LSTM, BiLSTM, and NARX-Net are employed to forecast river discharge for six gauged rivers in all four scenarios mentioned earlier. The key parameters of the models are tuned through a grid search method. Four different dropout rates, 0.001, 0.005, 0.01, and 0.1, are used. Four different batch sizes (16, 32, 64, and 128) are also compared to determine the best batch size. The studied number of layers are 2, 3, 4, and 5. The studied learning rates are 0.0001, 0.001, 0.01, and 0.1. The studied number of neurons are 30, 40, 60, 80, and 100. The optimal hyperparameters for the models are as follows: dropout rate, 0.01; batch size, 32; number of hidden layers, 3; learning rate, 0.001; and number of neurons, 80. LSTM and BiLSTM use a past window size of . NARX-Net utilizes a window size of and for data augmentation. For delta learning, it employs a window size of and . However, GPR is used only for Scenario 1 and Scenario 3, which use SVD to reduce runoff data dimensionality, with the window size of days. GPR is sensitive to high-dimensional input, and we sidestep this challenge by using a shorter lag time and by utilizing only the most important runoff features obtained by SVD. Although SVD reduced the dimensionality of runoff data from 431 rivers to just 18 values, most of the information is contained in the first few terms. We exploit this fact and use only the first two terms of the 18 features extracted by SVD to build the GPR model.

To limit the length of this article, we illustrate the prediction results for Rivers 1 and 6 in the four scenarios discussed earlier. Rivers 1 and 6 are chosen as representative examples, with River 1 having low discharge relative to River 6, which has a much higher discharge value. Figures 12, 13, 14, and 15 represent the predictions vs. true discharge measurements for Scenarios 1 to 4, respectively. We note that all rivers are free-flowing except for River 4, which has a dam. Figure 16 illustrates the predictions vs. true discharge measurements for River 4, considering both delta learning and data augmentation with runoff data reduced using SVD (Scenario 1 and Scenario 3). The performance metrics of predicted discharge are depicted in Figures 17, 18, and 19. These figures illustrate the evaluation of various hybrid models proposed in this article, along with RAPID predictions, in comparison to the true measured discharge. The evaluation is conducted using metrics such as RMSE, NSE, and Kling–Gupta efficiency (KGE) for a comprehensive analysis. Lower RMSE values indicate higher accuracy of the predictions, while higher KGE and NSE scores suggest better predictive performance.
Figure 12

Predicted vs. true measured discharge of rivers 1 and 6 obtained using delta learning algorithm considering runoff data reduced using SVD technique (Scenario 1). (a) River 1; ML method: GPR, (b) River 6; ML method: GPR, (c) River 1; ML method: NARX-net, (d) River 6; ML method: NARX-net, (e) River 1; ML method: LSTM, (f) River 6; ML method: LSTM, (g) River 1; ML method: BiLSTM, and (h) River 6; ML method: BiLSTM.

Figure 12

Predicted vs. true measured discharge of rivers 1 and 6 obtained using delta learning algorithm considering runoff data reduced using SVD technique (Scenario 1). (a) River 1; ML method: GPR, (b) River 6; ML method: GPR, (c) River 1; ML method: NARX-net, (d) River 6; ML method: NARX-net, (e) River 1; ML method: LSTM, (f) River 6; ML method: LSTM, (g) River 1; ML method: BiLSTM, and (h) River 6; ML method: BiLSTM.

Close modal
Figure 13

Predicted vs. true measured discharge of Rivers 1 and 6 obtained using delta learning algorithm considering runoff data reduced using LSTM-based autoencoder (Scenario 2). (a) River 1; ML method: NARX-net, (b) River 6; ML method: GPR, (c) River 1; ML method: LSTM, (d) River 6; ML method: LSTM, (e) River 1; ML method: BiLSTM, (f) River 6; ML method: BiLSTM.

Figure 13

Predicted vs. true measured discharge of Rivers 1 and 6 obtained using delta learning algorithm considering runoff data reduced using LSTM-based autoencoder (Scenario 2). (a) River 1; ML method: NARX-net, (b) River 6; ML method: GPR, (c) River 1; ML method: LSTM, (d) River 6; ML method: LSTM, (e) River 1; ML method: BiLSTM, (f) River 6; ML method: BiLSTM.

Close modal
Figure 14

Predicted vs. true measured discharge of Rivers 1 and 6 obtained using data augmentation algorithm considering runoff data reduced using SVD technique (Scenario 3). (a) River 1; ML method: GPR, (b) River 6; ML method: GPR, (c) River 1; ML method: NARX-net, (d) River 6; ML method: NARX-net, (e) River 1; ML method: LSTM, (f) River 6; ML method: LSTM, (g) River 1; ML method: BiLSTM, and (h) River 6; ML method: BiLSTM.

Figure 14

Predicted vs. true measured discharge of Rivers 1 and 6 obtained using data augmentation algorithm considering runoff data reduced using SVD technique (Scenario 3). (a) River 1; ML method: GPR, (b) River 6; ML method: GPR, (c) River 1; ML method: NARX-net, (d) River 6; ML method: NARX-net, (e) River 1; ML method: LSTM, (f) River 6; ML method: LSTM, (g) River 1; ML method: BiLSTM, and (h) River 6; ML method: BiLSTM.

Close modal
Figure 15

Predicted vs. true measured discharge of Rivers 1 and 6 obtained using data augmentation algorithm considering runoff data reduced using LSTM-based autoencoder (Scenario 4). (a) River 1; ML method: NARX-net, (b) River 6; ML method: NARX-net, (c) River 1; ML method: LSTM, (d) River 6; ML method: LSTM, (e) River 1; ML method: BiLSTM, and (f) River 6; ML method: BiLSTM.

Figure 15

Predicted vs. true measured discharge of Rivers 1 and 6 obtained using data augmentation algorithm considering runoff data reduced using LSTM-based autoencoder (Scenario 4). (a) River 1; ML method: NARX-net, (b) River 6; ML method: NARX-net, (c) River 1; ML method: LSTM, (d) River 6; ML method: LSTM, (e) River 1; ML method: BiLSTM, and (f) River 6; ML method: BiLSTM.

Close modal
Figure 16

Predicted vs. true measured discharge of River 4, which has a dam. (a) Delta learning implemented with GPR, (b) data augmentation implemented with GPR, (c) delta learning implemented with NARX-Net, (d) data augmentation implemented with NARX-Net, (e) delta learning implemented with LSTM, (f) data augmentation implemented with LSTM, (g) delta learning implemented with BiLSTM, and (h) data augmentation implemented with BiLSTM.

Figure 16

Predicted vs. true measured discharge of River 4, which has a dam. (a) Delta learning implemented with GPR, (b) data augmentation implemented with GPR, (c) delta learning implemented with NARX-Net, (d) data augmentation implemented with NARX-Net, (e) delta learning implemented with LSTM, (f) data augmentation implemented with LSTM, (g) delta learning implemented with BiLSTM, and (h) data augmentation implemented with BiLSTM.

Close modal
Figure 17

The RMSE for hybrid models and RAPID predictions relative to the true measured discharge. (a) RMSE for River 1, (b) RMSE for River 2, (c) RMSE for River 3, (d) RMSE for River 4, (e) RMSE for River 5, and (f) RMSE for River 6.

Figure 17

The RMSE for hybrid models and RAPID predictions relative to the true measured discharge. (a) RMSE for River 1, (b) RMSE for River 2, (c) RMSE for River 3, (d) RMSE for River 4, (e) RMSE for River 5, and (f) RMSE for River 6.

Close modal
Figure 18

The NSE for hybrid models and RAPID predictions relative to the true measured discharge. (a) NSE for River 1, (b) NSE for River 2, (c) NSE for River 3, (d) NSE for River 4, (e) NSE for River 5, and (f) NSE for River 6.

Figure 18

The NSE for hybrid models and RAPID predictions relative to the true measured discharge. (a) NSE for River 1, (b) NSE for River 2, (c) NSE for River 3, (d) NSE for River 4, (e) NSE for River 5, and (f) NSE for River 6.

Close modal
Figure 19

The KGE for hybrid models and RAPID predictions relative to the true measured discharge. (a) KGE for River 1, (b) KGE for River 2, (c) KGE for River 3, (d) KGE for River 4, (e) KGE for River 5, and (f) KGE for River 6.

Figure 19

The KGE for hybrid models and RAPID predictions relative to the true measured discharge. (a) KGE for River 1, (b) KGE for River 2, (c) KGE for River 3, (d) KGE for River 4, (e) KGE for River 5, and (f) KGE for River 6.

Close modal

Discussion

Table 2 shows the best-performing ML method for all the rivers and various scenarios of algorithms used in hybrid modeling. We make the following observations based on these analysis results:

  1. Improved overall prediction results relative to the original RAPID model: Based on the error plots for RMSE, GPR-based hybrid algorithms (Scenario 1 and Scenario 3) outperforms RAPID 100% of the time. LSTM- and BiLSTM-based algorithms outperform RAPID 91.667% of the time for delta Learning and 100% of the time for data augmentation. Narx-Net-based hybrid algorithms outperform RAPID 100% of the time for all the four scenarios of hybrid models. This clearly indicates that the predictions of the hybrid algorithms perform better than the RAPID predictions. That is, the ML component in the hybrid model successfully retrieves the information that has not been modeled in the original RAPID model.

  2. Data augmentation vs. delta learning: It is concluded from the error plots that data augmentation is overall a more robust algorithm than delta Learning. Overall, all the ML methods perform marginally better with data augmentation relative to delta Learning.

  3. Predictions in human-controlled river with a dam: As discussed earlier, River 4 is an anomaly in that it has a dam that obstructs the flow. Consequently, the true measured discharge is almost static and low (near zero). The RAPID model, unable to incorporate this information, produces a noisy prediction of a large positive discharge. This particular river, however, reveals a very interesting behavior of the delta learning and data augmentation algorithm. Since the delta learning algorithm involves evaluating the error between smoothed RAPID and the true discharge, it is highly sensitive to the deviation of RAPID’s predictions relative to the true value. In cases where the true discharge measurements are near zero, the delta terms become almost equal to smoothed RAPID predictions. As a result, the delta learning algorithm’s predictions are influenced by RAPID’s measurements and follow a similar pattern to RAPID’s predictions. On the other hand, the data augmentation algorithm does not have this sensitivity to RAPID predictions and yields a prediction pattern closer to the true discharge. This also justifies the overall better behavior of data augmentation relative to the delta learning algorithm (the first observation).

  4. Impact of feature extraction techniques on predictions: LSTM performs better with the delta learning algorithm that utilizes autoencoder-based runoff features relative to the SVD-based runoff features. Contrarily, the data augmentation algorithm that uses SVD runoff features performs better relative to the autoencoder-based runoff features.

  5. Performance of different ML algorithms in different scenarios: GPR is the overall best performer for Scenario 1 of the hybrid model; LSTM for Scenario 2; BiLSTM for Scenario 4. In Scenario 3, different methods perform the best for different rivers.

  6. Training cost and complexities of different ML algorithms: BiLSTM is harder to train than LSTM. Therefore, even though LSTM and BiLSTM perform relatively similarly, LSTM has an edge in terms of training ease. Ignoring River 4, overall, GPR marginally outperforms both LSTM and BiLSTM, even though it is a less complex architecture in terms of required training resources. This makes GPR a powerful method to support the hybrid model. However, GPR suffers from the curse of dimensionality, as discussed earlier. That is, it cannot handle larger input dimensions and is, hence, less informed than LSTM and BiLSTM.

  7. Performance of NARX-Net: The NARX-Net method of modeling with delta learning consistently demonstrated an overly strong correlation with the RAPID prediction. This trend was also observed in other modeling techniques, albeit to a lesser extent. Two possible explanations arise: there may be an excessive bias toward RAPID as the sole predictor for the output, implying that the utilization of runoff data and historic gauge information was insufficient to rectify the exclusive reliance on RAPID. Alternatively, and more likely, a very small and inadequate delta term is being generated and added back to the original estimation as a correction. This phenomenon could be attributed to the presence of noisy, erratic, and higher frequency variance in the training data, as is the case in some instances of RAPID and gauge data. Models, when exposed to such data, may undesirably learn the variance through potential overfitting, resulting in a biased model that either inappropriately predicts noise or abandons the natural features in the data, defaulting to a smooth or flat average. In the case of NARX-Net, it appears that, during training, the model produces a very small delta term that provides minimal correction to the RAPID prediction.

Table 2

Best-performing ML method for various rivers and algorithm scenarios based on the three error metrics

RiversScenario 1 delta learning with SVD runoff featuresScenario 2 delta learning with autoencoder runoff featuresScenario 3 data augmentation with SVD runoff featuresScenario 4 data augmentation with autoencoder runoff features
 RMSE: GPR RMSE: LSTM RMSE: LSTM RMSE: BiLSTM 
NSE: GPR NSE: LSTM NSE: LSTM NSE: BiLSTM 
 KGE: GPR KGE: LSTM KGE: LSTM KGE: BiLSTM 
 RMSE: GPR RMSE: LSTM RMSE: GPR RMSE: BiLSTM 
NSE: GPR NSE: LSTM NSE: GPR NSE: BiLSTM 
 KGE: GPR KGE: LSTM KGE: LSTM KGE: BiLSTM 
 RMSE: BiLSTM RMSE: LSTM RMSE: BiLSTM RMSE: BiLSTM 
NSE: BiLSTM NSE: LSTM NSE: LSTM NSE: BiLSTM 
 KGE: BiLSTM KGE: LSTM KGE: LSTM KGE: BiLSTM 
 RMSE: NARX-Net RMSE: LSTM RMSE: BiLSTM RMSE: BiLSTM 
NSE: NARX-Net NSE: LSTM NSE: BiLSTM NSE: BiLSTM 
 KGE: NARX-Net KGE: LSTM KGE: BiLSTM KGE: BiLSTM 
 RMSE: GPR RMSE: LSTM RMSE: GPR RMSE: LSTM 
NSE: GPR NSE: BiLSTM NSE: GPR NSE: LSTM 
 KGE: BiLSTM RMSE: LSTM KGE: GPR KGE: BiLSTM 
 RMSE: GPR RMSE: LSTM RMSE: GPR RMSE: LSTM 
NSE: GPR NSE: LSTM NSE: BiLSTM NSE: LSTM 
 KGE: LSTM KGE: LSTM KGE: BiLSTM KGE: BiLSTM 
RiversScenario 1 delta learning with SVD runoff featuresScenario 2 delta learning with autoencoder runoff featuresScenario 3 data augmentation with SVD runoff featuresScenario 4 data augmentation with autoencoder runoff features
 RMSE: GPR RMSE: LSTM RMSE: LSTM RMSE: BiLSTM 
NSE: GPR NSE: LSTM NSE: LSTM NSE: BiLSTM 
 KGE: GPR KGE: LSTM KGE: LSTM KGE: BiLSTM 
 RMSE: GPR RMSE: LSTM RMSE: GPR RMSE: BiLSTM 
NSE: GPR NSE: LSTM NSE: GPR NSE: BiLSTM 
 KGE: GPR KGE: LSTM KGE: LSTM KGE: BiLSTM 
 RMSE: BiLSTM RMSE: LSTM RMSE: BiLSTM RMSE: BiLSTM 
NSE: BiLSTM NSE: LSTM NSE: LSTM NSE: BiLSTM 
 KGE: BiLSTM KGE: LSTM KGE: LSTM KGE: BiLSTM 
 RMSE: NARX-Net RMSE: LSTM RMSE: BiLSTM RMSE: BiLSTM 
NSE: NARX-Net NSE: LSTM NSE: BiLSTM NSE: BiLSTM 
 KGE: NARX-Net KGE: LSTM KGE: BiLSTM KGE: BiLSTM 
 RMSE: GPR RMSE: LSTM RMSE: GPR RMSE: LSTM 
NSE: GPR NSE: BiLSTM NSE: GPR NSE: LSTM 
 KGE: BiLSTM RMSE: LSTM KGE: GPR KGE: BiLSTM 
 RMSE: GPR RMSE: LSTM RMSE: GPR RMSE: LSTM 
NSE: GPR NSE: LSTM NSE: BiLSTM NSE: LSTM 
 KGE: LSTM KGE: LSTM KGE: BiLSTM KGE: BiLSTM 

This article focuses on building a hybrid data-driven physics-enhanced model to forecast river discharge. The model is based on three data sources: true discharge values measured by gauges, discharge predicted by the physics-based RAPID, and runoff data from all the rivers in the basin. Two cutting-edge algorithms, namely, delta learning and data augmentation, are proposed for hybrid modeling. Building a hybrid model using either of these algorithms requires constructing three different ML models. We exploit four different ML methods, including GP-NARX, NARX-Net, LSTM, and BiLSTM.

The inclusion of runoff data from all rivers within the basin offers a comprehensive view of the dynamics of river flow across the entire surrounding region. The runoff data from neighboring rivers provide insights into the dynamic interactions and interconnected physics of the broader river network, contributing to more informed discharge forecasts for the specific river of interest. However, considering the runoff data of all the rivers constituting a basin of interest makes the runoff dataset have higher dimensions. We utilize SVD and LSTM-based autoencoder techniques to extract features with reduced dimensionality.

There are four combinations of algorithms and two feature extraction scenarios for the runoff data: (1) Scenario 1: Delta Learning algorithm combined with runoff data features extracted using SVD; (2) Scenario 2: Delta Learning algorithm combined with runoff data features extracted using an LSTM-based autoencoder; (3) Scenario 3: Data augmentation algorithm combined with runoff data features extracted using SVD; and (4) Scenario 4: Data augmentation algorithm combined with runoff data features extracted using an LSTM-based autoencoder. Each of these scenarios can be built by exploiting various ML techniques. In the hybrid model, GPR outperforms other methods in Scenario 1, while LSTM excels in Scenario 2, and BiLSTM is the top performer in Scenario 4. In Scenario 3, various methods demonstrate the best performance for different rivers.

This study has demonstrated encouraging results in discharge forecasting using various hybrid models to address this exceedingly complex problem. The performance metrics examined, namely, RMSE, KGE, and NSE, consistently show that hybrid algorithms outperform RAPID. For example, when considering RMSE as the error measure, GPR-based hybrid algorithms (Scenario 1 and Scenario 3) outperform RAPID 100% of the time. LSTM- and BiLSTM-based algorithms outperform RAPID 91.667% of the time for delta learning and 100% of the time for data augmentation. Narx-Net-based hybrid algorithms (Scenario 2 and Scenario 3) also outperform RAPID 91.667% of the time.

One major requirement upon which these models were contingent was the availability of true discharge measurement data. The encouraging results in this study have motivated us to extend this research and build a hybrid model capable of predicting the discharge of ungauged rivers where true discharge measurements are not available. Currently, the developed method is mainly designed for rivers with gauges. The extension of the proposed methods to un-gauged rivers through transfer learning is an interesting direction worth studying in the future. In addition, for rivers with a large volume of discharge data, purely data-driven ML models can be employed. The method developed in this article is more suitable for situations where we have only a limited amount of river discharge data. Another important research direction worth studying is the selection or ensemble of ML models using optimization algorithms. In this article, we compared the performance of different hybrid modeling techniques. These models and techniques can be integrated together to achieve better performance using ensemble learning methods.

Funding for this work was provided by the http://dx.doi.org/10.13039/100006752 United States Army Corps of Engineers through the http://dx.doi.org/10.13039/100006505 U.S. Army Engineer Research and Development Center Research Cooperative Agreement W912HZ-21-C-0042.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Adnan
R. M.
,
Mostafa
R. R.
,
T. Islam
A. R. M.
,
Kisi
O.
,
Kuriqi
A.
&
Heddam
S.
(
2021
)
Estimating reference evapotranspiration using hybrid adaptive fuzzy inferencing coupled with heuristic algorithms
,
Computers and Electronics in Agriculture
,
191
,
106541
.
Adnan
R. M.
,
Dai
H.-L.
,
Mostafa
R. R.
,
Parmar
K. S.
,
Heddam
S.
&
Kisi
O.
(
2022
)
Modeling multistep ahead dissolved oxygen concentration using improved support vector machines by a hybrid metaheuristic algorithm
,
Sustainability
,
14
(
6
),
3470
.
Adnan
R. M.
,
Dai
H.-L.
,
Mostafa
R. R.
,
T. Islam
A. R. M.
,
Kisi
O.
,
Heddam
S.
&
Zounemat-Kermani
M.
(
2023a
)
Modelling groundwater level fluctuations by elm merged advanced metaheuristic algorithms using hydroclimatic data
,
Geocarto International
,
38
(
1
),
2158951
.
Adnan
R. M.
,
Mostafa
R. R.
,
Dai
H.-L.
,
Heddam
S.
,
Kuriqi
A.
&
Kisi
O.
(
2023b
)
Pan evaporation estimation by relevance vector machine tuned with new metaheuristic algorithms using limited climatic data
,
Engineering Applications of Computational Fluid Mechanics
,
17
(
1
),
2192258
.
Bates
P. D.
&
De Roo
A.
(
2000
)
A simple raster-based model for flood inundation simulation
,
Journal of Hydrology
,
236
(
1–2
),
54
77
.
Beighley
R. E.
,
Eggert
K.
,
Dunne
T.
,
He
Y.
,
Gummadi
V.
&
Verdin
K.
(
2009
)
Simulating hydrologic and hydraulic processes throughout the Amazon River Basin
,
Hydrological Processes: An International Journal
,
23
(
8
),
1221
1235
.
Branstetter
M. L.
(
2001
)
Development of a Parallel River Transport Algorithm and Applications to Climate Studies
.
The University of Texas at Austin
,
Austin, Texas, USA
.
David
C. H.
,
Habets
F.
,
Maidment
D. R.
&
Yang
Z.-L.
(
2011a
)
RAPID applied to the SIM-France model
,
Hydrological Processes
,
25
(
22
),
3412
3425
.
David
C. H.
,
Maidment
D. R.
,
Niu
G.-Y.
,
Yang
Z.-L.
,
Habets
F.
&
Eijkhout
V.
(
2011b
)
River network routing on the NHDPlus dataset
,
Journal of Hydrometeorology
,
12
(
5
),
913
934
.
David
C. H.
,
Famiglietti
J. S.
,
Yang
Z.-L.
&
Eijkhout
V.
(
2015
)
Enhanced fixed-size parallel speedup with the Muskingum method using a trans-boundary approach and a large subbasins approximation
,
Water Resources Research
,
51
(
9
),
7547
7571
.
David
C. H.
,
Famiglietti
J. S.
,
Yang
Z.-L.
,
Habets
F.
&
Maidment
D. R.
(
2016
)
A decade of RAPID—reflections on the development of an open source geoscience code
,
Earth and Space Science
,
3
(
5
),
226
244
.
Ducharne
A.
,
Golaz
C.
,
Leblois
E.
,
Laval
K.
,
Polcher
J.
,
Ledoux
E.
&
de Marsily
G.
(
2003
)
Development of a high resolution runoff routing model, calibration and application to assess runoff from the LMD GCM
,
Journal of Hydrology
,
280
(
1–4
),
207
228
.
Follum
M. L.
,
Tavakoly
A. A.
,
Niemann
J. D.
&
Snow
A. D.
(
2017
)
AutoRAPID: A model for prompt streamflow estimation and flood inundation mapping over regional to continental extents
,
JAWRA Journal of the American Water Resources Association
,
53
(
2
),
280
299
.
Frame
J. M.
,
Kratzert
F.
,
Klotz
D.
,
Gauch
M.
,
Shalev
G.
,
Gilon
O.
,
Qualls
L. M.
,
Gupta
H. V.
&
Nearing
G. S.
(
2022
)
Deep learning rainfall–runoff predictions of extreme events
,
Hydrology and Earth System Sciences
,
26
(
13
),
3377
3392
.
Ghimire
G. R.
,
Hansen
C.
,
Gangrade
S.
,
Kao
S.-C.
,
Thornton
P. E.
&
Singh
D.
(
2023
)
Insights from dayflow: A historical streamflow reanalysis dataset for the conterminous United States
,
Water Resources Research
,
59
(
2
),
e2022WR032312
.
Graves
A.
&
Schmidhuber
J.
(
2005
)
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
,
Neural Networks
,
18
(
5–6
),
602
610
.
Gupta
H. V.
,
Kling
H.
,
Yilmaz
K. K.
&
Martinez
G. F.
(
2009
)
Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling
,
Journal of Hydrology
,
377
(
1–2
),
80
91
.
Hu
Z.
&
Mahadevan
S.
(
2016
)
Global sensitivity analysis-enhanced surrogate (GSAS) modeling for reliability analysis
,
Structural and Multidisciplinary Optimization
,
53
(
3
),
501
521
.
Hu
Z.
,
Ao
D.
&
Mahadevan
S.
(
2017
)
Calibration experimental design considering field response and model uncertainty
,
Computer Methods in Applied Mechanics and Engineering
,
318
,
92
119
.
Ikram
R. M. A.
,
Mostafa
R. R.
,
Chen
Z.
,
Parmar
K. S.
,
Kisi
O.
&
Zounemat-Kermani
M.
(
2023
)
Water temperature prediction using improved deep learning methods through reptile search algorithm and weighted mean of vectors optimizer
,
Journal of Marine Science and Engineering
,
11
(
2
),
259
.
Kan
G.
,
Liang
K.
,
Yu
H.
,
Sun
B.
,
Ding
L.
,
Li
J.
,
He
X.
&
Shen
C.
(
2020
)
Hybrid machine learning hydrological model for flood forecast purpose
,
Open Geosciences
,
12
(
1
),
813
820
.
Kling
H.
,
Fuchs
M.
&
Paulin
M.
(
2012
)
Runoff conditions in the upper danube basin under an ensemble of climate change scenarios
,
Journal of Hydrology
,
424
,
264
277
.
Kratzert
F.
,
Klotz
D.
,
Herrnegger
M.
,
Sampson
A. K.
,
Hochreiter
S.
&
Nearing
G. S.
(
2019
)
Toward improved predictions in ungauged basins: Exploiting the power of machine learning
,
Water Resources Research
,
55
(
12
),
11344
11354
.
Liu
Y.
,
Barthlow
D.
,
Mourelatos
Z. P.
,
Zeng
J.
,
Gorsich
D.
,
Singh
A.
&
Hu
Z.
(
2022
)
Mobility prediction of off-road ground vehicles using a dynamic ensemble of NARX models
,
Journal of Mechanical Design
,
144
(
9
),
091709
.
Lohmann
D.
,
Nolte-Holube
R.
&
Raschke
E.
(
1996
)
A large-scale horizontal routing model to be coupled to land surface parametrization schemes
,
Tellus A
,
48
(
5
),
708
721
.
Ma
J.
,
Li
Z.
,
Cheng
J. C.
,
Ding
Y.
,
Lin
C.
&
Xu
Z.
(
2020
)
Air quality prediction at new stations using spatially transferred bi-directional long short-term memory network
,
Science of the Total Environment
,
705
,
135771
.
McCarthy
G. T.
(
1938
)
The unit hydrograph and flood routing. In: Proceedings of Conference of North Atlantic Division, US Army Corps of Engineers, 1938, pp. 608–609
.
Mostafa
R. R.
,
Kisi
O.
,
Adnan
R. M.
,
Sadeghifar
T.
&
Kuriqi
A.
(
2023
)
Modeling potential evapotranspiration by improved machine learning methods using limited climatic data
,
Water
,
15
(
3
),
486
.
Nash
J. E.
&
Sutcliffe
J. V.
(
1970
)
River flow forecasting through conceptual models Part I—A discussion of principles
,
Journal of Hydrology
,
10
(
3
),
282
290
.
Nearing
G. S.
,
Kratzert
F.
,
Sampson
A. K.
,
Pelissier
C. S.
,
Klotz
D.
,
Frame
J. M.
,
Prieto
C.
&
Gupta
H. V.
(
2021
)
What role does hydrological science play in the age of machine learning?
Water Resources Research
,
57
(
3
),
e2020WR028091
.
Nguyen
H. D.
,
Tran
K. P.
,
Thomassey
S.
&
Hamad
M.
(
2021
)
Forecasting and anomaly detection approaches using LSTM and LSTM autoencoder techniques with the applications in supply chain management
,
International Journal of Information Management
,
57
,
102282
.
Niu
G.-Y.
,
Yang
Z.-L.
,
Mitchell
K. E.
,
Chen
F.
,
Ek
M. B.
,
Barlage
M.
,
Kumar
A.
,
Manning
K.
,
Niyogi
D.
,
Rosero
E.
&
Tewari
M.
(
2011
)
The community Noah land surface model with multiparameterization options (Noah-MP): 1. Model description and evaluation with local-scale measurements
,
Journal of Geophysical Research: Atmospheres
,
116
(
D12
),
1
19
.
Oppenheim
A. V.
(
1999
)
Discrete-Time Signal Processing
.
Pearson Education India
.
Prentice-Hall Inc., Upper Saddle River, New Jersey, 07458
.
Schuster
M.
&
Paliwal
K. K.
(
1997
)
Bidirectional recurrent neural networks
,
IEEE Transactions on Signal Processing
,
45
(
11
),
2673
2681
.
Tavakoly
A. A.
,
Snow
A. D.
,
David
C. H.
,
Follum
M. L.
,
Maidment
D. R.
&
Yang
Z.-L.
(
2017
)
Continental-scale river flow modeling of the Mississippi River Basin using high-resolution NHDplus dataset
,
JAWRA Journal of the American Water Resources Association
,
53
(
2
),
258
279
.
Thelen
A.
,
Zhang
X.
,
Fink
O.
,
Lu
Y.
,
Ghosh
S.
,
Youn
B. D.
,
Todd
M. D.
,
Mahadevan
S.
,
Hu
C.
&
Hu
Z.
(
2022
)
A comprehensive review of digital twin—Part 1: Modeling and twinning enabling technologies
,
Structural and Multidisciplinary Optimization
,
65
(
12
),
354
.
Williams
C. K.
&
Rasmussen
C. E.
(
2006
)
Gaussian Processes for Machine Learning
, vol.
2
.
Cambridge, MA
:
MIT Press
.
Xie
H.
,
Tang
H.
&
Liao
Y.-H.
(
2009
)
Time series prediction based on NARX neural networks: An advanced approach. In: 2009 International Conference on Machine Learning and Cybernetics, vol. 3. IEEE, pp. 1275–1279
.
Yamazaki
D.
,
Kanae
S.
,
Kim
H.
&
Oki
T.
(
2011
)
A physically based description of floodplain inundation dynamics in a global river routing model
,
Water Resources Research
,
47
(
4
)
W04501, 1–21
.
Yu
Y.
,
Si
X.
,
Hu
C.
&
Zhang
J.
(
2019
)
A review of recurrent neural networks: LSTM cells and network architectures
,
Neural Computation
,
31
(
7
),
1235
1270
.
Yuan
X.
,
Chen
C.
,
Lei
X.
,
Yuan
Y.
&
Muhammad Adnan
R.
(
2018
)
Monthly runoff forecasting based on LSTM-ALO model
,
Stochastic Environmental Research and Risk Assessment
,
32
,
2199
2212
.
Zhao
Y.
,
Chadha
M.
,
Olsen
N.
,
Yeates
E.
,
Turner
J.
,
Gugaratshan
G.
,
Qian
G.
,
Todd
M. D.
&
Hu
Z.
(
2023
)
Machine learning-enabled calibration of river routing model parameters
,
Journal of Hydroinformatics
,
25
,
1799
1821
.

Author notes

These authors contributed equally.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).