ABSTRACT
Data assimilation (DA) integrates observations to enhance model predictions, but its benefits can rapidly decay during forecast for models heavily influenced by boundary conditions. To extend forecast accuracy, we combine machine learning (ML) with DA in a hybrid approach. We utilize ML to create synthetic point observations, which are assimilated into the model and thus propagated to other positions and model variables. In a case study, we apply the approach to a tidal estuary hydrodynamic model using a long short-term memory (LSTM) network for ML forecasts and the Ensemble Kalman Filter (EnKF) for assimilation. Hereby, physical consistency and spatial representation of water levels in the estuary are maintained. Our results from 100 forecast runs show significant improvements: over 40% accuracy increase in forecasts up to 4 h ahead, and a >5% error reduction at 9 h compared with a traditional numerical model without assimilation.
HIGHLIGHTS
Joint utilization of numerical model, data assimilation, and machine learning.
The proposed hybrid model significantly increases forecasting accuracy.
Preservation of data assimilation effect through multiple transfer times.
INTRODUCTION AND MOTIVATION
To ensure nautical safety and enable smooth and cost-efficient port operations, as well as to give early, precise warning in case of extreme events such as flooding, knowledge of the hydrodynamic conditions along coastlines, in estuaries and rivers is of huge importance (Asher et al. 2019; Ricci et al. 2011).
Understanding the dynamics of hydrodynamic systems and their influential factors is essential for accurate forecasting and effective management. Analyses of streamflow and water level trends, both historical and real-time, play a crucial role in revealing patterns and changes in hydrological behaviour (Zaghloul et al. 2022). For instance, historical trend analyses help uncover long-term changes in river flow regimes influenced by climatic variables such as temperature and precipitation. These insights are vital for water resource management and planning. On the other hand, short-term variations, primarily driven by tidal dynamics and immediate meteorological conditions, are critical for applications requiring high temporal resolution. Acknowledging both long-term trends and short-term fluctuations enriches our overall understanding of hydrodynamic systems and informs the development of robust forecasting models.
Hydrodynamic conditions are commonly forecasted using physics-based numerical models which provide a digital twin of a water body (Loucks & van Beek 2017). While a good representation of reality, and able to incorporate a multitude of forcings and physical phenomena, numerical models remain a simplification of nature and inherit imprecisions due to uncertainties in (i) initial conditions, (ii) model parameters, (iii) forcings, (iv) numerical approximations, and (v) constrains in spatial as well as temporal resolution, e.g., suffering from unresolved small scales (Liu et al. 2012; Khanarmuei et al. 2021; Buizza et al. 2022).
To address some of the limitations of physics-based numerical models, data assimilation (DA) has become a popular and successful approach in many disciplines for improving model predictions by including observations in the simulation process while considering respective associated uncertainties (Cho et al. 2020).
Commonly, DA is used for improving the estimation of the initial system state and thereby improving the model forecast (Swinbank 2003; Liu et al. 2012). This can improve the forecast skill significantly for systems governed by initial conditions and internal dynamics.
Ricci et al. (2011) assimilated river water level observations from the Adour River and Marne Village in France into a one-dimensional hydraulic model. During application to representative events, they found significant improvements for water levels and discharge for forecasts up to 12 h lead time. To accomplish this, they proposed a two-step Kalman Filter DA algorithm which was later adapted and further extended by Habert et al. (2016), Barthélémy et al. (2017), and Barthélémy et al. (2018) by exploring the use of Ensemble Kalman Filter (EnKF) (Evensen 1994) and subsequently by coupling of one-dimensional and two-dimensional models.
Each of these improvements, when evaluated for multiple flood events, yielded significant improvements in the model's capability to accurately predict observed water levels for short-term forecasts. For instance, Barthélémy et al. (2017) documented an improvement in the water level RMSE of up to 88% at the analysis time and 40% at a 4-h forecast lead time when comparing an EnKF approach to a deterministic numerical model.
Studies of Ricci et al. (2011), Habert et al. (2016), Barthélémy et al. (2017), and Cooper et al. (2018) also show however, that the advantage gained from DA can diminish quickly during forecasting periods in which no observed data for assimilation is available.
The speed of decline in forecasting accuracy or gains from DA, in systems driven by boundary conditions, can be related to the transfer time as done, e.g., by Barthélémy et al. (2017). The transfer time describes the duration a signal needs to propagate from a boundary to an arbitrary point of interest in the model domain. The time horizon of improvement by DA thus strongly depends on the variability of the forcings, geometric properties, and further parameters such as the bottom friction of the system in question (Cooper et al. 2018).
In conclusion, for systems driven by boundary conditions, the provision of accurate initial conditions from DA will solely improve short-term forecasts in the approximate range of the transfer time or few multiples. But since forecasting is the aim for many practical applications, it is of great importance to find ways of improving the forecasts throughout multiple transfer times.
Purely data-driven approaches could in some cases be considered as an alternative to physics-based numerical models. Methods such as classic autoregressive integrated moving average (ARIMA) models, their extensions or various ML methods can be effective and accurate at producing point forecasts if sufficient historical observations are available for training.
As an added benefit, data-driven approaches can consider sources of information farther away from desired prediction locations (e.g., outside of the numerical model domain) and types of information not applicable to numerical models. Thereby they can potentially circumvent typical limitations in transfer time.
Indeed, data-driven approaches have proven capable of representing non-linear dynamics well in a wide range of applications including hydrodynamic forecasting (Nevo et al. 2019; Di Nunno et al. 2021; Guo et al. 2021). Consequently, Liu et al. (2024) in a comprehensive scientometric analysis found a rapid increase in research applying data-driven techniques to the realm of hydrodynamic forecasting between 2015 and 2022.
In the realm of classical statistical methods, ARIMA models and their extensions – considering exogeneous variables (ARIMAX) and seasonal terms (SARIMAX) – have demonstrated effectiveness in capturing temporal patterns and handling non-stationary data. For instance, Agaj et al. (2024) utilized ARIMA and exponential smoothing (ETS) models to forecast river water levels in Northern Macedonia and Kosovo. They found that both models were effective, with the ARIMA model exhibiting slightly superior accuracy, indicated by lower RMSE and higher correlation coefficients in their study. The study demonstrated that even with, in this context, relatively short dataset of nine years, traditional time series models such as ARIMA and ETS can produce acceptable forecasts, providing a valuable tool for hydrological forecasting in areas with limited data availability.
However, these traditional models often struggle to predict extreme events and may have limitations in capturing complex seasonal patterns inherent in hydrodynamic systems.
Addressing some of these challenges, Ghaderpour et al. (2023) employed the antileakage least-squares spectral analysis (ALLSSA) method for analysing and forecasting precipitation time series in Italy. By estimating trends and seasonal components using ALLSSA, they were able to identify declining precipitation trends and pronounced annual cycles across different regions in Italy. In their application, ALLSSA outperformed ARIMA, ARIMAX, and SARIMAX models in forecasting accuracy. Despite its improved performance, all models, including ALLSSA, performed poorly in predicting extreme precipitation events. Consequently, Ghaderpour et al. (2023) suggested that incorporating additional parameters such as temperature, soil moisture, and wind into hybrid models may lead to better and more reliable forecasting results.
Advanced machine learning (ML) approaches, such as recurrent neural networks and in particular long short-term memory (LSTM) networks, have shown great flexiblility in handling exogeneous variables. LSTMs are comparatively easy to implement due to their availability in a wide range of ML frameworks and consequently have been successfully applied to various use cases involving sequential time-series data, including accurate hydrological forecasts (Kratzert et al. 2018). The advantage of LSTMs compared with traditional feed-forward networks is the ability to model the state of the system by explicitly combining the output from the previous time step in addition to the external input in the current time step, in a manner like a dynamical systems model.
However, even though successful, the applicability of purely data-driven approaches possesses some caveats such as (i) the lack of predictability in non-observed points and variables, (ii) risk of producing unphysical results, (iii) lack of interpretability, (iv) lack of robustness to handle noisy and sparse data well, and (v) ability to generalize to new situations (Nearing et al. 2021; Buizza et al. 2022; Liu et al. 2024).
These drawbacks have strong implications for real-world applicability to hydrodynamic systems.
For instance, observed data is commonly either temporally or spatially sparse, as it stems from point in-situ measurements or remotely sensed data with a comparatively low temporal solution.
Leveraging the strengths of both (i) physics-based numerical models and (ii) efficient data-driven approaches combined via DA is thus appealing.
Examples where data-driven and physics-based numerical models are combined can be found in recent literature but are yet sparse in the context of hydrodynamic systems. Buizza et al. (2022) illustrated in multiple examples spanning different fields ranging from fluid dynamics to econometrics to medical applications, the potential of DA and ML combinations to mitigate weaknesses of each other such as the ML susceptibility to noisy data.
In the realm of geophysics, Brajard et al. (2020) combined DA with an ML approach to address issues with sparse and noisy data for the latter and demonstrated the proposed method in an application to the Lorenz 96 (Lorenz & Emanuel 1998) model. Leveraging DA capability to interpolate and smooth sparse and noisy data in a physically consistent manner, the output is provided to train a neural network that provides a surrogate forward model to DA. They concluded that particularly the combination of DA and ML models leads to the success of the proposed method rather than one of the components. Zhang et al. (2006) addressed the absence of observed data for assimilation during forecasting periods for an oceanic wave model. To extend the positive DA effect on model skill into the forecasting period, model errors were emulated with an artificial neural network (ANN).
A further application to wave modelling is provided by Wahle et al. (2015). Illustrated with a case study for the German Bight, they propose the use of a two-step neural network model that is first used to emulate a wave model and later inversed.
However, the application of a combined, modular DA, and ML approach to an operational hydrodynamic forecast system that is easy to implement, to the author's best knowledge, remains absent from the literature.
Therefore, the main objectives of this study are to:
develop a hybrid ML–DA approach to improve spatially resolved and physically consistent water level forecasting in operational hydrodynamic models;
investigate whether integrating ML-generated synthetic observations into a numerical model via DA can effectively improve forecast accuracy beyond traditional limitations imposed by transfer time;
evaluate the impact of a dynamic weighting scheme between ML predictions and numerical model outputs on sustaining improvements over extended forecast periods; and
demonstrate the practical implementation and benefits of the proposed approach through application to a German estuary.
To achieve these objectives, we propose an ML–DA combination in which ML is used to predict water levels at observation stations and use these as synthetic observations for assimilation into a numerical model during forecasting where real observations are absent.
The confidence of the assimilated observations can be dynamically adjusted and thereby used to control the weighting between the numerical model and the observations, and hence, the strength of the assimilation. High confidence is assigned to ML predictions with short lead times. The confidence will then gradually decrease with lead times until the predictive skill of the ML model vanishes.
Thereby, the positive effect of DA will be prolonged, and the forecast accuracy of the hydrodynamic model enhanced far beyond the transfer time.
The feasibility of implementation and practical merits of the proposed approach are illustrated by application to a German estuary.
The main contributions of this study are summarized as follows:
Introduction of a novel hybrid forecasting approach: We propose a new method that combines ML predictions with DA in physics-based numerical hydrodynamic models to enhance water level forecasts.
Dynamic adjustment of confidence levels in assimilated observations: We incorporate a mechanism to dynamically adjust the confidence assigned to the ML predictions used in DA based on lead time, assigning higher confidence to predictions with shorter lead times and gradually decreasing it as predictive skill diminishes.
Extension of DA benefits beyond transfer time: Our approach prolongs the positive effects of DA, significantly improving forecast accuracy well beyond the typical limitations imposed by the transfer time in boundary-driven hydrodynamic systems.
Demonstration of practical implementation: We illustrate the feasibility and practical merits of the proposed ML–DA combination through an application to a German estuary, showcasing improvements in spatially resolved and physically consistent water level forecasting in an operational setting.
Provision of a modular methodology that can be adopted in various operational systems: By integrating ML and DA in an operational hydrodynamic forecasting system that is easy to implement, we address a gap in the existing literature and provide a modular approach that can be adopted in real-world applications.
This manuscript is organized as follows. In Section 2, we first provide an overview of the area of interest (Section 2.1) and the base hydrodynamic model (Section 2.2) as well as available observations (datasets in Section 2.3) before the methodology for our DA (Section 2.4) and ML approach (Section 2.5) and their combination in a hybrid model (Section 2.6) is described. We proceed to the evaluation of results in Section 3. Section 4 discusses findings, while Section 5 provides conclusions.
MATERIALS AND METHODS
Study area
The Elbe River stretches from the Czech Republic, through Germany, into the North Sea. Our area of interest is the estuarine Elbe-section around the city and port of Hamburg close (around 50 km) from the North Sea and hence influenced strongly by tidal conditions and discharge from upstream catchments.
The Port of Hamburg is one of Europe's largest ports and highly important to the German economy. Smooth and timely operation of the port and safe nautical navigation are therefore of utmost practical importance.
For the aforementioned purposes, the port currently relies on a well-calibrated numerical model that provides hind-, now-, and forecasts of current speeds and water levels.
Two-dimensional numerical model and setup
Map of the numerical model, bathymetry, and observation stations and numerical centred around Hamburg in northern Germany. Stations in the bottom left map inset are used as boundary conditions and/or as input to the LSTM.
Map of the numerical model, bathymetry, and observation stations and numerical centred around Hamburg in northern Germany. Stations in the bottom left map inset are used as boundary conditions and/or as input to the LSTM.
The model is constructed in DHI's MIKE 21 Flexible Mesh numerical modelling software (DHI A/S 2022). The unsteady flow is simulated in two dimensions utilizing the depth-integrated, incompressible shallow water equations with a second-order cell-centred finite volume method.
The model considers flooding and drying by dynamically including and excluding areas depending on local water levels and bathymetry.
The computational mesh utilized in this study consists of 13,118 triangular and quadrangular elements of variable size, with a minimum area of 89 m2 and an average of 5,870 m2. This mesh is a runtime-optimized (coarsened) version of the mesh used in the current operational service but achieves similar quality in forecasting water levels.
The model is controlled by two boundary conditions. Upstream, close to Geesthacht (southeastern boundary), a temporally varying discharge is specified. This discharge is obtained from measurements in Neu Darchau, which is located approximately 50 km upstream from the model boundary and therefore applied with a delay of one day to account for the delay in propagation from the measurement station to the model boundary. This delay has been a best practice in the latest years in the operational system.
Tidal-influenced water levels from measurements at Stadersand are applied on the downstream boundary, near Stadersand (northwestern boundary). A temporal shift of +3 min is applied to account for the distance between the measurement and the model boundary.
The bathymetry of the Elbe estuary in the model domain is obtained from soundings (resolution 1 × 1 m, calculated from sounding averages obtained in 2016).
Bed resistance in the model domain is parameterized spatially variable in five different zones, and fitted to yield good numerical results in hindcasts while being chosen in a physically reasonable range with Manning numbers ranging from kst = 37 to 80 m1/3 s−1. The Smagorinsky factor for horizontal eddy viscosity is set constant to 0.28.
In operational mode, the model is run hourly. Each simulation comprises 1.5 h of hindcast and 8.5 h of forecast and inherits initial conditions (water levels and currents) in the entire domain from a prior run of the model (hotstart).
Model forecasts, where no observations of the forcing boundary conditions exist, are produced here by keeping the upstream discharge boundary condition constant at the last observed value. This is reasonable as the temporal variability of the discharge is typically small within the targeted forecast horizon.
The downstream water level condition possesses stronger variability and is predicted in this article from a first-order autoregressive AR(1) error process with a half-life of 18 h and a standard deviation of 0.15 m. The AR(1) boundary water level prediction in this study mimics the forecast accuracy of externally provided water level forecasts in the operational system which were not available for this study.
Datasets
The hybrid model proposed in this manuscript extends the original numerical model with a DA and ML approach and thus ingests a broader variety of observations.
The observations can fall into these four categories:
1. Boundary condition observations,
2. Assimilation observations (in domain),
3. ML observations (inside and outside domain),
4. Validation observations for skill assessment (in domain).
An overview of these observations is given in Table 1. For water level observations within the numerical model domain, the transfer time is calculated by estimating the phase shift between peaks/troughs at the downstream boundary and the respective station from the numerical model result.
Overview of observation stations, variables, and model components for which they are used
Station . | Variable(s) . | Component(s) . | Transfer timea . |
---|---|---|---|
Helgoland | Water level | ML | – |
Cuxhaven | Water level, wind speed, wind direction | ML | – |
Stadersand | Water level | HD | – |
Blankenese | Water level | ML, DA | 33 min |
Seemannshoeft | Water level | Validation | 43 min |
Sankt Pauli | Water level | Validation | 53 min |
Harburg | Water level | Validation | 63 min |
Schoepfstelle | Water level | DA | 69 min |
Neu Darchau | Discharge | HD, ML | – |
Station . | Variable(s) . | Component(s) . | Transfer timea . |
---|---|---|---|
Helgoland | Water level | ML | – |
Cuxhaven | Water level, wind speed, wind direction | ML | – |
Stadersand | Water level | HD | – |
Blankenese | Water level | ML, DA | 33 min |
Seemannshoeft | Water level | Validation | 43 min |
Sankt Pauli | Water level | Validation | 53 min |
Harburg | Water level | Validation | 63 min |
Schoepfstelle | Water level | DA | 69 min |
Neu Darchau | Discharge | HD, ML | – |
Note: HD is used for free numeric hydrodynamic model, ML is used for training of the LSTM, and DA is used as assimilation stations (predicted by ML).
aMean transfer time evaluated in 2016 via water level peak through detection with the hindcast model and the boundary condition.
For use in this article, all observations have been processed to comprise 5-min time steps except for Neu Darchau, which has daily averages of discharge.
Data assimilation
In the presence of observations, DA can be used to improve the state of the numerical model. The EnKF (Evensen 1994) is a powerful DA method that utilizes a model ensemble to propagate the model error and thereby providing a model consistent way to update the model state.
The DA module of MIKE FM includes several variants of the EnKF. Here, the Potter scheme (Whitaker & Hamill 2002) is used with a novel technique for temporal smoothing of the error covariance described in Sørensen et al. (2004). The MIKE FM DA module creates the ensemble by stochastically perturbing model parameters and forcings. The perturbation vectors are called model errors and can be propagated in time with their own AR(1) model with a user-set half-life. The perturbation vectors are appended to the model state vector allowing for an update of the model errors by the assimilation scheme. This approach gives the system some ‘memory’ which is important for models strongly driven by boundary conditions.
Prior skill analysis of the non-assimilative (free) numerical model has shown that large uncertainties stem from the water level boundary condition. We account for this by perturbing the boundary condition with a standard deviation of 0.2 m (σwl in Table 2) to create an ensemble of 10 members. The perturbation is additive and the magnitude is chosen such that it corresponds well with previously observed uncertainty in externally provided forecasts and consequently the AR(1)-process mimicking it in this study.
Perturbation parameters at water level boundary and observation errors for assimilation stations
Parameter . | Description . | Value . | Unit . |
---|---|---|---|
σwl | Std. dev. of perturbations at water level boundary | 0.2 | m |
ρwl | Temporal perturbation half-life | 3 | h |
σas,hc | Std. dev. of error at assimilation stations during hindcast | 0.2 | m |
σas,fc | Std. dev. of error at assimilation stations during forecast | Dynamic | m |
Parameter . | Description . | Value . | Unit . |
---|---|---|---|
σwl | Std. dev. of perturbations at water level boundary | 0.2 | m |
ρwl | Temporal perturbation half-life | 3 | h |
σas,hc | Std. dev. of error at assimilation stations during hindcast | 0.2 | m |
σas,fc | Std. dev. of error at assimilation stations during forecast | Dynamic | m |
Note: The observational error during forecast is a dynamic parameter, see Figure 2.
The temporal AR(1) propagation half-life is chosen to 3 h (ρwl in Table 2), corresponding approximately to half of the duration between Kenterpoints of the tide.
As the choice of assimilation stations is of great importance (Khanarmuei et al. 2021), different in-situ water level observation stations and combinations thereof were tested in assimilative models for their capability to positively impact model performance (not included). Choosing the number of assimilation stations is typically a compromise, preferring the smallest number of stations while still maintaining a reasonable amount of influence on the state of the system. A lower number of assimilation stations is preferred to (i) keep a higher number of stations for validation and here (ii) robustness of the operational system (iii) as prediction of the assimilation stations for forecasting requires additional effort in ML model training (data preparation and computational effort, see Section 2.5). The use of two assimilation stations, at Blankenese and Schoepfstelle, was found best suited here. The assimilation at these stations is conducted at each time step (every 5 min).
The user-set measurement error, given as a standard deviation, accounts for both instrument error (minor) and representation error (larger). During hindcast, where observations are available, the measurement error standard deviation is set constant at 0.2 m (σas,hc in Table 2). The standard deviation is chosen slightly higher than the actual error due to the higher frequency of observations. During standard model forecasts, assimilation cannot be performed due to missing observations. However, for our proposed hybrid model, where synthetic observations are predicted by an ML model, assimilation will be performed in the forecast period and a time-varying value for the measurement error is assigned (σas,fc). This dynamic value depends on the confidence in synthetic observations provided and is described in more detail after the introduction of the ML component of the hybrid model in Section 2.5.
Machine learning
To forecast water levels at assimilation stations via supervised learning, we use an LSTM. Opposed to classical time series forecasting methods such as exponential smoothing or ARIMA models, it has been found capable to represent the complex deformation of the tidal wave due to an interplay of multivariate causes such as (astronomical) tide, discharge, wind, reflections, and geometry of the river. While the LSTM is known for its effectiveness in handling complex, multivariate time series data, our primary objective with this article is to demonstrate the integration of ML with classical numerical models through DA. Therefore, the selection of LSTM was made pragmatically to illustrate the proposed methodology rather than to identify the optimal ML algorithm for time series forecasting in this context.
To implement the LSTM, we utilize the open-source ML framework Tensorflow (version 2.8.2) (Abadi et al. 2015) and the Keras API (version 2.8.0) (Chollet et al. 2015). It is set up to ingest sequences of multiple features (predictors) as input and from those predict an output sequence for one assimilation station as target (predictor). This implies that the two assimilation stations required in our Elbe setup will be predicted independently of one another and can use different features as input.
The choice of LSTM input features is made depending on the availability and quality of observations and expert judgment as well as trial-and-error evaluation of feature importance. A low number of features is preferred as it limits the amount of input data that must be retrieved, stored, and cleaned. This is particularly important regarding the robustness of the operational system as well as possible overfitting of the LSTM. Water levels at Blankenese, Helgoland, Cuxhaven, as well as discharge in Neu Darchau and wind speed and direction from Cuxhaven are used as features to predict water levels at the station Blankenese. Furthermore, metadata such as the hour of the day and the day of the year is used as features to allow learning from seasonal components. To predict water levels at Schoepfstelle, the same features are utilized but supplemented with observed water levels at Schoepfstelle station itself.
It should be noted here that, by including observations from outside the numerical model domain (e.g., Helgoland and Cuxhaven, see Figure 1), upstream information of the tide is incorporated.
Due to the low number of features used, our LSTM is relatively sparse such that it might yield an added benefit of interpretability to some extent.
Before being used as input for the LSTM, all features are preprocessed to yield a resolution of 5-min time steps. For this purpose, the discharge at the station Neu Darchau is linearly interpolated from available daily values.
Cyclical features such as wind direction, the hour of the day, as well as the day of the year were transformed into vector components. Wind direction was joined with speed and transformed into an x and y component whereas the temporal features were expressed in their sinusoidal and cosinusoidal form. One hundred and forty-four lagged time steps, corresponding to 12 h, of these features are fed into the LSTM to predict 144 lead time steps. It should be noted that, in the following evaluations, only 9 out of 12 lead hours are utilized, corresponding to the forecast horizon of the operational numerical model.
Data for all features and targets are retrieved and preprocessed for the years 2016, 2017, and 2018. The years 2017 and 2018 were used for training and validation and for this purpose a temporal split using the former 80% of the data for testing and the latter 20% of the data for validation was conducted.
For training, all features are standardized by removing the mean and scaling to unit variance.
Even though the aim of this article is to illustrate the proposed hybrid method, rather than to have each component perfectly tuned, various LSTM network configurations have been evaluated. Ultimately, we chose a sequential model with a network structure containing an input LSTM layer, a second LSTM layer, and an output layer. The three layers comprise 144, 256, and 144 nodes. We use hyperbolic tangent (tanh) activation functions in both LSTM layers and a linear activation function for the output layer. For training and validation, a batch size of 144 and 100 epochs are used. The mean absolute error is used as loss function for evaluation.
Hybrid model
The core idea of the hybrid model is to use ML-forecasted water level values as synthetic observations for assimilation in the HD model. In this way, the assimilation scheme can transfer the higher accuracy of the ML point forecast to other parts of the domain and to other variables. The hybrid model consists of the following two steps:
1. Forecast of water level in selected points with the ML model (generation of synthetic observations).
2. DA model assimilating real observation in the hindcast period and synthetic observation in the forecast period.
Dynamic standard deviation representing decreasing confidence in synthetic observation (ML prediction) for assimilation, and vice versa increasing confidence in the numerical model, with forecast time.
Dynamic standard deviation representing decreasing confidence in synthetic observation (ML prediction) for assimilation, and vice versa increasing confidence in the numerical model, with forecast time.
RESULTS
To sample alternating times of the day, the 100 samples are extracted with an even spacing of 85 h between start times. Each of the samples comprises a forecast horizon of 9 h.
Evaluation is conducted at three validation stations (not used for assimilation) and both assimilation stations (see Table 1). The model performance is evaluated here by computing the unbiased root mean square error (URMSE) of the following models using the below labels:
HD: The baseline numerical model
LSTM: The ML model (only in selected points)
DA: The DA model only assimilating in the hindcast period
HY: The hybrid DA model assimilating in both hindcast (from observations) and forecast (from LSTM predictions).
URMSE of water level through forecast hours from the numerical model (HD) and machine learning model (LSTM). (a) Station Blankenese and (b) Schoepfstelle.
URMSE of water level through forecast hours from the numerical model (HD) and machine learning model (LSTM). (a) Station Blankenese and (b) Schoepfstelle.
The LSTM forecast is initially more accurate than the HD model at both assimilation stations until 6–7 lead hours. Afterwards, the quality of the LSTM forecast decays while the numerical model result remains comparatively stable. The forecasts at Blankenese are for both models better than the forecasts at Schoepfstelle for all lead times. The minimum URMSE values at Blankenese are 5.8 cm for the LSTM and 7.5 cm for the HD model whereas the minimum URMSE achieved at Schoepfstelle is 7.5 and 10.4 cm, respectively. These results were used to design the time-varying standard deviation of the observation used in the hybrid model shown in Figure 3.
URMSE of water level through forecast hours from the numerical model (HD), standard data assimilation model (DA) and hybrid model (HY). Top row (a, b) shows assimilation stations, whereas others (c, d, e) are validation stations.
URMSE of water level through forecast hours from the numerical model (HD), standard data assimilation model (DA) and hybrid model (HY). Top row (a, b) shows assimilation stations, whereas others (c, d, e) are validation stations.
In addition to the lead time axis, all plots have an additional x-axis showing the transfer time from the domain boundary as explained in the introduction and Table 1.
Similar trends can be observed at all stations: The DA model shows strong improvements during the first forecast hour, or approximately two transfer times, compared with the baseline model (HD). However, the improvement from DA disappears quickly. The DA results are generally worse than the HD after 3 lead hours, after which the two models perform almost identically as the effect of DA vanishes.
The proposed hybrid model (HY) initially performs similarly well as the DA model. However, the HY model preserves the higher accuracy throughout all investigated forecast hours. The most significant improvements are obtained with the HY model throughout the first 3 to 5 forecast hours, equivalent to approximately four transfer times.
Complementing Figure 5, Tables 3 and 4 present the mean URMSE for each lead hour and the standard deviation, respectively.
URMSE mean through all stations for individual lead hours
Forecast (h) . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . |
---|---|---|---|---|---|---|---|---|---|
HDmean (cm) | 9.24 | 9.98 | 10.65 | 11.03 | 11.65 | 12.28 | 12.72 | 13.35 | 13.48 |
DAmean (cm) | 5.34 | 9.52 | 11.12 | 11.15 | 11.63 | 12.29 | 12.72 | 13.35 | 13.48 |
HYmean (cm) | 5.37 | 6.65 | 6.69 | 7.35 | 8.55 | 9.77 | 11.02 | 11.68 | 12.77 |
Forecast (h) . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . |
---|---|---|---|---|---|---|---|---|---|
HDmean (cm) | 9.24 | 9.98 | 10.65 | 11.03 | 11.65 | 12.28 | 12.72 | 13.35 | 13.48 |
DAmean (cm) | 5.34 | 9.52 | 11.12 | 11.15 | 11.63 | 12.29 | 12.72 | 13.35 | 13.48 |
HYmean (cm) | 5.37 | 6.65 | 6.69 | 7.35 | 8.55 | 9.77 | 11.02 | 11.68 | 12.77 |
URMSE std through all stations for individual lead hours
Forecast (h) . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . |
---|---|---|---|---|---|---|---|---|---|
HDstd (cm) | 1.25 | 1.06 | 1.27 | 1.16 | 0.94 | 0.93 | 0.91 | 0.78 | 0.66 |
DAstd (cm) | 0.67 | 0.94 | 1.58 | 1.29 | 0.95 | 0.92 | 0.91 | 0.77 | 0.65 |
HYstd (cm) | 0.46 | 0.52 | 0.73 | 0.31 | 0.28 | 0.56 | 0.94 | 0.90 | 0.86 |
Forecast (h) . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . |
---|---|---|---|---|---|---|---|---|---|
HDstd (cm) | 1.25 | 1.06 | 1.27 | 1.16 | 0.94 | 0.93 | 0.91 | 0.78 | 0.66 |
DAstd (cm) | 0.67 | 0.94 | 1.58 | 1.29 | 0.95 | 0.92 | 0.91 | 0.77 | 0.65 |
HYstd (cm) | 0.46 | 0.52 | 0.73 | 0.31 | 0.28 | 0.56 | 0.94 | 0.90 | 0.86 |
The relative change in URMSE for all stations of the hybrid model (HY) compared with HD (a) and DA (b). Negative y-values indicate that the HY model is better than the HD model or the DA model, respectively.
The relative change in URMSE for all stations of the hybrid model (HY) compared with HD (a) and DA (b). Negative y-values indicate that the HY model is better than the HD model or the DA model, respectively.
Compared with the HD model, the HY model achieves a reduction of errors throughout all forecast lead hours. During the first 3–4 h, the HY model can reduce errors most significantly between 35 and 45% (Figure 6(a)). For later forecast hours, the advantage of the HY model slowly fades, however still achieving 10–15% error reduction compared with HD after 9 forecast hours.
Comparison between the standard DA and the HY model (Figure 6(b)) shows similar improvements except for the first forecast hour during which the DA's improved initial state still impacts the results during the forecast.
DISCUSSION
In this study, we proposed a hybrid approach for enhancing water level forecasts integrating and leveraging individual strengths of numerical modelling, ML, and DA. The overarching concept can be summarized as providing a robust, physics-based approach with a numerical model and extending it via different ways of information transfer. The numerical model uses the physical relationship between system states to extrapolate the information contained in the initial and boundary conditions in space and time. To prevent the model from diverging from the actual state of the system, observed data from available locations can be assimilated into the numerical model. The assimilation effectively transfers knowledge from point observations to a full spatial representation, while preserving the physical consistency between system states.
Our results demonstrate that the hybrid model (HY), which assimilates ML-generated synthetic observations during the forecast period, significantly outperforms both the baseline numerical model (HD) and the standard DA model. The HY model reduced the URMSE by approximately 42% in the first forecast hour compared with the HD model, decreasing from 9.24 to 5.37 cm (Table 3). This substantial error reduction was maintained over several hours, with the HY model achieving over 40% improvement in URMSE up to 4 h ahead (Figure 5). Even at a 9-h lead time, the HY model still provided an error reduction of approximately 5% compared with the HD model.
These findings address the rapid decay of DA benefits commonly observed in boundary-driven hydrodynamic systems, a challenge highlighted in previous studies (Ricci et al. 2011; Barthélémy et al. 2017; Cooper et al. 2018). While standard DA methods improve forecast accuracy by assimilating observations, their benefits often diminish quickly during the forecast period due to the influence of boundary conditions and the absence of new observations. Our hybrid approach overcomes this limitation by using ML predictions to provide synthetic observations during the forecast period, effectively prolonging the positive effects of DA well beyond the traditional transfer time.
By integrating an LSTM network to forecast water levels at assimilation stations, we generated synthetic observations that were assimilated into the numerical model during the forecast period. The choice of an LSTM was based on its ability to handle complex, multivariate time-series data and model the non-linear dynamics of hydrodynamic systems (Kratzert et al. 2018). However, the modular nature of our approach allows for flexibility in the choice of ML models. Alternative methods such as ARIMA, multilayer perceptron, or other recurrent neural networks could also be employed, depending on the complexity of the system and the availability of data.
To account for the increasing uncertainty of LSTM forecasts over lead time, we added a dynamic weighting scheme during the assimilation process. This scheme can be flexibly adjusted in its confidence between the numerical model and ML components over time and for each synthetic observation. In our study, we gradually decrease the confidence in the ML predictions as the lead time increases, shifting the weighting to place more trust in the numerical model's forecasts (Figure 2). This balanced interplay ensures that short-term accurate ML predictions enhance the model's performance, while the robust extrapolation capabilities of the physics-based model maintain accuracy at longer lead times.
A key advantage of our hybrid approach is the preservation of physical consistency and spatial representation of water levels throughout the model domain. Unlike purely data-driven models, which may provide accurate point forecasts but lack spatial coverage and physical interpretability, our method ensures that improvements at assimilation points are propagated throughout the system in a physically consistent manner. This is particularly important in operational contexts where spatially resolved forecasts are necessary for decision-making.
Potential drawbacks of our method can be seen in potentially unphysical results from utilized LSTM as it does not inherently guarantee physically reasonable outputs.
The increased computational effort that is required for DA that increases approximately linearly with ensemble members, compared with a standard hydrodynamic model, could be seen as another drawback.
It should be noted here that the ML component of our approach demands only a negligible amount of computing power while enhancing the accuracy benefits from DA. Thus, we argue that the practical advantages of heightened accuracy and reduced uncertainty in predictions more than compensate for the additional computational demands of DA in our hybrid model. Furthermore, the inclusion of the ML component bolsters the case for using DA.
Several opportunities exist to further enhance our hybrid model:
Enhancing the ML component: Extending the time series data available for training the LSTM and incorporating additional features, such as lunar phases or meteorological variables, could improve prediction accuracy for longer lead times and during extreme events.
Exploring alternative ML techniques: As our approach is modular and flexible, investigating the application of other ML models could be beneficial, depending on the specific characteristics of the hydrodynamic system.
Implementing plausibility checks: Incorporating plausibility thresholds or comparing ML predictions against physical or domain-specific reasonable thresholds can enhance the reliability of forecasts. In a more sophisticated approach, information from surrounding observations can be leveraged. These methods ensure that if the ML model does produce unphysical results, a warning is issued or the unphysical forecasts are temporarily excluded from assimilation. This enhances the overall robustness of the forecasting system.
Expanding DA: For the DA component, extending the model to assimilate additional observations and generating the ensemble by perturbing the tidal boundary in temporal dimension to account for phase-uncertainty, could provide additional benefits, though these improvements may require a larger ensemble size.
Analysing extreme events: While our evaluation is based on an aggregate of 100 forecasts, future work should focus on the model's performance during individual events such as extreme hydrological events to assess its robustness under challenging conditions.
CONCLUSIONS
We have demonstrated the value and effectiveness of a hybrid approach that bridges data-driven and physics-based models, in hydraulic forecasting. Our findings underscore the improvements that can be achieved through the fusion of ML, numerical modelling, and DA techniques.
Notably, the modular implementation of our proposed hybrid model not only makes implementation comparatively simple, but also adds robustness. Both are particularly valuable for operational models.
The study set out to address the following research questions:
Can integrating ML-generated synthetic observations into a numerical model via DA effectively improve forecast accuracy beyond traditional limitations imposed by transfer time?
Does a dynamic weighting scheme between ML predictions and numerical model outputs contribute to sustained improvements over extended forecast periods?
Our results affirmatively answer these questions. The hybrid model significantly improved forecast accuracy beyond traditional limitations, maintaining reduced errors over extended forecast horizons. The dynamic weighting scheme effectively balanced the strengths of short-term accurate ML predictions and the robust extrapolation capabilities of the numerical model, ensuring optimal performance across both short and long lead times.
Key conclusions include:
Extended forecast accuracy: The hybrid model significantly improves forecast accuracy, maintaining reduced errors over extended forecast horizons. This addresses the rapid decay of DA benefits in boundary-driven systems, extending the positive effects beyond the traditional transfer time.
Balanced integration of ML and numerical modelling: The dynamic weighting scheme effectively balances the strengths of short-term accurate ML predictions and the robust extrapolation capabilities of the numerical model. This ensures optimal performance across both short and long lead times.
Preservation of physical consistency: By assimilating ML-generated synthetic observations into the numerical model, the approach maintains physical consistency and provides spatially resolved forecasts, which are critical for operational decision-making.
Modularity and operational feasibility: The modular implementation of the hybrid model enhances its adaptability and robustness, making it particularly suitable for operational settings. It allows for the substitution or enhancement of the ML and DA components without requiring extensive restructuring.
Adaptability to other systems: The approach is adaptable to a wide range of hydrodynamic models and environmental systems. Successful implementation requires careful consideration of data availability, system-specific uncertainties, and appropriate adjustment of the dynamic weighting mechanism.
Overall, the hybrid model in our case study promises enhanced risk assessments for both port operators and shipping companies. This improvement can streamline port scheduling for arrivals, departures, and docking. Moreover, heightened accuracy in prediction can tighten safety margins for bridge passage heights and under-keel clearance, potentially yielding significant cost savings, such as decreased dredging expenses.
ACKNOWLEDGEMENTS
The authors also wish to extend our sincere gratitude to the Hamburg Port Authority for their consistent collaboration and for providing us with essential data, including bathymetric information and observations from stations. Their continued support and contribution have been instrumental in driving this research forward.
FUNDING
This work was funded through DHI's research contract with the Danish Ministry of Higher Education and Science.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.