## Abstract

The conceptual hydrologic model has been widely used for flood forecasting, while long short-term memory (LSTM) neural network has been demonstrated a powerful ability to tackle time-series predictions. This study proposed a novel hybrid model by combining the Xinanjiang (XAJ) conceptual model and LSTM model (XAJ-LSTM) to achieve precise multi-step-ahead flood forecasts. The hybrid model takes flood forecasts of the XAJ model as the input variables of the LSTM model to enhance the physical mechanism of hydrological modeling. Using the XAJ and the LSTM models as benchmark models for comparison purposes, the hybrid model was applied to the Lushui reservoir catchment in China. The results demonstrated that three models could offer reasonable multi-step-ahead flood forecasts and the XAJ-LSTM model not only could effectively simulate the long-term dependence between precipitation and flood datasets, but also could create more accurate forecasts than the XAJ and the LSTM models. The hybrid model maintained similar forecast performance after feeding with simulated flood values of the XAJ model during horizons to . The study concludes that the XAJ-LSTM model that integrates the conceptual model and machine learning can raise the accuracy of multi-step-ahead flood forecasts while improving the interpretability of data-driven model internals.

## HIGHLIGHTS

Proposed a novel hybrid XAJ-LSTM model that combines XAJ model with LSTM neural network.

Compared XAJ-LSTM model with XAJ model and LSTM neural network for multi-step-ahead flood forecasting.

Employed MIV algorithm to analyze the relative importance of the input variables.

## INTRODUCTION

Accurate multi-step-ahead flood forecasting has important guiding significance for reservoir operation, flood control, and water resources management (Noori & Kalin 2016; Zhang *et al.* 2016; Young *et al.* 2017) while it has been a challenge all the time, subject to a dynamic climate environment and complex hydrological process (Xie *et al.* 2019; Zhou *et al.* 2020). Various hydrological models have been developed for flood simulation and forecasting in order to grasp the characteristics of hydrological processes and facilitate policy-makers in making better decisions (Ba *et al.* 2017; Zuo *et al.* 2020). Generally, the hydrological models have been classified into conceptual, physically based, and data-driven (also referred to as black-box) models (Tokar & Markus 2000; Pang *et al.* 2007; Makwana & Tiwari 2014; Zhang *et al.* 2016; Ren *et al.* 2018).

A perceived advantage of conceptual and physically based models, compared with data-driven models, is converting a hydrological cycle mechanism into mathematical formulas to store physical meaning (Rezaeianzadeh *et al.* 2013; Badrzadeh *et al.* 2015; Xiang *et al.* 2020), while they are able to obey the law of conservation of mass and capture the desired characteristics of hydrological processes. Meanwhile, conceptual models have been widely used owing to their physical basis and relatively simple model structures. For instance, Xin-An-Jiang (XAJ) model, which is a conceptual model proposed by Zhao (1992) and has been widely used to make flood forecasts in humid and semi-humid regions in China (Zhou *et al.* 2019a), not only has a solid theoretical foundation and intelligible physical meaning, but also is easy to apply in practice (Bai *et al.* 2017). Since the conversion of rainfall into runoff is a non-linear, dynamic, and non-stationary process (Wang *et al.* 2006; Pang *et al.* 2007), hydrologic models are susceptible to various sources of uncertainty to the extent that they cannot maintain accurate multi-step forecasts (Zhou *et al.* 2019b; Zuo *et al.* 2020).

Meanwhile, data-driven models relying on historical rainfall and runoff series rather than mimic the physical process may be suitable for handling non-linear and highly stochastic systems (Dawson & Wilby 2001; Hu *et al.* 2018; Kratzert *et al.* 2018; Kao *et al.* 2020). One of the representative models is artificial neural network models (ANNs) (Tokar & Johnson 1999; Ha & Stenstrom 2003; Faruk 2010; Chang *et al.* 2015; Chang & Tsai 2016). In recent years, owing to the breakthrough in algorithms and computational conditions, deep neural networks (DNNs, also referred to as deep learning methods) based on ANNs have received a great deal of attention (LeCun *et al.* 2015). The long short-term memory neural network (LSTM) is one of the most common applications for DNNs (Kao *et al.* 2020; Tennant *et al.* 2020), continuously storing antecedent useful information via the memory cell structure. It has been widely used for various fields, including flood forecasting (Hu *et al.* 2018; Le *et al.* 2019; Ding *et al.* 2020; Lin *et al.* 2020; Tennant *et al.* 2020), with multiple hidden layers and techniques such as minibatch training and dropout method (LeCun *et al.* 2015; Xiang *et al.* 2020). The single-output LSTM neural network has been found to conduct satisfying hydrographs and handle potential noise in the series (Nourani *et al.* 2014; Zhang *et al.* 2016; Hu *et al.* 2018; Kratzert *et al.* 2018). Nevertheless, it cannot consider the physical mechanism, like other ANNs, and produces less accurate forecasting results in the long lead times due to the influence of available input variables (Kurian *et al.* 2020). In addition, the recursive strategy also allows for multi-step-ahead forecasting (Ben Taieb & Atiya 2016). This strategy uses the simulations from the same LSTM model as antecedent discharge for the long lead time whereas it will be prone to accumulated errors of flood forecasts (Chang *et al.* 2014; Tran *et al.* 2016; Zhou *et al.* 2019b).

Pre-processing and post-processing techniques have been widely applied to improve the forecast accuracy of hydrological models. Pre-processing techniques, such as the wavelet transform (Barzegar *et al.* 2018), empirical mode decomposition (Wu *et al.* 2017), singular spectrum analysis (Poornima & Jothiprakash 2018) are used to remove outliers or noise from the raw data. Post-processing techniques, such as Kalman filtering (Kalman 1960; Liu *et al.* 2016), Bayesian model averaging (Raftery *et al.* 2005; Li *et al.* 2017), hydrologic uncertainty processor (Krzysztofowicz 1999; Han *et al.* 2019) are used to correct the forecast values in real time. There is also an approach to improve the forecast accuracy by combining conceptual models with data-driven models (Maier *et al.* 2010; Humphrey *et al.* 2016; Mount *et al.* 2016; Young *et al.* 2017; Li *et al.* 2020). Using the output values of the conceptual model as additional input variables of the LSTM is a novel idea to improve flood forecasting accuracy. ANNs (e.g., LSTM) have the ability to tackle a certain level of noise residing in the input variables (Zhang *et al.* 2016). Additionally, the simulated discharges provided by conceptual models may preserve the physical characteristics of flood hydrographs and have a certain accuracy (Kurian *et al.* 2020). Therefore, the forecasting accuracy of ANNs can be improved with simulated discharges of reasonable precision being used as inputs (Kumar *et al.* 2015). Humphrey *et al.* (2016) constructed the hybrid model of GR4 J and Bayesian neural network (GR4 J-BNN), which is able to take advantage of the complementary advantages of the GR4 J model and BNN model to obtain a reliable forecast distribution. Noori & Kalin (2016) proposed a hybrid model in which the values of baseflow and stormflow simulated by the Soil and Water Assessment Tool (SWAT) are used as inputs of ANN, and the hybrid model can lead to improved daily streamflow forecasting in an ungauged basin. Young *et al.* (2017) used the simulated flows from the hydrologic modeling system of the Hydrologic Engineering Center (HEC-HMS) as additional inputs of backpropagation neural network (BPNN) and support vector regression (SVR), and the hybrid models greatly improved the prediction of HEC-HMS. Yang *et al.* (2019) proved that the LSTM-based hybrid model can effectively improve the performance of flood simulation on the global scale. Kurian *et al.* (2020) used the simulation results of the HEC-HMS and SWAT models as the inputs of ANN, and the hybrid models achieved better prediction performance in the long lead times. At present, hybrid models are mostly conceptual or physically based models coupled with the simple ANNs, obtaining good results, whereas there are few cases where the LSTM neural network is applied to the type of hybrid model. Konapala *et al.* (2020) confirmed the applicability of LSTM-based hybrid models in a watershed under diverse natural conditions (such as climate, topography, soil characteristics, etc.) in the United States. The above-mentioned researches regarding the LSTM-based hybrid models have in common that they demonstrate the effectiveness of the hybrid models on the large spatial scale, which can analyze its simulation performance of single-step rather than multi-step. Moreover, there has not been a hybrid model coupled with real precipitation forecasting information for research.

This study proposes a hybrid forecast model that integrates the XAJ model and the LSTM neural network for the first time to produce accurate multi-step-ahead flood forecasts. The hybrid model (i.e., XAJ-LSTM) combines the XAJ model and LSTM neural network through taking the simulation values of the XAJ model and forecast precipitation as the input variables of the LSTM neural network. The XAJ-LSTM model was applied to the Lushui basin which is a closed basin. The contribution of this study is three-fold. First, the XAJ model provides accurate flow simulation for the input of the hybrid model, which can significantly improve the performance of the LSTM neural network and effectively extend the lead time. Second, the rainfall products of the European Centre for Medium-Range Weather Forecasts (ECMWF) are used to further explore the performance of the hybrid model and the reliability in real-world operations of Lushui reservoir. Third, the mean impact value (MIV) method is employed to demonstrate the relative importance of the simulated flow of the Xinanjiang model to improve the forecast accuracy of the LSTM neural network. The novelty of this study is due to integrating the XAJ model and the currently popular LSTM neural network for the first time to produce accurate multi-step-ahead flood forecasts. The XAJ model and the LSTM neural network are constructed for comparison purposes.

## METHODS

Figure 1 illustrates the architecture of the hybrid model that integrates the XAJ model (Figure 1(a)) and the LSTM neural network (Figure 1(b)). The XAJ model and LSTM neural network are regarded as benchmark models to compare the forecast performance of the XAJ-LSTM model. The related methods are described as follows.

### Benchmark model

#### XAJ model

The XAJ model is a conceptual hydrological model (Zhao 1992), which has been widely used to make flood forecasts for humid and semi-humid regions of China. As shown in Figure 1(a), the lumped XAJ model consists of a four-layer structure, including the evapotranspiration part, the runoff generation part, the runoff partition part, and the runoff concentration part (Hu *et al.* 2005). The XAJ model commonly uses the shuffled complex-evolution metropolis algorithm (SCEM-UA) to optimize the model parameters considering the Nash–Sutcliffe efficiency (NSE) indicator as the objective function.

#### Long short-term memory (LSTM) neural network

The LSTM neural network was proposed by Hochreiter & Schmidhuber (1997), and is a special recurrent neural network (RNN). The difference between LSTM and other ANNs is that the hidden layer in LSTM is composed of an internal self-loop unit (Zhou 2020), which is able to overcome the gradient explosion and disappearance bottleneck prone to appear in RNN in the backpropagation through time (BPTT) algorithm (Kao *et al.* 2020). Therefore, LSTM has good applicability in processing the prediction of various time series.

The LSTM usually consists of a three-layer network structure (input layer, hidden layer, output layer) as a fully connected neural network (Figure 1(b)). See Supplementary Material, Appendix A for specific formula details. In this study, the output values of the hidden layer and the output layer are calculated by the sigmoid function, while the Adam algorithm is used for model training. The genetic algorithm (GA) is adopted to optimize the hyperparameters of LSTM, where the NSE indicator is taken as the objective function.

*X*denotes normalized data;

_{t}*x*denotes sample data;

_{t}*n*denotes the total number of samples; min (·) and max (·) denote the minimum and maximum function.

To facilitate the analysis of the relative importance of the input variables and its variation, the single-output LSTM neural network is constructed for lead time flood forecasting.

### XAJ-LSTM hybrid model

The hybrid model (i.e., XAJ-LSTM) takes forecasting discharges that preserve a certain physical characteristic from the XAJ model into account, along with the precipitation information as input variables of the LSTM neural network. Considering that the observed flow discharge may be subject to anthropogenic influences, which have an impact on the performance of the XAJ model with physical mechanisms, whereas such time series seem to be captured by the data-driven LSTM model. Therefore, integrating the LSTM, which can handle a certain noise from anthropogenic impacts, with the XAJ, which generates a reasonable flood hydrograph, is able to improve forecast performance of benchmark models and prolong their lead times.

The structure of the XAJ-LSTM hybrid model is depicted in Figure 1(c). In the simulation stage, observed precipitation and evaporation are used as inputs of the XAJ model. Then the outputs of XAJ and the observed precipitation data are used as inputs of LSTM to complete the model training and validation. In the forecast stage, forecasting discharge created by XAJ and forecasting precipitation data from the ECMWF are taken as inputs of LSTM to test the effectiveness of the hybrid model.

*t*. and denote the maximum lengths of time-lags for precipitation and discharge, respectively, and

*s*denotes the length of the lead time. The output of the hybrid model can be expressed by a general equation:where is the forecast discharge of the hybrid model at time

*t*

*+*

*s*; is LSTM neural network of the hybrid model; is the set of observed precipitation with time-lag

*r*; is the set of forecast precipitation with time-lag

*v*; is the set of observed discharge with time-lag

*k*; is the set of simulated discharge with time-lag

*l*.

*r, v, k, l*, and

*t*are positive integers, and . The values of

*v*and

*l*obey Equation (3):

### Evaluation of model performance

*n*th observed and forecasted discharge, respectively; and are the mean value of the observed and forecasted discharge, respectively; is the ratio between the standard deviation of forecasted discharge and that of the observed discharge; is the ratio between the mean of the forecasted discharge and that of the observed discharge;

*N*is the total number of samples.

From the perspective of index characteristics, the index of R can assess the degree of similarity between the observed and forecasted discharge, the NSE and KGE indices are able to assess the impact of high flows on model performance, while the RE index is able to assess the impact of total flood volume on model performance. Additionally, the RMSE index is used to evaluate the effect of flood volume at each lead time on model performance. The reasonable range of NSE, KGE, and R is while that of RE and RMSE is and , respectively. The optimum values of NSE, KGE, and R are 1 and the optimum values of RE and RMSE are 0, which indicate the high goodness-of-fit between forecasted and observed results.

## CASE STUDY

### Study area and data materials

The Lushui River is a primary tributary on the south bank of the middle reaches of the Yangtze River, and the Lushui reservoir basin (Figure 2) with a drainage area of 3,950 km^{2} belongs to the subtropical monsoon climate zone with a warm climate and an annual average temperature about 15.5 °C. The average annual rainfall is 1,550 mm with a high intra- and interannual variability. The rainy season is early, generally concentrated in April to September, accounting for 70% of the annual rainfall. The average annual runoff is about 3.03 billion m^{3} and the biggest flood generally occurs during May up to June. The difference in terrain of Lushui reservoir basin is large, with the highest terrain (1,493 m) appearing in the southeast and the lowest (29 m) in the northwest (Figure 2). After a high-intensity rainfall event, the time of runoff yield and concentration is very short. The reservoir capacity of about 742 million m^{3}, located at the outlet of the mainstream, has a great effect on the regulation and storage of flood resources. The Lushui reservoir is facing tremendous flood prevention pressure in the rainy season due to the limitations of small storage and inaccurate forecasts. Therefore, this has driven an increasing demand for making accurate forecasts to provide effective decision support for flood prevention and water resources management.

This study collected the datasets associated with flood events during the flood season (May 1 to October 31), including 3 h precipitation data of 17 gauge stations, 3 h evaporation and inflow data of the Lushui reservoir during 2012–2019, as well as 3 h precipitation forecast data (spatial resolution of each grid: 0.125°E × 0.125°N of ECMWF) from 2017 to 2019. Both areal precipitation values of 17 gauge stations and ECMWF grids were calculated by the Thiessen polygon method. Considering the flood forecast in different lead times to guide the reservoir operation, the range of the lead time studied is set to 3–12 h, with an interval of 3 h (defined as ). In the study, the dataset was divided into three parts for model training, validation, and testing. The differences between the training, validation, and testing phases as well as the simulation and forecasting stages are shown in Table 1.

Stage . | Period . | Time range . |
---|---|---|

Simulation | Warm-up period | 3 h data of flood season^{a} in 2012 |

Training period | 3 h data of flood season during 2013 up to 2016 | |

Validation period | 3 h data of flood season during 2017 up to 2019 | |

Forecasting | Test period | 3 h data at 11:00, 14:00, 17:00, and 20:00 for each day in flood season during 2017 up to 2019 |

Stage . | Period . | Time range . |
---|---|---|

Simulation | Warm-up period | 3 h data of flood season^{a} in 2012 |

Training period | 3 h data of flood season during 2013 up to 2016 | |

Validation period | 3 h data of flood season during 2017 up to 2019 | |

Forecasting | Test period | 3 h data at 11:00, 14:00, 17:00, and 20:00 for each day in flood season during 2017 up to 2019 |

^{a}Flood season spans from 1 May to 31 October.

In addition, the starting time of flood forecast is 8:00 each day in the test period. Therefore, the forecast results for the 3 h, 6 h, 9 h, and 12 h lead time are the instantaneous forecast data at 11:00, 14:00, 17:00, and 20:00 each day in flood season during 2017 up to 2019, respectively.

### Input variable selection and structure

The inputs of XAJ are forecasted precipitation and evaporation material, and the output is the inflow of Lushui reservoir. The inputs of LSTM neural network and LSTM module of hybrid model are similar, consisting of antecedent precipitation, forecasting precipitation, antecedent discharge, and forecasting discharge material. The expected output is the observed discharge, and the inputs of the neural network need to be further selected preferentially. In Equation (2), the and are the maximum lag times of precipitation and flow in input data, respectively. The values of the two variables are determined by comprehensively considering the flow concentration time in association with the basin characteristics and the cross-correlation function (CCF) between input and output data (Srinivasulu & Jain 2006; Wu *et al.* 2014; Ba *et al.* 2017).

We used the CCF between the rainfall and runoff series to estimate the time of concentration in the Lushui basin (Ba *et al.* 2017). It is observed from Figure 3 that the CCF value is the largest in the 12 h lagged rainfall, which indicates that the runoff concentration time is about 12 hours for the investigated basin.

Therefore, the precipitation and discharge with 3 h, 6 h, 9 h, and 12 h time lags can be selected as input, and the forecasted precipitation at the forecast time from ECMWF is used as input. In the earlier ANNs, the discharge at the forecast time is not considered because it is unknown. , , , , , , , , and are used to forecast the discharge at in the LSTM model.

However, the forecasted discharge at the forecast time can be obtained in the hybrid model. Thus, , , , , , , , , , and are used to forecast the discharge at in the XAJ-LSTM model. According to Equation (2), the input variables of LSTM during lead time up to are determined and input structures of models are shown in Table 2, where , , and denote the forecasted discharge of XAJ model, LSTM model, and XAJ-LSTM hybrid model, respectively; denotes the forecasted precipitation; *P* denotes the observed precipitation; *Q* denotes the observed discharge.

Model . | Input variables . | Output . |
---|---|---|

XAJ | , | |

, | ||

, | ||

, | ||

LSTM | , , , , , , , , | |

, , , , , , , | ||

, , , , , , | ||

, , , , , | ||

XAJ-LSTM | , , , , , , , , , | |

, , , , , , , , , | ||

, , , , , , , , , | ||

, , , , , , , , , |

Model . | Input variables . | Output . |
---|---|---|

XAJ | , | |

, | ||

, | ||

, | ||

LSTM | , , , , , , , , | |

, , , , , , , | ||

, , , , , , | ||

, , , , , | ||

XAJ-LSTM | , , , , , , , , , | |

, , , , , , , , , | ||

, , , , , , , , , | ||

, , , , , , , , , |

The input of the LSTM model includes observed discharge, observed and forecast precipitation information except forecasted discharge. When the lead time is extended, observed discharge will gradually become less or even unavailable. The inputs of the XAJ-LSTM model are added to input of the LSTM model to forecasted discharge from the XAJ model. When the lead time is , there will be only the forecasted rainfall and drainage without observations' input into the neural network.

## RESULTS AND DISCUSSION

Five indexes were performed to assess the performance of the XAJ-LSTM, XAJ, and LSTM models in the training, validation, and testing stages. To verify the reliability of the three models, this study also assessed the model performance of three flood events. Finally, the relative importance of input variables on model performance were analyzed.

### Evaluation of model performance in the training and validation stages

The XAJ model was calibrated by using the SCEM-UA during the training period and the parameter values are summarized in Table 3. The performance of XAJ and LSTM models for flood forecasting during the training and validation stages are summarized in Table 4.

Parameter . | Description . | Value . |
---|---|---|

WM (mm) | Areal mean tension water capacity | 142.30 |

WUM (mm) | Averaged soil moisture storage capacity of the upper layer | 34.75 |

WLM (mm) | Averaged soil moisture storage capacity of the lower layer | 84.36 |

B | Exponent of the tension water capacity curve | 0.50 |

KC | Ratio of potential evapotranspiration to pan evaporation | 0.95 |

C | The coefficient of deep evapotranspiration | 0.18 |

IM | Percentage of impervious and saturated areas in the catchment | 0.05 |

SM (mm) | Areal mean of the free water capacity of the surface soil layer | 49.97 |

EX | Exponent of the free water capacity curve | 1.08 |

KI | Outflow coefficients of the free water storage to interflow | 0.19 |

KG | Outflow coefficients of the free water storage to groundwater | 0.46 |

CI | Recession constants of the lower interflow storage | 0.87 |

CG | Recession constants of the groundwater storage | 0.98 |

n | Number of linear reservoirs of Nash unit hydrograph | 3.99 |

k (h) | Storage coefficient of linear reservoirs of Nash unit hydrograph | 12.94 |

Parameter . | Description . | Value . |
---|---|---|

WM (mm) | Areal mean tension water capacity | 142.30 |

WUM (mm) | Averaged soil moisture storage capacity of the upper layer | 34.75 |

WLM (mm) | Averaged soil moisture storage capacity of the lower layer | 84.36 |

B | Exponent of the tension water capacity curve | 0.50 |

KC | Ratio of potential evapotranspiration to pan evaporation | 0.95 |

C | The coefficient of deep evapotranspiration | 0.18 |

IM | Percentage of impervious and saturated areas in the catchment | 0.05 |

SM (mm) | Areal mean of the free water capacity of the surface soil layer | 49.97 |

EX | Exponent of the free water capacity curve | 1.08 |

KI | Outflow coefficients of the free water storage to interflow | 0.19 |

KG | Outflow coefficients of the free water storage to groundwater | 0.46 |

CI | Recession constants of the lower interflow storage | 0.87 |

CG | Recession constants of the groundwater storage | 0.98 |

n | Number of linear reservoirs of Nash unit hydrograph | 3.99 |

k (h) | Storage coefficient of linear reservoirs of Nash unit hydrograph | 12.94 |

Period . | Index . | XAJ model^{a}
. | LSTM model . | |||
---|---|---|---|---|---|---|

. | . | . | . | |||

Training | NSE | 0.931 | 0.978 | 0.959 | 0.922 | 0.883 |

KGE | 0.894 | 0.960 | 0.953 | 0.914 | 0.907 | |

R | 0.965 | 0.989 | 0.980 | 0.961 | 0.940 | |

RE (%) | −2.309 | 5.115 | −2.783 | −4.117 | −4.706 | |

RMSE (m^{3}/s) | 72.068 | 37.220 | 54.693 | 76.455 | 95.704 | |

Validation | NSE | 0.891 | 0.982 | 0.953 | 0.901 | 0.856 |

KGE | 0.884 | 0.911 | 0.924 | 0.883 | 0.879 | |

R | 0.943 | 0.991 | 0.976 | 0.949 | 0.925 | |

RE (%) | 0.337 | 5.631 | −2.084 | −3.091 | −5.125 | |

RMSE (m^{3}/s) | 77.346 | 36.709 | 55.540 | 79.448 | 93.592 |

Period . | Index . | XAJ model^{a}
. | LSTM model . | |||
---|---|---|---|---|---|---|

. | . | . | . | |||

Training | NSE | 0.931 | 0.978 | 0.959 | 0.922 | 0.883 |

KGE | 0.894 | 0.960 | 0.953 | 0.914 | 0.907 | |

R | 0.965 | 0.989 | 0.980 | 0.961 | 0.940 | |

RE (%) | −2.309 | 5.115 | −2.783 | −4.117 | −4.706 | |

RMSE (m^{3}/s) | 72.068 | 37.220 | 54.693 | 76.455 | 95.704 | |

Validation | NSE | 0.891 | 0.982 | 0.953 | 0.901 | 0.856 |

KGE | 0.884 | 0.911 | 0.924 | 0.883 | 0.879 | |

R | 0.943 | 0.991 | 0.976 | 0.949 | 0.925 | |

RE (%) | 0.337 | 5.631 | −2.084 | −3.091 | −5.125 | |

RMSE (m^{3}/s) | 77.346 | 36.709 | 55.540 | 79.448 | 93.592 |

^{a}The training and validation periods in the XAJ model refer to the flood periods in 2013–2016 and 2017–2019, respectively. Since the flood period in 2012 was taken as the warm-up period of the XAJ model, the data in 2012 were not used for assessing the three models.

In terms of NSE, KGE, R, and RMSE values, the LSTM model has a higher forecast accuracy at lead times and , and gradually deteriorates with the lead time increasing from to , in comparison with the XAJ model.

The reason may be that the correlation between input variables and expected output variables with the longer lead time is decreased, resulting in a weakened ability to capture the relationship between variables. If the input variables that are highly correlated with the expected output can be provided consistently for the LSTM model, this problem could be overcome. The observed discharge is greatly fitted by the simulated discharge of the XAJ model, which has a high correlation with the expected output variable of the LSTM model. In terms of NSE, R, RE, and RMSE values during the longer lead times, the XAJ model has a better forecast accuracy than the LSTM model from the perspective of high discharge and flood magnitudes, owing to stable physical mechanism of model internals. The results imply that taking simulated flood values in a hydrograph of a flood event from XAJ as input to LSTM is expected to improve the forecast accuracy.

The performance of the XAJ-LSTM model for flood forecasting at lead times , , , and in the training and validation stages is shown in Table 5. It is noteworthy that the NSE, KGE, and R values of the XAJ-LSTM model are all above 0.930, which can improve the simulated results of XAJ and LSTM benchmark models. As for lead times and , no matter what the inputs of LSTM and XAJ-LSTM models are, the simulated discharges of the two models have similar statistical characteristics. As the lead time increases to , the performance of the LSTM model begins to deteriorate, caused by the gradual unavailability of inputs that have a higher correlation with the observed discharge, which severely restrains the model performance.

Period . | Index . | XAJ-LSTM model . | |||
---|---|---|---|---|---|

. | . | . | . | ||

Training | NSE | 0.983 | 0.967 | 0.950 | 0.941 |

KGE | 0.965 | 0.953 | 0.949 | 0.940 | |

R | 0.992 | 0.984 | 0.976 | 0.971 | |

RE (%) | 2.472 | −4.389 | 1.511 | 1.028 | |

RMSE (m^{3}/s) | 32.641 | 45.928 | 56.104 | 61.299 | |

Validation | NSE | 0.983 | 0.959 | 0.944 | 0.932 |

KGE | 0.941 | 0.958 | 0.941 | 0.939 | |

R | 0.992 | 0.980 | 0.972 | 0.966 | |

RE (%) | 3.615 | −1.734 | 0.266 | −0.114 | |

RMSE (m^{3}/s) | 35.266 | 55.162 | 64.670 | 71.343 |

Period . | Index . | XAJ-LSTM model . | |||
---|---|---|---|---|---|

. | . | . | . | ||

Training | NSE | 0.983 | 0.967 | 0.950 | 0.941 |

KGE | 0.965 | 0.953 | 0.949 | 0.940 | |

R | 0.992 | 0.984 | 0.976 | 0.971 | |

RE (%) | 2.472 | −4.389 | 1.511 | 1.028 | |

RMSE (m^{3}/s) | 32.641 | 45.928 | 56.104 | 61.299 | |

Validation | NSE | 0.983 | 0.959 | 0.944 | 0.932 |

KGE | 0.941 | 0.958 | 0.941 | 0.939 | |

R | 0.992 | 0.980 | 0.972 | 0.966 | |

RE (%) | 3.615 | −1.734 | 0.266 | −0.114 | |

RMSE (m^{3}/s) | 35.266 | 55.162 | 64.670 | 71.343 |

At the same time, it also highlights the importance of replacing the LSTM inputs with the simulated flows of the conceptual model (XAJ). The hybrid model (XAJ-LSTM) is able to release the limitation of the available input variables, effectively improving simulated accuracy of the LSTM model, and the powerful learning ability of an ANN can be fully utilized.

Figure 4 shows the scatter plots of observed and simulated discharge in the training and validation stages. In the training and validation stages, the scatter points of high discharge simulated by XAJ were located below the 1:1 ideal line, indicating that the observed discharge was underestimated. Although the scatter points simulated by LSTM can be uniformly distributed around the 1:1 line with the shorter lead times, the scatter distribution became gradually uneven and loose as the lead time increases and the scatter points of high discharge gradually were located below the 1:1 line as the lead time increased. In contrast, the scatter points simulated by XAJ-LSTM can be uniformly distributed around the 1:1 line during the training stage, but the high-flow simulation points were located below the 1:1 line during the validation stage. The reason for this problem is that there are fewer observed high flows involved in the parameter calibration process during the training stage, so the high-flow points in the validation stage appear to be underestimated. However, compared with the benchmark models, the hybrid model is closer to the observed points, indicating that the hybrid model effectively improves the problem of underestimating the observed discharge in the higher lead times. As shown by the R value, XAJ-LSTM is almost the highest all the time, and the R values of LSTM are lower than that of XAJ at the lead time , indicating that the simulated performance of XAJ-LSTM is optimal and the simulated performance of XAJ is better relative to LSTM with the longer lead times.

### Evaluation model performance in the testing stage

In the forecast (i.e., testing) stage, during the 2017–2019 flood season, forecast precipitation information of the ECMWF is used for driving the hydrological models to forecast discharge in order to guide the operation and management of the Lushui reservoir, which is an important way to apply hydrological models to practice. The input data of the LSTM model and forecast flows of the XAJ model are used as the input data of LSTM modules in the hybrid model (i.e., XAJ-LSTM). The starting time of flood forecast is 8:00 each day in the test period. Therefore, the forecast results of the 3 h, 6 h, 9 h, and 12 h lead times are the instantaneous forecast data at 11:00, 14:00, 17:00, and 20:00 each day in the flood season during 2017 up to 2019, respectively.

The evaluation indexes of forecasted results during the testing stage are shown in Table 6. The change of the evaluation indexes is similar to that of the simulated stage, but the performance degradation is greater than that of simulation stage, indicating that there is a large deviation between the simulation and the real application. With increasing uncertainty of inputs such as forecast precipitation, etc., as lead times extend, a deterioration in forecast accuracy, correlation coefficient, and water volume error was observed. Meanwhile, with the same lead time, the NSE, KGE, and R of the XAJ-LSTM model are higher than those of XAJ and LSTM models; and the RE and RMSE are lower than that of the latter two, which indicates the hybrid model has significantly improved the multi-step-ahead forecast performance of benchmark models.

Model . | Index . | Lead time . | |||
---|---|---|---|---|---|

. | . | . | . | ||

XAJ | NSE | 0.921 | 0.881 | 0.824 | 0.761 |

KGE | 0.897 | 0.887 | 0.796 | 0.776 | |

R | 0.960 | 0.940 | 0.909 | 0.886 | |

RE (%) | 4.976 | 5.765 | 4.681 | −3.802 | |

RMSE (m^{3}/s) | 57.854 | 88.896 | 118.680 | 144.159 | |

LSTM | NSE | 0.979 | 0.936 | 0.850 | 0.749 |

KGE | 0.909 | 0.858 | 0.773 | 0.765 | |

R | 0.990 | 0.968 | 0.925 | 0.867 | |

RE (%) | 6.055 | 1.774 | 7.943 | 1.244 | |

RMSE (m^{3}/s) | 35.180 | 65.207 | 119.528 | 150.627 | |

XAJ-LSTM | NSE | 0.981 | 0.943 | 0.895 | 0.813 |

KGE | 0.930 | 0.918 | 0.823 | 0.830 | |

R | 0.991 | 0.971 | 0.947 | 0.902 | |

RE (%) | 4.664 | 3.792 | 4.094 | 0.663 | |

RMSE (m^{3}/s) | 33.137 | 61.288 | 91.680 | 129.997 |

Model . | Index . | Lead time . | |||
---|---|---|---|---|---|

. | . | . | . | ||

XAJ | NSE | 0.921 | 0.881 | 0.824 | 0.761 |

KGE | 0.897 | 0.887 | 0.796 | 0.776 | |

R | 0.960 | 0.940 | 0.909 | 0.886 | |

RE (%) | 4.976 | 5.765 | 4.681 | −3.802 | |

RMSE (m^{3}/s) | 57.854 | 88.896 | 118.680 | 144.159 | |

LSTM | NSE | 0.979 | 0.936 | 0.850 | 0.749 |

KGE | 0.909 | 0.858 | 0.773 | 0.765 | |

R | 0.990 | 0.968 | 0.925 | 0.867 | |

RE (%) | 6.055 | 1.774 | 7.943 | 1.244 | |

RMSE (m^{3}/s) | 35.180 | 65.207 | 119.528 | 150.627 | |

XAJ-LSTM | NSE | 0.981 | 0.943 | 0.895 | 0.813 |

KGE | 0.930 | 0.918 | 0.823 | 0.830 | |

R | 0.991 | 0.971 | 0.947 | 0.902 | |

RE (%) | 4.664 | 3.792 | 4.094 | 0.663 | |

RMSE (m^{3}/s) | 33.137 | 61.288 | 91.680 | 129.997 |

The LSTM neural network can effectively process data with characteristics such as non-linearity and high randomness. Therefore, the forecasted discharge of the XAJ model in the longer lead time is significantly improved after the post-processing work from the LSTM neural network. On the other hand, since the forecast flows of the XAJ model can supplement insufficient inputs of the LSTM model, the hybrid model (i.e., XAJ-LSTM) effectively improves the forecast results of LSTM and prolongs its forecast period. This further highlights the flows of the conceptual hydrological model are of great significance to maintain the forecast accuracy of the LSTM network. However, this degree of improvement is not unlimited in terms of NSE, KGE, and R values because the hybrid model is also affected by uncertainties of different sources. For example, when the input error with the increase in lead time is more serious, the forecast performance of the hybrid model will also be greatly affected.

In addition, it is observed that the improved degree of the hybrid model to XAJ is greater than that of LSTM in the shorter lead time while the improvement degree of LSTM is greater than that of XAJ in the longer lead time. The reason is that LSTM has such strong capability to process time series that the forecast results of LSTM outperform those of XAJ when there are sufficient input variables in shorter lead times. However, the inputs of LSTM that are highly correlated with the expected output progressively become unavailable with lead times increasing, resulting in a larger decrease in forecast accuracy.

This study also assessed the forecast performance for a small flood event (peak discharge 500 m^{3}/s, occurring between May 24 and June 2, 2018, Figure 5), a medium flood event (peak discharge 1,000 m^{3}/s, occurring between May 19 and May 28, 2017, Figure 6), and a large flood event (peak discharge 1,000 m^{3}/s, occurring between May 22 and May 31, 2019, Figure 7). It can be found that the hydrographs of the three models gradually deviate from the observed values with the increase in lead times, indicating that the forecasting performance decreases. Among them, the forecast performance of LSTM deteriorates the most, followed by XAJ, and that of the hybrid model is relatively optimal. Taking the large flood event (Figure 7) as an example, the LSTM severely overestimates the peak flow and the shape of the hydrograph fluctuates during lead time . However, the XAJ underestimates the peak flow, which is caused by the uncertainty of the forecast precipitation, but maintains a similar shape with the observed hydrograph. Although the XAJ-LSTM has the same issues as LSTM, its forecast results are relatively more accurate. Meanwhile, hydrographs simulated by XAJ-LSTM maintain accurate and reasonable shape (Figure 5 better illustrates the phenomenon), indicating that the hydrographs of XAJ can transmit raw physical characteristics to the hybrid model, allowing the hybrid model to maintain reasonably hydrographs. Although the hybrid model has a certain degree of error in the longer lead times, it is able to significantly improve forecasting performance of benchmark models.

The XAJ-LSTM model not only ameliorates the problem of XAJ that cannot maintain the forecast accuracy, but also prolongs the lead time of LSTM. The hybrid model is not limited by recent flood data (it can be replaced by forecasted flows of models that are similar to the XAJ model) in practical applications. A certain precision of precipitation forecast being known, theoretically the discharge of longer lead times and higher precision will be obtained by hybrid models, which makes it greatly practical in flood management and utilization.

### Relative importance of input variables on model performance

In order to quantitatively describe the impact of the inputs on the hybrid model, the MIV algorithm was adopted to analyze the relative importance of inputs in this study. The MIV algorithm was proposed by Dombi *et al.* (1995) to reflect the change of the weight matrix in the neural network. It is commonly used in the selection of input variables, but it is also considered to be one of the effective techniques to evaluate the impact of input variables on model performance of ANNs. It is known that the trained neural network structure is the prerequisite for the application of MIV. The MIV adds or subtracts 10% from a certain input of the sample *N* during the training period to obtain two new training samples, and . They are simulated in the trained network to get two results, and , where the difference between and is the impact values (IV) of the input on the output. The arithmetic means of IV are MIV, and the MIV of each input is obtained in turn. Finally, the larger the absolute value of MIV, the higher the relative importance of the corresponding input. In order to intuitively compare the relative importance of the input variables under different network structures and to illustrate the extent to which XAJ simulated flow affects the forecast accuracy, the absolute values of MIV are normalized in this study. The principle of the normalization method is consistent with the method of Equation (1). A normalized MIV equal to 0 or 1 indicates, respectively, that the input variable is relatively the most unimportant or important in this network structure.

Figure 8 shows the MIV change diagram of the input variables. Since the correlation between antecedent discharge and observed discharge is better than the precipitation, the antecedent discharge is relatively important, which indicates the input that determines the accuracy of output is discharge in this study. In different lead times, the relatively important variable of the LSTM model is always the latest discharge information relative to the forecast time. As the lead time increases and the available observed discharge decreases, the relative importance of precipitation gradually increases, causing a decline in forecast accuracy to occur. On the contrary, the hybrid model maintains the discharge in a relatively important position. The observed discharge closest to the forecast moment has been in relatively important positions in the shorter lead times because the correlation between the antecedent discharge with shorter lag time and the observation is higher. With the increase in lead time, the MIV of the simulated flow gradually increases. Starting from the lead time , which has been in the relatively most important position, it plays a key role in maintaining forecast accuracy with longer lead times. Considering that the forecast error is intrinsic in ECMWF products, the MIV is targeted to reduce the uncertainty impacts of precipitation input combination on model performance. The analysis of the neural network structure by MIV method confirms that the simulated discharge of XAJ has a larger impact on improving the forecast accuracy.

## CONCLUSIONS

This study proposes a hybrid model that integrates the XAJ conceptual model and the LSTM neural network to make multi-step-ahead flood forecasts with considering the forecasting precipitation. The XAJ-LSTM hybrid model, which takes advantage of forecasting discharge from the XAJ model to make up for the insufficient input variables of LSTM and utilizes the noise-handling capability of the LSTM to improve the forecasting accuracy, was applied in the Lushui reservoir basin of China. The main findings are summarized as follows:

Although the uncertainties from actual forecasting precipitation have great effects on flood forecasting and lead to a large deviation in the performance between the simulation and the actual application, the hybrid model can effectively handle the noise of forecasting precipitation via the LSTM module and improve multi-step-ahead forecasting performance of benchmark models in terms of NSE, KGE, R, RE, and RMSE indexes. Meanwhile, the improved degree of hybrid models to XAJ is greater than that of LSTM in the short lead times ( and ) while the contrary is the case in the long lead times ( and ).

The MIV method shows that the simulated discharge of XAJ plays an important role in maintaining the high forecast accuracy of the hybrid model, which not only effectively prolongs lead times of LSTM, but also contributes to achieving the high forecast accuracy.

In the framework of the hybrid model, the LSTM neural network can act as a post-processing approach for correcting the forecast error of the XAJ model, while the XAJ model can act as a pre-processing approach for supplementing physical hydrograph information for the LSTM neural network to improve the flood forecast accuracy. The framework of the hybrid model would provide a reference for subsequent integration of conceptual (or physically based) models and ANNs.

This study demonstrated that the proposed hybrid model can effectively improve the forecast accuracy in the longer lead times and would be a useful tool for flood management of the Lushui reservoir. However, the proposed model is still affected by the uncertainty of the input variables. Most of the statistical post-processing methods quantify the uncertainty of flood forecasts by providing probability distributions about future forecasts conditional on the available observations and deterministic or ensemble forecasts. Todini (2008) proposed the model conditional processor, Barbetta *et al.* (2017) and Biondi & Todini (2018) further developed a framework to diminish the uncertainty of multi-step-ahead flood forecasting. Papacharalampous *et al.* (2019) applied the quantile regression algorithm in the probabilistic hydrological post-processing. All of the above uncertainty analysis of multi-step-ahead forecasts will be our future research focus.

## ACKNOWLEDGEMENTS

This study is financially supported by the National Natural Science Foundation of China (No. U20A20317). The authors would like to thank the editors and the anonymous reviewers for their valuable and constructive comments related to this manuscript.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.