## Abstract

In this study, a two-dimensional hydrodynamic water-quality model is proposed for river-connected lakes in an effort to improve calibration accuracy and reduce computational burden. To achieve this, the sensitivity of parameters involved in the hydrodynamic model is analyzed using stepwise rank regression and Latin hypercube sampling (LHS), and the roughness coefficient, wind drag coefficient and wind resistance coefficient are identified as the most important parameters affecting the hydrodynamics of the Hongze Lake. Then, the ensemble Kalman filter (EnKF) is used to assimilate observations to the proposed hydrodynamic and water quality model. It is found that assimilation of both state variables and model parameters results in a significant improvement of the simulation of the water level, flow velocity and pollutant concentration in the Hongze Lake.

## HIGHLIGHTS

Roughness, wind drag coefficient and wind resistance coefficient are the most important hydrodynamic parameters.

EnKF method can improve model simulation accuracy.

Model assimilation is applicable to lakes with complex water systems.

## INTRODUCTION

There are numerous natural lakes in China, with a total area of approximately 71,780 km^{2}, and these lakes are often used for a wide variety of industrial, agricultural, domestic and ecological purposes. Clearly, there is a need for optimal management of these lakes to allow for a more sustainable use of water resources and management of natural ecosystems (Gong *et al.* 2016). Most lakes are fed and drained by rivers and streams and thus have very complex inflow and outflow conditions (Rasmussen *et al.* 2009), especially for those lakes connected with a large number of rivers, streams and canals (Li *et al.* 2014). River-connected lakes are also used in trans-basin water transfer projects for water storage and flood control, posing additional challenges to the simulation of hydrodynamics and water quality characteristics in these lakes.

The hydrodynamic models can be used to simulate flow velocity and direction, turbulence and water exchange characteristics in lakes and thus provide the basis for water quality models (Wang *et al.* 2008). Simons (1973) made the first attempt to develop three-dimensional numerical models of the Great Lakes and investigated the effects of lake topography on the hydrodynamics. Murthy *et al.* (1986) simulated the transport and compartmental distribution of chlorinated benzenes in the Niagara River bar area using a two-dimensional model that combined coastal physical processes with a chemical partitioning sub-model. Water quality models can be used to characterize temporal and spatial changes in water quality, which provide an important tool for the management of water pollution (Rasmussen *et al*. 2009). Since the development of the Streeter–Phelps model in 1925, various water quality models have been developed to describe the water quality of surface water, groundwater, nonpoint-source water and drinking water, the most important of which include the QUAL Model, Branched Lagrangian Transport Model (BLTM), One-Dimensional Transport with Inflow and Storage (OTIS) Model, Water Quality Analysis Simulation Program, Quality Simulation Along River System (QUASAR), Environmental Fluid Dynamics Computer Code (EFDC), MIKE Model, PROTEUS Model, AQUATOX Model, Surface Water Modeling System (SMS), and Computational Aquatic Ecosystem Dynamics Model (CAEDYM) (Zheng *et al.* 2004; Itoh *et al.* 2018). However, it is important to note that each hydrodynamic and water quality model has some inherent disadvantages that may limit their practical use. It is likely that improving the simulation accuracy of hydrodynamic models can significantly improve the simulation accuracy of water quality models (James 2016; Zhang *et al.* 2017).

In recent years, global climate change and anthropogenic activities have brought about substantial changes in the hydrodynamics and water quality of the Hongze Lake (Jin *et al.* 2007; Dietzel & Reichert 2014). What makes the situation complex is that this lake is currently used for water storage and regulation on the East Route of the South-to-North Water Transfer Project, which can have significant impacts on its hydrodynamic characteristics. The Ensemble Kalman Filter (EnKF) was used to assimilate the proposed hydrodynamic and water quality model in an effort to improve the simulation accuracy of the water level, flow field and pollutant diffusion of the Hongze Lake.

## STUDY AREA

The Hongze Lake (33°06–33°40′N, 118°10′–118°52′E) is located between Huai'an City and Suqian City of Jiangsu province, China (seen in Figure 1), and it has a moist monsoon climate with a mean annual temperature of 10–16 °C and a mean annual rainfall of 959 mm. It is the fourth largest freshwater lake in China (Huang *et al.* 2010), the largest plain reservoir of the Huaihe River basin, and a key regulating reservoir for the East Route of the SNWDP. Thus, this lake serves multiple purposes including flood control, irrigation, water diversion, water transportation and aquaculture. It usually has a normal water surface area of 1,597 km^{2} with a water level of 12.5 m in non-flood seasons, and a water surface area of 3,500 km^{2} with a water level of 15.5 m in flood seasons, respectively.

The rainfall in the rainy season from June to September accounts for 65.5% of the total annual rainfall. The maximum annual rainfall (1,240.9 mm) was recorded in 1965, while the minimum annual rainfall (532.9 mm) was recorded in 1978. The multi-year mean evaporation is 1,593.4 mm, with a mean of 196.5 mm in August and only 34.5 mm in January. The multi-year mean air temperature is 14.8 °C, with a mean of 28.4 °C in August and only 1 °C in January; and the maximum and minimum temperatures are 39.8 °C and −16.1 °C, respectively. The water from the Huaihe River accounts for over 70% of the total inflow of the lake. A series of pollution accidents have occurred in the Huaihe River basin since the 1990s, and the major pollutants observed in 2001–2011 include organic matter, phenol, heavy metals, and Cd. According to the plan of the East Route of the SNWDP, the inflow rate of the Hongze Lake is 550 m^{3}/s and the outflow rate is 450 m^{3}/s, resulting in an increase of the storage level from 13.0 m to 13.5 m (approximately 0.825 billion m^{3}). The storage level in the non-flood season from October to May of the next year should be controlled at 13.31 m, whereas that in the flood season from June to September should be controlled at 12.31 m.

## DATA AND METHOD

### Data

Meteorological data of eight meteorological stations (Pengbu, Huaiyin, Xuyi, Hongze, Jinhu, Huaian, Gaoyou and Sihong) in 1961–2013 were obtained from China Meteorological Administration; lake water level data of Gaoliangjian, Jiangba, Laozishan and Linhuaitou stations in 1961–2013, runoff data of Jinsuo Town, Sihong, Shuanggou, Mingguang, Xiaoliuxiang, Tuanjie gate, Sanhe gate, Gaoliangjian gate and Er River gate stations in 1961–2013 were collected from the Hydrographic Office (Information Center) of Huaihe River Conservancy Commission; lake water-quality data in 1989–2013 monitored by Laozishan, Sanhe gate and Gaoliangjian gate stations were obtained from Sihong Environmental Protection Bureau. Rainfall and evaporation data of Linhuaitou, Laozishan, Shangju, Gaoliangjian gate, Jinhu and Jinsuo Town stations around the Hongze Lake were collected from Jiangsu Meteorological Bureau.

### Assimilation method and processes

*et al.*2015). In this study, EnKF is used to assimilate water level and flow velocity observations into the hydrodynamic and water quality model to improve the simulation accuracy for river-connected lakes (Chen & Oliver 2010). Although EnKF relies on the Gaussian assumption, it is also used for nonlinear problems in practice where the Gaussian assumption may not be satisfied. Suppose that there exist

*N*state variables (). A white Gaussian noise is added to the state variable matrix

*X*, and then an initial variable set with

*M*variables is randomly generated from the matrix

*X*that takes into account the initial state error using the Monte Carlo method, where

*t*is the assimilation time period, and

*i*is the random variable in the set (). A simulation model with the error covariance matrix at

*t*is established, and then relevant state variables, forcing functions, and model parameters can be described by the following equations:where is the predicted value of the random state variable

*i*at

*t*

*+*1; is the model operator; is the value of state variable

*i*at

*t*; is the forcing variable at

*t*(i.e., inflow runoff, outflow runoff, wind field and initial lake water level generated by adding a white Gaussian noise of into initial ); is the model parameter

*j*generated by adding a white Gaussian noise of into initial ; is the error at

*t*that accounts for both imperfections in model formulation and stochastic variability in forcing variables and parameters , whose variance is approximately 10% of the simulated values of the state variables (Sakov

*et al.*2010);

*j*is the number of model parameters, ; , .

*t*

*+*1 be . The estimates of state variables and model parameters can be updated by observations at

*t*

*+*1:where and are the Kalman gain matrix used to update state variables and model parameters; is the observation at

*t*

*+*1; is the observation error, ; is the observation operator that maps the state vector to the observation space; is the analysis value of model parameter

*j*at

*t*

*+*1; and is the predicted value of model parameter

*j*at

*t*, respectively.

*H*is the matrix of the observation operator ; is the forecast error covariance of at

*k*

*+*1; is the forecast error covariance of and model output variable using at

*k*

*+*1; is the forecast error covariance of the model output variable using at

*k*

*+*1; and is the average forecast of state variables at

*k*

*+*1. In order to avoid the curse of dimensionality of state variables, the Kalman gain matrix of state variables can be obtained by calculating and without the need to calculate .

*j*at

*k*and the analysis value at

*k*

*+*1, respectively.

*RMSE*) and the index of agreement (

*IOA*) are used to evaluate data assimilation results:where is the simulated value, is the measured value, and

*n*is the total number of the data set. A smaller

*RMSE*value and larger

*IOA*value represent better assimilation performance.

## HYDRODYNAMIC AND WATER QUALITY MODELS OF RIVER-CONNECTED LAKES

The Hongze Lake is a typical inland shallow lake with a horizontal scale much larger than the vertical scale (Li *et al.* 2020). It is connected to several large rivers (i.e., Huai River, Yangtze River and Yellow River) and numerous small rivers and streams, and it is responsible for water storage and supply for the East Route of the South-to-North Water Transfer Project. The lake area can be strongly affected by the Southeast Asian monsoon. Thus, the water system of the Hongze Lake is more than an ordinary plain river network; it is also regulated by anthropogenic activities and requirements of the water transfer project, making it difficult to model the hydrodynamics and water quality in the Hongze Lake. In order to better address this problem, a two-dimensional hydrodynamic and water quality model was proposed in this study based on the sensitivity analysis and identification of key hydrodynamic parameters.

### Hydrodynamic and water quality modeling

*et al.*2010). Clearly, the simulation accuracy of the water quality model depends heavily on that of the hydrodynamic model. For a shallow lake like the Hongze Lake, there may be only a small change in hydrodynamics in the vertical direction, and thus the water pressure with depth is assumed to follow the static pressure distribution (Missaghi & Hondzo 2010). Thus, the three-dimensional hydrodynamic equations can be simplified into two-dimensional depth-averaged hydrodynamic equations:

*ζ*is the lake depth from the datum to the surface (m);

*h*is the water depth below the datum (m);

*u*and

*v*are the average flow velocity along the

*x*and

*y*axis (m

^{3}/s);

*f*is the Coriolis force coefficient (

*f*

*=*2

*ωsinφ*);

*f*

_{W}is the wind resistance coefficient;

*C*is the Chezy coefficient (

*C*= 1/

*n*(

*ζ*+

*h*)

^{1/6}, in which

*n*is the roughness coefficient);

*A*and

_{x}*A*are the eddy viscosity along the

_{y}*x*and

*y*axis;

*p*is the static pressure (Pa);

*t*is the time (s);

*W*is the wind speed 10 m above the lake surface,

*W*and

_{x}*W*are the wind speed 10 m above the lake surface along the

_{y}*x*and

*y*axis (m/s);

*g*is the acceleration of gravity (m/s

^{2});

*τ*,

_{xx}*τ*and

_{yx}*τ*are the shear stress at 0°, 90° and 45° in the

_{xy}*x*–

*y*axis; and

*C*

_{d}is the wind drag coefficient, respectively.

*p*is the concentration of a given pollutant (mg/L);

*k*is the degradation coefficient of the lake (s

^{−1}); and

*S*is the pollutant source term (g/(m

^{2}·s)). The first term on the left side is the time variable, and the second and third terms are the convection along the

*x*and

*y*axis, while the first and second terms on the right side are the diffusion coefficient along the

*x*and

*y*axis (m

^{2}/s), and the third term is the biochemical reaction that can be considered as the total derivative of ecological variables with respect to time.

*Ω*) in the Cartesian coordinate system can be converted into a rectangular geometry (

*Ω*′) in the new

*ξ–η*coordinate system. Then, the water flow in the orthogonal

*ξ–η*coordinate system can be defined as:where and are the side length of the orthogonal grid in the

*ξ–η*coordinate system.

### Boundary conditions

Considering the spatial distribution of hydrometric, water level and water quality monitoring stations over the Hongze Lake, the whole lake area is considered in the simulation of hydrodynamics and water quality. The lake has a total area of about 1,700 km^{2} at a water level of 13.0 m, with a maximum length of 61 km from the north to the south and 59 km from the east to the west, respectively. The lake is fed mainly by the Huai River (including Chi River), New Huaihong River, Old Sui River, New Sui River, New Bian River, and Old Bian River; and drained mainly by the Sanhe, Erhe, Xuhong River, and irrigation channel of the northern Suzhou and Chenzi River. However, 322 minor rivers or streams are not considered due to their negligible inflow/outflow (Osti & Egashira 2009).

In modeling river-connected lakes, either flow rate or water level processes, or both, are taken as boundary conditions. Considering the complex water system of the Hongze Lake and the requirements of the East Route of the South-to-North Water Transfer Project, the runoff measured at the Huai River (including Chi River), New Huaihong River, New Bian River, Bian River, Sui River and Sanhe (Hongze Station) is used as the inflow boundary condition (upper boundary condition), and the water levels measured at Sanhe, the irrigation channel of the northern Suzhou, Erhe, Xuhong River and Chenzi River are used as the outflow boundary condition (lower boundary condition). A non-slip boundary condition is assumed for the lakeshore, which is thus initially set to 0. The inflow from and outflow into the East Route of the South-to-North Water Transfer Project are determined by the design requirements of the project. The permanganate index (COD_{Mn}) is used to indicate the water pollution of the Hongze Lake, and the concentrations monitored at Laozishan, middle Chenhe Town, Linhuai, northern Longji Town, Sui River entrance, eastern Chenhe Town, western Chenhe Town and northern Chenhe Town are taken as the upper boundary conditions, whereas those at Sanhe sluice, Gaoliangjian sluice and Erhe sluice are taken as the lower boundary conditions of the water quality model.

The spatial step is set to 300–500 m in the confluence area and the area along the lakeside where the topography is relatively flat, but to 100–200 m at the estuary of the Huai River and the area surrounding the Sanhe sluice where the topography is more rugged. Finally, the lake is divided into 243 × 198 grids with a total of 7,060 grid nodes. In order to ensure high stability and accuracy, the calculation time step is set to Δ*t* = 60 s.

### Identification of key hydrodynamic parameters

*SRC*) and the determination coefficient (

*R*

^{2}) are calculated to determine the sensitivity of each parameter, where the

*SRC*value indicates the contribution of each parameter to the variation of the output variable, and the higher the absolute value of

*SRC*is, the higher the sensitivity of the parameter will be; whereas the

*R*

^{2}value indicates the proportion of variability in the output variable explained by the model, and the higher the

*R*

^{2}value is, the better the model fits the data. In general,

*R*

^{2}> 0.7 indicates that the regression results are reliable. The stepwise rank regression model can be described as follows:where

*i*= 1, 2, 3…,

*n*;

*j*= 1, 2, 3…,

*m*;

*j*is the number of parameters (

*m*= 4);

*n*is the sample size;

*α*

_{0}is the intercept,

*b*is the regression coefficient of parameter

_{j}*j*; is the estimate of the output variable; and is the value of the input variable for sample

*i*in model

*j*.

*b*cannot be compared directly. Thus, Equation (25) needs to be converted into:where is the average deviation of the input parameter , is the standard deviation of ; is the average deviation of the output variable , is the standard deviation of ; and is the

_{j}*SRC*, which can be used to estimate the contribution of each parameter to the variation of the output variable. The higher the

*SRC*value is, the larger the contribution will be.

*SRC*is used to indicate the sensitivity of each parameter. The total variance of the output variable and the corresponding regression analysis are as follows:

*n*equally probable intervals and then, a random value is sampled for each interval. The random value

*Q*in the

_{h}*h*interval can be described as follows:where

*h*= 1, 2, …,

*N*;

*Q*is the random value greater than (

_{h}*h −*1)/

*N*but lower than

*h*/

*N*; and

*Q*is a random value uniformly distributed in the range of [0, 1].

Parameters that have an effect on the simulation accuracy of the two-dimensional hydrodynamic model include wet–dry boundary, time step, eddy viscosity, bottom roughness coefficient, water temperature, salinity, extinction coefficient, critical wind velocity, wind resistance coefficient and wind drag coefficient. In this study, the sensitivity of these parameters is analyzed using standard stepwise rank regression and LHS. In LHS, each variable is sampled 50, 100, 200, 300, 400 and 500 times, and it is found that a sampling time of 200 is sufficient to obtain optimal results. Thus, it is set to 200 times. When the flow velocity is taken as the output variable, all *R*^{2} values are higher than 0.7, 94% of which are higher than 0.9, indicating reliable regression results. The sensitivity of hydrodynamic parameters to flow velocity follows the order of roughness coefficient (46.15%), wind drag coefficient (31.02%), wind resistance coefficient (12.46%), eddy viscosity (2.15%), critical wind velocity (1.92%), time step (1.73%), wet–dry boundary (1.56%), water temperature (1.22%), salinity (1.12%), and extinction coefficient (0.67%), indicating that changes in surface flow velocity are mainly determined by lake bottom topography and wind field. When the water level is taken as the output variable, all *R*^{2} values are higher than 0.7, 90% of which are higher than 0.9, also indicating reliable regression results. The sensitivity of parameters follows the order of roughness coefficient (64.13%), wind drag coefficient (17.76%), wind resistance coefficient (11.38%), eddy viscosity (2.15%), wet–dry boundary (1.32%), critical wind velocity (1.08%), time step (0.97%), water temperature (0.83%), salinity (0.26%) and extinction coefficient (0.12%), indicating that changes in water level are mainly determined by lake bottom topography. Therefore, it is concluded that the roughness coefficient (*n*), wind drag coefficient (*C*_{d}) and wind resistance coefficient (*f*_{W}) are the most important parameters affecting lake hydrodynamics and thus need to be calibrated, while other parameters of minor importance can be determined by empirical formulas.

### Parameter calibration and model validation

#### Parameter calibration and validation of the hydrodynamic model

The daily water levels measured at four stations (Gaoliangjian, Jiangba, Laozishan and Linhuai) for the period 1961–1993 are used for parameter calibration. It shows that optimal results are obtained at a roughness coefficient of 0.015, a wind drag coefficient of 2.93 × 10^{−6} and a wind resistance coefficient of 1.1. The daily water levels measured at the four stations for the period 1994–2013 are used to verify the reliability of the hydrodynamic model, while those measured at the Shangzui Station for the period 1967–2013 are used for verification. The relative error (*RE*) and Nash–Sutcliffe efficiency coefficient (*E*_{ns}) are used to evaluate the simulation accuracy of the hydrodynamic model. Table 1 shows that all *RE* values are within ±8% and ±7% and all *E*_{ns} values are higher than 0.69 and 0.72 during the calibration and verification period, respectively, indicating that the simulation error of the hydrodynamic model is within an acceptable range. Thus, the hydrodynamic model allows a reliable simulation of water levels of the Hongze Lake.

Stations . | Calibration period (1961–1993) . | Verification period (1994–2013) . | ||
---|---|---|---|---|

RE (%)
. | E_{ns}
. | RE (%)
. | E_{ns}
. | |

Gaoliangjian | −6.56 | 0.70 | 6.98 | 0.76 |

Jiangba | 4.23 | 0.76 | 4.89 | 0.80 |

Laozishan | 5.87 | 0.69 | −5.57 | 0.72 |

Linhuaitou | 7.15 | 0.70 | 6.45 | 0.74 |

Shangzui | Verification period (1967–2013) | |||

RE (%) | E_{ns} | |||

− 5.13 | 0.83 |

Stations . | Calibration period (1961–1993) . | Verification period (1994–2013) . | ||
---|---|---|---|---|

RE (%)
. | E_{ns}
. | RE (%)
. | E_{ns}
. | |

Gaoliangjian | −6.56 | 0.70 | 6.98 | 0.76 |

Jiangba | 4.23 | 0.76 | 4.89 | 0.80 |

Laozishan | 5.87 | 0.69 | −5.57 | 0.72 |

Linhuaitou | 7.15 | 0.70 | 6.45 | 0.74 |

Shangzui | Verification period (1967–2013) | |||

RE (%) | E_{ns} | |||

− 5.13 | 0.83 |

#### Parameter calibration and validation of the water quality model

Parameters considered in the water quality model include the longitudinal diffusion coefficient, transverse diffusion coefficient, degradation coefficient of COD_{Mn}, dry depth, and wet depth. The COD_{Mn} concentrations measured at Laozishan, Sanhe sluice and Gaoliangjian sluice for the period 1989–2013 are used for the model calibration, which shows that the model best fits the data at a degradation coefficient of COD_{Mn} of 0.06105, a longitudinal and transverse diffusion coefficient of 1.4 m^{2}/s, a dry depth of 0.006 m and a wet depth of 0.098 m, respectively.

The accuracy and reliability of the proposed hydrodynamic and water quality model are verified under four conditions, as shown in Table 2. Specifically, 1.0 mg/L COD_{Mn} is used as a chemical tracer in experiment a, while ammonium molybdate (T4) of the same concentration is used in experiment b–d. The background concentration is set to 0.0 mg/L; the time step is set to 2 min for water quality simulation; the simulation time is 48 h; and the maximum runoff is set to 3,200 m^{3}/s in a pollution accident that occurred in Huai River in July 2014, respectively.

Experiments . | Pollutants . | Pollutant sources . | Discharge mode . | Flow rate (m^{3}/s)
. |
---|---|---|---|---|

a | T4 | Huai River | Continuous | 3,200 |

b | COD_{Mn} | Huai River | Continuous | 3,200 |

c | T4 | New Huaihong River | Continuous | 1,800 |

d | T4 | Linhuai center | Continuous | 25 |

Experiments . | Pollutants . | Pollutant sources . | Discharge mode . | Flow rate (m^{3}/s)
. |
---|---|---|---|---|

a | T4 | Huai River | Continuous | 3,200 |

b | COD_{Mn} | Huai River | Continuous | 3,200 |

c | T4 | New Huaihong River | Continuous | 1,800 |

d | T4 | Linhuai center | Continuous | 25 |

The results assumed that there is no wind and the water in the Hongze Lake is transferred and regulated under normal operating conditions. The simulation results show that in experiment a, pollutants originating from the Huai River are diffused to Linhuitou 48 h after discharge, and the Laozishan area is completely polluted. In experiment b, the concentration and diffusion range of COD_{Mn} are significantly reduced compared with those in experiment a due to the degradability of COD_{Mn} in water, indicating the self-purification capacity of the Hongze Lake. In experiment c, the Wahu section of the Li River is completely polluted 48 h after the discharge of pollutants to the New Huaihong River, and the northwest corner of Laozishan and the southwest corner of Linhuitou are partially polluted. In experiment d, pollutants discharged to central Linhuaitou are diffused to surrounding areas in a concentric circular manner, but the polluted area is still small after 48 h. This is similar to the diffusion of pollutants in static water, which can be attributed to the low flow velocity in the central area and the relatively flat bed of the Hongze Lake. It is seen that the simulated diffusion of pollutants by the Hongze Lake hydrodynamic and water quality model is in good agreement with theories, indicating the reliability of the model.

The COD_{Mn} concentration data collected in July 2013 are also used to verify the water quality model. The relative error between simulated and measured concentration of COD_{Mn} is within [4.27%, 12.34%]. Changes in COD_{Mn} concentration are observed at the estuaries of the Huaihe River and Huaihongxin River, and the simulation accuracy is high in all center areas. As a result, the water quality can be employed to simulate and deduce the diffusion process of COD_{Mn} in the Hongze Lake.

## RESULTS AND DISCUSSION

### Effects of assimilation on the hydrodynamic model

Given the interdependence of parameters in the hydrodynamic model, only roughness coefficient and wind drag coefficient are considered in this study. The daily flow velocities measured at Sanhe, Erhe and Gaoliangjian for the period 1988–2013 and daily water levels measured at Gaoliangjian, Jiangba, Laozishan and Linhuai for the period 1988–2013 are assimilated into the hydrodynamic model. The flow velocity and water level data for the period 1961–1987 are first used until the model reaches a steady state, and subsequently those data for the period 1988–2013 are assimilated into the model. A roughness coefficient of 0.015 and a wind drag coefficient of 2.93 × 10^{−6} derived from the calibrated hydrodynamic model are assumed to be true values. The ensemble size is set to 120 and the error is set to 10%, and thus the initial roughness coefficient is 0.0135 and the initial wind drag coefficient is 2.637 × 10^{−6}. Three assimilation conditions are considered in this study: (1) only roughness coefficient is considered in the hydrodynamic model: the initial roughness coefficient is 0.0135 and the wind drag coefficient is 2.93 × 10^{−6}; (2) only wind drag coefficient is considered in the hydrodynamic model: the roughness coefficient is 0.015 and the initial wind drag coefficient is 2.637 × 10^{−6}; (3) both roughness coefficient and wind drag coefficient are considered in the hydrodynamic model: the initial roughness coefficient is 0.0135 and the initial wind drag coefficient is 2.637 × 10^{−6}. The assimilation results are shown in Table 3 and Figure 2.

Observations . | Observed flow velocity . | Observed water level . | Observed flow velocity and water level . | |||
---|---|---|---|---|---|---|

State variables | Roughness coefficient | Value | 0.01535 | 0.01501 | 0.01502 | |

Error (%) | 2.33 | 0.07 | 0.13 | |||

Wind drag coefficient | Value | 2.79 × 10^{−6} | 2.60 × 10^{−6} | 2.79 × 10^{−6} | ||

Error (%) | 5.02 | 12.69 | 5.02 | |||

Simultaneous | Roughness coefficient | Value | 0.01531 | 0.01518 | 0.01512 | |

Error (%) | 2.07 | 0.93 | 0.80 | |||

Wind drag coefficient | Value | 2.87 × 10^{−6} | 2.72 × 10^{−6} | 2.87 × 10^{−6} | ||

Error (%) | 2.05 | 7.17 | 2.05 |

Observations . | Observed flow velocity . | Observed water level . | Observed flow velocity and water level . | |||
---|---|---|---|---|---|---|

State variables | Roughness coefficient | Value | 0.01535 | 0.01501 | 0.01502 | |

Error (%) | 2.33 | 0.07 | 0.13 | |||

Wind drag coefficient | Value | 2.79 × 10^{−6} | 2.60 × 10^{−6} | 2.79 × 10^{−6} | ||

Error (%) | 5.02 | 12.69 | 5.02 | |||

Simultaneous | Roughness coefficient | Value | 0.01531 | 0.01518 | 0.01512 | |

Error (%) | 2.07 | 0.93 | 0.80 | |||

Wind drag coefficient | Value | 2.87 × 10^{−6} | 2.72 × 10^{−6} | 2.87 × 10^{−6} | ||

Error (%) | 2.05 | 7.17 | 2.05 |

The results show that assimilation of water level observations results in a small difference (0.07%) in roughness coefficient, but much larger differences are found for assimilation of flow velocity observations (2.33%) or both water level and flow observations (0.13%). However, the opposite appears to be true for the wind drag coefficient, as the difference is smaller (5.02%) for assimilation of flow velocity observations or both water level and flow velocity observations but much larger (12.69%) for assimilation of water level observations. Thus, it can be concluded that the roughness coefficient has a more significant effect on the water level than on the flow velocity, whereas the wind drag coefficient has a more significant effect on the flow velocity than on the water level. It is also noted that when both roughness coefficient and wind drag coefficient are considered in the hydrodynamic model, assimilating either water level or flow velocity observations results in larger differences than assimilating both water level and flow observations.

### Assimilation schemes

The assimilation of water level and flow velocity (without considering vertical changes) observations to the hydrodynamic model involves multiple state variables and parameters. The daily flow velocity data measured at Sanhe, Erhe and Gaoliangjian sluices for the period 1988–2013 and the daily water level data measured at Gaoliangjian, Jiangba, Laozishan and Linhuai for the period 1988–2013 are used as observed values. The flow velocity and water level data for the period 1961–1987 are first used until the model reaches a steady state, and subsequently those data for the period 1988–2013 are assimilated into the model. For the hydrodynamic water quality model, the COD_{Mn} concentration is taken as the state variable, and its integrated attenuation coefficient and diffusion coefficient are taken as model parameters. In this case, a single state variable but two model parameters are involved in the model. Similarly, the COD_{Mn} concentration data for the period 1961–1987 are first used until the model reaches a steady state, and subsequently those data for the period 1988–2013 are assimilated into the model.

The ensemble size can have significant effects on the performance of EnKF. A too large ensemble size is computationally demanding and even impossible, while a small ensemble size increases the residual errors and gives inaccurate results. Thus, EnKF is implemented in this study using ensemble sizes of 30, 60, 90, 120, 200, 240, 300 and 500, respectively, and *RMSE* and *IOA* are used to evaluate the fit between assimilated state variables and observations.

The analysis shows that increasing the ensemble size from 30 to 120 results in a decrease in the *RMSE* value and an increase in the *IOA* value in the hydrodynamic model. However, it is noted that further increasing the ensemble size results in no substantial decrease in the *RMSE* value, indicating no further improvement of the assimilation accuracy. For the water quality model, the *RMSE* value remains stable as the ensemble size is higher than 90, when the *IOA* value also reaches a maximum.

The observation and simulation errors are set to 1%, 10%, 20% and 30%, respectively, resulting in a total of 16 combinations. Table 4 shows that optimal assimilation results are obtained at an observation error of 1% and a simulation error of 10% for both hydrodynamic and water quality models. Thus, in order to reduce computational time and improve computational stability, the ensemble size is set to 120, the observation and simulation error are set to 1% and 10%, and the observational time step is set to 30 min for the hydrodynamic model; whereas the ensemble size is set to 90, the observation and simulation error are set to 1% and 10%, and the observational time step is set to 30 min for the (hydrodynamic) water quality model.

. | Observation error . | ||||
---|---|---|---|---|---|

1% . | 10% . | 20% . | 30% . | ||

Hydrodynamic model | |||||

Model error | 1% | 1.567 | 1.359 | 1.269 | 1.638 |

10% | 0.329 | 1.665 | 0.976 | 0.902 | |

20% | 0.413 | 1.538 | 1.494 | 1.331 | |

30% | 1.457 | 1.086 | 1.173 | 0.329 | |

Water quality model | |||||

Model error | 1% | 1.257 | 1.045 | 1.287 | 0.942 |

10% | 0.397 | 1.273 | 1.247 | 1.385 | |

20% | 0.412 | 0.884 | 0.975 | 1.128 | |

30% | 1.698 | 1.655 | 0.937 | 1.32 |

. | Observation error . | ||||
---|---|---|---|---|---|

1% . | 10% . | 20% . | 30% . | ||

Hydrodynamic model | |||||

Model error | 1% | 1.567 | 1.359 | 1.269 | 1.638 |

10% | 0.329 | 1.665 | 0.976 | 0.902 | |

20% | 0.413 | 1.538 | 1.494 | 1.331 | |

30% | 1.457 | 1.086 | 1.173 | 0.329 | |

Water quality model | |||||

Model error | 1% | 1.257 | 1.045 | 1.287 | 0.942 |

10% | 0.397 | 1.273 | 1.247 | 1.385 | |

20% | 0.412 | 0.884 | 0.975 | 1.128 | |

30% | 1.698 | 1.655 | 0.937 | 1.32 |

### Performance analysis of assimilation

The observed values, simulated values obtained from the hydrodynamic model without assimilation, and estimates obtained by assimilation of state variables or both state variables and parameters are compared. Figure 3 shows when the water level is taken as the state variable and the roughness coefficient is taken as model parameter, the assimilation of water level observations to the hydrodynamic model results in an improvement in simulation accuracy compared with direct simulation results. However, the simulation accuracy can be further improved by the assimilation of both state variables and model parameters, and some water levels are very close to measured values. Thus, updating the water level and roughness coefficient using water level observations can improve the simulation accuracy of the hydrodynamic model of river-connected lakes under complex climate changes and anthropogenic activities.

When the flow velocity is taken as the state variable and roughness coefficient and wind drag coefficient are taken as model parameters, the optimal results are obtained by assimilation of both state variables and model parameters. Specifically, the *RMSE* value between simulated and measured values is 3.45%, 3.67% and 2.86%, and the *IOA* value is 0.86, 0.85 and 0.85 for Sanhe, Erhe and Gaoliangjian, respectively (shown in Figure 4). Thus, it can be concluded that the assimilated hydrodynamic model can provide more reliable and accurate simulation of water levels and flow velocities in the Hongze Lake.

The COD_{Mn} concentration is taken as the state variable, and its integrated attenuation coefficient and diffusion coefficient are selected as parameters of the water quality model. Figure 5 shows the assimilation results of COD_{Mn} concentration in the Hongze Lake. The results show that data assimilation using EnKF can also significantly improve the simulation accuracy of the COD_{Mn} concentration.

## Conclusions

The two-dimensional hydrodynamic and water quality model is an effective way to simulate water level, flow field and pollutant migration process in lakes. However, in this the crucial thing is simulation accuracy, especially for a shallow lake. To achieve this, assimilation using the EnKF method is employed to improve the simulation accuracy of the lake hydrodynamic and water quality model. The results show that the roughness coefficient has a more significant effect on the water level than on the flow velocity, whereas the wind drag coefficient has a more significant effect on the flow velocity than on the water level. The simulation results of water level, flow velocity and also COD_{Mn} concentration from the hydrodynamic and water quality model after assimilation of both state variables and model parameters for the Hongze Lake have higher accuracy. Also, the EnKF method used in this paper can provide a good reference for the improvement of the accuracy of hydrodynamic and water quality simulation in a shallow lake.

## ACKNOWLEDGEMENTS

The paper is jointly supported by the National Key R&D Program of China (2018YFC1508706), the National Natural Science Foundation of China (51879240), Project funded by China Postdoctoral Science Foundation (2019M652551).

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this paper is available online at https://dx.doi.org/10.2166/ws.2020.125.