## Abstract

The normal probability density function (PDF) is widely used in parameter estimation in the modeling of dynamic systems, assuming that the random variables are distributed at infinite intervals. However, in practice, these random variables are usually distributed in a finite region confined by the physical process and engineering practice. In this study, we address this issue through the application of truncated normal PDF. This method avoids a non-differentiable problem inherited in the truncated normal PDF at the truncation points, a limitation that can limit the use of analytical methods (e.g., Gaussian approximation). A data assimilation method with the derived formula is proposed to describe the probability of parameter and measurement noise in the truncated space. In application to a water distribution system (WDS), the proposed method leads to estimating nodal water demand and hydraulic pressure key to hydraulic and water quality model simulations. Application results to a hypothetical and a large field WDS clearly show the superiority of the proposed method in parameter estimation for WDS simulations. This improvement is essential for developing real-time hydraulic and water quality simulation and process control in field applications when the parameter and measurement noise are distributed in the finite region.

## HIGHLIGHTS

The truncated normal probability density functions (PDFs) are developed.

A new data assimilation method utilizing truncated normal PDF is proposed.

The method is used for demand estimation in water distribution systems.

## INTRODUCTION

Parameter estimation and calibration play a crucial role in the modeling and management of water resources systems, as emphasized in several studies (Savic *et al.* 2009; Beckers *et al.* 2020; Scott *et al.* 2022; Yoon *et al.* 2022). The main objective is to minimize discrepancies between model outputs and measured values by adjusting network parameters, including nodal water demand and pipe roughness. However, this task can be challenging due to the large number of parameters involved and the limited availability of measured data. Consequently, parameter estimation often faces ill-conditioning issues, where the insufficient number of measurements leads to non-unique solutions within the search domain (Shao *et al.* 2019). In addition, uncertainties in both the measurement and model itself can significantly impact the parameter estimation accuracy. Often time parameter estimation from such a noisy environment is a major challenge for the modeling of water resources systems (Chu *et al.* 2021a, 2021b). Data assimilation with consideration of various uncertainties has been used for parameter estimation in model simulations (Vrugt *et al.* 2005; Hutton *et al.* 2014; Zhou *et al.* 2020).

Data assimilation estimates the state of a process based on time-series measurements. The state estimator relies on the knowledge of the posterior probability distribution function (PDF) of the state given the real-time measurements (Bar-shalom *et al.* 2001). Generally, the posterior PDF is approximated in a recursion, which involves two stages: the prediction stage and the update stage (Garcia-Fernandez *et al.* 2012; Hutton *et al.* 2014). In the prediction stage, the PDF of the state at the current time step is predicted from the historical data, referred to as the prior PDF. In the update stage, the likelihood describes the probabilistic relationship between the state and measurement. The likelihood and prior PDF are combined based on the Bayesian rule to approximate the posterior PDF (Garcia-Fernandez *et al.* 2012).

Numerous data assimilation methods have been proposed to solve the state estimation problem through the sample-based and analytical methods (e.g., Gaussian approximation). The sampling-based, such as particle filter, Markov Chain Monte Carlo, stipulates that one or more samples (particles) are sampled from the prior PDF, and then the likelihood or posterior PDF evaluates the sampled value. If the sampled value agrees with the likelihood or posterior PDF, it is retained, and the algorithm proceeds to the next variable in turn (Bishop 2006). However, sampling-based methods may be time-consuming for large-scale non-linear systems because they require frequent evaluation of the samples' probability density. For this reason, there is considerable interest in computationally efficient analytical methods (e.g., Gaussian approximations) (Garcia-Fernandez *et al.* 2012). Kalman filter is the most well-known analytical method for state estimation in the linear system. For the non-linear system, the extended Kalman filter (Singh *et al.* 2022) and the iterative Kalman filter (Huang *et al.* 2022) have been developed, in which a linearized approximation of the system function is required (Garcia-Fernandez *et al.* 2012; Shao *et al.* 2019). In the linearization of the system, the first-order gradient (Jacobian matrix) or second-order gradient (Hessian matrix) information is utilized to search for the optimal solution. This analytical method is more computationally efficient.

One common assumption in the analytical method is that the prior PDF of the state variable conforms to a normal PDF over the full range (−∞, ∞) (Law *et al.* 2015; Yang *et al.* 2018; Shao *et al.* 2019). In practice, however, the state variable is usually distributed in the finite region. Besides, no sensors can provide an infinitely large measurement (Garcia-Fernandez *et al.* 2012), for which the measurement noise should be bounded in the reasonable region. This practice is not consistent with the unbounded likelihood assumed in most models (Garcia-Fernandez *et al.* 2012). Therefore, the state variable or measurement constraints to confine them in the feasible domain must be considered in the data assimilation process (Lauvernet *et al.* 2009), a difficult requirement that calls for developing efficient algorithms to handle the constraints.

A popular method to address the problem is by incorporating constraints in data assimilation (Ko & Bitmead 2007; Lauvernet *et al.* 2009; Garcia-Fernandez *et al.* 2012; Xu *et al.* 2013). Ko & Bitmead (2007) proposed a constrained Kalman filter based on the projected system method. The method's superiority was demonstrated by comparing the magnitude of the estimation error covariance matrix with those of the unconstrained Kalman filters. Garcia-Fernandez *et al.* (2012) developed a method to solve the state estimation problem under bounded measurement noise. The boundary information of the measurement noise is used to modify the prior PDF of the state variable based on the Bayesian rule, while the Kalman filter is used to fuse the modified prior PDF with the likelihood. Simon & Simon (2010) described a PDF truncation method by incorporating constraints into the Kalman filter and applied this method to an aircraft turbofan engine health estimation problem. Andersson *et al.* (2019) developed a linear state estimation method with linear equality constraints for time-variant systems. Xu *et al.* (2013) constructed a linear equality constraint dynamic model by incorporating the constraints as the prior information about the states into the dynamics modeling. Overall, the above methods address the constraint problem by introducing equality or inequality constraints to data assimilation.

Another approach to address the problem is to modify the normal PDF, referred to as truncated normal PDF, to restrict the value range of state variables. The truncated normal PDF has many application scenarios in Bayesian inference for truncated parameter space problems. Robert (1995) proposed an efficient algorithm for unidimensional truncated normal variables and a multidimensional extension. Generally, the theoretical truncated normal PDF is equal to the normal distribution PDF in the feasible region, and is directly equal to 0 outside the feasible region (Burkardt 2014).

The theoretical truncated normal PDF has been widely used in sample-based state estimation problems (Zhou *et al.* 2018). However, it is rarely used in the analytical method since the non-differentiable truncation points lead to intricate numerical integration (Robert 1995). For the analytical method, due to the requirement of the linearization of the system, the function should be differentiable, and the first-order gradient (Jacobian matrix) or second-order gradient (Hessian matrix) information is utilized. The difficulty in applying the truncated normal PDF to the analytical method is the non-differentiability of a truncated normal PDF at the truncation points.

*et al.*2016; Oikonomou

*et al.*2018; Harmouche & Narasimhan 2020). For real-time control purposes, the model parameter such as the nodal water demand is a time-dependent state variable that should be estimated given real-time measurements, for which the analytical method is suggested for its superiority in terms of computational efficiency. When the PDF of nodal water demand is assumed as a standard normal PDF, the variable ranges from negative infinity to positive infinity (Shao

*et al.*2019). The probability of negative or excessive values is not equal to zero. Therefore, the search space ranges from negative infinity to positive infinity. In addition, when the measured values of some pressure sensors are inaccurate or have large uncertainty, the nodal water demands may adjust to negative or excessive values, which is an infeasible condition in practice. A possible solution to this problem is to introduce a truncated normal PDF. However, the PDF used in analytical methods must be differentiable, whereas the truncated normal PDF cannot meet this condition. As shown in Figure 1, the PDF value drops sharply to 0 on the truncation points, at which the PDF is non-differentiable. Currently, one strategy to solve this problem is to set all negative values to 0 when finishing all the nodal water demand estimations. Another strategy is reassigning the PDF in the infeasible region to the feasible region (Shimada

*et al.*1998; Simon & Simon 2010). Nevertheless, both strategies mentioned above are too inefficient and do not solve the fundamental problem of the non-differentiability of the truncation points.

The primary aim of this study is to propose a truncated normal PDF to overcome the difficulty of non-differentiable at truncated points. Then, the truncated normal PDF is used to describe the prior PDF and likelihood based on the probability modeling of state variable and measurement noise in the truncated space. The prior PDF and likelihood are fused in the data assimilation framework, and analytical solutions for state estimation in a non-linear system are developed. Furthermore, we have applied the developed method to estimate the nodal water demand in a hypothetical WDS and indirectly through nodal pressure estimate in a field WDS. The results show that the method can deal with the state estimation problem under the condition that the state variable and measurement noise are distributed in the finite region. Moreover, the developed method can be effectively applied to estimate a wide range of parameters in WDS models, such as pipe roughness.

## METHODS

### Modeling of truncated normal PDF

#### Two-side truncated normal PDF

*C*is the normalization constant, which ensures that Equation (1) yields a valid probability density and integrates to a unit one.

The proposed PDF (red line in Figure 1) has modeling errors in approximation compared to the theoretical truncated normal PDF (black line in Figure 1). Importantly, the magnitude of the modeling error is mainly controlled by the parameter . With a decreasing value, the proposed PDF better replicate the theoretical PDF. The smaller the , the smaller the error of the truncated normal PDF. Also notable is that the parameter *λ* affects the convergence of results in the state estimation for non-linear systems, for which the recommended range is .

#### One-side truncated normal PDF

### Data assimilation via truncated normal PDF

#### Probability formulation for data assimilation

*t*is the time index; is the state variable to be estimated; is the prediction matrix and is the prediction error; is the measurement; is the non-linear function mapping to ; is the measurement noise;

*m*and

*n*are the dimensions of and . In the classical data assimilation algorithm, and are uncorrelated zero mean normal noise input sequences:where is the normal PDF; and are the covariance matrix for and , respectively.

Equation (11) can be solved by a sampling-based method (Do *et al.* 2017; Zhou *et al.* 2018) or an analytical method (Shao *et al.* 2019; Singh *et al.* 2022). This paper focuses on the use of analytical methods to estimate the state variable.

*R*are diagonal matrices. The prior PDF and likelihood can be written as Equations (12) and (14), respectively:where , , , and are the element of , , , and , respectively; and are the diagonal value of and

*R*, respectively.

#### Truncated prior PDF and likelihood

### Analytical solutions for non-linear system

#### The objective function

As shown in Equation (20), when the parameter or approaches the boundary from the inner of the feasible domain, e.g., [,] for and [,] for , the optimization objective function will increase sharply. Then the variables can be effectively constrained to the feasible domain by minimizing the objective function. This strategy is similar to the barrier method to solve non-linearly constrained optimization problems, in which is the barrier parameter affects the number of iterations and the convergence effect (Nocedal & Wright 2006).

#### Non-linear solution

*k*th iteration; and are the first-order gradient and the second-order gradient of , respectively. and can be derived as follows:where and , and . These vectors can be computed by

## APPLICATION TO NODAL WATER DEMAND ESTIMATION

The proposed method is applied to the nodal water demand estimation in two water distribution networks. The first one is a hypothetical simple network, and the other is a large-scale field network located in eastern China. Case study 1 is used to verify the proposed method with a hypothetical simple network. In Case study 2, the performance of the proposed method is evaluated when used to solve a real large-scale state estimation.

### Case study 1: Simple hypothetical network

*et al.*(2019). The main difference between the two methods is whether the truncated normal distribution is used. A 3 days simulation with a time step of 20 min is carried out.

Random noise is added to the theoretical value to obtain the observed value and the noise is normally distributed with a variance of *R*. The variance for the pressure and flow sensors are and 1 , respectively. The prior nodal water demand is generally predicted from historical data (Chu *et al.* 2021a). In this case study, we assume that the mean value of prior nodal water demand is equal to the estimated nodal water demand in the previous time step (). In the first time-step, the total water demand is equally allocated to each node as the prior node water demand, with . The covariance () is assumed to be a constant, with . The truncation point for the prior PDF is computed as , where is the theoretical value of the nodal water demand. For the pressure sensor, the truncation points for the likelihood are determined by ; for the flow sensor, , where is the observed value of the sensor. The constant parameter = 1 and the maximum number of allowed iterations .

#### Nodal water demand estimation

*Step 1: Set Estimation Parameters*

The pressure at nodes 3 and 7 and the pipe flow at pipes [8], [10], and [11] are selected as the measured values. The observed value and their variance are ,,, and . The truncation points for the likelihood are ; For the prior PDF of nodal water demand , . The truncation points for the prior PDF are . . In each time step, the nodal water demand is estimated after serval iterations. The prior value of the nodal water demand is set as the value at the first iteration ()

*Step 2: Compute the Model Outputs*

*Step 3: Calculate the Jacobian Matrix*

*Step 7: Reach Termination Conditions*

The next step is to repeat Steps 2–6 until . The comparison of the nodal water demands and nodal pressure between theoretical values and estimated results are discussed.

#### Case study 1: Discussion

*et al.*(2019) is significantly higher as compared to the method proposed in this study. For example, Shao

*et al.*(2019) estimated the nodal water demand at N4 with an error of 2.30 L/s, whereas the approach presented in this study achieved an error of only 0.24 L/s. The accuracy of the two algorithms is evaluated by calculating the average deviation (error) between the estimated values and the theoretical values. Similar observations can be found in N1, N2, and N5. It is apparent that the proposed method can effectively avoid unrealistic estimates of excessively large or small nodal water demands. This is primarily due to the use of the truncated normal PDF, which restricts the parameters to a reasonable range. As shown in Figure 5, the estimated nodal water demand is confined within the domain defined by the truncation points (gray dashed line). In contrast, Shao

*et al.*(2019) used a normal PDF, which allows the nodal water demand to be estimated over the full range . This assumption is not consistent with the actual situation, leading to an excessive value of the nodal water demand at N1, N2, N4, and N5. More importantly, these unrealistic nodal demands without the use of the proposed truncated normal PDF cannot be easily uncovered by the use of limited pressure and flow measurements.

### Case study 2: Large-scale city network

*y*denotes the field measurements. The total water demand is equally allocated to each node as the prior node water demand (). The prior variance is assumed to be . The truncation points for the prior PDF are set as . In this case study, only one time-step simulation is performed.

In contrast, for the method proposed by Shao *et al.* (2019), the deviations for 54 estimation sensors are within 1 m, while the deviations for 7 sensors are above 1 m and the deviations for 3 sensors are above 2 m, with the largest residual being 3.68 m. In the validation data set, the deviations for 6 sensors are above 1 m, and the deviations for 3 sensors are above 2 m with the largest residual being 3.05 m. Moreover, Shao *et al.* (2019) estimated 191 negative nodal water demands, while the proposed method does not estimate any negative nodal water demand.

The results of Shao *et al.* (2019) show that three estimation sensors have deviations greater than 2.5 m, and three validation sensors also have deviations greater than 2 m, indicating excessive estimation errors (Figure 8). As shown in Figure 7, the distribution of sensor locations is highly uneven, with a high density of sensors in urban areas and a low density in rural areas. The sensors with excessive deviations are all located in rural areas with low sensor density. As mentioned earlier, demand estimation with a limited number of field measurements is an ill-conditioned problem. The problem is even more severe in locations with sparse sensor density, which leads to greater estimation errors for these sensors. Additionally, in the Bayesian estimation process, there is a competitive relationship between the sensors, and the estimated demand is a compromise between them. Sensors that are closer together tend to adjust the nodal water demand in a similar direction, making them more competitive. Thus, in areas with a higher sensor density, competitiveness increases, while in areas with a lower sensor density, competitiveness decreases, exacerbating the severity of the ill-conditioning problem in areas with sparse sensors and leading to excessive deviations. The use of truncated likelihood constrains the simulated value to a set range () around the observed value (), solving the ill-conditioning problem by incorporating constraints implied in the truncated likelihood.

## CONCLUSIONS

The accurate estimate of the nodal water demand is crucial for modeling and managing water distribution networks. Recent advances have made it possible to estimate all nodal water demand in real-time using ubiquitous pressure sensors and limited flow rate measurements (Shao *et al.* 2019). The nodal demand estimation is often based on the assumed normal PDF for probability modeling in Bayesian inference, allowing the values of random variables to be distributed at infinite intervals. However, engineering practice and field operations commonly confine the state variables and measurement noises in a limited range. This imprecise mathematical representation can lead to an unrealistic estimation of nodal pressure and water demands.

This paper proposed a new nodal water demand estimation approach using truncated normal PDF methods. According to the application results and analysis, the main conclusions can be summarized as follows:

- (a)
Existing truncated normal PDF in simulation suffers from the difficulty at the non-differentiable truncated point, a problem that seriously limits the use of analytical methods (e.g., Gaussian approximation). The proposed analytical solutions for truncated normal PDFs method avoid this non-differentiable problem by mathematical approximation in truncated points.

- (b)
When using a limited number of sensors to estimate or calibrate network parameters, an ill-conditioning problem arises, making it susceptible to overfitting the noise in the data. However, the proposed method offers a solution to this issue by effectively constraining the fitting range of the noise. This limitation helps to prevent overfitting and ensures that the estimated parameters are more robust and reliable. By effectively managing the noise during the estimation or calibration process, the proposed method mitigates the ill-conditioning problem and improves the accuracy of the parameter estimates.

- (c)
When the measurements are biased, the estimated parameters, such as nodal water demand, can exhibit undesirable characteristics, such as negative values or excessively large values. However, the proposed truncated normal PDFs in this paper effectively mitigate this issue. By utilizing truncated normal PDFs, the parameter estimation process is constrained within a specific range, preventing the estimation of unrealistic or extreme values. This approach ensures that the estimated parameters remain within reasonable bounds and avoids the occurrence of negative or excessively large values, enhancing the accuracy and reliability of the estimation results.

- (d)
In data assimilation, the truncated normal PDF from the theoretical truncated normal PDF is primarily controlled by the parameters . As a result, the parameter affects the number of iterations and the convergence effect in data assimilation. A fixed value of is used in this study, which may in some cases increase the number of iterations and reduce the computational efficiency. Considering that the objective function (refer to Equation (20)) is similar to the barrier method when dealing with non-linearly constrained optimization problems, updating the parameter in the presence of non-linearities (Nocedal & Wright 2006), can be adopted to improve the algorithm performance in future research.

The proposed method using the truncated normal PDF in the finite region has been applied to simulate nodal pressure and estimate corresponding water demand in a hypothetical simple and a field WDS in eastern China. The results clearly show the advantage of the proposed method in avoiding the artificial negative nodal water demand and unseasonable errors between the estimated and measured values. This improvement is constructive toward WDS simulation and control using real-time network monitoring data. Furthermore, the proposed method can also be utilized to estimate other hydraulic parameters in WDS models. This includes parameters like pipe roughness, which can be accurately represented by a truncated normal PDF.

## ACKNOWLEDGEMENTS

This work was supported by the National Natural Science Foundation of China (No. 52270095) and the National Natural Science Foundation of China (No. 52200119). As a part of the collaborative research, this publication has been cleared through the U.S. EPA administrative and technical review process. The views expressed in this article are those of the authors and do not necessarily represent the views or the policies of the Agency; therefore, no official endorsement should be inferred.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.