The normal probability density function (PDF) is widely used in parameter estimation in the modeling of dynamic systems, assuming that the random variables are distributed at infinite intervals. However, in practice, these random variables are usually distributed in a finite region confined by the physical process and engineering practice. In this study, we address this issue through the application of truncated normal PDF. This method avoids a non-differentiable problem inherited in the truncated normal PDF at the truncation points, a limitation that can limit the use of analytical methods (e.g., Gaussian approximation). A data assimilation method with the derived formula is proposed to describe the probability of parameter and measurement noise in the truncated space. In application to a water distribution system (WDS), the proposed method leads to estimating nodal water demand and hydraulic pressure key to hydraulic and water quality model simulations. Application results to a hypothetical and a large field WDS clearly show the superiority of the proposed method in parameter estimation for WDS simulations. This improvement is essential for developing real-time hydraulic and water quality simulation and process control in field applications when the parameter and measurement noise are distributed in the finite region.
The truncated normal probability density functions (PDFs) are developed.
A new data assimilation method utilizing truncated normal PDF is proposed.
The method is used for demand estimation in water distribution systems.
Parameter estimation and calibration play a crucial role in the modeling and management of water resources systems, as emphasized in several studies (Savic et al. 2009; Beckers et al. 2020; Scott et al. 2022; Yoon et al. 2022). The main objective is to minimize discrepancies between model outputs and measured values by adjusting network parameters, including nodal water demand and pipe roughness. However, this task can be challenging due to the large number of parameters involved and the limited availability of measured data. Consequently, parameter estimation often faces ill-conditioning issues, where the insufficient number of measurements leads to non-unique solutions within the search domain (Shao et al. 2019). In addition, uncertainties in both the measurement and model itself can significantly impact the parameter estimation accuracy. Often time parameter estimation from such a noisy environment is a major challenge for the modeling of water resources systems (Chu et al. 2021a, 2021b). Data assimilation with consideration of various uncertainties has been used for parameter estimation in model simulations (Vrugt et al. 2005; Hutton et al. 2014; Zhou et al. 2020).
Data assimilation estimates the state of a process based on time-series measurements. The state estimator relies on the knowledge of the posterior probability distribution function (PDF) of the state given the real-time measurements (Bar-shalom et al. 2001). Generally, the posterior PDF is approximated in a recursion, which involves two stages: the prediction stage and the update stage (Garcia-Fernandez et al. 2012; Hutton et al. 2014). In the prediction stage, the PDF of the state at the current time step is predicted from the historical data, referred to as the prior PDF. In the update stage, the likelihood describes the probabilistic relationship between the state and measurement. The likelihood and prior PDF are combined based on the Bayesian rule to approximate the posterior PDF (Garcia-Fernandez et al. 2012).
Numerous data assimilation methods have been proposed to solve the state estimation problem through the sample-based and analytical methods (e.g., Gaussian approximation). The sampling-based, such as particle filter, Markov Chain Monte Carlo, stipulates that one or more samples (particles) are sampled from the prior PDF, and then the likelihood or posterior PDF evaluates the sampled value. If the sampled value agrees with the likelihood or posterior PDF, it is retained, and the algorithm proceeds to the next variable in turn (Bishop 2006). However, sampling-based methods may be time-consuming for large-scale non-linear systems because they require frequent evaluation of the samples' probability density. For this reason, there is considerable interest in computationally efficient analytical methods (e.g., Gaussian approximations) (Garcia-Fernandez et al. 2012). Kalman filter is the most well-known analytical method for state estimation in the linear system. For the non-linear system, the extended Kalman filter (Singh et al. 2022) and the iterative Kalman filter (Huang et al. 2022) have been developed, in which a linearized approximation of the system function is required (Garcia-Fernandez et al. 2012; Shao et al. 2019). In the linearization of the system, the first-order gradient (Jacobian matrix) or second-order gradient (Hessian matrix) information is utilized to search for the optimal solution. This analytical method is more computationally efficient.
One common assumption in the analytical method is that the prior PDF of the state variable conforms to a normal PDF over the full range (−∞, ∞) (Law et al. 2015; Yang et al. 2018; Shao et al. 2019). In practice, however, the state variable is usually distributed in the finite region. Besides, no sensors can provide an infinitely large measurement (Garcia-Fernandez et al. 2012), for which the measurement noise should be bounded in the reasonable region. This practice is not consistent with the unbounded likelihood assumed in most models (Garcia-Fernandez et al. 2012). Therefore, the state variable or measurement constraints to confine them in the feasible domain must be considered in the data assimilation process (Lauvernet et al. 2009), a difficult requirement that calls for developing efficient algorithms to handle the constraints.
A popular method to address the problem is by incorporating constraints in data assimilation (Ko & Bitmead 2007; Lauvernet et al. 2009; Garcia-Fernandez et al. 2012; Xu et al. 2013). Ko & Bitmead (2007) proposed a constrained Kalman filter based on the projected system method. The method's superiority was demonstrated by comparing the magnitude of the estimation error covariance matrix with those of the unconstrained Kalman filters. Garcia-Fernandez et al. (2012) developed a method to solve the state estimation problem under bounded measurement noise. The boundary information of the measurement noise is used to modify the prior PDF of the state variable based on the Bayesian rule, while the Kalman filter is used to fuse the modified prior PDF with the likelihood. Simon & Simon (2010) described a PDF truncation method by incorporating constraints into the Kalman filter and applied this method to an aircraft turbofan engine health estimation problem. Andersson et al. (2019) developed a linear state estimation method with linear equality constraints for time-variant systems. Xu et al. (2013) constructed a linear equality constraint dynamic model by incorporating the constraints as the prior information about the states into the dynamics modeling. Overall, the above methods address the constraint problem by introducing equality or inequality constraints to data assimilation.
Another approach to address the problem is to modify the normal PDF, referred to as truncated normal PDF, to restrict the value range of state variables. The truncated normal PDF has many application scenarios in Bayesian inference for truncated parameter space problems. Robert (1995) proposed an efficient algorithm for unidimensional truncated normal variables and a multidimensional extension. Generally, the theoretical truncated normal PDF is equal to the normal distribution PDF in the feasible region, and is directly equal to 0 outside the feasible region (Burkardt 2014).
The theoretical truncated normal PDF has been widely used in sample-based state estimation problems (Zhou et al. 2018). However, it is rarely used in the analytical method since the non-differentiable truncation points lead to intricate numerical integration (Robert 1995). For the analytical method, due to the requirement of the linearization of the system, the function should be differentiable, and the first-order gradient (Jacobian matrix) or second-order gradient (Hessian matrix) information is utilized. The difficulty in applying the truncated normal PDF to the analytical method is the non-differentiability of a truncated normal PDF at the truncation points.
The primary aim of this study is to propose a truncated normal PDF to overcome the difficulty of non-differentiable at truncated points. Then, the truncated normal PDF is used to describe the prior PDF and likelihood based on the probability modeling of state variable and measurement noise in the truncated space. The prior PDF and likelihood are fused in the data assimilation framework, and analytical solutions for state estimation in a non-linear system are developed. Furthermore, we have applied the developed method to estimate the nodal water demand in a hypothetical WDS and indirectly through nodal pressure estimate in a field WDS. The results show that the method can deal with the state estimation problem under the condition that the state variable and measurement noise are distributed in the finite region. Moreover, the developed method can be effectively applied to estimate a wide range of parameters in WDS models, such as pipe roughness.
Modeling of truncated normal PDF
Two-side truncated normal PDF
The proposed PDF (red line in Figure 1) has modeling errors in approximation compared to the theoretical truncated normal PDF (black line in Figure 1). Importantly, the magnitude of the modeling error is mainly controlled by the parameter . With a decreasing value, the proposed PDF better replicate the theoretical PDF. The smaller the , the smaller the error of the truncated normal PDF. Also notable is that the parameter λ affects the convergence of results in the state estimation for non-linear systems, for which the recommended range is .
One-side truncated normal PDF
Data assimilation via truncated normal PDF
Probability formulation for data assimilation
Equation (11) can be solved by a sampling-based method (Do et al. 2017; Zhou et al. 2018) or an analytical method (Shao et al. 2019; Singh et al. 2022). This paper focuses on the use of analytical methods to estimate the state variable.
Truncated prior PDF and likelihood
Analytical solutions for non-linear system
The objective function
As shown in Equation (20), when the parameter or approaches the boundary from the inner of the feasible domain, e.g., [,] for and [,] for , the optimization objective function will increase sharply. Then the variables can be effectively constrained to the feasible domain by minimizing the objective function. This strategy is similar to the barrier method to solve non-linearly constrained optimization problems, in which is the barrier parameter affects the number of iterations and the convergence effect (Nocedal & Wright 2006).
APPLICATION TO NODAL WATER DEMAND ESTIMATION
The proposed method is applied to the nodal water demand estimation in two water distribution networks. The first one is a hypothetical simple network, and the other is a large-scale field network located in eastern China. Case study 1 is used to verify the proposed method with a hypothetical simple network. In Case study 2, the performance of the proposed method is evaluated when used to solve a real large-scale state estimation.
Case study 1: Simple hypothetical network
Random noise is added to the theoretical value to obtain the observed value and the noise is normally distributed with a variance of R. The variance for the pressure and flow sensors are and 1 , respectively. The prior nodal water demand is generally predicted from historical data (Chu et al. 2021a). In this case study, we assume that the mean value of prior nodal water demand is equal to the estimated nodal water demand in the previous time step (). In the first time-step, the total water demand is equally allocated to each node as the prior node water demand, with . The covariance () is assumed to be a constant, with . The truncation point for the prior PDF is computed as , where is the theoretical value of the nodal water demand. For the pressure sensor, the truncation points for the likelihood are determined by ; for the flow sensor, , where is the observed value of the sensor. The constant parameter = 1 and the maximum number of allowed iterations .
Nodal water demand estimation
Step 1: Set Estimation Parameters
The pressure at nodes 3 and 7 and the pipe flow at pipes , , and  are selected as the measured values. The observed value and their variance are ,,, and . The truncation points for the likelihood are ; For the prior PDF of nodal water demand , . The truncation points for the prior PDF are . . In each time step, the nodal water demand is estimated after serval iterations. The prior value of the nodal water demand is set as the value at the first iteration ()
Step 2: Compute the Model Outputs
Step 3: Calculate the Jacobian Matrix
Step 7: Reach Termination Conditions
The next step is to repeat Steps 2–6 until . The comparison of the nodal water demands and nodal pressure between theoretical values and estimated results are discussed.
Case study 1: Discussion
Case study 2: Large-scale city network
In contrast, for the method proposed by Shao et al. (2019), the deviations for 54 estimation sensors are within 1 m, while the deviations for 7 sensors are above 1 m and the deviations for 3 sensors are above 2 m, with the largest residual being 3.68 m. In the validation data set, the deviations for 6 sensors are above 1 m, and the deviations for 3 sensors are above 2 m with the largest residual being 3.05 m. Moreover, Shao et al. (2019) estimated 191 negative nodal water demands, while the proposed method does not estimate any negative nodal water demand.
The results of Shao et al. (2019) show that three estimation sensors have deviations greater than 2.5 m, and three validation sensors also have deviations greater than 2 m, indicating excessive estimation errors (Figure 8). As shown in Figure 7, the distribution of sensor locations is highly uneven, with a high density of sensors in urban areas and a low density in rural areas. The sensors with excessive deviations are all located in rural areas with low sensor density. As mentioned earlier, demand estimation with a limited number of field measurements is an ill-conditioned problem. The problem is even more severe in locations with sparse sensor density, which leads to greater estimation errors for these sensors. Additionally, in the Bayesian estimation process, there is a competitive relationship between the sensors, and the estimated demand is a compromise between them. Sensors that are closer together tend to adjust the nodal water demand in a similar direction, making them more competitive. Thus, in areas with a higher sensor density, competitiveness increases, while in areas with a lower sensor density, competitiveness decreases, exacerbating the severity of the ill-conditioning problem in areas with sparse sensors and leading to excessive deviations. The use of truncated likelihood constrains the simulated value to a set range () around the observed value (), solving the ill-conditioning problem by incorporating constraints implied in the truncated likelihood.
The accurate estimate of the nodal water demand is crucial for modeling and managing water distribution networks. Recent advances have made it possible to estimate all nodal water demand in real-time using ubiquitous pressure sensors and limited flow rate measurements (Shao et al. 2019). The nodal demand estimation is often based on the assumed normal PDF for probability modeling in Bayesian inference, allowing the values of random variables to be distributed at infinite intervals. However, engineering practice and field operations commonly confine the state variables and measurement noises in a limited range. This imprecise mathematical representation can lead to an unrealistic estimation of nodal pressure and water demands.
This paper proposed a new nodal water demand estimation approach using truncated normal PDF methods. According to the application results and analysis, the main conclusions can be summarized as follows:
Existing truncated normal PDF in simulation suffers from the difficulty at the non-differentiable truncated point, a problem that seriously limits the use of analytical methods (e.g., Gaussian approximation). The proposed analytical solutions for truncated normal PDFs method avoid this non-differentiable problem by mathematical approximation in truncated points.
When using a limited number of sensors to estimate or calibrate network parameters, an ill-conditioning problem arises, making it susceptible to overfitting the noise in the data. However, the proposed method offers a solution to this issue by effectively constraining the fitting range of the noise. This limitation helps to prevent overfitting and ensures that the estimated parameters are more robust and reliable. By effectively managing the noise during the estimation or calibration process, the proposed method mitigates the ill-conditioning problem and improves the accuracy of the parameter estimates.
When the measurements are biased, the estimated parameters, such as nodal water demand, can exhibit undesirable characteristics, such as negative values or excessively large values. However, the proposed truncated normal PDFs in this paper effectively mitigate this issue. By utilizing truncated normal PDFs, the parameter estimation process is constrained within a specific range, preventing the estimation of unrealistic or extreme values. This approach ensures that the estimated parameters remain within reasonable bounds and avoids the occurrence of negative or excessively large values, enhancing the accuracy and reliability of the estimation results.
In data assimilation, the truncated normal PDF from the theoretical truncated normal PDF is primarily controlled by the parameters . As a result, the parameter affects the number of iterations and the convergence effect in data assimilation. A fixed value of is used in this study, which may in some cases increase the number of iterations and reduce the computational efficiency. Considering that the objective function (refer to Equation (20)) is similar to the barrier method when dealing with non-linearly constrained optimization problems, updating the parameter in the presence of non-linearities (Nocedal & Wright 2006), can be adopted to improve the algorithm performance in future research.
The proposed method using the truncated normal PDF in the finite region has been applied to simulate nodal pressure and estimate corresponding water demand in a hypothetical simple and a field WDS in eastern China. The results clearly show the advantage of the proposed method in avoiding the artificial negative nodal water demand and unseasonable errors between the estimated and measured values. This improvement is constructive toward WDS simulation and control using real-time network monitoring data. Furthermore, the proposed method can also be utilized to estimate other hydraulic parameters in WDS models. This includes parameters like pipe roughness, which can be accurately represented by a truncated normal PDF.
This work was supported by the National Natural Science Foundation of China (No. 52270095) and the National Natural Science Foundation of China (No. 52200119). As a part of the collaborative research, this publication has been cleared through the U.S. EPA administrative and technical review process. The views expressed in this article are those of the authors and do not necessarily represent the views or the policies of the Agency; therefore, no official endorsement should be inferred.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.