Abstract
Metamodels accurately reproduce the output of physics-based hydraulic models with a significant reduction in simulation times. They are widely employed in water distribution system (WDS) analysis since they enable computationally expensive applications in the design, control, and optimisation of water networks. Recent machine-learning-based metamodels grant improved fidelity and speed; however, they are only applicable to the water network they were trained on. To address this issue, we investigate graph neural networks (GNNs) as metamodels for WDSs. GNNs leverage the networked structure of WDS by learning shared coefficients and thus offering the potential of transferability. This work evaluates the suitability of GNNs as metamodels for estimating nodal pressures in steady-state EPANET simulations. We first compare the effectiveness of GNN metamodels against multi-layer perceptrons (MLPs) on several benchmark WDSs. Then, we explore the transferability of GNNs by training them concurrently on multiple WDSs. For each configuration, we calculate model accuracy and speedups with respect to the original numerical model. GNNs perform similarly to MLPs in terms of accuracy and take longer to execute but may still provide substantial speedup. Our preliminary results indicate that GNNs can learn shared representations across networks, although assessing the feasibility of truly general metamodels requires further work.
HIGHLIGHTS
The accuracy of GNN-based and MLP-based metamodels is comparable on most of the studied water networks.
The proposed model can be trained on several water networks at once and can learn shared representation between them.
By learning shared representations, the model achieves comparable performance while requiring fewer training examples.
GNNs show promising results from transferability, although further study is required.
INTRODUCTION
Water utilities rely on hydrodynamic models to design and control water distribution systems. These physics-based models, such as EPANET (Rossman 2000), compute the state of the system, i.e., the flow rates and pressures at all the pipes and junctions, by solving the underlying equations of mass and energy conservation. The inputs for these computer programs include the layout of the network and the characteristics and settings of components such as pipes, pumps, valves, and reservoirs, among others. Hydrodynamic models provide valuable insight into the functioning of the system. However, the speed of these models is often insufficient for applications such as optimisation (e.g., Bi & Dandy 2014) or criticality assessment (e.g., Meijer et al. 2021), especially in large search space problems (Maier et al. 2014). One alternative to address this issue is developing surrogate models, also referred to as metamodels.
Metamodels are models that aim to significantly reduce simulation times while still obtaining comparable results to the hydrodynamic model. A main family of metamodels is response surface models (Razavi et al. 2012). These surrogate models mimic the input–output relation, i.e., the response surface, of the original physics-based model to obtain results in a fraction of the time while retaining sufficient accuracy. Among the multiple algorithms that can be used for creating these metamodels, artificial neural networks (ANNs) have been increasingly popular due to their high performance and execution speed; previous studies using mainly ANNs have shown remarkable gains in computational time (Broad et al. 2005; Martínez et al. 2007; Salomons et al. 2007; Behzadian et al. 2009; Broad et al. 2010).
ANNs are models obtained by stacking parametric functions that take an input x and produce an estimated output ŷ from a target representation y. The parameters in an ANN are learned, i.e., calibrated, by minimising the difference between the expected and real targets, measured with a loss function. This calibration process is usually performed via backpropagation, i.e., the parameters change based on the value of the loss function. The process of learning these parameters is referred to as training and the data employed during training are called training sets. The model performances are then evaluated on a separate dataset known as a validation dataset, before being employed for testing on unseen data.
Fully connected ANNs, also known as multi-layer perceptrons (MLPs), are arguably the most used metamodels for water distribution systems (WDS). Even though MLPs can approximate any function (Hornik et al. 1989), their number of parameters increases exponentially with the size of the input, making them unsuitable for high-dimensional data. This issue is known as the ‘curse of dimensionality’ and implies the amount of training data required by an MLP increases exponentially with the input's dimensions (Lecun et al. 2015). Furthermore, MLPs require a fixed-size input, and consequently, a new model needs to be created when the size of the inputs changes, e.g., by adding new pipes or junctions to a WDS. Thus, they do not overcome a major limitation of traditional metamodels: they are only applicable to the water network they were trained on. As noted by Garzon et al. (2022), this implies that new metamodels must be trained with new sets of simulations to account for multiple networks or structural changes in the original system. This characteristic could discourage the use of metamodels or even make them impractical.
Components of metamodels can resemble the underlying structure of the problem at hand by including inductive bias, i.e., assumptions or knowledge about the data-generating process, underlying physical processes, or the space of solutions (Battaglia et al. 2018). This similitude aids the effectiveness of model transfer by exploiting the connectivity of the nodes and the physical information of the components. For metamodels in WDSs, this information includes connectivity and data such as node elevation, pipe roughness, and length.
Graph neural networks (GNN) are a recent variant of ANN which can perform operations on data that lie on graphs – mathematical objects that describe how entities are connected to each other via nodes and edges. WDSs can be represented by graphs, considering junctions, reservoirs, or storage tanks as nodes and pipes, pumps, or valves as edges. GNNs can then take the information embedded in WDS and apply the same linear and non-linear operations used in MLPs. The main difference is that GNNs can have permutation invariant and equivariant properties, which allows them to consider arbitrarily sized graphs. This inductive bias preserves the additional information embedded in the graph structure which helps in decreasing the number of trainable parameters in the metamodel.
Recently, GNNs found successful applications in water networks. Hajgató et al. (2021) and Xing & Sela (2022) used a GNN model to estimate the pressure state of a WDS, based on a few sensors in the network. Bonilla et al. (2022) employed a GNN model to predict the pump speed from pressure and flow measurements in the networks. However, no works yet explored the transferability of GNNs for WDSs, i.e., their ability to learn representations and perform predictions across multiple case studies. Furthermore, to the best of our knowledge, no studies on WDSs have compared the performance of GNNs against that of traditional data-driven alternatives, such as MLPs. In this paper, we move the first steps in these directions by developing GNN-based metamodels for estimating nodal pressures on six benchmark WDSs and comparing them against several MLP baselines.
The remainder of the paper is organised as follows. In the methodology, we firstly describe the data generation procedure and the six case studies used in this work. Next, we present the employed metamodels based on MLP and GNN and describe the metrics used for assessing their performances. The section additionally includes the description of the conducted experiments and the adopted setup. The result section presents the comparison between MLP- and GNN-based metamodels for the benchmark WDSs and discusses the results on GNN transferability. The last section concludes the paper.
METHODS
Case studies and data generation
Name . | ID . | Reference . | # nodes . | # pipes . | # reservoirs . |
---|---|---|---|---|---|
Fossolo | FOS | Bragalli et al. (2012) | 37 | 58 | 1 |
BakRyan | BAK | Lee & Lee (2001) | 36 | 58 | 1 |
Pescara | PES | Bragalli et al. (2012) | 71 | 99 | 3 |
Modena | MOD | Bragalli et al. (2012) | 272 | 317 | 4 |
Marchi Rural | RUR | Marchi et al. (2014) | 381 | 476 | 2 |
KL | KL | Kang & Lansey (2012) | 936 | 1,274 | 1 |
Name . | ID . | Reference . | # nodes . | # pipes . | # reservoirs . |
---|---|---|---|---|---|
Fossolo | FOS | Bragalli et al. (2012) | 37 | 58 | 1 |
BakRyan | BAK | Lee & Lee (2001) | 36 | 58 | 1 |
Pescara | PES | Bragalli et al. (2012) | 71 | 99 | 3 |
Modena | MOD | Bragalli et al. (2012) | 272 | 317 | 4 |
Marchi Rural | RUR | Marchi et al. (2014) | 381 | 476 | 2 |
KL | KL | Kang & Lansey (2012) | 936 | 1,274 | 1 |
For each of these networks, we employed the WNTR Python package (Klise et al. 2018) to generate a dataset of 10,000 samples, divided into training (8,000), validation (1,000), and test (1,000) subsets. Each sample is created by altering all (i) nodal base demands, (ii) pipe diameters, and (iii) pipe roughness coefficients. The altered values are selected from different distributions that reflect commercial ranges for pipe diameters (0–1.5 m at 2.5 cm increments) and Hazen–Williams roughness coefficients (50–150 at 1 unit increments), as well as reasonable base demands with respect to the selected case studies (0–100 L/s with 0.1 increments). To avoid unrealistic configurations leading to very low or very high pressures, the distributions were sampled within a range centred on each original node or pipe characteristics. While the distribution of pipe roughness and nodal-based demands are uniform within these ranges, the selection of pipe diameters is biased towards larger values to reduce the possibility of unfeasible setups (e.g., yielding failed simulation).
The described alterations ensured sufficient variability in the nodal pressures obtained via WNTR pressure-driven simulations. When running the simulations, we kept all other network characteristics constant, including network connectivity, geographical coordinates, elevation, pipe lengths, and boundary conditions (i.e., total head of the reservoirs).
ANN-based metamodels
Contrary to GNNs that can learn shared representations within the same WDS and across WDSs by exploiting topological and ‘static’ features, adding constant inputs to the MLP (e.g., elevations, pipe lengths) hinders its training process. This is carried out by means of gradient descent algorithms minimising a loss function; for the nodal regression problem entailed in our metamodelling approach, the loss is chosen as the mean squared error (MSE) between the pressure values of WNTR simulations and those predicted by the MLP averaged for the entire training dataset.
GNN-based metamodels
Metamodel performance
Experimental setup
We run two main sets of experiments aimed at (i) assessing the performances of GNN metamodels trained on individual WDSs, and (ii) exploring the advantages of a transferable GNN metamodel trained on datasets featuring samples of all WDSs. We compare the results against that of MLP-based metamodels trained on each WDS separately.
The comparison of MLP- vs GNN-based metamodels on individual WDSs is carried out by training the models on the entire 8,000 samples available for each water network. To facilitate the training process, all variables are scaled using log transformations (elevation, pressure, and pipe length) or min–max scaling (all other features). The scaling parameters derived from the training dataset are applied to the validation and test datasets. We decided to substitute the reservoirs' elevation (set to 0 by default in EPANET solvers) with their base hydraulic head and replace this latter feature with a Boolean flag to discriminate reservoirs from junctions. All GNN datasets thus have three features per node (elevation + base head, base demand, node Boolean flag) and three per edge (pipe diameter, pipe length, pipe roughness).
The study on GNN transferability entails three separate training datasets built using samples from all WDSs as shown in Table 2. The first dataset contains 1,024 samples for each WDS, for a total of 6,144 data points. Since the GNN loss function is calculated per node, this dataset is unbalanced, as it overrepresents larger networks. To balance the dataset, we undersample the larger WDSs by including a number of data points that is inversely proportional to the number of nodes in the WDS with respect to the smallest systems (FOS, BAK). Using this strategy, we create two extra datasets named balanced and balanced extended with 5,722 and 22,350 data points, respectively. All training datasets in Table 2 are normalised using the same procedure described before for the individual WDS datasets, but with extreme values computed across all WDSs. Similarly, the derived scaling parameters are then applied to normalize the validation and test datasets, now consisting of data from all WDSs.
Name . | FOS . | BAK . | PES . | MOD . | RUR . | KL . | Total . |
---|---|---|---|---|---|---|---|
Unbalanced | 1,024 | 1,024 | 1,024 | 1,024 | 1,024 | 1,024 | 6,144 |
Balanced | 2,048 | 2,048 | 1,067 | 279 | 199 | 81 | 5,722 |
Balanced extended | 8,000 | 8,000 | 4,169 | 1,088 | 777 | 316 | 22,350 |
Name . | FOS . | BAK . | PES . | MOD . | RUR . | KL . | Total . |
---|---|---|---|---|---|---|---|
Unbalanced | 1,024 | 1,024 | 1,024 | 1,024 | 1,024 | 1,024 | 6,144 |
Balanced | 2,048 | 2,048 | 1,067 | 279 | 199 | 81 | 5,722 |
Balanced extended | 8,000 | 8,000 | 4,169 | 1,088 | 777 | 316 | 22,350 |
Hyperparameter search
We test MLP architectures of different complexity by changing the number of hidden layers, as well as the number of units in each layer. In general, a larger network performs better but requires additional data and computational power. We also employ dropout layers (Srivastava et al. 2014) after each fully-connected layer to improve model generalisation. All activation functions are rectified linear units (ReLU). Table 3 shows the hyperparameters' range selected for MLPs, yielding 36 potential combinations. The upper limits of the hyperparameters are selected based on similar works in the water distribution system domain (Martínez et al. 2007; Hajgató et al. 2021).
Hyperparameter . | Range . |
---|---|
Number of hidden layers | 1, 2, 3, 4 |
Hidden layers dimension | 64, 128, 256 |
Dropout rate | 0, 0.1, 0.25 |
Hyperparameter . | Range . |
---|---|
Number of hidden layers | 1, 2, 3, 4 |
Hidden layers dimension | 64, 128, 256 |
Dropout rate | 0, 0.1, 0.25 |
Table 4 shows the values chosen for the optimisation of the GNN hyperparameters, yielding 24 possible combinations. The hyperparameters include the embedding dimensions of the shared preprocessing MLP, the number of graph convolutional (ChebNet) layers after the preprocessing MLP, the number of output channels of the graph convolutional layers (e.g., number of hidden units), and the max K-hop neighbourhood considered by the GNN. As for the MLP metamodel described before, all activation functions in the GNN are ReLU. No hyperparameter tuning is performed for the transferability experiments, where we use the largest possible GNN with 64 embedding dimensions, 3 ChebNet layers, 128 hidden output channels, and K = 6.
Hyperparameter . | Range . |
---|---|
Embedding dimension | 32, 64 |
Number of convolutional layers | 1, 2, 3 |
Hidden layers dimension | 64, 128 |
K-hop neighbourhood | 3, 6 |
Hyperparameter . | Range . |
---|---|
Embedding dimension | 32, 64 |
Number of convolutional layers | 1, 2, 3 |
Hidden layers dimension | 64, 128 |
K-hop neighbourhood | 3, 6 |
We run all the experiments using the Pytorch library (Paszke et al. 2019) for MLP models and Pytorch Geometric (Fey & Lenssen 2019) for the GNN models. We used default library weight initialisation methods (Glorot & Bengio 2010) and fixed the random seeds. Each metamodel was trained using the Adam optimisation algorithm with a constant learning rate of 0.001, no weight decay, and default selection of parameters. The training was carried out for 30 epochs with no early stopping and a batch size of 128. In terms of hardware, we employed a Xeon W-10855M @2.8 GHz CPU and a Nvidia Quadro RTX 5000, 16Gb RAM GPU.
RESULTS AND DISCUSSION
In this section, we first compare the MLP and the GNN in terms of performance and execution time one WDS at a time. Then, we assess the transferability of GNNs trained on the combined datasets of Table 2.
MLP vs GNN
Table 5 reports the best configurations of the metamodels based on the validation for each WDS, and after considering all hyperparameter combinations described in the previous section. From the comparison of the test , it emerges that MLPs outperform GNNs in all benchmark datasets apart from BAK, where the performances are almost identical, and FOS, where the GNN largely outperforms the MLP.
Model . | . | FOS . | PES . | PES . | MOD . | RUR . | KL . |
---|---|---|---|---|---|---|---|
MLP | # hidden units | 256 | 128 | 256 | 256 | 256 | 64 |
# hidden layers | 4 | 3 | 3 | 2 | 2 | 2 | |
Dropout | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | |
validation | 0.364 | 0.991 | 0.570 | 0.859 | 0.944 | 0.472 | |
test | 0.360 | 0.993 | 0.561 | 0.868 | 0.929 | 0.482 | |
GNN | Embedding dimension | 32 | 32 | 64 | 32 | 32 | 64 |
# conv. layers | 3 | 2 | 3 | 3 | 3 | 3 | |
# hidden units | 64 | 128 | 128 | 128 | 128 | 128 | |
K-hop neigh. | 6 | 3 | 6 | 6 | 6 | 6 | |
validation | 0.748 | 0.991 | 0.496 | 0.759 | 0.924 | 0.463 | |
test | 0.815 | 0.993 | 0.445 | 0.763 | 0.906 | 0.468 |
Model . | . | FOS . | PES . | PES . | MOD . | RUR . | KL . |
---|---|---|---|---|---|---|---|
MLP | # hidden units | 256 | 128 | 256 | 256 | 256 | 64 |
# hidden layers | 4 | 3 | 3 | 2 | 2 | 2 | |
Dropout | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | |
validation | 0.364 | 0.991 | 0.570 | 0.859 | 0.944 | 0.472 | |
test | 0.360 | 0.993 | 0.561 | 0.868 | 0.929 | 0.482 | |
GNN | Embedding dimension | 32 | 32 | 64 | 32 | 32 | 64 |
# conv. layers | 3 | 2 | 3 | 3 | 3 | 3 | |
# hidden units | 64 | 128 | 128 | 128 | 128 | 128 | |
K-hop neigh. | 6 | 3 | 6 | 6 | 6 | 6 | |
validation | 0.748 | 0.991 | 0.496 | 0.759 | 0.924 | 0.463 | |
test | 0.815 | 0.993 | 0.445 | 0.763 | 0.906 | 0.468 |
Table 6 presents the comparison also in terms of RMSE and computational speedups, along with the total number of parameters of the best metamodels. While the RMSE follows the same trends described for , it better indicates the average error in meters for nodal pressure estimation. As expected, the best GNNs are usually smaller than the MLPs in terms of parameters, especially for the FOS and the RUR case studies where the GNN achieves better or similar performances. Nevertheless, MLPs are faster, granting execution speedups of three orders of magnitude with respect to WNTR simulations. Similarly, their training time is between one to two orders of magnitude smaller than that of GNNs. That said, the GNNs provide substantial speedups of up to 70× that may justify their utilisation. Furthermore, this gap in optimised GPU implementation will likely get smaller as these relatively novel techniques become mainstream.
. | MLP . | GNN . | ||||||
---|---|---|---|---|---|---|---|---|
. | RMSE . | Speedup . | #parameters, 103 . | . | RMSE . | Speedup . | #parameters, 103 . | |
FOS | 0.379 | 3.38 | 879 | 200 | 0.815 | 1.84 | 71 | 60 |
BAK | 0.993 | 0.65 | 1,393 | 50 | 0.993 | 0.65 | 56 | 60 |
PES | 0.561 | 6.76 | 1,241 | 200 | 0.445 | 7.60 | 43 | 200 |
MOD | 0.868 | 1.22 | 2,223 | 300 | 0.763 | 1.63 | 24 | 200 |
Rural | 0.929 | 1.27 | 2,029 | 500 | 0.906 | 1.47 | 27 | 200 |
KL | 0.482 | 6.11 | 4,001 | 300 | 0.468 | 6.19 | 22 | 200 |
. | MLP . | GNN . | ||||||
---|---|---|---|---|---|---|---|---|
. | RMSE . | Speedup . | #parameters, 103 . | . | RMSE . | Speedup . | #parameters, 103 . | |
FOS | 0.379 | 3.38 | 879 | 200 | 0.815 | 1.84 | 71 | 60 |
BAK | 0.993 | 0.65 | 1,393 | 50 | 0.993 | 0.65 | 56 | 60 |
PES | 0.561 | 6.76 | 1,241 | 200 | 0.445 | 7.60 | 43 | 200 |
MOD | 0.868 | 1.22 | 2,223 | 300 | 0.763 | 1.63 | 24 | 200 |
Rural | 0.929 | 1.27 | 2,029 | 500 | 0.906 | 1.47 | 27 | 200 |
KL | 0.482 | 6.11 | 4,001 | 300 | 0.468 | 6.19 | 22 | 200 |
Transferability of GNNs
On the other hand, these results also indicate that GNNs may exploit information learned in some case studies for predictions elsewhere. This particularly emerges when considering the performances of the transferable GNN trained on the balanced extended dataset on RUR and KL. This GNN shows equal performances to those trained on the unbalanced dataset despite having only around 75 and 30% of the training samples for these two WDSs, respectively (see Table 2), and it slightly underperforms the best individual MLPs and GNNs trained individually on 8,000 samples. While the training times of the transferable GNNs grow longer with larger training datasets, the model retains execution speedups in the order of those reported in Table 6 for the individual case studies.
CONCLUSION
In this work, we assessed the performances and transferability properties of GNNs used for metamodelling the pressure response surface of six benchmark WDSs. After generating a large sample of steady-state (snapshot) simulations with WNTR, we first compared the performances of multiple configurations of GNNs trained on individual WDSs against MLPs. The results indicate that, while MLPs tend to slightly outperform GNNs in most case studies, there is partial evidence that GNNs may be inherently better architectures for some WDSs. Despite requiring less trainable parameters to achieve comparable goodness-of-fit, GNNs are substantially slower than MLPs. At the current stage, the direct applicability of both GNNs and MLPs in downstream tasks might be limited for some WDSs. For the other cases, however, they still provide consistent speedups that could justify their use.
We assessed the transferability property of GNNs by training a single model on datasets with samples from all WDSs. Testing results for the larger WDSs suggest that a general GNN may perform comparably to the best MLP and GNN models while requiring substantially less training data for these case studies. These initial findings may indicate that GNNs can indeed learn shared representations across different water networks. However, the performance drop witnessed for two of the six networks implies that substantial efforts are required to design adequate datasets and test the general validity of this approach.
This exploratory study only considered a limited combination of GNN architectures consisting of ChebNet layers with a shared MLP for embedding nodal and edge features. Future studies should assess the effects of other graph layers and GNN paradigms on individual WDS performances (Wu et al. 2021) and transferability (Ruiz et al. 2020). Additionally, the effect of each hyperparameter could be investigated in more detail. Furthermore, we aim to investigate whether we can achieve better transferability by including more variability in the data generation process for example by resorting to a large number of randomly generated WDSs (Sitzenfrei 2016). This can include more complex techniques of sampling the pipe parameters that will result in broader pressure distribution in the dataset. New research could consider the equivalences that occur across different settings, such as when different pipe parameters in consecutive pipes result in the same headloss. These equivalences can be used in a self-supervised setting (Xie et al. 2022). Similarly, we aim to extend this work by considering surrogates of extended-period hydraulic analyses, rather than steady-state simulations.
ACKNOWLEDGEMENTS
The authors acknowledge the financial support through the NTNU Green2050 Centre for Green Shift in the Built Environment and the TU Delft AI Labs programme.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories at https://github.com/rtaormina/GNN_metamodels_wds.
CONFLICT OF INTEREST
The authors declare there is no conflict.
REFERENCES
Author notes
Equal contribution.