The partitioning of the urban water supply network can significantly enhance water supply quality. Nonetheless, the bulk of the recently deployed partition approaches overlooked the question of whether the district's fluctuation regulation of flow data is consistent. When the district is modified, it most likely leads to an increase in pressure at a node. To tackle the problem, the flow data from a city's water supply network was evaluated in this article. The random forest approach was also used to extract time-domain characteristics from flow data, and the water supply network split was optimized using the random forest-hierarchical clustering (RF-HC) strategy. Finally, the results were examined and compared. The results suggest that the RF-HC-based water supply network partition technique can better meet the aim of consistent flow changes in the district, as well as offer a theoretical foundation and technological support for the optimal dispatch of press concerning the water supply network.

  • This paper realized the variables of the degree of flow change in single day.

  • An optimized hierarchical clustering algorithm based on random forest is proposed.

  • Take actual coordinate information as calculation parameters.

  • The result could be helpful with the optimal dispatch of press.

District metering area (DMA) was founded on the partition method of an urban water supply pipe network system. Its primary application is to split the complete water supply pipeline network into various autonomous communities. It can detect and reduce leaks in urban water supply systems. Further study on the partition technique has also piqued the interest of numerous specialists and scholars in recent years.

An early DMA approach was the partition method based on experience. Geographical circumstances and regional functions are the main partitions in this strategy. The approach is straightforward, but it ignores the real hydraulic conditions (Wenqi et al. 2012; Bin et al. 2018).

The partition technique, which is based on the community structure approach, considers several comparable nodes in the pipe network as a group when calculating the optimal way to split. Diao K et al. advocated governing the water supply nodes as a community and calculating the border density inside the community as well as the boundary density between communities (Diao et al. 2013).

The partition processing of the water supply network, based on graph theory, may effectively portray the geographic information link among the nodes to a certain extent. Alvisi et al. proposed a method for obtaining DMA partitions of a specified size by using the depth-first search algorithm in graph theory (Alvisi and Franchini 2014); Liu proposed combining the shortest path method in graph theory with geographic information systems to calculate the optimal partition boundary (Jun & Guoping 2012). Both of these approaches essentially address the growth of cross-regional partitioning. At the same time, it decreases the overall energy consumption of the pipeline network to some amount.

The partition approach, which is based on multi-objective optimization, can handle multi-objective issues that arise during the real operation of the water supply network. Paola et al. devised a partition technique that optimizes the uniformity of the water supply in the partition and the pressure differential between nodes in the partition (Paola et al. 2014).

The final optimization target for the majority of the partition methods listed above is static water flow and pressure homogeneity in the pipe network. This overlooked the variance in regulation between nodes in the same district. When the booster pump in this district was modified, this resulted in varied flow and pressure fluctuations.

The research proposes replacing traditional flow data with variation pattern and time-domain flow characteristics based on DMA partitioning, and then using random forest to improve hierarchical clustering (HC) to accomplish partition planning. Unlike the traditional flow data-based method, the patterns of change and time-domain features may guarantee that the flow rates of nodes within the same partition have comparable patterns of change as much as feasible, which can ultimately assist to the optimized scheduling of the pipe network

It should be mentioned that because the experimental environment of this work comes from a demonstration project in the city of HC. Because some of nodes have been abandoned in later projects, there are now a low number of nodes. This leads a to the unfavorable usage of random forests. However, even after the addition of additional nodes, this strategy is still relevant to the water supply network following node expansion.

The flow data of the urban water supply network expresses the water demand of consumers intuitively. The pipe network's districts might be split based on the characteristics of this data. The goal is to ensure that each node in the same district has a similarly changing flow data control. This research introduces the time-domain properties of the extracted flow data using the random forest technique, as well as the degree of change of a single node in a set period of time, as parameters to complete the partition of the water supply network, based on the HC algorithm. Figure 1 depicts the whole algorithm's flow.

Figure 1

Overall algorithm flow chart.

Figure 1

Overall algorithm flow chart.

Close modal

The following are the procedures for partitioning the water supply network using the random forest-hierarchical clustering (RF-HC) method.

  • (1)

    Partitioning preliminary. To analyze the original data, HC is employed, and the nodes with the least distance between classes are computed to accomplish clustering of nodes with the smallest difference in traffic between each other, resulting in the theoretically ideal partition state.

  • (2)

    Extraction of features. The major goal of this article is to use variation in flow data as the foundation for zoning calculations instead of flow data. The article provides a new parameter, , to capture the degree of pressure change over the day as well as the time-domain properties of the pressure data for this purpose. Simultaneously, to ensure the partitioning's practicability, the location parameter is incorporated as a constraint, i.e. location parameter 1 is given a weight in the optimized hierarchical clustering process to magnify its importance in the clustering process.

  • (3)

    Partitioning schemes were created using RF-HC. The random forest technique is utilized to filter the time-domain features, with the theoretically best partition of each node serving as its node label. The filtered features are then time-domain random forested again to establish the significance parameters. The importance parameters are used as an optimized HC strategy to run clustering computations and give an optimized partitioning scheme.

The flow data used in the research was acquired by pressure and flow sensors installed in a city's water supply network monitoring system.

The topological diagram of the 17 monitoring nodes involved in this paper is shown in Figure 2.

Figure 2

Topological structure diagram of monitoring node.

Figure 2

Topological structure diagram of monitoring node.

Close modal

Partition of water supply network based on hierarchical clustering algorithm

The HC method creates a hierarchical clustering tree by assessing the similarity between data points belonging to distinct groups (Bishop and Tipping 1998; Tong 2006). Cluster trees are often created in two modes: top-down split and bottom-up merging. The former is uncommon in our cluster analysis.

The method is based on the actual data gathered by the urban water supply network information monitoring system. The flow data of the 17 nodes in the water supply network within a week has been gathered and clustered in this paper.

The clustering process in this article is as follows.

  • (1)

    Treat the flow data set of a single node as a cluster, and generate the initial 17 clusters.

  • (2)

    Calculate the various clusters, the distance matrix between two different clusters, and calculate the clusters that are closest to each other.

  • (3)

    Merge and to form a new cluster.

  • (4)

    Repeat step 2 and step 3 until a cluster containing all nodes is formed, and a clustering tree is generated at the same time.

The gathered flow data of 17 nodes in a week was used as the input of the HC method. The node data collection interval is one hour, so we got a 168*17 data set. The Ward's distance was used as the clustering algorithm for calculating the distance between clusters. For two different classes ():
(1)
(2)
(3)

In the formula, and are respectively the sum of squared deviations of () and the union of (). in the formula is the mean value of and the union of .

Calculate the distance between clusters as:
(4)

The basis is that when the clustering effect is good enough, there must be a small value in the cluster, and the distance within the cluster is small enough. At the same time, the distance among the clusters must be large enough, that is, the size of is big enough. In this way, if the calculated value is large enough, a theoretically optimal global partitioning method based on flow data can be obtained.

The premise is that when the clustering effect is strong enough, there must be a small value in the cluster and a small distance inside the cluster. At the same time, the distance between the clusters must be great enough, that is, must be large enough. If the estimated number is big enough, a theoretically optimum global partitioning technique based on flow data can be derived in this manner.

Reduces the dimension of coordinate information by principal component analysis

To solve the problem of the overlap of district, two parameters were given to each node. Which describes the coordinate information of each node on the map, and define a new array . To facilitate the processing of HC, the principal component analysis (PCA) method (Li et al. 1999) was used to reduce the two-dimensional location data to one-dimensional and generate a new location label .

In this paper, the PCA algorithm was implemented based on the eigenvalue decomposition covariance matrix. The main process is as follows:

  • (1)
    Remove the mean:
    (5)

is the newly generated de-averaged coordinate information; is the original coordinate information; is the mean value of . Than get a new matrix .

  • (2)
    Calculate the covariance matrix:
    (6)

n is the number of samples, which means the number of nodes, and C is a symmetric matrix.

  • (3)
    Calculate the covariance eigenvalues and eigenvectors:
    (7)

Q is a matrix composed of eigenvectors of matrix C, and Σ is a matrix whose diagonal elements are eigenvalues.

  • (4)

    Sort the eigenvalues in descending order. Select the larger eigenvalue and the corresponding eigenvector, and defined the corresponding eigenvector as .

Time-domain features and the change degree of the daily flow data

Extract the features of the gathered flow data. Because the flow data was given in discrete time series, the following time-domain properties should be extracted first: mean, standard deviation, variance, effective value, peak value, crest factor, and kurtosis. Table 1 shows the feature parameters for each node.

Table 1

Time-domain features of flow data

Node2Node3Node4Node18
Mean 0.876 12.988 215.276  504.069 
Standard deviation 13.728 9.062 73.922  129.773 
Variance 188.444 82.123 5,464.422  16,840.994 
Root mean square 13.727 15.826 227.564  520.438 
Peak value 98.474 36.381 379.623  636.057 
Crest factor 7.174 2.299 1.668  1.222 
Kurtosis 5.722 2.519 2.771  2.459 
Form factor 1.602 1.219 1.057  1.032 
Node2Node3Node4Node18
Mean 0.876 12.988 215.276  504.069 
Standard deviation 13.728 9.062 73.922  129.773 
Variance 188.444 82.123 5,464.422  16,840.994 
Root mean square 13.727 15.826 227.564  520.438 
Peak value 98.474 36.381 379.623  636.057 
Crest factor 7.174 2.299 1.668  1.222 
Kurtosis 5.722 2.519 2.771  2.459 
Form factor 1.602 1.219 1.057  1.032 

A new data set can be formed by the integrated of mean, standard deviation, variance, root mean square, peak value, crest factor, kurtosis, and form factor.

Simultaneously, the single-day change degree of the flow data regarding water has been proposed, taking into consideration the problem of property in the everyday life of the selected district. It reflects the amount of variation in the daily flow of water. Meanwhile, defined r as the threshold value of the ratio of flow rate data change one hour after a specific point to the flow rate at that point. It is assumed that the starting value of is 0. The calculating procedure for for each node's flow data is as follows.
(8)
Prior to this, some nodes had ‘0’ traffic data when, or they had flow data that was excessively big. Because this condition occurs in real life, this data is saved, and the flow data is normalized.
(9)

Here, is the original flow data, is the mean value of the raw flow data, is the standard deviation, and is the normalized flow data.

Some of , when r = 0.2 are shown in Table 2.

Table 2

of each node in a week

Node2Node3Node4Node5Node17Node18
Day1 −9 −8 −6 −2  −8 −5 
Day2 −5 −5 −4 −1  −9 −5 
        
Day7 −6 −4 −0 −5  −17 −1 
Node2Node3Node4Node5Node17Node18
Day1 −9 −8 −6 −2  −8 −5 
Day2 −5 −5 −4 −1  −9 −5 
        
Day7 −6 −4 −0 −5  −17 −1 

The size of the threshold r has a distinct influence on the partition outcome. The value of r directly influences the value of , which represents the degree of flow change and hence influences the ultimate clustering outcome.

Random forest algorithm to screen time-domain features

Because the Ward's distance algorithm handled all characteristics equally. This would imply that the distance between two clusters might be magnified. To increase clustering reliability, the random forest technique (Ho 1995, 1998; Fang et al. 2011) was employed to filter time-domain characteristics. The less significant features would be deleted, and various weights would be assigned to different features. The and features, on the other hand, will not be checked; only the time-domain features have been examined. Because they were both more essential than the time-domain elements for the outcome.

Prior to calculation, the above data should be standardized:
(10)

In the formula, is the time-domain feature parameters of each node, is the original mean value of each parameter of the node, is the original standard deviation, and is the normalized parameter.

The classification label of each node is based on the previous global optimal partition (), and the district is marked with 0, 1, 2, and 3 in turn.

The construction process of random forest in this article:

  • (1)

    From the training set composed of 12 nodes, there are 12 samples that are replaced to form a new training subset .

  • (2)

    For , three features are extracted from the eight-times domain features without replacement, and the selected features are used as the split features of the nodes on the decision tree.

  • (3)

    Repeat n times of step 1 and step 2 to form n subsets , and form n decision trees, and combine these decision trees to form a random forest.

  • (4)

    Input the remaining 5 node data as the test set into the random forest, repeat the classification 5 times, and calculate the importance of each feature.

In this paper, the Gini index is used as the basis for splitting node in step 2. The calculation formula here is:
(11)

The parameter is the probability that the sample point belongs to the four categories 0,1,2,3.

The results obtained are shown in Table 3.

Table 3

Importance rate of time-domain features of flow data

ParameterImportance rate
Peak value 0.183467 
variance 0.180575 
Mean 0.146953 
Form factor 0.123266 
Crest factor 0.103743 
Root mean square 0.093229 
Kurtosis 0.084399 
Standard deviation 0.084368 
ParameterImportance rate
Peak value 0.183467 
variance 0.180575 
Mean 0.146953 
Form factor 0.123266 
Crest factor 0.103743 
Root mean square 0.093229 
Kurtosis 0.084399 
Standard deviation 0.084368 

Parameters with a significance rate of less than 0.1 are eliminated. Peak value, variance, mean, form factor, and crest factor are more important in flow data grouping.

Further, Random Forest processing has been used again for peak value, variance, mean value, form factor, and crest factor. The weights of the time-domain characteristics were used to determine the new significance rate of peak value, variance, mean, form factor, and crest factor.

Partition results based on hierarchical clustering

The following is the outcome of a computation concerning the similarity of flow data collected by various nodes.

The flow data collected by the 17 nodes was analyzed using the HC technique. Figure 3 depicts the clustering findings achieved.

Figure 3

Clustering dendrogram based on flow data.

Figure 3

Clustering dendrogram based on flow data.

Close modal

Based on the clustering findings, it is possible to deduce that the city's water supply region, which consists of 17 monitoring nodes, may be separated into four districts: . When the distance is less than 3,200, the nodes in each district are as given in Table 4.

Table 4

Hierarchical clustering and partitioning results based on flow data

DistrictNode
 Node2 Node13 Node3 Node14 Node12 Node11 Node7 
 Node15 Node17 
 Node4 Node8 Node9 Node5 Node16 
 Node6 Node10 Node18 
DistrictNode
 Node2 Node13 Node3 Node14 Node12 Node11 Node7 
 Node15 Node17 
 Node4 Node8 Node9 Node5 Node16 
 Node6 Node10 Node18 

The flow curve diagram of node in each district is shown in Figure 4(a)–4(d).

Figure 4

Flow curve diagram of each district (a) Flow curve diagram of district (b) Flow curve diagram of district (c) Flow curve diagram of district (d) Flow curve diagram of district.

Figure 4

Flow curve diagram of each district (a) Flow curve diagram of district (b) Flow curve diagram of district (c) Flow curve diagram of district (d) Flow curve diagram of district.

Close modal

Analyzing Figure 4, we can observe that the nodes in Figure 4(c) and 4(d) have relatively comparable alterations, although there are some inaccuracies in Figure (a) (b). To some extent, the partition approach based on flow data has achieved the aim of global optimization, however there are several nodes spread throughout a large geographic range in the same district.

Partition results of the RF-HC method

The time-domain features of each node (peak value, variance, mean, form factor, crest factor), , have been integrated, and defined as . Obviously, C is a 16*17 data set, . HC was used again, where the importance rate was used as the weight of the time-domain features to calculate the distance between classes. And the other parameters were given a weight value of 1 to keep it unchanged.
(12)
where represents the weight parameter; defined the weight matrix as then the new distance could be calculated as:
(13)
(14)
(15)
(16)

When r = 0.1; r = 0.2; when r = 0.5, the weighted Ward's method was used in the clustering method. The result is shown in Figure 5(a)–5(c).

Figure 5

Clustering results of weighted Ward's method (a) (b) (c) .

Figure 5

Clustering results of weighted Ward's method (a) (b) (c) .

Close modal

The figure shows the nodes of a district where four branch nodes appear, when the distance is 4.5 with r takes different sizes. And the result is shown in Table 5.

Table 5

The partition node where the branch node is located when r takes different sizes

r = 0.1r = 0.2r = 0.5
District with Node3 Node2 Node3 Node4 Node10 Node13 Node2 Node3 Node6 Node8 Node13 Node3 Node6 
District with Node14 Node5 Node11 Node14 Node5 Node11 Node14 Node5 Node11 Node14 
District with Node15 Node12 Node15 Node16 Node12 Node15 Node16 Node7 Node8 Node15 Node9 Node17 Node18 
District with Node1 Node15 Node12 Node16 Node15 Node12 Node16 Node4 Node10 Node12 Node13 Node16 
r = 0.1r = 0.2r = 0.5
District with Node3 Node2 Node3 Node4 Node10 Node13 Node2 Node3 Node6 Node8 Node13 Node3 Node6 
District with Node14 Node5 Node11 Node14 Node5 Node11 Node14 Node5 Node11 Node14 
District with Node15 Node12 Node15 Node16 Node12 Node15 Node16 Node7 Node8 Node15 Node9 Node17 Node18 
District with Node1 Node15 Node12 Node16 Node15 Node12 Node16 Node4 Node10 Node12 Node13 Node16 

Each branch node is clearly split into related sections, either by resemblance or by proximity. When r = 0.5, this partition and the partition in which node 10 is located have some overlap. The findings are satisfactory for r = 0.1 and r = 0.2 for the partitions containing nodes 14, 15, and 16, and there are no issues with overlapping pipe networks. However, with r = 0.1, node 2 causes partition (including nodes 6, 8, 15, and 16) to have a pipe network overlap, indicating that the zoning is unclear. It is obvious that using when the partitioning impact is better, take r = 0.2 here.

After adjusting for the effect of the difference in, the new dataset was determined to be just 13*17. In comparison, the same dataset was partitioned using a Pearson correlation analysis-based partitioning algorithm for any two nodes with their correlation coefficients .
(17)
whereis the sample mean of .

The calculated correlation matrix of size 17*17 was converted to a grey-scale graphical representation, as shown in Figure 6 as a grey-scale schematic of the correlation coefficients for nodes 2–18.

Figure 6

Grayscale diagram of Pearson correlation coefficients for each node.

Figure 6

Grayscale diagram of Pearson correlation coefficients for each node.

Close modal

When is between 0.8 and 1, the nodes are strongly positively correlated; when is between 0.5 and 0.8, the nodes are moderately favorably correlated. To fulfill real-world control needs, only positive correlation is employed as the foundation for partitioning here, and the two nodes with the highest values between them are joined to establish a new partition. Because of the phenomena of overlapping pipe networks across partitions, the partitioning of these 17 nodes is accomplished by removing them based on the connection of the pipe network. Table 6 shows the state of each partition node before and after subjective rejection.

Table 6

Partitioning results based on Pearson correlation analysis

PartitionEach partition node before eliminationEach partition node after elimination
 Node6 Node16 Node12 Node15 Node6 Node16 Node12 Node15 Node2 
 Node8 Node9 Node11 Node18 Node4 Node9 Node18 Node14 Node5 Node11 Node8 
 Node2 Node3 Node5 Node13 Node13 Node3 Node10 
 Node7 Node17 Node7 Node17 
 Node4 Node10 — 
PartitionEach partition node before eliminationEach partition node after elimination
 Node6 Node16 Node12 Node15 Node6 Node16 Node12 Node15 Node2 
 Node8 Node9 Node11 Node18 Node4 Node9 Node18 Node14 Node5 Node11 Node8 
 Node2 Node3 Node5 Node13 Node13 Node3 Node10 
 Node7 Node17 Node7 Node17 
 Node4 Node10 — 

As it can be observed, the correlation for node 8 is too low to fit into the other partitions, resulting in a substantial overlap of partitions ,. Figure 7 shows the flow curve diagrams of nodes in each district () when the distance is around 5 when r = 0.2:

Figure 7

New flow curve diagram of each partition (a) Flow curve diagram of district (b) Flow curve diagram of district (c) Flow curve diagram of district (d) Flow curve diagram of district (e) Flow curve diagram of district (f) Flow curve diagram of district (g) Flow curve diagram of district (h) Flow curve diagram of district .

Figure 7

New flow curve diagram of each partition (a) Flow curve diagram of district (b) Flow curve diagram of district (c) Flow curve diagram of district (d) Flow curve diagram of district (e) Flow curve diagram of district (f) Flow curve diagram of district (g) Flow curve diagram of district (h) Flow curve diagram of district .

Close modal

Figure 7(a) and 7(b) shows that, when compared to the correlation analysis-based technique, Node 2, although being physically adjacent to Nodes 6, 12, and 15, obviously has a distinct traffic pattern than the other nodes, where RH-partitioning HC's results are clearly superior. The two approaches provide substantially the same partitioning results in Figure 7(c)–7(e). Analysis of the data from node 11, where the flow fluctuation is not identical to that of many other nodes, indicates that there may be an issue with the data collection equipment, resulting in what seems to be a lag in the data from node 11. Both approaches divide nodes 7 and 17 into the same partition. Nodes 2, 6, and 8 have similar trends in Figure 7(g), and nodes 3 and 13 have similar trends to the other nodes in most of their trends, for example: from 22 to 30, there is a rising and then falling trend in this partition, and from 33 to 45, there is also a falling and then rising trend, but nodes 3 and 13 can be separated from partition A 23. Of course, nodes 3 and 13 can be split from partition A 23. Node 10 in Figure 7(h) has most of the temporal similarities with 3 and 13, but it cannot be separated again.

Finally, nodes 12, 9, 5, 7, and 6 on the main pipe of water supply network were chosen as regulating nodes for the partition. They can aid in real pressure regulation.

It is obvious that with limited data samples, the partitioning method based on RF-HC gives more accurate results than the method based on Pearson correlation. It also decreases model complexity by avoiding the need to re-plan partitioning due to overlapping pipe networks.

In comparison to the original HC approach based on flow data and the partitioning findings based on Pearson correlation analysis, the RF-HC approach gives better partitioning results and the model is less complex. It ensures the similarity of the flow variation pattern of the nodes in the partition. Also, it can effectively help the actual regulating treatment of the problem of inconsistent flow and pressure variation in the demand about each node. Surely, the value for the degree of change in the flow on a single day, as well as the judgement threshold r, may be optimized further.

It should be noted that this paper proposes a water supply network partitioning strategy with a small amount of data, which is convenient and accurate. However, the paper has a limitation in that it is not as efficient in solving for very large pipe network models as the new machine learning technique.

We would like to give thanks for the funding of the project: Demonstration project of water supply safety guarantee and optimized operation of Chaohu pipe network, and the teachers and students at Anhui Jianzhu University, who helped to completed this project.

Data cannot be made publicly available; readers should contact the corresponding author for details.

Bin
L.
,
Guo-li
Z.
,
Jin-xu
N.
&
Hao
J.
2018
An overview of optimized DMA zonal methods in water distribution system
.
Journal of Guangdong University of Technology
35
(
2
),
19
27
.
Bishop
C. M.
&
Tipping
M. E.
1998
A hierarchical latent variable model for data visualisation
.
IEEE TPAMI
20
(
3
),
281
293
.
Diao
K.
,
Zhou
Y.
&
Rauch
W.
2013
Automated creation of district metered area boundaries in water distribution systems
.
Journal of Water Resources Planning &Management
139
(
2
),
184
190
.
Fang
K.
,
Wu
J.
,
Zhu
J.
&
Xie
B.
2011
A review of technologies on random forests
.
Journal of Statistics and Information
26
(
03
),
32
38
.
Ho
T. K.
1995
Random decision forest
. In:
Proceedings of the 3rd International Conference on Document Analysis and Recognition
,
Montreal, QC, Canada
, Vol.
8
, pp.
278
282
.
Ho
T. K.
1998
The random subspace method for constructing decision forests
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
20
(
8
),
832
844
.
Jun
L.
&
Guoping
Y.
2012
Integration of graph theory and GIS for district metered areas of water distribution systems
.
Journal of Tongji University(Natural Science)
40
(
12
),
1863
1869
.
Li
Y.
,
Zeng
Z.
,
Zhang
M.
&
Yu
S.
1999
Application of primary component analysis in the methods of comprehensive evaluation for many indexes
.
Journal of Hebei University of Technology
01
,
96
99
.
Paola
F. D.
,
Fontana
N.
,
Galdiero
E.
,
Giugni
M.
,
Uberti
G. S. D.
&
Vitaletti
M.
2014
Optimal design of district metered areas in water distribution networks
.
Procedia Engineering
70
(
70
),
449
457
.
Tong
Z.
2006
Research of clustering algorithm based on hierarchical and partitioning method
.
Computer Engineering and Applications
08
,
178
180
.
Wenqi
Y.
,
Xiaoming
Z.
,
Zhaokai
X.
&
Qing
Y.
2012
Research on regional water distribution network planning with district meter area method
.
Water & Wastewater Engineering
48
(
07
),
98
102
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).