ABSTRACT
This study introduces an innovative diagnostic approach for identifying gate-valve failures in water distribution systems. By implementing high-frequency pressure sensors upstream and downstream of the gate valves, we obtained detailed pressure data that are pivotal for fault diagnosis. We explored three distinct machine-learning algorithms and two data-handling techniques to ensure optimal performance in real-world applications. In our methodology, supervised learning algorithms are used to analyze pressure differentials and predict valve behavior. We rigorously tested these algorithms using both raw and feature-engineered data, and the results indicated the effectiveness of the Gaussian-naïve Bayes model with six extracted features. This approach enhances the precision and reliability of diagnostics in water distribution networks.
HIGHLIGHTS
An innovative diagnostic approach is introduced to identify gate-valve failures in water distribution systems.
Three distinct machine-learning algorithms and two data-handling techniques were explored.
Supervised learning algorithms were used to analyze pressure differentials and predict valve behavior.
The selected approach enhances the precision and reliability of diagnostics in water distribution networks.
INTRODUCTION
In the dynamic environment of water distribution networks, gate valves are key components that efficiently maintain the fluid dynamics within district-metered areas (DMAs) and across the main water transmission and distribution pipes. Their design minimizes resistance and pressure loss. However, these valves are not impervious to failures such as seat leakage, packing wear, and corrosion that accompany age and use. These malfunctions present operational challenges – notably when they manifest subtly or in less-accessible network areas. Hence, timely detection is required (Dombor 2023; FCValves 2023).
To mitigate such failures, different methods have been proposed from the design phase of valves to ensure optimal design. Soboleva et al. (2023a) employed the finite element method (FEM) to design butterfly check valves (BCV) considering the temperature. Soboleva et al. (2023b) utilized FEM to enhance the reliability and safety of the pipeline systems by creating an improved model range of the BCV. Zong et al. (2022) performed design optimization for explosion-proof safety valves using 3D steady-state simulation, while Yang et al. (2023) applied the non-dominated sorting genetic algorithm II (NSGA-II) based on the computational fluid dynamics (CFD) results for the multi-objective optimization of the multi-way spool valves. This study introduces a novel approach for identifying gate-valve failures in water distribution systems that involve the strategic placement of high-frequency pressure sensors both upstream and downstream of gate valves, thereby capturing detailed pressure data. These data are crucial for our machine-learning (ML) algorithms, which rely on supervised learning to conduct an accurate analysis of the pressure differentials. Such an in-depth examination allows our models to predict valve behavior and effectively identify the potential malfunctions before they become critical issues.
To determine the most effective algorithm and data processing method that meet our objectives, we explored three distinct ML algorithms and two data-handling techniques. This comprehensive evaluation was crucial to ensuring that our approach is not only theoretically sound but also practically viable in real-world applications. The algorithms were tested using raw and feature-engineered data, which allowed us to refine our models to achieve optimal performance in fault diagnosis.
METHODS
Design and fabrication of device for diagnosing gate-valve failures
In water distribution networks, hydraulic components such as valves and pumps generate pressure differentials before and after the component base on their hydraulic characteristics. The opening ratio is a key factor in determining the hydraulic characteristics of a valve. The flow characteristics inside the valve change based on the degree of opening, thereby resulting in a varying degree of the pressure drop. The pressure-drop characteristics of a valve according to the opening ratio can be represented by valve coefficients such as and , while the changes in these characteristics in terms of the opening ratio are compiled into a valve characteristic curve provided by the valve manufacturers.
In this study, we aimed to diagnose the operational status of the gate valves by utilizing the characteristic pressure drop based on the opening ratio of the valve. To diagnose the gate-valve failure, we measured the pressures before and after the gate valve and evaluated the difference. This diagnosis was based on whether the observed pressure differential was consistent with the normal pressure differential for a given gate-valve opening ratio.
ML algorithms for fault diagnosis
In this study, we addressed the problem of detecting gate-valve failures by using binary classifiers to distinguish between normal conditions and leakages. Various algorithms exist for classifiers, for example, random forest, XGBoost, and Support Vector Machine (SVM); however, applying numerous models to numerous datasets presents practical constraints. Hence, we initially attempted to identify suitable classifiers for our collected data using PyCaret (Ali 2020), a low-code ML library in Python. PyCaret automates ML workflows, thereby enabling us to rapidly and efficiently experiment using different ML models and determine the one that best fits the characteristics of our data. We will explore suitable models among the 14 models provided by PyCaret.
Feature engineering
Conventional ML models were not developed with time-series data in mind. Hence, the use of raw data as input may be considered. However, they have a large feature length, resulting in high computational costs. If specific time steps in the raw data are not considered, this can result in severe noise and weak signals. Therefore, feature creation, which quantifies the characteristics of the raw time-series data into numerical values, can be useful for ML classification. It is beneficial for enhancing computational efficiency and reducing the risk of overfitting.
In this study, we tackle the problem of gate-valve failure diagnosis using six basic features provided by cesium, an open-source library that enables feature extraction from raw time-series data (Naul et al. 2016), each contributing uniquely to the diagnostic process.
The amplitude, calculated as half the difference between the observed maximum and minimum values, served as a major indicator of the variability range of the data. A higher amplitude suggests greater fluctuations, which could indicate anomalies or considerable changes in the pressure data.
The percentage of values exceeding one standard deviation from the weighted mean quantified the degree of deviation within the dataset. This metric highlights the proportion of values considerably different from the average and thus reveals potential outliers or extreme changes.
The percentage of values near the median provides insight into the central tendency and distribution of the data, which is determined within a specific range around the median value.
Skewness, as a measure of data asymmetry, helps understand how much the distribution deviates from a normal (Gaussian) distribution. The closer a skewness value is to zero, the more symmetric would be the data and the closer would the distribution be to a normal distribution; this measure is thus important in avoiding bias toward high or low values.
The weighted mean considering the measurement errors facilitates a nuanced assessment of the average pressure values. By weighting each data point according to its error range, this feature offers a more accurate and representative mean, which is essential for precise fault detection.
Finally, the maximum slope calculates the highest rate of the change in the data, playing a crucial role in detecting rapid or considerable changes over time. This feature is particularly useful in identifying sudden pressure changes, which often serve as primary indicators of system faults or anomalies.
By combining the aforementioned features to analyze the pressure data, we form an overarching framework to effectively identify and classify the gate-valve failures within water distribution systems. The combined application of these features enhances the precision and reliability of the diagnostic model, thus ensuring accurate fault detection and timely intervention.
EXPERIMENTAL SETUP
The proposed fault diagnosis algorithm for the gate valves was tested on a pilot-scale water distribution system at the Korea Water Cluster in Daegu, South Korea. The system included two reservoirs, 20 junctions, and 20 pipelines totaling 1,428.38 m in length and consisting of four different pipe materials: polyvinyl chloride (PVC), ductile iron pipe (DCIP), polyethylene (PE), and steel pipe (SP). Table 1 presents the key specifications of the pilot-scale water distribution network.
Pipe ID . | Node1 . | Node2 . | Diameter (mm) . | Length (m) . | Material . |
---|---|---|---|---|---|
P1 | RUpstream | D | 300 | 64.5 | DCIP |
P2 | D | N1 | 300 | 15.94 | DCIP |
P3 | N1 | N2 | 300 | 70.05 | PVC |
P4 | N2 | N3 | 300 | 43.33 | PE |
P5 | N3 | N4 | 300 | 218.59 | PVC |
P6 | N4 | C | 300 | 134.26 | PVC |
P7 | C | B | 300 | 135.08 | PVC |
P8 | B | N5 | 300 | 68.73 | PVC |
P9 | N5 | A | 300 | 21.09 | PE |
P10 | A | V1 | 300 | 29.45 | PE |
P11 | V1 | N6 | 300 | 22.32 | DCIP |
P12 | N4 | N6 | 300 | 261.34 | PVC |
P13 | N6 | N7 | 300 | 27.98 | DCIP |
P14 | N7 | N8 | 300 | 50.36 | PVC |
P15 | N8 | N9 | 300 | 50.25 | SP |
P16 | N9 | E | 300 | 77.24 | PVC |
P17 | E | N10 | 300 | 34.11 | PVC |
P18 | N10 | N11 | 300 | 26.77 | PE |
P19 | N11 | N12 | 300 | 70 | SP |
P20 | N12 | RDownstream | 300 | 6.99 | PVC |
Pipe ID . | Node1 . | Node2 . | Diameter (mm) . | Length (m) . | Material . |
---|---|---|---|---|---|
P1 | RUpstream | D | 300 | 64.5 | DCIP |
P2 | D | N1 | 300 | 15.94 | DCIP |
P3 | N1 | N2 | 300 | 70.05 | PVC |
P4 | N2 | N3 | 300 | 43.33 | PE |
P5 | N3 | N4 | 300 | 218.59 | PVC |
P6 | N4 | C | 300 | 134.26 | PVC |
P7 | C | B | 300 | 135.08 | PVC |
P8 | B | N5 | 300 | 68.73 | PVC |
P9 | N5 | A | 300 | 21.09 | PE |
P10 | A | V1 | 300 | 29.45 | PE |
P11 | V1 | N6 | 300 | 22.32 | DCIP |
P12 | N4 | N6 | 300 | 261.34 | PVC |
P13 | N6 | N7 | 300 | 27.98 | DCIP |
P14 | N7 | N8 | 300 | 50.36 | PVC |
P15 | N8 | N9 | 300 | 50.25 | SP |
P16 | N9 | E | 300 | 77.24 | PVC |
P17 | E | N10 | 300 | 34.11 | PVC |
P18 | N10 | N11 | 300 | 26.77 | PE |
P19 | N11 | N12 | 300 | 70 | SP |
P20 | N12 | RDownstream | 300 | 6.99 | PVC |
A data acquisition system using the NI-9253 module from National Instruments Inc. was set up to capture the pressure and flow data at a sampling rate of 1.000 Hz. A program was developed using LabVIEW to configure and operate the data acquisition system.
RESULTS AND DISCUSSION
Simulation of gate-valve leakage conditions
In this study, the experiments were conducted with the flow rate maintained between 769.4 and 773.9 m3/h, while the water pressure was kept at an average of 25.78 m of water column (mH2O) using pumps to transfer water from the upstream reservoir (RUpstream) to the downstream reservoir (RDownstream).
Table 2 presents the average values of the pressure differences across the gate valve based on the valve lift and leakage conditions. There were four failure scenarios: normal, leakage upstream of the valve (L1), leakage downstream of the valve (L2), and simultaneous leakage at both ends of the valve (L1 and L2). Twelve experiments were conducted across the three-valve lift settings. As the lift ratio of the gate valve decreased, the average pressure difference between the upstream and downstream ends increased. Although there were differences in the pressure values between the normal and leakage conditions at the same lift ratio, they were within the error range of the sensors, thereby hindering the detection of leakage solely on the basis of the data.
Valve lift ratio (%) . | 47 . | 30 . | 10 . | |
---|---|---|---|---|
Mean pressure difference | Normal (mH2O) | 1.09 | 2.4 | 13.19 |
Only L1 Open (mH2O) | 1.08 | 2.39 | 13.08 | |
Only L2 Open (mH2O) | 1.1 | 2.36 | 12.9 | |
L1 & L2 Open (mH2O) | 1.09 | 2.38 | 13.17 |
Valve lift ratio (%) . | 47 . | 30 . | 10 . | |
---|---|---|---|---|
Mean pressure difference | Normal (mH2O) | 1.09 | 2.4 | 13.19 |
Only L1 Open (mH2O) | 1.08 | 2.39 | 13.08 | |
Only L2 Open (mH2O) | 1.1 | 2.36 | 12.9 | |
L1 & L2 Open (mH2O) | 1.09 | 2.38 | 13.17 |
Data preprocessing
Data frames from the ‘normal’ state and three types of ‘leakage’ data were concatenated to create a balanced dataset with a 1:1 ratio of ‘normal’ to ‘leakage,’ resulting in a data frame with approximately 1.200 rows. Given the inclusion of the three valve lift ratios and three leakage scenarios in the experimental setup, nine different datasets were generated to reflect these conditions.
For fault classification, two different data-handling approaches were employed, resulting in a total of 18 datasets. The first approach utilizes the acquired data in its raw time-series form. This method relies on learning from the temporal information of each column and comparing characteristics of the given time series for diagnosing faults in normal or failure states. The second approach involves extracting different features from the raw data and using these derived data characteristics from segments for fault diagnosis without directly employing the time series.
Table 3 presents the five best models from PyCaret analysis on raw and selected feature data for 30% valve lift with combined leakage (L1 and L2). Out of 18 datasets, only 2 were selected – these selected datasets represent ‘combined leakage (L1 and L2)’ data at ‘30% valve lift.’ We quickly evaluated 14 models using 70% (n = 774) of the dataset as the training set and the remaining 30% (n = 332) for testing. The performance of the models with the highest F1 scores was assessed using stratified 10-fold cross-validation. Upon examining the top five models from each dataset for both raw data (R) and selected features (F), naive Bayes and random forest were determined to be the most fitting models. Additionally, logistic regression, which is considered a standard approach for binary classification, is known to consistently exhibit a higher accuracy than random forest when the variance of the explanatory variables and noise variables is increased. Hence, we included it in our selection of classifiers.
. | Rank . | Model . | F1 . | Accuracy . | AUC . | Recall . | Precision . |
---|---|---|---|---|---|---|---|
Raw data (R) | 1 | Naive Bayes | 0.6481 | 0.5982 | 0.6086 | 0.7391 | 0.5783 |
2 | Extra trees | 0.6069 | 0.645 | 0.7031 | 0.5476 | 0.6912 | |
3 | Decision tree | 0.6027 | 0.606 | 0.6062 | 0.5972 | 0.6120 | |
4 | Random forest | 0.6003 | 0.6371 | 0.6997 | 0.5451 | 0.6751 | |
5 | LightGBM | 0.5995 | 0.6268 | 0.6720 | 0.5606 | 0.6514 | |
Selected feature (F) | 1 | Random forest | 0.6711 | 0.6836 | 0.7340 | 0.6458 | 0.7046 |
2 | Extra trees | 0.6622 | 0.6745 | 0.7279 | 0.6405 | 0.6921 | |
3 | Ridge classifier | 0.6612 | 0.6293 | 0.0000 | 0.7260 | 0.6087 | |
4 | Gradient boosting | 0.6491 | 0.6551 | 0.7264 | 0.6356 | 0.6671 | |
5 | Logistic regression | 0.6481 | 0.6086 | 0.6752 | 0.7235 | 0.5881 |
. | Rank . | Model . | F1 . | Accuracy . | AUC . | Recall . | Precision . |
---|---|---|---|---|---|---|---|
Raw data (R) | 1 | Naive Bayes | 0.6481 | 0.5982 | 0.6086 | 0.7391 | 0.5783 |
2 | Extra trees | 0.6069 | 0.645 | 0.7031 | 0.5476 | 0.6912 | |
3 | Decision tree | 0.6027 | 0.606 | 0.6062 | 0.5972 | 0.6120 | |
4 | Random forest | 0.6003 | 0.6371 | 0.6997 | 0.5451 | 0.6751 | |
5 | LightGBM | 0.5995 | 0.6268 | 0.6720 | 0.5606 | 0.6514 | |
Selected feature (F) | 1 | Random forest | 0.6711 | 0.6836 | 0.7340 | 0.6458 | 0.7046 |
2 | Extra trees | 0.6622 | 0.6745 | 0.7279 | 0.6405 | 0.6921 | |
3 | Ridge classifier | 0.6612 | 0.6293 | 0.0000 | 0.7260 | 0.6087 | |
4 | Gradient boosting | 0.6491 | 0.6551 | 0.7264 | 0.6356 | 0.6671 | |
5 | Logistic regression | 0.6481 | 0.6086 | 0.6752 | 0.7235 | 0.5881 |
Application of ML for diagnosing gate-valve failures
Table 4 presents the results of the fault diagnosis classification using raw time-series pressure data and six selected features obtained from the raw data. The analysis was conducted under different valve lift ratios and leakage conditions, as follows: L1 for the leakage at the front of the valve, L2 for the leakage at the back of the valve, and L1 and L2 for the simultaneous leakage at both ends. The classification models utilized in this study – random forest, Gaussian-naïve Bayes, and logistic regression – were evaluated for their proficiency in detecting specific gate fault cases.
Valve lift ratio (%) . | Leakage condition . | Random forest . | Gaussian-naïve Bayes . | Logistic regression . | ||||
---|---|---|---|---|---|---|---|---|
Recall . | F1 score . | Recall . | F1 score . | Recall . | F1 score . | |||
47 | L1 | R | 0.48 | 0.49 | 0.46 | 0.48 | 0.46 | 0.47 |
F | 0.51 | 0.51 | 0.79 | 0.62 | 0.54 | 0.52 | ||
L2 | R | 0.51 | 0.51 | 0.79 | 0.62 | 0.54 | 0.52 | |
F | 0.52 | 0.51 | 0.7 | 0.6 | 0.56 | 0.55 | ||
L1 and L2 | R | 0.51 | 0.52 | 0.54 | 0.54 | 0.51 | 0.5 | |
F | 0.53 | 0.53 | 0.77 | 0.61 | 0.54 | 0.51 | ||
30 | L1 | R | 0.5 | 0.5 | 0.5 | 0.49 | 0.52 | 0.52 |
F | 0.54 | 0.52 | 0.68 | 0.59 | 0.57 | 0.54 | ||
L2 | R | 0.54 | 0.52 | 0.68 | 0.59 | 0.57 | 0.54 | |
F | 0.59 | 0.57 | 0.67 | 0.6 | 0.57 | 0.56 | ||
L1 and L2 | R | 0.68 | 0.68 | 0.51 | 0.52 | 0.48 | 0.49 | |
F | 0.54 | 0.53 | 0.67 | 0.59 | 0.6 | 0.57 | ||
10 | L1 | R | 0.6 | 0.62 | 0.43 | 0.49 | 0.51 | 0.5 |
F | 0.56 | 0.56 | 0.44 | 0.51 | 0.58 | 0.59 | ||
L2 | R | 0.69 | 0.7 | 0.37 | 0.42 | 0.49 | 0.5 | |
F | 0.6 | 0.6 | 0.79 | 0.66 | 0.54 | 0.54 | ||
L1 and L2 | R | 0.6 | 0.59 | 0.71 | 0.61 | 0.61 | 0.6 | |
F | 0.6 | 0.59 | 0.71 | 0.61 | 0.61 | 0.6 |
Valve lift ratio (%) . | Leakage condition . | Random forest . | Gaussian-naïve Bayes . | Logistic regression . | ||||
---|---|---|---|---|---|---|---|---|
Recall . | F1 score . | Recall . | F1 score . | Recall . | F1 score . | |||
47 | L1 | R | 0.48 | 0.49 | 0.46 | 0.48 | 0.46 | 0.47 |
F | 0.51 | 0.51 | 0.79 | 0.62 | 0.54 | 0.52 | ||
L2 | R | 0.51 | 0.51 | 0.79 | 0.62 | 0.54 | 0.52 | |
F | 0.52 | 0.51 | 0.7 | 0.6 | 0.56 | 0.55 | ||
L1 and L2 | R | 0.51 | 0.52 | 0.54 | 0.54 | 0.51 | 0.5 | |
F | 0.53 | 0.53 | 0.77 | 0.61 | 0.54 | 0.51 | ||
30 | L1 | R | 0.5 | 0.5 | 0.5 | 0.49 | 0.52 | 0.52 |
F | 0.54 | 0.52 | 0.68 | 0.59 | 0.57 | 0.54 | ||
L2 | R | 0.54 | 0.52 | 0.68 | 0.59 | 0.57 | 0.54 | |
F | 0.59 | 0.57 | 0.67 | 0.6 | 0.57 | 0.56 | ||
L1 and L2 | R | 0.68 | 0.68 | 0.51 | 0.52 | 0.48 | 0.49 | |
F | 0.54 | 0.53 | 0.67 | 0.59 | 0.6 | 0.57 | ||
10 | L1 | R | 0.6 | 0.62 | 0.43 | 0.49 | 0.51 | 0.5 |
F | 0.56 | 0.56 | 0.44 | 0.51 | 0.58 | 0.59 | ||
L2 | R | 0.69 | 0.7 | 0.37 | 0.42 | 0.49 | 0.5 | |
F | 0.6 | 0.6 | 0.79 | 0.66 | 0.54 | 0.54 | ||
L1 and L2 | R | 0.6 | 0.59 | 0.71 | 0.61 | 0.61 | 0.6 | |
F | 0.6 | 0.59 | 0.71 | 0.61 | 0.61 | 0.6 |
Considering the feature importance from the random forest and the assumptions of Gaussian-naïve Bayes – normality and independence of the data – we replaced the skewness feature with flux_percentile_ratio_mid50 from the six initially considered features. The flux_percentile_ratio_mid50 was used to measure the variability around the median of a distribution of the observed values. It indicates how widely spread the central portion of the data distribution is by comparing the range of the specific values with the overall variability range. It is calculated as follows: ((flux at (50 + x)th percentile) − (flux at (50 − x)th percentile))/(flux at 95th percentile − flux at 5th percentile), where x is half of the percentile range (50). This metric is particularly useful in the case of time-series data, as an increase in variability may indicate the occurrence of an event or change, providing a quantitative measure to capture this phenomenon. To explore suitable features, we analyzed the importance of 24 features, including the initial 6, using the cesium library based on the L2 data at a 47% valve lift ratio using random forest. The analysis revealed negligible differences in terms of the importance among the various features, indicating a uniform contribution to the predictions of the model. Subsequently, we examined the correlation between the features to ensure compliance with the normality and independent assumptions of the Gaussian naive Bayes model. High correlations were found among certain features, for instance, between those that were semantically similar (weighted_average and median) or between various flux percentage ratios, resulting in the exclusion of highly correlated features in favor of a selected set of six features. The selected features are visualized in Figure 6 using histograms. These selected features were visualized using histograms, and their adherence to a normal distribution was evaluated using the Shapiro–Wilk test, a statistical method to assess if a data sample comes from a normally distributed population. The test results showed that three out of the six features had statistics close to 1 and a p-value above 0.05, suggesting a high likelihood of a normal distribution. As the Gaussian naive Bayes model operates under the assumption that features follow a normal distribution, these results could have a positive impact on the performance of the model.
Under a valve lift ratio of 47%, the random forest model exhibited a recall of 0.48 and an F1 score of 0.49 for front leakage (L1), while both metrics were slightly improved to 0.51 in the cases of back leakage (L2) and combined leakage (L1 and L2). The Gaussian-naïve Bayes model exhibited considerable proficiency in identifying back leakage (L2), thereby achieving the highest recall (0.79, coupled with an F1 score of 0.62) among all the models and conditions. Conversely, logistic regression maintained a consistent performance, with the recall and F1 scores ranging from 0.46 to 0.54 across different leakage conditions.
When the valve lift ratio was reduced to 30%, the random forest model exhibited a notable improvement in detecting combined leakage (L1 and L2), with a recall of 0.68 and an F1 score of 0.68 – the highest scores for this model among all the conditions. At a valve lift ratio of 10%, the performance of the model peaked – particularly for back leakage (L2), with a recall of 0.69 and an F1 score of 0.70.
The feature selection process fine-tuned the model capabilities, as shown by the improved recall and F1 scores across all the models and conditions. At a valve lift ratio of 47%, the recall and F1 score of the random forest model for front leakage (L1) improved to 0.51, thereby matching its performance for back leakage (L2) and combined leakage (L1 and L2). Gaussian-naïve Bayes maintained its superior detection of back leakage, with a recall of 0.70 and an F1 score of 0.60. Logistic regression exhibited a balanced improvement – particularly for back leakage (L2), wherein the recall and F1 score increased to 0.56 and 0.55, respectively.
Further reductions in the valve lift ratio highlight the benefits of the feature selection. At a 30% valve lift ratio, the recall and F1 scores of the random forest model increased to 0.59 and 0.57, respectively, for back leakage (L2). Logistic regression achieved the best performance, with a recall of 0.60 and an F1 score of 0.57 for combined leakage (L1 and L2) at a similar valve lift ratio.
The impact of the feature selection was most pronounced at a 10% valve lift ratio, wherein the recall and F1 score of the random forest model reached 0.60 for the back leakage (L2) and combined leakage (L1 and L2). The Gaussian-naïve Bayes model achieved an exceptional recall for back leakage (L2) at 0.79, whereas its F1 score increased to 0.66, thereby indicating a considerable enhancement in the precision–recall balance.
In conclusion, this study establishes the key role of feature selection in enhancing the performance of classification models for fault diagnosis. By judiciously selecting the most relevant features, we observed substantial improvements in the model accuracy and generalization capabilities across various leakage scenarios. This approach effectively reduces data dimensionality, which not only prevents overfitting but also strengthens the applicability of the models in diverse industrial settings. Consequently, intelligent feature engineering has emerged as a critical step in the development of dependable fault diagnosis systems.
CONCLUSION
We propose a novel device and algorithm for diagnosing gate-valve failures. Our diagnostic method is based on comparing pressure readings from sensors placed around the gate valve to normal pressure differentials and identifying faults through deviations from the standard. We assembled an actual device and installed it within a pilot-scale water distribution network to gather data. As regards fault diagnosis, we applied three different AI algorithms and two data processing methods utilizing raw data and extracting features from the data. The results revealed that a Gaussian-naïve Bayes model using six extracted features was the most effective in terms of diagnosing failures.
In addressing the challenges of implementing our diagnostic methodology within actual water networks, such as space and electricity constraints, our ongoing research is focused on the application of microelectromechanical system (MEMS) sensors. These sensors offer the dual benefits of energy efficiency and precision, which are crucial for the successful integration into water metering stations. To circumvent issues related to data transmission, we investigated the viability of edge computing and on-device AI technologies. Such technologies show promise in streamlining communication requirements by processing data at or near the source, thus reducing overhead and enhancing responsiveness. The developed algorithms are promising for identifying malfunctions in hydraulic structures at aboveground facilities such as plants. Nevertheless, further development of these strategies will potentially bridge the gap toward their practical application in subterranean water distribution infrastructures.
Moreover, this study utilized high-frequency data of 1,000 Hz; however, the substantial data quantity entails significant and potentially unnecessary storage and operational costs. Therefore, additional research to establish the minimal data acquisition frequency necessary for effective algorithm application is deemed essential. This research will guide the practical deployment of our methodologies in subterranean water distribution infrastructures.
It is neither feasible nor necessary to equip every valve within a water distribution system with our diagnostic device. The methodology should primarily be applied to valves where failure could have severe repercussions as well as those located at operationally vulnerable points that consistently exhibit issues, even post-replacement. Implementing our method in such critical scenarios is expected to offer substantial operational advantages.
Our diagnostic methods are applicable to a broad spectrum of hydraulic structures, both subterranean and aboveground. Our ongoing research and adaptation of algorithms indicate a potential for enhancing the operational reliability of these systems. By continuously refining our approach to consider the diverse components within water distribution systems, we aim to transform the maintenance landscape and thus improve overall efficiency and reliability.
ACKNOWLEDGEMENT
This work was supported by Korea Planning & Evaluation Institute of Industrial Technology funded by the Ministry of the Interior and Safety (MOIS, Korea). [Project Name: Development of water quality platform to prevent tap/drinking water accidents/Project Number: 20025188]
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.