With the development of IoT monitoring equipment, an increasing number of monitoring indicators are employed to monitor the operational status of water pumps, thereby resulting in the challenge of data redundancy. This paper proposes an algorithm for predicting the health status of pumps that integrates multiple monitoring variables. Initially, the original dataset is classified using the maximum relevance minimum redundancy method. Next, principal component dimensionality reduction is used to reduce the dimensionality of the classified dataset. Finally, a long and short term memory neural network is employed to construct the association model between monitoring data and equipment health. The proposed algorithm takes into account the correlation between variables and the negative impacts of long-term dependence on the prediction results. It is capable of predicting abnormal working conditions, which has been experimentally verified in the Xiasha Pumping Station located in Hangzhou. The algorithm was compared with LR, SVM, and RNN algorithms, and it was found that the proposed algorithm achieved the highest prediction accuracy.

  • Sensors measuring vibration, magnetic flux, temperature, and sound loudness have been installed on a water pump to collect data on its operational status.

  • To address the high dimensionality of the monitoring data, the mRMR-principal component analysis method was employed to reduce the impact of irrelevant variables on the results.

  • Long short-term memory was used to establish associations between the health status and monitoring data.

With the increasing depth of control equipment applications in water plants, the water plant system has become more complex. As an essential equipment in water plants, the health condition of pumps has a direct impact on the operational efficiency of water plants and plays a crucial role in determining whether the water transmission and distribution system can operate smoothly (Bayindir & Cetinceviz 2011). Ensuring that the pumps have better quality, higher reliability, and greater availability has become an urgent issue to be addressed (Luna et al. 2019).

Factors such as excessive vibration, heightened current, and abnormal magnetic flux are the main indicators of equipment malfunction and damage to vulnerable parts (Bhuiyan et al. 2022; Zhao et al. 2022). Numerous researchers have demonstrated how internal and external electrical faults generate parasitic radial forces (rotating waves) and tangential forces (pulsating torques) within induction motors, leading to a deterioration of equipment health (Tsypkin 2017). As a result, various monitoring sensors, including self-powered vibration sensors, stator vibration, stator line bar vibration, air gap, and magnetic field intensity, have been employed to monitor the operational conditions of mechanical gear systems (Li et al. 2022).

In addition to the proliferation of monitoring device sensors, there has been rapid development in various methods for assessing device health (Elsheikh et al. 2019; Ma et al. 2019). These methods can be generally classified into physics-based (or model-based), data-driven, and hybrid approaches (Li & Li 2017). Physics-based approaches aim to establish explicit mathematical models based on the fundamental physical principles that govern the underlying mechanisms of mechanical systems or their components. Computer programs have been utilized for automated simulation, aiming to replicate the reasoning and decision-making processes of experts in the field (Qian et al. 2021). However, physics-based approaches heavily rely on expert domain knowledge of physical models, making them infeasible for complex systems (Dutta et al. 2022; Pan et al. 2022).

On the other hand, machine learning techniques strive to learn the nonlinear nature of mechanical systems by treating them as black-box models (Zhao et al. 2019b; Goyal et al. 2020). Consequently, obtaining proper relationships between the inputs and outputs of machinery systems requires high-quality (less noise), highly sampled, and large-volume time-series data. Unfortunately, most industrial time-series data exhibit significant noise, posing challenges for machine learning technology in the field of equipment health assessment (Wang et al. 2019). Especially concerning industrial time-series data, fitting machine learning models becomes an issue due to the presence of sensor noise, as the numerical rank of noisy data may be far larger than the dimension of the true dynamical features (Lee et al. 2018; Zhao et al. 2019a; Cheng et al. 2020).

In summary, most existing studies on equipment health assessment have primarily focused on the development of monitoring equipment or the analysis of specific monitoring indices. However, there is a noticeable lack of research considering multiple monitoring indicators, especially when dealing with high-dimensional data (Bolón-Canedo et al. 2016). With the advancements in Internet of thing (IoT) monitoring sensor technology, there is a growing trend toward shifting from single sensors to the fusion monitoring of multiple sensors for device health monitoring. This leads to the use of high-dimensional datasets in data-driven models. Furthermore, accurately determining the exact time of equipment failure poses an inherent challenge. Much of the existing research relies on data collected in controlled experimental environments, which often suffer from insufficient volume and quality. Although some unsupervised learning methods have been utilized for abnormal state monitoring, ensuring their robustness remains a challenging task (Dai et al. 2020).

Addressing the shortcomings and challenges in existing research, this article proposes a data-driven approach as the framework for an equipment health assessment algorithm. In the data collection phase, multiple sets of diverse monitoring sensors were installed in six pumps.

To overcome the difficulties arising from massive data features, data redundancy, and long-term dependencies in the original monitoring dataset, the algorithm integrates the max-relevance and min-redundancy (mRMR) method, principal component analysis (PCA), and long short-term memory (LSTM) neural network. In the data preprocessing stage, the mRMR method is utilized to address correlation and redundancy issues among the original monitoring data. Subsequently, the PCA method is employed to reduce the dimensionality of the categorical data, aiding in the screening of important features.

Leveraging accurate data on pump failure times, this study classifies the equipment's health state based on the severity of the failures, using the post-classification fault degree as the target for the model. Finally, the LSTM method was employed to construct an association model between IoT monitoring data and device health.

The key features of the algorithm proposed in this article are summarized as follows:

  • 1.

    Data preprocessing stage: The algorithm takes into account the complexity of variables in the original dataset and automatically categorizes the data. To eliminate redundant variables, the mRMR method, which considers nonlinear correlation between variables, is used.

  • 2.

    Training set generation stage: The algorithm marks the time before an abnormality occurs in the equipment as an abnormality. By learning data-driven models, the algorithm predicts the equipment health degree, rather than diagnosing in real time, which reserves sufficient time for equipment maintenance.

  • 3.

    Accurate classification labels: The presence of dedicated maintenance personnel facilitated the prompt and precise identification of abnormal operating conditions in the six pumps. The time of failure was further validated through video surveillance installed in the pump station.

The process of water pump health assessment is outlined as follows. First, the mRMR method is employed to partition the original data, followed by the implementation of the PCA method to reduce the dimensionality of the partitioned data. Second, the pump's maintenance records are utilized to annotate its health index. Finally, the preprocessed data are input into an LSTM neural network to establish the correlation between the feature set and health states. This approach effectively achieves the objective of evaluating the pump's health index.

Figure 1 depicts the flowchart for the water pump health assessment.

mRMR

The purpose of data categorization is to maximize the correlation between features and categorical variables, which involves selecting the top k variables with the highest correlation with the categorical variables. However, in feature selection, there is a possibility of high correlation between features, which may result in redundancy in the feature variables, rendering individual good features useless for enhancing the classifier's performance (Mao 2004).
Figure 1

The flowchart of water pump health assessment.

Figure 1

The flowchart of water pump health assessment.

Close modal

To address this issue, the mRMR method is employed, which calculates the correlation between feature subsets and categories based on the mean of the information gain of each feature and category. It uses the sum of mutual information between features and features and then divides it by the square of the number of features in the subset as the feature–feature redundancy. This approach considers not only the correlation between features and categorical variables but also the correlation between features and features.

Using mutual information as a metric, given two random variables x and y, and the probability density functions p(x), p(y), p(x, y), the mutual information I(x; y) is given as follows:
formula
(1)

Given the total features X of the original monitoring data, the mRMR algorithm aims to find the subset of features S containing of features. Via the incremental search method to find the near-optimal features, it starts by randomly selecting a feature as the feature set Sm-1 and find the mth feature from the remaining features X – Sm-1. The feature is selected to maximize Ø(.):

formula
(2)

Continuously subloop is formed, and when Ø(.) is lower than the set threshold, the feature search is stopped, a remaining feature is randomly selected to generate a new feature set, and all the features in the feature set X are finally categorized.

PCA

The main goal of the dimensionality reduction phase is to make the feature dimension smaller while minimizing information loss. For a sample matrix, the number of new features is reduced to less than the number of original features by replacing and reducing the features (Jollife & Cadima 2016).

The PCA method maps n-dimensional original features to dimension k (k < n) and calls these k-dimensional features the principal components. The k-dimensional orthogonal features are completely new features reconstructed, and the newly generated k-dimensional data contain as much information as possible from the original n-dimensional data. Given the features x and y, the mean value of the feature x, the variance after removing the mean S, and the covariance of the features x and y are calculated as follows:
formula
(3)
formula
(4)
formula
(5)

When there are multiple features, the covariance matrix is used to represent the correlation between multiple features. The PCA algorithm is constructed with the goal of linear independence between new features, i.e., the covariance between new features is 0. The essence is to make the covariance matrix of the new features a diagonal matrix, with the diagonal elements as the variance of the new features and the off-diagonal elements as 0, indicating that the covariance between the new features is 0 and the corresponding feature vectors are orthogonal.

Given a dataset with m samples and n features, i.e., sample data:
formula
(6)
where each row represents a sample, each column represents a feature, the column number indicates the dimension of the feature, the current dimension is n, and the target dimension for dimensionality reduction is k.
The decentralized matrix is expressed as follows:
formula
(7)
The covariance matrix of the decentralized matrix is expressed as follows:
formula
(8)
A feature decomposition of the covariance matrix C is carried out to obtain the eigenvalues of the covariance matrix and the corresponding eigenvectors . Arrange the eigenvectors in a descending order according to the corresponding eigenvalues to form a matrix. The first k columns are taken as matrix W. The final sample features are obtained by processing with the original matrix:
formula
(9)

LSTM neural network

LSTM neural network is a variant of recurrent neural network (RNN), which is a neural network designed to solve the long-term dependency problem of general RNN (Thakkar & Chaudhari 2021). All LSTMs have a chain-like form of repeating neural network modules, and each repeating module has four special structures that interact in a special way. As shown in Figure 2, represent the input gate, forget gate, output gate, state, memory cell, and candidate state, respectively (Hochreiter & Schmidhuber 1997).
Figure 2

LSTM loop cell structure (Lee et al. 2017).

Figure 2

LSTM loop cell structure (Lee et al. 2017).

Close modal
First, the discarded information from the loop cell structure is decided by the forget gate, which first reads the current input and the pre-neuron information and decides the discarded information, as follows:
formula
(10)
Determine the new information stored in the loop cell structure and create a new candidate value vector to be added to the state using the tan h layer as follows:
formula
(11)
formula
(12)
After that, to update the cell structure state, is updated to by multiplying the old state with , discarding some of the state, and determining a new candidate state as follows:
formula
(13)
Finally, the output is determined by using a tan h layer to process the cell structure state as follows:
formula
(14)
formula
(15)

Health status level classification

According to the historical monitoring data and the maintenance records of the equipment, the health status is divided into four different status levels, namely, no fault, fault in 30 min, minor fault, and downtime fault, and assigned values of [0,1,2,3], as shown in Table 1.

Table 1

Health status assignment method

AssignmentFailure status
Trouble free 
30 min before failure 
Minor failure 
Downtime failure 
AssignmentFailure status
Trouble free 
30 min before failure 
Minor failure 
Downtime failure 

Softmax integration in LSTM for classification

The LSTM model is commonly employed in the analysis of time-series data and is proficient in predicting continuous values for regression tasks. By associating specific numerical values with the health condition of the devices, we are able to effectively categorize them into distinct classes based on the predefined threshold or criteria. This transformation allows us to leverage classification techniques and tools to analyze and predict the health status of devices, enabling more accurate and efficient assessment compared to traditional evaluation methods. To facilitate this transformation from the conventional LSTM model to a classification problem, we introduced an additional fully connected layer tailored to accommodate the number of categories, followed by the utilization of the Softmax function. The number of neurons in this fully connected layer should be equal to the number of classification categories.

Following the fully connected layer, the Softmax function is employed. The Softmax function transforms the output into a vector that represents a probability distribution, ensuring that the probabilities of all categories sum up to 1. The function is shown as follows:

For an input vector (or logits) , where represents the value of the th element, the Softmax function can be expressed as follows:
formula
(16)

Through this computation, the Softmax function can transform the original values into numerical values representing relative probabilities, where larger values correspond to higher probabilities.

Consequently, through the application of the Softmax function to the output of an LSTM model, a probability distribution that represents the various classes can be obtained. The class with the highest probability can then be selected, enabling effective classification.

During model training, the loss is calculated using labeled data, and backpropagation is employed to update the model parameters accordingly. This iterative process aims to optimize the model and improve its classification performance.

By incorporating this modification, the LSTM model can be effectively repurposed to address classification challenges, allowing for the generation of probability distributions for different classes. Consequently, the class with the highest assigned probability can be selected to determine the category of interest.

The six groups of boosting pumps in the Xiasha Pumping Station in Hangzhou are used as the research object, as shown in Figure 3. Table 2 lists the performance parameters. The performance parameters of the six pumps exhibit slight variations. Given the capability of neural networks to automatically extract relevant features from raw data, the reliance on hand-crafted features is reduced. This inherent capacity to extract meaningful representations and hierarchical features can effectively enhance both performance and generalization. To accommodate the differences between the pumps, we partitioned the data based on their serial numbers and amalgamated them into a unified dataset for model training.
Table 2

Performance parameters of the pump

IDPump speed (RPM)Rated power (kW)Rated flow (m3/h)Rated head (m)
1# 1450 185 790 58 
2# 1450 185 790 58 
3# 1490 139.05 1,200 37 
4# 1480 160 1,050 37 
5# 1480 160 1,050 37 
6# 1490 139.05 1,200 37 
IDPump speed (RPM)Rated power (kW)Rated flow (m3/h)Rated head (m)
1# 1450 185 790 58 
2# 1450 185 790 58 
3# 1490 139.05 1,200 37 
4# 1480 160 1,050 37 
5# 1480 160 1,050 37 
6# 1490 139.05 1,200 37 
Figure 3

Picture of monitoring sensor installation schematic.

Figure 3

Picture of monitoring sensor installation schematic.

Close modal

Vibration, magnetic flux, temperature, noise, and electric current are monitored in the motor bearing, near-end motor bearing, and far-end motor bearing of the pump. The monitoring sensor is programmed to trigger data collection every hour, capturing measurements continuously for a duration of 10 s during each sampling event. This periodic and consistent sampling scheme guarantees the collection of representative observations at regular intervals throughout the monitoring process. The interpolation method is used to solve the problem of low monitoring frequency of some monitoring indicators. The monitoring principle of monitoring indicators is presented in Table 3. The detailed monitoring indicators are shown in Table 4.

Table 3

The monitoring principle of monitoring indicators

Monitoring indicatorsWorking principle
Motor speed As the rotor of the pump approaches the hall sensor, it induces a magnetic field, leading to the generation of periodic voltage signals. By processing and counting these voltage signals, the rotational speed of the pump can be accurately measured 
Instantaneous current Smart power meter device 
Vibration Piezoelectric devices generate electrical signals in response to applied pressure resulting from vibrations. By recording and analyzing these electrical signals, the magnitude of vibration acceleration can be calculated 
Magnetic Hall element 
Temperature Resistance sensor 
Sound Vibrating film principle 
Monitoring indicatorsWorking principle
Motor speed As the rotor of the pump approaches the hall sensor, it induces a magnetic field, leading to the generation of periodic voltage signals. By processing and counting these voltage signals, the rotational speed of the pump can be accurately measured 
Instantaneous current Smart power meter device 
Vibration Piezoelectric devices generate electrical signals in response to applied pressure resulting from vibrations. By recording and analyzing these electrical signals, the magnitude of vibration acceleration can be calculated 
Magnetic Hall element 
Temperature Resistance sensor 
Sound Vibrating film principle 
Table 4

Monitoring position and frequency of pump monitoring indicators

Serial numberMonitoring contentUnitMonitoring frequency (Hz)
Motor speed RPM 
Instantaneous current mA 
Motor bearing-vibration-x-axis m/s2 8,000 
Motor bearing-vibration-y-axis m/s2 8,000 
Motor bearing-vibration-z-axis m/s2 8,000 
Motor bearing-magnetic flux-x-axis Wb 50 
Motor bearing-magnetic flux-y-axis Wb 50 
Motor bearing-magnetic flux-z-axis Wb 50 
Motor bearing-instantaneous temperature, °C °C 
10 Motor bearing-sound loudness dB 4,000 
11 Near-end motor bearing-vibration-x-axis m/s2 8,000 
12 Near-end motor bearing-vibration-y-axis m/s2 8,000 
13 Near-end motor bearing-vibration-z-axis m/s2 8,000 
14 Near-end motor bearing-magnetic flux-x-axis Wb 50 
15 Near-end motor bearing-magnetic flux-y-axis Wb 50 
16 Near-end motor bearing-magnetic flux-z-axis Wb 50 
17 Near-end motor bearing-sound loudness dB 4,000 
18 Near-end motor bearing-instantaneous current mA 
19 Far-end motor bearing-vibration-x-axis m/s2 8,000 
20 Far-end motor bearing-vibration-y-axis m/s2 8,000 
21 Far-end motor bearing-vibration-z-axis m/s2 8,000 
22 Far-end motor bearing-magnetic flux-x-axis Wb 50 
23 Far-end motor bearing-magnetic flux-y-axis Wb 50 
24 Far-end motor bearing-magnetic flux-z-axis Wb 50 
25 Far-end motor bearings-instantaneous temperature °C 
26 Far-end motor bearing-sound loudness dB 4,000 
Serial numberMonitoring contentUnitMonitoring frequency (Hz)
Motor speed RPM 
Instantaneous current mA 
Motor bearing-vibration-x-axis m/s2 8,000 
Motor bearing-vibration-y-axis m/s2 8,000 
Motor bearing-vibration-z-axis m/s2 8,000 
Motor bearing-magnetic flux-x-axis Wb 50 
Motor bearing-magnetic flux-y-axis Wb 50 
Motor bearing-magnetic flux-z-axis Wb 50 
Motor bearing-instantaneous temperature, °C °C 
10 Motor bearing-sound loudness dB 4,000 
11 Near-end motor bearing-vibration-x-axis m/s2 8,000 
12 Near-end motor bearing-vibration-y-axis m/s2 8,000 
13 Near-end motor bearing-vibration-z-axis m/s2 8,000 
14 Near-end motor bearing-magnetic flux-x-axis Wb 50 
15 Near-end motor bearing-magnetic flux-y-axis Wb 50 
16 Near-end motor bearing-magnetic flux-z-axis Wb 50 
17 Near-end motor bearing-sound loudness dB 4,000 
18 Near-end motor bearing-instantaneous current mA 
19 Far-end motor bearing-vibration-x-axis m/s2 8,000 
20 Far-end motor bearing-vibration-y-axis m/s2 8,000 
21 Far-end motor bearing-vibration-z-axis m/s2 8,000 
22 Far-end motor bearing-magnetic flux-x-axis Wb 50 
23 Far-end motor bearing-magnetic flux-y-axis Wb 50 
24 Far-end motor bearing-magnetic flux-z-axis Wb 50 
25 Far-end motor bearings-instantaneous temperature °C 
26 Far-end motor bearing-sound loudness dB 4,000 

Our monitoring sensor collected data from October 2022 to June 2023. The initial time of pump failure was reported by onsite maintenance personnel, and the accurate time was confirmed through video surveillance. Given the voluminous quantity of raw data, we employed undersampling techniques on the dataset. To mitigate the potential risk of data leakage, we performed a rigorous time-based partitioning of our training and testing sets, as outlined in Table 5.

Table 5

Training data set partitioning

DatasetTime spanData point
Training set 2022.10.1 00:00:00–2023.4.30 23:59:59 2,442,240 
Validation set 2023.5.1 00:00:00–2023.6.30 23:59:59 702,720 
DatasetTime spanData point
Training set 2022.10.1 00:00:00–2023.4.30 23:59:59 2,442,240 
Validation set 2023.5.1 00:00:00–2023.6.30 23:59:59 702,720 

The six groups of pumps experienced 73 minor failures and nine downtime failures during the monitoring period. We selected a subset of the monitoring data from four pumps for the months of April, May, and June, and displayed it in Figure 4.
Figure 4

Example of partial monitoring data.

Figure 4

Example of partial monitoring data.

Close modal
Figure 5

Example of PCA dimensionality reduced data.

Figure 5

Example of PCA dimensionality reduced data.

Close modal
Figure 6

Confusion matrix of the results calculated by the algorithm proposed in this paper.

Figure 6

Confusion matrix of the results calculated by the algorithm proposed in this paper.

Close modal

The classification of the monitoring indicators presented in Table 4 is accomplished by utilizing the mRMR method as described above under section ‘mRMR’. The threshold value for Ø(.) is set at 0.25. Table 6 shows the classification of the 26 monitoring indicators into four distinct categories.

Table 6

Classification of mRMR monitoring indicators

Classification numberMonitoring indicators
Motor bearing-sound loudness, far-end motor bearing-y vibration, motor bearing-x magnetic flux, far-end motor bearing-x vibration, motor bearing-y vibration 
Instantaneous current, motor bearing-z vibration, near-end motor bearing-sound, near-end motor bearing-z vibration 
Near-end motor bearing-temperature, far-end motor bearing-z vibration, motor bearing-z magnetic flux, far-end motor bearing-sound, near-end motor bearing-z magnetic flux 
Near-end motor bearing-sound loudness, near-end motor bearing-x magnetic flux, far-end motor bearing-temperature, far-end motor bearing-loudness, motor bearing-sound loudness 
Classification numberMonitoring indicators
Motor bearing-sound loudness, far-end motor bearing-y vibration, motor bearing-x magnetic flux, far-end motor bearing-x vibration, motor bearing-y vibration 
Instantaneous current, motor bearing-z vibration, near-end motor bearing-sound, near-end motor bearing-z vibration 
Near-end motor bearing-temperature, far-end motor bearing-z vibration, motor bearing-z magnetic flux, far-end motor bearing-sound, near-end motor bearing-z magnetic flux 
Near-end motor bearing-sound loudness, near-end motor bearing-x magnetic flux, far-end motor bearing-temperature, far-end motor bearing-loudness, motor bearing-sound loudness 

Applying the PCA method as described above under section ‘PCA’, the four categories of monitoring indicators are individually subjected to dimensionality reduction, with the target dimension set to 2. Taking several pumps as examples, Figure 5 illustrates the data after dimensionality reduction for monitoring indicators during select time intervals.

The LSTM model was trained using the methods outlined above under ‘Methods’ section. The evaluation of the model's classification prediction performance was conducted using the accuracy metric, which calculates the ratio of correctly predicted samples to the total number of samples in the testing dataset. On the validation set, the prediction accuracy was found to be 99.1%.

Considering the presence of data imbalance in our dataset, where the majority of instances indicate normal pump operation, we employed a confusion matrix to assess the performance of our model across various classification targets. Figure 6 portrays the confusion matrix, elucidating the predictive outcomes across various health states.

To verify the accuracy of the algorithm proposed in this article, the data before and after mRMR-PCA dimensionality reduction are fitted using logistic regression (LR), support vector machine (SVM), RNN, and LSTM methods. Given that the pump operates normally for the majority of the time, there is a challenge of sample imbalance within the dataset. To address this issue, a resampling technique is employed during the model training phase. Figure 7 illustrates the confusion matrix presenting the classification results of various models, both before and after applying data dimensionality reduction to the dataset.
Figure 7

Confusion matrix depicting the classification results of various models before and after data dimensionality reduction.

Figure 7

Confusion matrix depicting the classification results of various models before and after data dimensionality reduction.

Close modal

Based on the results, it can be observed that the proposed method presented in this study achieves the highest level of accuracy among the compared models. The comparative analysis of the algorithmic models reveals that LR exhibits the poorest performance due to the high dimensionality of the dataset, which causes the linear model to underperform. SVM performs relatively worse than RNN due to its simplistic model structure, which is typically only suitable for a restricted and fixed set of features. Comparatively, LSTM performs slightly better than RNN as the gate function integrated into the LSTM architecture allows it to effectively capture long-term dependencies, which is a challenge for RNNs. Moreover, LSTM can automatically learn meaningful features from the original signal without requiring feature engineering.

The analysis of the results before and after applying the data dimensionality reduction method shows that the prediction accuracy significantly improves after dimensionality reduction using the mRMR-PCA method proposed in this article. The dataset used in this study is high dimensional and of large volume, and employing the dimensionality reduction technique helps in eliminating unnecessary and redundant features, thereby reducing the impact of noise and enhancing the model's accuracy and generalization ability.

This article presents a method for assessing the health status of water pumps by fusing and analyzing multiple monitoring and sensing indicators. The proposed approach involves classifying the monitoring data using the mRMR method, reducing the dimensionality of the classified data using PCA, and inputting the processed data into an LSTM neural network to evaluate the health status of the pump. The effectiveness of the proposed algorithm is demonstrated through experiments conducted on six groups of pumps at the Xiasha Pumping Station in Hangzhou. The performance of the proposed method is compared with that of LR, SVM, RNN, and other algorithms. The experimental results show that the proposed algorithm has the highest accuracy for identifying the health status of pumps. Moreover, the proposed method can reduce the number of features in the dataset, leading to a reduction in computation and storage space. In addition, it can respond promptly to abnormal situations during the pump health status assessment process.

However, due to the limited number of samples, we only collected operational data from six pumps. Although our proposed method showed good performance on the collected dataset, the generalizability of this method to a wider range of diverse pumps remains uncertain. Further collection of additional sample data is necessary, and it is also the direction for our future work.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Bhuiyan
M. H. R.
,
Arafat
I. M.
,
Rahaman
M.
,
Toha
T. R.
&
Alam
S. M. M.
2022
Towards devising a vibration based machinery health monitoring system
.
Materials Today: Proceedings
56
,
2490
2496
.
Bolón-Canedo
V.
,
Sánchez-Maroño
N.
&
Alonso-Betanzos
A.
2016
Feature selection for high-dimensional data
.
Progress in Artificial Intelligence
5
(
2
),
65
75
.
Dai
J.
,
Wang
J.
,
Huang
W.
,
Shi
J.
&
Zhu
Z.
2020
Machinery health monitoring based on unsupervised feature learning via generative adversarial networks
.
IEEE/ASME Transactions on Mechatronics
25
(
5
),
2252
2263
.
Dutta
N.
,
Kaliannan
P.
&
Subramaniam
U.
2022
Bearing fault detection for water pumping system using artificial neural network
. In:
ICPC2T 2022 - 2nd International Conference on Power, Control and Computing Technologies
, pp.
1
6
.
Goyal
D.
,
Choudhary
A.
,
Pabla
B. S.
&
Dhami
S. S.
2020
Support vector machines based non-contact fault diagnosis system for bearings
.
Journal of Intelligent Manufacturing
31
(
5
),
1275
1289
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
Jollife
I. T.
&
Cadima
J.
2016
Principal component analysis: A review and recent developments
.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
374
(
2065
).
https://doi.org/10.1098/rsta.2015.0202
Lee
K.
,
Kim
J.-K.
,
Kim
J.
,
Hur
K.
&
Kim
H.
2018
CNN and GRU combination scheme for bearing anomaly detection in rotating machinery health monitoring
. In:
2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII)
,
IEEE
,
23–27 July 2018. Jeju, South Korea
, pp.
102
105
.
Li
S.
&
Li
J.
2017
Condition monitoring and diagnosis of power equipment: Review and prospective
.
High Voltage
2
(
2
),
82
91
.
Available from
:
https://onlinelibrary.wiley.com/doi/10.1049/hve.2017.0026
.
Li
X.
,
Shao
H.
,
Lu
S.
,
Xiang
J.
&
Cai
B.
2022
Highly efficient fault diagnosis of rotating machinery under time-varying speeds using LSISMM and small infrared thermal images
.
IEEE Transactions on Systems, Man, and Cybernetics: Systems
52
(
12
),
7328
7340
.
Mao
K. Z.
2004
Orthogonal forward selection and backward elimination algorithms for feature subset selection
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
34
(
1
),
629
634
.
Pan
J.
,
Li
Y.
&
Wu
P.
2022
A predict method of water pump operating state based on improved particle swarm optimization of support vector machine
.
Journal of Physics: Conference Series
2160
(
1
).
https://doi.org/10.1088/1742-6596/2160/1/012056
.
Qian
H.
&
Yifan
Z.
,
Rongyong
Z.
2021
Design of intelligent diagnosis operation and maintenance system for circulating water pump
. In:
2021 Global Reliability and Prognostics and Health Management, PHM-Nanjing 2021
,
15–17 October 2021, Nanjing, China
, pp.
1
4
.
Thakkar
A.
&
Chaudhari
K.
2021
A comprehensive survey on deep neural networks for stock market: The need, challenges, and future directions
.
Expert Systems with Applications
177
,
114800
.
https://doi.org/10.1016/j.eswa.2021.114800
.
Tsypkin
M.
2017
Induction motor condition monitoring: vibration analysis technique – diagnosis of electromagnetic anomalies
. In:
2017 IEEE AUTOTESTCON
,
9–15 September 2017
.
IEEE
,
Schaumburg, IL, USA
, pp.
1
7
.
Zhao
R.
,
Yan
R.
,
Chen
Z.
,
Mao
K.
,
Wang
P.
&
Gao
R. X.
2019a
Deep learning and its applications to machine health monitoring
.
Mechanical Systems and Signal Processing
115
,
213
237
.
Zhao
H.
,
Shu
M.
,
Ai
Z.
,
Lou
Z.
,
Sou
K. W.
,
Lu
C.
,
Jin
Y.
,
Wang
Z.
,
Wang
J.
,
Wu
C.
,
Cao
Y.
,
Xu
X.
&
Ding
W.
2022
A highly sensitive triboelectric vibration sensor for machinery condition monitoring
.
Advanced Energy Materials
12
(
37
),
2201132
.
Available from
:
https://onlinelibrary.wiley.com/doi/10.1002/aenm.202201132
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).