Fault diagnosis by machine learning techniques is of great importance in wastewater treatment plants (WWTPs). A key factor influencing the accuracy of fault diagnosis lies in the imbalance between the sample data in minority classes (i.e. faulty situations) and that in majority classes (i.e. normal situations), which may cause misjudgments of faults and lead to failure in practical use. This study proposes a novel pre-processing method with a fast relevance vector machine (Fast RVM) reducing the data of majority class samples and the synthetic minority over-sampling technique expanding the minority class samples. A case study indicates that this pre-processing method could be a promising solution for imbalanced data classification in WWTPs and the pre-processed data can be well diagnosed by back-propagation neural networks, support vector machine, RVM and Fast RVM models.

