Water quality prediction is the basic work of water resource management and pollution control, and it is crucial to accurately predict the trend of pollutant concentration in water bodies over time. Water quality data prediction has an important significance, it provides data support for the effective estimation of water quality, and is also an indirect way to protect water resources and environment. At present there are a variety of water quality prediction methods, but these methods still have some shortcomings. In this paper, the main water quality pollution indicators such as the dissolved oxygen (DO), ammonia nitrogen (NH3-N) and total phosphorus (P) data were the object of study to build a water quality prediction model. The water quality prediction index contains numerous nonlinear correlation characteristics that results in low training efficiency on a large-scale data. Therefore, a combined water quality prediction model based on integrated ensemble empirical mode decomposition (EEMD) and cascade support vector machine (Cascade SVM) is proposed. First, the EEMD method is used to highlight the real characteristics of the original water quality data series. Then, the parallel training and prediction process are realized by the Spark, a distributed computing engine, to parallelize the traditional Cascade SVM. The experimental results show that the proposed combined model shows a strong superiority in many aspects of performance such as training efficiency and prediction accuracy.

  • Proposes a combined water quality prediction model based on EEMD and Cascade SVM.

  • Improves the accuracy of the prediction results.

  • A combined water quality prediction model proposed in this paper has a higher accuracy.

  • A combined water quality prediction model proposed in this paper has less prediction time.

  • The proposed combined model shows a strong superiority in training efficiency and prediction accuracy.

Graphical Abstract

Graphical Abstract
Graphical Abstract
This content is only available as a PDF.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).