Success of any forecasting model depends heavily on reliable historical data, among others. Data are needed to calibrate, fine tune and verify any simulation model. However, data are very often contaminated with noise of different levels originating from different sources. This study proposes a scheme that extracts the most representative data from a raw data set. Subtractive Clustering Method (SCM) and Micro Genetic Algorithm (mGA) were used for this purpose. SCM does (a) remove outliers and (b) discard unnecessary or superfluous points while mGA, a search engine, determines the optimal values of the SCM's parameter set. The scheme was demonstrated in: (1) Bangladesh water level forecasting with Neural Network and Fuzzy Logic and (2) forecasting of two chaotic river flow series (Wabash River at Mt. Carmel and Mississippi River at Vicksburg) with the phase space prediction method. The scheme was able to significantly reduce the data set with which the forecasting models yield either equally high or higher prediction accuracy than models trained with the whole original data set. The resulting fuzzy logic model, for example, yields a smaller number of rules which are easier for human interpretation. In phase space prediction of chaotic time series, which is known to require a long data record, a data reduction of up to 40% almost does not affect the prediction accuracy.

This content is only available as a PDF.