In this study, basic interpolation and machine learning data augmentation were applied to scarce data used in Water Quality Analysis Simulation Programme (WASP) and Continuous Stirred Tank Reactor (CSTR) that were applied to nitrogenous compound degradation modelling in a river reach. Model outputs were assessed for statistically significant differences. Furthermore, artificial data gaps were introduced into the input data to study the limitations of each augmentation method. The Python Data Analysis Library (Pandas) was used to perform the deterministic interpolation. In addition, the effect of missing data at local maxima was investigated. The results showed little statistical difference between deterministic interpolation methods for data augmentation but larger differences when the input data were infilled specifically at locations where extrema occurred.

  • Basic interpolation methods did not produce statistically significant differences in augmented datasets.

  • Increasing the gaps yielded greater differences between augmented datasets.

  • Differences between augmented datasets with artificial gaps appear to depend on gap size than gap location.

  • ML methods’ artificial gaps produced acceptable results.

  • Difference between the WASP and Basic Model on real and artificial input.

Graphical Abstract

Graphical Abstract
Graphical Abstract
This content is only available as a PDF.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).