In this paper, data from 105 soil and groundwater remediation projects at BP gasoline service stations located in the state of Illinois were mined for lessons to reduce cost and improve management of remediation sites. Data mining software called D2K was used to train decision tree, stepwise linear regression and instance-based weighting models that relate hydrogeologic, sociopolitical, temporal and remedial factors in the site closure reports to remediation cost. The most important factors influencing cost were found to be the amount of soil excavated and the number of groundwater monitoring wells installed, suggesting that better management of excavation and well placement could result in significant cost savings. The best model for predicting cost classes (low, medium and high cost) was the decision tree, which had a prediction accuracy of approximately 73%. The misclassification of approximately 27% of the sites by even the best model suggests that remediation costs at service stations are influenced by other site-specific factors that may be difficult to accurately predict in advance.
Data mining to improve management and reduce costs of environmental remediation
Dara M. Farrell, Barbara S. Minsker, David Tcheng, Duane Searsmith, Jane Bohn, Dennis Beckman; Data mining to improve management and reduce costs of environmental remediation. Journal of Hydroinformatics 1 March 2007; 9 (2): 107–121. doi: https://doi.org/10.2166/hydro.2007.004
Download citation file: