Conventional oil-in-water analyzers used by waterworks have hydrocarbon detection limits at mg/L levels and do not identify the type of oil compounds. The objectives of this study were to evaluate a more sensitive optical instrument and the analysis method to (1) determine the signature excitation and emission matrixs of each type of oil (such as diesel, heavy oil, gasoline and kerosene) or their indicator organic compounds and enter them into the instrument's software library and (2) test out the effectiveness of the instrument in detecting the above-mentioned oil in local waterworks’ source and treated water. The patented simultaneous absorbance-transmittance excitation-emission matrix (A-TEEM) instrument method was used to identify and quantify low levels of organic contaminants present in a much higher background of other dissolved organic matter components in raw and treated water. Multivariate regression and machine learning techniques were applied and shown to have potential for alerting plant operators to organic contamination events.
An A-TEEM instrument method is an efficient qualification and quantification method for the detection of oil contaminants in water.
An A-TEEM method is effective in the detection of water soluble fractions of oil contaminants in ppb level in raw and treated water matrices.
An A-TEEM method provides early warning alert for proactive prevention and corrective actions in responding to contamination events.
Current oil-in-water analyzers used by Singapore PUB waterworks can only detect hydrocarbons in the mg/L range and cannot provide information on the likely identity of the dissolved organic contaminants present in the water. Some organic components of oils, if carried over into the treated water at trace levels, can be aesthetically objectionable to customers. Hence, there is a need for a highly sensitive instrument method, which can detect early and identify the trace level of organic pollutants in the incoming source water and into the waterworks.
Conventionally, oil contaminants in water rely on the laborious, non-specific gravimetric method with solvent extraction (EPA Method 1664). Additionally, the gas chromatography (GC) with the solid-phase extraction method (EPA Method 8270) is slow and requires specialized laboratories that are not always available in every waterworks. Field spectroscopic probes lack the desired sensitivity and selectivity of GC methods (Conmy et al. 2014). In contrast, the application of fluorescence 3D excitation and emission matrix (EEM), which can yield molecular selectivity and high sensitivity (μg/L) (Gilmore & Chen 2020), is becoming more widely adopted in water quality monitoring (Trueman et al. 2016; O'Driscoll et al. 2020; Vines & Terry 2020). Patented developments including charge couple device detectors and simultaneous absorbance-transmittance fluorescence excitation-emission matrices (A-TEEM) with the inner filter effect (IFE) correction (Gilmore & Tong 2014; Gilmore et al. 2014) have greatly reduced acquisitions from hours to minutes. IFE correction is important for measuring low levels of contamination in water samples because the strong absorbance of colored dissolved organic matter in the background interferes with the identification and quantification of contaminants. The IFE-corrected EEMs thus can facilitate the detection of dissolved aromatic compounds qualitatively and quantitatively (Gilmore et al. 2014). In this study, the A-TEEM instrument was investigated as a novel rapid optical analytical method to oil contaminants in water, specifically oil aromatic fractions, because its capability of detecting many dissolved aromatic organic pollutants in water. With the calibrated and optimized prediction models, the method reports an individual concentration of specified fuels investigated in the μg/L range (Driskill et al. 2018). The instrument is easy to use by trained personnel with no chemical reagents required and has potential for online implementation.
The two key objectives of the study were to (1) determine the signature EEMs of each type of oil (such as diesel and heavy oil) or their indicator organic compounds and enter them into the instrument's software library and (2) test the effectiveness of the instrument in detecting the above-mentioned oil in local waterworks’ source and treated water. To this end, this study preliminarily evaluated several target hydrophilic compounds, such as phenol and naphthalene, as they represent the water soluble fraction (WSF) typically associated with oil contaminants in water (Gilmore & Chen 2020). The study also focused on evaluation of the effectiveness of multivariate and conventional EEM data analysis methods for the purposes of contaminant identification, quantification and reporting. For purposes of contaminant identification, Support Vector Machine Discrimination Analysis (SVMDA) was evaluated to assign Extreme Gradient Boost Regression (XGBR) or Classical Least Squares (CLS) regression models for specific target compounds (Sádecká & Tóthová 2007; Bridgeman et al. 2011; Kumar et al. 2014). The goal of these different algorithms is to reduce multidimensionality and eliminate spectral noise, so that only important and relevant information can be extracted to correct with selected parameters. Additionally, conventional EEM regional integration for target compounds and compound classes was evaluated as a chemometric-independent method for rapid identification of abnormal EEMs (Chen et al. 2003). Further, we evaluated characterizing the dissolved organic carbon composition of different water matrices using the Parallel Factor Analysis (PARAFAC) (Stedmon & Bro 2008; Murphy et al. 2013).
This paper summarizes the positive findings with respect to the A-TEEM instrument method and data analysis developments, in addition to describing the problems encountered and ideas for method improvement and adaptation. We compare results with both hydrophobic materials including crude oils and fuels themselves and fluorescent hydrophilic compounds contained in the raw oil/fuel mixtures that are well-established indicators of the contamination source, such as polycyclic aromatic hydrocarbons (Driskill et al. 2018), which can then serve as an indicator of contamination. The potential deployment of this A-TEEM technique and multivariate modeling could close the gap between source water contamination detection and lab analytical turnaround time, especially in cases where samples need to be sent to an outside organic laboratory for testing, thus providing early warning alerts for proactive prevention and corrective actions in contamination events. The A-TEEM method with the classification model on-site could also save manpower and reduce response time for necessary actions to be taken when an abnormality is detected.
A-TEEM instrument and sample conditions
A-TEEM data were collected using an Aqualog UV-800 (HORIBA Scientific, NJ) using v4.0 software. Typical instrument sample conditions included the use of a 1 × 1 cm quartz cuvette (3.5 mL) at room temperature with constant stirring. The instrument calibration was checked daily with a sealed pure water standard (Starna Inc.) that was also used to collect the Raman scattering unit normalization factor to account for the instrument lamp throughput and integration time conditions. Deionized (DI) water that was used for sample compound dilution was also used to collect the sample blanks. All EEMs were corrected for spectral excitation and emission response, dark offset, and blank signals, and then corrected for both primary and secondary IFE, and first- and second-order Rayleigh masking (RM) were applied (scattering lines are replaced with zero values) in the Aqualog v.4.0 software before further data pre-processing and calibration modeling. Solvent blanks were recorded for background signal subtraction, and the EEMs were normalized daily by water Raman scattering units for the specified conditions obtained by the measurement of a standard sealed water cuvette (Starna RM H20, Starna Cells, Atascadero, CA, USA). Cuvettes were cleaned with Decon® Decomatic detergent, then tap water and DI water. Cuvette cleanliness was confirmed by comparing EEM and absorbance background signals against an acceptance threshold comparable to those obtained with the sealed Starna Inc. water sample (RM-H2O).
Raw water and treated water samples in this study were collected from local waterworks treating the same river water source using different treatment technologies. Plant 1 employs conventional treatment technology (coagulation–sedimentation–sand filtration), while Plant 2 uses advanced treatment technology (coagulation–sedimentation–sand filtration–ozonation–biologically activated carbon filtration). In this paper, Raw refers to sample from the river water source, and Coag-Filter refers to treated water sample from Plant 1, while Bio-Filter refers to treated water sample from Plant 2.
Preparation of standard compound stock solutions and spiked samples
Stock aqueous compound solutions were prepared in a serial manner with an aim to minimize the content of the solvents (methanol) contained in the original standards; this was in order to minimize the solvent-associated fluorescent backgrounds. Compound calibration was achieved by serial addition (spiking) of the stock solutions to a given water source sample in a given cuvette while continually accounting for the volume and total concentrations. Spiked samples were prepared with filtered raw water (0.45 μm) to reduce turbidity while treated water samples were unfiltered. Nylon syringe filters pre-conditioned with DI water to minimize UV-leachates.
Multivariate and regional integration analysis
Solo v.8.5 thru v.8.7 (Eigenvector Inc., USA) was used for all operations of multivariate analyses including SVMDA, XGBoost and CLS, and PARAFAC analyses including EEM pre-processing, EEM transformation and model calibration, cross-validation and split validation. EEM regional integration was applied using Aqualog v.4.0 software.
RESULTS AND DISCUSSION
Water source organic matter characterization
Decomposition of waterwork organic composition EEMs by PARAFAC
PARAFAC analysis indicated unique dissolved organic matter compositions for various types of water matrices: untreated raw source water (Raw) and treated water, Coag-Filter and Bio-Filter. Raw exhibited the highest levels of humic- and fulvic-like compounds, in addition to significant levels of protein-like components consistent with untreated water affected by natural and anthropogenic activities. Raw also exhibited very high turbidity (up to 500 NTU) and high dissolved organic carbon (DOC) concentration (1.2–8.37 mg/L). PARAFAC analysis showed that Coag-Filter exhibited reduced levels of humic- and fulvic-like compounds and very low levels of protein-like components, while Bio-Filter showed greatly reduced levels of fulvic- and protein-like components and a unique altered humic-like component consistent with ozone degradation effects on this humic-like class of compounds.
One key observation was the unfiltered Raw samples, with high particulates and turbidity, which exhibited high absorbance and light-scattering that obscured the resolution of spectral features in the ultraviolet (UV) absorbance and fluorescence signals (data not shown). Filtered Raw, unfiltered treated water Coag-Filter and Bio-Filter were evaluated using the Aqualog. The filtered Raw samples showed reduced absorbance and light-scattering and typical surface water EEM spectral contours. The EEM characteristics of the filtered raw water and unfiltered treated water were further evaluated over an extended period of time and decomposed using PARAFAC with a four-component model. The four PARAFAC components included fulvic acid-like (C1, excitation 240–320 nm; emission 375–425 nm), humic acid-like (C2, excitation 320–360 nm; emission 425–475 nm), degraded humic acid-like (C3, excitation 260–300 nm; emission 400–525 nm) and protein-like components (C4, excitation 270–280 nm; emission 325–375 nm). The PARAFAC component score analysis resolved the filtered Raw, Coag-Filter and Bio-Filter, respectively, as three distinct classes consistent with their unique DOC composition (Figure 1). The Raw class exhibited the highest levels of fulvic (C1) and humic (C2), the highest C2:C1 ratio and the highest protein-like (C4). Thus, the Raw composition was consistent with typical surface water pre-treatment conditions of higher levels of humic acid and possibly wastewater-effluent sourced protein-like components. Notably, the C4 component of the filtered Raw samples also overlaps partly with the hydrophobic and hydrophilic contaminant spectral profiles in the study. Coag-Filter showed lower fulvic (C1) and humic (C2) levels, a reduced C2:C1 ratio and reduced protein-like (C4) consistent with the effects of conventional coagulation–sedimentation–filtration (Gilmore & Tong 2017), while Bio-Filter was distinguished primarily by the lack of humic (C2) being replaced by degraded humic acid-like (C3) which appeared to be consistent with the effects of ozone treatment followed by biologically active carbon filtration to degrade and alter the spectral properties of the humic acid-like materials.
A-TEEM detection of oil contaminants
Regional integration analysis
Figures 2 and 3, for heavy oil and diesel, respectively, compare the EEM results for Bio-Filter (left), Coag-Filter (center) and filtered Raw (right) samples as a function of increasing spiked concentrations from top to bottom. The rectangular region (yellow) shows the targeted area that was integrated to evaluate the changes in fluorescence intensity for both heavy oil and diesel. The increasing intensity contours in this region visually correlate with the increased concentrations in both Coag-Filter and Bio-Filter samples. Notably, however, the selected region indicated little if any visible changes in the filtered Raw samples for either heavy oil or diesel.
Figure 4 compares the integrated change in intensity for the area (Δ), compared with the 0 μg/L spiked samples from Figures 2 and 3, respectively, as a function of the spiked concentrations for heavy oil (left) and diesel (right). Consistent with the visualization of the EEM contours, it was clear that the spiked concentrations in the Coag-Filter and Bio-Filter samples can be resolved for both heavy oil and diesel in the observed μg/L range, whereas however, neither of these compounds was well resolved in the filtered Raw samples.
Support Vector Machine Discriminant Analysis
Representative EEM calibration sets for baseline (uncontaminated) and targeted contaminant samples were used to test the validation of classification models for qualitative contaminant recognition (Figure 5). It was determined that the filtered Raw samples exhibited a large decrease in the contents of several hydrophobic oil types, compared with the known pre-filter spike concentrations, such that qualitative and quantitative determination of these oils was unsuccessful with the spiking sequence method used (data not shown).
Importantly, however, for the Coag-Filter and Bio-Filter samples, it was possible to develop effective SVMDA models to recognize and distinguish between baseline samples and samples spiked with hydrophobic oil contaminants, namely diesel and heavy oil. Example models including Coag-Filter and Bio-Filter sources were calibrated and successfully validated by independent sets of samples that were either unspiked or spiked with diesel and heavy oil samples in the μg/L range (0–40 μg/L). Notably, PARAFAC analysis also clearly resolved the components of diesel and heavy oil contamination, compared with the unspiked samples, in the Coag-Filter and Bio-Filter samples (data not shown).
Extreme Gradient Boost Regression
Following the effective classification by SVMDA, source-specific quantitative calibration curves for diesel and heavy oil were evaluated using XGBR for Coag-Filter and Bio-Filter samples, respectively. Figure 6 shows the XGBR curves using independent calibration and validation datasets. For the diesel in the Coag-Filter model, there were 75 samples in the calibration set and four samples in the validation set. The linear fit R2 reached 0.987, with regular residuals of ±0.762 μg/L (Figure 6(a)). Similarly, for the diesel in Bio-Filter model, there were 98 samples in the calibration set and four samples in the validation set. The linear fit R2 reached 0.957, with regular residuals of ±1.83 μg/L (Figure 6(b)).
For the heavy oil in the Coag-Filter model, there were 70 samples in the calibration set and four samples in the validation set. The linear fit R2 reached 0.971, with regular residuals of ±0.971 μg/L (Figure 6(c)). Similarly, for the heavy oil in the Bio-Filter model, there were 96 samples in the calibration set and four samples in the validation set. The linear fit R2 reached 0.950, with regular residuals of ±1.01 μg/L (Figure 6(d)).
|Water type .||Fuel type .||Slope value .||Standard error of intercept .||LOD (μg/L) .||LOQ (μg/L) .||SD residuals (μg/L) .|
|Water type .||Fuel type .||Slope value .||Standard error of intercept .||LOD (μg/L) .||LOQ (μg/L) .||SD residuals (μg/L) .|
In summary, XGBR model successfully detected the targeted oils, diesel and heavy oil, with quantification limits near 1 μg/L.
A-TEEM detection of hydrophilic contaminants
Preliminary investigations with target hydrophilic compounds (phenol and naphthalene) were readily detectable in the μg/L range when spiked into DI water. Figures 7 and 8 show the concentrations of individually spiked samples were predicted by using Classical Least Square (CLS) regression models. The linear calibration ranges were from 0.5 to 9.8 μg/L for phenol with linear fit R2 = 0.996 and from 0.5 to 50.4 μg/L for naphthalene with linear fit R2 = 0.962, respectively.
Figure 9 shows predicted concentrations against spiked concentrations for each water matrix (DI water, Coag-Filter and Bio-Filter, and filtered Raw). The results indicated that there was good predictability for phenol in all water matrices, including DI water, filtered Raw, and Coag-Filter and Bio-Filter, and for naphthalene in treated water Coag-Filter and Bio-Filter, respectively. However, the CLS models for the hydrophobic oils which include the targeted oil contaminants were unable to provide good results for all matrices; thus, the authors explored Support Vector Machine Discriminant Analysis (SVMDA), Partial Least Squares Regression (PLSR) and XGBR.
As discussed in the previous section, hydrophobic compounds including raw fuels such as diesel and heavy oil were much more difficult to detect when spiked before filtering into raw water and analyzed after filtering; this was attributed to particulate binding and filter retention of the hydrophobic compounds. The hydrophilic or WSF compounds (such as phenol and naphthalene) are less susceptible to filter retention and presumably particulate biding; however, the adverse effects of high turbidity must be considered where high turbidity raw water is present. Therefore, the upper limit of site-specific raw water turbidity should be tested as an impact factor for method robustness. We found in this study that raw water with high turbidity up to 500 NTU presented challenges for fuel detection and identification. It must also be considered that equilibration and dissolution of the hydrophilic components of these raw fuels was incomplete prior to filtration and sample analysis. However, these hydrophobic compounds were all readily detectable at μg/L levels and could be identified by SVMDA when spiked into the unfiltered treated water samples; for these samples’ EEM regional integration was also successful at identifying abnormal EEM samples containing the hydrophobic contaminants.
The A-TEEM method was shown to be effective for characterizing the dissolved organic matter composition for water samples collected in accordance with different types of treatment process applied. This is important to evaluate the process effectiveness and help detect abnormalities. Regional integration analysis could be employed for early detection of contamination of hydrophobic oils in treated water. Importantly, hydrophilic or WSF components with good fluorescence and absorbance such as phenol can be detected in filtered raw water and treated water. These hydrophilic components could be used as indicators of an oil contamination event; however, one may not specifically pinpoint the oil type when such a contamination event occurs as WSF compounds are commonly present in most oil types. The high turbidity of the raw water imposed difficulties in detecting and identifying specific fuels investigated such as heavy oil and diesel, and other types of hydrophobic fuel contamination. Future steps to deal with this may include investigating longer equilibration times for the fuel contaminations prior to filtration as well as other filtration techniques. It is expected that each type of fuel should exhibit a unique composition of fluorescent WSF compounds that could be used to ‘fingerprint’ different types of fuels and oils in the raw water. Other problems to be addressed for plant operator employment include (1) the ability to use and maintain good techniques to prevent sample cross-contamination and (2) to implement an intuitive, robust and user-friendly software interface. To this end, future tests could involve both (1) the use of a fully automated sampling apparatus that greatly reduces sample carry-over and (2) the use of an automated data analysis dashboard with a simple standard operating protocol. This dashboard's output first identifies each contaminant in the classification model library (if present) and then reports the concentrations from the regression model of each contaminant with the respect to user-defined thresholds/warning levels.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.