A series of ultraviolet-visible (UV-Vis) spectra from seawater samples collected from sites along the coastline of Tianjin Bohai Bay in China were subjected to multivariate partial least squares (PLS) regression analysis. Calibration models were developed for monitoring chemical oxygen demand (COD) and concentrations of total organic carbon (TOC). Three different PLS models were developed using the spectra from raw samples (Model-1), diluted samples (Model-2), and diluted and raw samples combined (Model-3). Experimental results showed that: (i) possible nonlinearities in the signal concentration relationships were well accounted for by the multivariate PLS model; (ii) the predicted values of COD and TOC fit the analytical values well; the high correlation coefficients and small root mean squared error of cross-validation (RMSECV) showed that this method can be used for seawater quality monitoring; and (iii) compared with Model-1 and Model-2, Model-3 had the highest coefficient of determination (R2) and the lowest number of latent variables. This latter finding suggests that only large data sets that include data representing different combinations of conditions (i.e., various seawater matrices) will produce stable site-specific regressions. The results of this study illustrate the effectiveness of the proposed method and its potential for use as a seawater quality monitoring technique.

INTRODUCTION

Monitoring seawater quality is a subject of growing importance around the world because of the ever increasing exploitation and utilization of marine resources. However, currently available seawater quality monitoring technologies have several drawbacks, including problems with control and optimization of monitoring procedures (Pouet et al. 1999) and sampling and sample storage problems. In addition, existing standard analytical methods are not designed to allow real-time monitoring and process control (APHA-AWWA-WEF 1999; Bonastre et al. 2005). Thus, effective online sensors are sorely needed for improved seawater quality monitoring.

Since the late 1990s, the rapid development of new online sensors for in situ monitoring of water quality has been occurring (Namieśnik 2000). Among these new sensors, ultraviolet-visible (UV-Vis) spectrometers can be used to estimate pollutant concentrations via mathematical processing of their absorbance spectra (Roig et al. 2002; Fogelman et al. 2006). Because the maritime environment frequently changes and sometimes the conditions are harsh, especially in an estuary (Cha et al. 2009; Su et al. 2011; Hur & Cho 2012), monitoring techniques can be compromised. In such conditions, a combination of methods, such as UV-Vis spectrum analysis and multivariate analysis, is necessary to improve monitoring performance (Thomas & Constant 2004). Among the available multivariate analysis methods, partial least squares (PLS) regression has been shown to reduce dimensions and retain quality-related variables (Bastien et al. 2005). The UV-Vis spectroscopy coupled with various modelling methods has been effectively utilized for quality analysis of wastewater (Lourenço et al. 2008; Platikanov et al. 2014). However, this technique has not been widely applied to seawater analysis.

The health of coastal and marine ecosystems is affected by seawater quality. The marine pollution and its impacts have resulted in a number of environmental issues. The seawater quality standard of China, The Specification for Marine Monitoring Part 4: Seawater analysis (AQTSPRC 2007), lists chemical oxygen demand (COD) and total organic carbon (TOC) content as two meaningful parameters that represent the typical chemical or physical characteristics of the seawater (e.g., degree of pollution). The objective of the present study was to determine if UV-Vis spectrophotometry can be used to monitor the COD and TOC of seawater. UV-Vis spectra of seawater samples collected from Tianjin Bohai Bay in China were acquired and used to develop multivariate PLS calibration models to model and predict seawater quality. To the best of our knowledge, this is the first study utilizing UV-Vis spectroscopy coupled with PLS modelling method for the COD and TOC analysis of seawater. This study will highlight the potential of UV-Vis spectroscopy for the assessment of seawater contamination.

MATERIALS AND METHODS

Sample collection and seawater characteristics

In the present study, a total of 80 seawater samples were collected randomly from Tianjin Bohai Bay, China. Collection conditions varied slightly, but all samples were taken at a depth of 50 cm and were kept at 4 °C until used. All experiments were performed within a day or two after collection. Both raw and diluted samples were analyzed; for the latter, samples were diluted with artificial seawater at a dilution ratio of 1:10. Artificial seawater was prepared by mixing 31 g NaCl, 10 g MgSO4•7H20, 0.5 g NaHCO3 in 1 L distilled water. After the samples were filtered through a 0.45 μm pore-size membrane to remove suspended particles, UV-Vis spectra were acquired using a UV-3000 spectrophotometer (Mapada Instruments, Shanghai, China). Samples were placed in a quartz cell with 10 mm path length, and absorbance readings were taken between 190 and 800 nm at 1 nm wavelength increments against distilled water. The sample sets were randomly separated into two subsets for prediction model training and testing (Table 1). Table 1 also shows the ranges of COD and TOC for the samples. COD and TOC were measured using standard methods (SEPA 2002). The multivariate PLS calibration models were developed using the UV-Vis spectra and their corresponding COD and TOC values. Model-1 was developed using 60 UV-Vis spectra from raw samples. To assess the effect of nonlinearity due to signal saturation in the UV region, Model-2 was developed using 60 spectra from 1:10 diluted samples. To assess the possibility of expanding the narrow interval of concentration values present in the original data sets, Model-3 was developed using the 120 spectra from both raw and diluted seawater samples.

Table 1

Characteristics of seawater samples and partition of subsets

      Number of samples 
Model  Range (mg/L) Training Testing 
Model-1 COD 5.20–21.00 60 20 
TOC 21.20–79.55 60 20 
Model-2 COD 0.52–2.10 60 20 
TOC 3.92–9.76 60 20 
Model-3 COD 0.52–21.00 120 40 
TOC 3.92–79.55 120 40 
      Number of samples 
Model  Range (mg/L) Training Testing 
Model-1 COD 5.20–21.00 60 20 
TOC 21.20–79.55 60 20 
Model-2 COD 0.52–2.10 60 20 
TOC 3.92–9.76 60 20 
Model-3 COD 0.52–21.00 120 40 
TOC 3.92–79.55 120 40 

Multivariate PLS model

The aim of PLS regression is to predict a variable matrix Y from a variable matrix X and to describe their common structure (Mark 2000). For example, suppose you have a data set {X, Y} where X is a n × p process variable data matrix with p predictor variables for n samples (e.g., in the present study, n seawater samples with corresponding UV-Vis spectra, and each UV-Vis spectrum has p wavelengths) and Y is the corresponding dependent variable matrix with size n × q with q dependent variables for n samples (e.g., COD and TOC measurements of these n seawater samples). PLS regression reflects both outer relations within X and Y blocks and inner relations between the two blocks. The equations are as follows: 
formula
1
 
formula
2
 
formula
3
where T, P, and E are the score matrix, loading matrix, and residual matrix of X space, respectively, and U, Q, and F are the score matrix, loading matrix, and residual matrix of Y space, respectively (Otto 1999). Equations (1) and (2) describe the outer relations, and Equation (3) describes the inner relations between Y space and X space, as bi is the regression coefficient between the PLS component ti from X space and the PLS component ui from Y space. There are many methods to choose the number of latent variables (LVs) of PLS (Geladi & Kowalski 1986; Hoskuldsson 1988). In the present study cross-validation was used.

The UV-Vis spectra data were directly imported to the computer for development of the PLS calibration models and subsequent model validation using SIMCA-P software (Umetrics, San Jose, CA, USA).

Several indices were used to evaluate the performance of the proposed models. The predictive ability of the multivariate PLS models for the training set was evaluated using the root mean squared error of cross-validation (RMSECV) following the cross-validation (leave-one-out) procedure: 
formula
4
where n is the number of samples in the training set, is the measured value of sample i, and is the predicted value of sample i when the model is constructed without sample i. The optimal number of LVs to retain in the PLS models was established by applying this cross-validation procedure to the training set.

RESULTS AND DISCUSSION

Figure 1 shows typical UV-Vis spectra of the seawater samples used to construct Model-1, Model-2, and Model-3.

Figure 1

Typical UV-Vis spectra of the seawater samples for the multivariate PLS models: (a) Model-1, (b) Model-2, (c) Model-3.

Figure 1

Typical UV-Vis spectra of the seawater samples for the multivariate PLS models: (a) Model-1, (b) Model-2, (c) Model-3.

As seen in Figure 1, the UV-Vis spectra are noisy due to the strong interactions between UV-Vis light and particles in the seawater, and some of the information in the full spectrum is redundant. Thus, only a portion of the variables in the UV region should be used to construct the prediction models. Choosing the appropriate wavelengths to establish a PLS model is a crucial step, as it suppresses the signals measured at some wavelengths that represent noise or that contain useless information (Luis et al. 2004). The wavelength range from 240 to 380 nm is known to contain information related to organic contamination (Lourenço et al. 2008), and a specific absorption peak near 200 nm is visible in the UV-Vis spectra of the seawater samples shown in Figure 1. This finding indicates that this range should be informative for predictions of COD and TOC of seawater samples. Therefore, the wavelength range from 190 to 380 nm was selected in this study for development of the multivariate PLS models.

In PLS analysis, choosing the optimal number of LVs is a key step in the estimation of model structure because the estimation of parameters strongly depends on the obtained latent vectors. The process of extracting LVs is equivalent to variable selection or feature extraction. The calculation is simpler with fewer LVs present. Here, the number of LVs in the PLS analysis was determined using the RMSECV procedure (Figure 2).

Figure 2

Root mean squared error of cross-validation (RMSECV) as a function of the number of latent variables used in the multivariate PLS calibration models developed for estimation of COD and TOC of seawater: (a) Model-1, (b) Model-2, (c) Model-3.

Figure 2

Root mean squared error of cross-validation (RMSECV) as a function of the number of latent variables used in the multivariate PLS calibration models developed for estimation of COD and TOC of seawater: (a) Model-1, (b) Model-2, (c) Model-3.

In Figure 2, the point with the lowest RMSECV value indicates the number of LVs that should be used in the model. Thus, Model-1 and Model-2 have five LVs whereas Model-3 has only three LVs. The corresponding values for the RMSECV were 0.96, 0.11, and 1.04 mg/L for COD-1, COD-2, and COD-3, respectively, and 3.28, 0.32, and 3.98 mg/L for TOC-1, TOC-2, and TOC-3, respectively. The resulting PLS calibration models for COD and TOC estimation are compared in Figures 3 and 4. The corresponding results are summarized in Table 2.

Table 2

Summary of the multivariate PLS models developed for estimation of COD and TOC

  Model-1
 
Model-2
 
Model-3
 
 COD-1 TOC-1 COD-2 TOC-2 COD-3 TOC-3 
UV-Vis region (nm) 190–380 190–380 190–380 
Latent variables 
R2 calibration 0.963 0.967 0.951 0.966 0.978 0.976 
R2 validation 0.963 0.960 0.965 0.939 0.974 0.975 
RMSECV (mg/L) 0.96 3.28 0.11 0.32 1.04 3.98 
RMSEP (mg/L) 0.90 4.23 0.13 0.36 1.06 3.97 
  Model-1
 
Model-2
 
Model-3
 
 COD-1 TOC-1 COD-2 TOC-2 COD-3 TOC-3 
UV-Vis region (nm) 190–380 190–380 190–380 
Latent variables 
R2 calibration 0.963 0.967 0.951 0.966 0.978 0.976 
R2 validation 0.963 0.960 0.965 0.939 0.974 0.975 
RMSECV (mg/L) 0.96 3.28 0.11 0.32 1.04 3.98 
RMSEP (mg/L) 0.90 4.23 0.13 0.36 1.06 3.97 
Figure 3

Scatter plots of predicted versus analytical COD values for the multivariate PLS models. Training set: (a) Model-1, (b) Model-2, (c) Model-3. Testing set: (d) Model-1, (e) Model-2, (f) Model-3.

Figure 3

Scatter plots of predicted versus analytical COD values for the multivariate PLS models. Training set: (a) Model-1, (b) Model-2, (c) Model-3. Testing set: (d) Model-1, (e) Model-2, (f) Model-3.

Figure 4

Scatter plots of predicted versus analytical TOC values for the multivariate PLS models. Training set: (a) Model-1, (b) Model-2, (c) Model-3. Testing set: (d) Model-1, (e) Model-2, (f) Model-3.

Figure 4

Scatter plots of predicted versus analytical TOC values for the multivariate PLS models. Training set: (a) Model-1, (b) Model-2, (c) Model-3. Testing set: (d) Model-1, (e) Model-2, (f) Model-3.

Although Model-2 was the multivariate PLS calibration model with the lowest RMSECV value, Model-1 and Model-3 had a higher R2 and a satisfactory RMSECV for the range of values analyzed. These results suggest that possible nonlinearities in the signal concentration relationships can be well accounted for by Model-1 and Model-3 in the selected spectral range. Thus, the sample dilution step can be avoided, thereby making the overall estimation faster. Figure 2 and Table 2 show that Model-3 was an even better calibration model than Model-1 and Model-2. It had the highest R2 and the lowest number of LVs, suggesting that it was the best at suppressing the quality-irrelevant variables and further reducing the complexity of the model. It also was the most accurate at predicting the concentration values. This finding illustrates that multivariate PLS regressions are highly sensitive to the data sets used and that only large data sets that include data representing different combinations of conditions (i.e., various seawater matrices) will produce stable site-specific regressions.

The models were validated using the test sets support, and the results for COD and TOC estimation are compared in Figures 3 and 4. Table 2 shows that the correlation coefficient between the measured and the predicted COD and TOC using Model-3 were R2 = 0.974 and 0.9750, respectively. The developed and validated multivariate PLS models produced highly satisfactory predictions of important seawater quality indices.

UV-Vis spectroscopy is commonly applied to freshwater and wastewater analysis. Our study demonstrated UV-Vis spectrophotometry combined with multivariate PLS regression analysis can also be applied to seawater analysis. It should be also mentioned that although the PLS models developed in this work is promising in term of its high performance of prediction of seawater quality indices, some other issues, such as chloride interference, effects of rainfall and pH changes, merit further study. COD and TOC measure the total amount of organic compounds and hence are highly dependent on the water matrix. It is possible to achieve improved performance by selection of sets of wavelengths that exhibit good sensitivity and linearity for the organics in seawater and do not emphasize characteristics of chloride and other potentially interfering components.

CONCLUSIONS

A series of UV-Vis spectra from seawater samples collected from Tianjin Bohai Bay combined with multivariate PLS regression analysis were used to develop calibration models for monitoring the COD and TOC of seawater. Experimental results showed that the predicted values fit the analytical values well, with high R2 values and small RMSECV values. These findings illustrate the effectiveness of using information extracted from UV-Vis spectra for seawater quality monitoring. Compared to existing techniques, this novel method has marked advantages, including higher precision, expeditiousness, cleanliness, and safety, and does not require expensive and toxic reagents.

ACKNOWLEDGEMENT

This work was funded by the Tianjin Oceanic Administration (Promote Marine Program no. KJXH2011-11).

REFERENCES

REFERENCES
APHA-AWWA-WEF
1999
Standard Methods for the Examination of Water and Wastewater
.
20th edn.
American Public Health Association/American Water Works Association/Water Environment Federation
,
Washington, DC, USA
.
AQTSPRC
2007
The Specification for Marine Monitoring Part 4: Seawater analysis
.
The Administration of Quality and Technical Supervision of the People's Republic of China
,
Beijing, China
.
Bastien
P.
Vinzi
V. E.
Tenenhaus
M.
2005
PLS generalised linear regression
.
Computational Statistics & Data Analysis
48
(
1
),
17
46
.
Bonastre
A.
Ors
R.
Capella
J. V.
Fabra
M. J.
Peris
M.
2005
In-line chemical analysis of wastewater: Present and future trends
.
Trends in Analytical Chemistry
24
(
2
),
128
137
.
Cha
S. M.
Ham
Y. S.
Ki
S. J.
Lee
S. W.
Cho
K. H.
Park
Y.
Kim
J. H.
2009
Evaluation of pollutants removal efficiency to achieve successful urban river restoration
.
Water Science & Technology
59
(
11
),
2101
2109
.
Fogelman
S.
Blumenstein
M.
Zhao
H.
2006
Estimation of chemical oxygen demand by ultraviolet spectroscopic profiling and artificial neural networks
.
Neural Computing & Applications
15
(
3–4
),
197
203
.
Geladi
P.
Kowalski
B.
1986
Partial least-squares regression: a tutorial
.
Analytica Chimica Acta
185
,
1
17
.
Hoskuldsson
A.
1988
PLS regression methods
.
Journal of Chemometrics
2
(
3
),
211
228
.
Mark
H.
2000
Quantitative spectroscopic calibration
. In:
Encyclopedia of Analytical Chemistry
(
Meyers
R. A.
ed.
, ed.).
John Wiley & Sons Ltd
,
Chichester
, pp.
1
19
.
Namieśnik
J.
2000
Trends in environmental analytics and monitoring
.
Critical Reviews in Analytical Chemistry
30
(
2–3
),
221
269
.
Otto
M.
1999
Pattern recognition and classification
. In:
Chemometrics: Statistics and Computer Application in Analytical Chemistry
(
Otto
M.
, ed.).
Wiley-VCH
,
Weinheim, Germany
, pp.
128
130
.
Platikanov
S.
Rodriguez-Mozaz
S.
Huerta
B.
Barceló
D.
Cros
J.
Batle
M.
Poch
G.
Tauler
R.
2014
Chemometrics quality assessment of wastewater treatment plant effluents using physicochemical parameters and UV absorption measurements
.
Journal of Environmental Management
140
,
33
44
.
Pouet
M. -F.
Thomas
O.
Jacobsen
B. N.
Lynggaard-Jensen
A.
Ecole des Mines d'Ales
1999
Conclusions of the workshop on methodologies for wastewater quality monitoring
.
Talanta
50
(
4
),
759
764
.
SEPA
2002
The Analytical Method of Monitoring Water and Waste Water
.
4th edn
.
State Environmental Protection Administration
,
Beijing, China
.
Thomas
O.
Constant
D.
2004
Trends in optical monitoring
.
Water Science & Technology
49
(
1
),
1
8
.