## Abstract

Passive air samplers have proven to be a widely used technique in measuring hydrophobic organic compounds (HOCs) in the atmosphere. Chemical partition coefficients between the sorbent material and air (*K*_{PE-a}) are significant for calculating chemical concentrations from passive sampling devices. We have established a theoretical linear solvation energy relationship (TLSER) model for predicting *K*_{PE-a} values by molecular descriptor. With the TLSER model, McGowan volume (*V*), chemical hardness (*η*) and dipole moment (*DM*) were screened as the most relevant variables. The model had a high correlation coefficient (*R*^{2}), leave-one-out cross-validation coefficient (*Q*^{2}_{LOO}) and bootstrapping coefficient (*Q*^{2}_{BOOT}). To our knowledge, this is the first attempt to predict the LDPE-air partition coefficient using the TLSER model. Statistical parameters, determination coefficient (*R*^{2}) and cross-validation coefficients (*Q*^{2}) ranged from 0.896 to 0.931 and 0.883 to 0.909, respectively, which indicated that the TLSER model appropriately fitted the results, and also showed robustness and predictive capacity. Mechanism interpretation suggested that the factors governing the partition process for LDPE and air were the McGowan volume and molecular orbital energies. The results of this study provide a good tool for predicting log *K*_{PE-a} values of HOCs, within the applicability domains to reduce cost and time for innovation.

## INTRODUCTION

Hydrophobic organic compounds (HOCs) are usually carcinogenic, teratogenic and mutagenic, and pose a great danger to the natural and human survival environment (Fujii *et al.* 2007). The atmosphere is one of the core matrices for global monitoring of persistent organic pollutants (POPs). The urgent need for the effective evaluation of air pollutants has contributed to the innovative development of atmospheric POPs sampling technology. Great progress has been made in the development and application of passive atmospheric sampling (PAS) technology in the last decade (Hazrati & Harrad 2007; Tuduri *et al.* 2012). PAS is becoming an effective complementary sampling technology for high volume air samplers, with its extraordinary advantage in synchronic monitoring of POPs in the atmosphere across vast territories. Passive sampling technology is a kind of equilibrium sampling technology for enriching organic pollutants in environmental media based on molecular diffusion or the infiltration principle. It integrates sampling, separation, concentration and analysis without the need for external energy. Many different materials have been used as sorbents in various passive sampling devices, including semipermeable membrane devices (Huckins *et al.* 1990; Petty *et al.* 1993; Lohmann *et al.* 2001; Khairy & Lohmann 2014), low density polyethylene (LDPE) ﬁlm (Bartkow *et al.* 2006; Denberg *et al.* 2007; Khairy & Lohmann 2014; Zhu *et al.* 2018), and polyurethane foam (PUF) devices (Jaward *et al.* 2004, 2005; Tuduri *et al.* 2006; Hazrati & Harrad 2007; Devi *et al.* 2011). Among them, previous studies have shown that a thin sheet of LDPE is the cheapest, simplest effective material for measuring many different classes of HOCs in the field and in laboratory experiments (Ockenden *et al.* 1998; Lohmann *et al.* 2001; Harner *et al.* 2003; Jaward *et al.* 2004; Khairy & Lohmann 2014; Smedes *et al.* 2017).

The partition coefficients of HOCs between LDPE and air (*K*_{PE-a}) are vitally important for designing passive sampler devices and calculating chemical concentrations of the air phase from LDPE. The equilibrium partition coefficients are required to calculate the sampling rates and estimate the concentrations of compounds. However, it is difficult to measure *K*_{PE-a} for all potential pollution, since experimental determination of *K*_{PE-a} is generally laborious, time-consuming and expensive (Khairy & Lohmann 2014).

So it would be helpful to develop mathematical models for predicting *K*_{PE-a} values with the physicochemical properties of chemicals. Several models have been proposed by research groups for calculating *K*_{PE-a} values of HOCs based on the logarithm of octanol-air partition coefﬁcient (log *K*_{oa}) (Khairy & Lohmann 2013), but the *K*_{PE-a} – *K*_{oa} relation for organochlorine pesticides (OCPs) explained only 32% of the total data. A strong linear relationship between log *K*_{PE-a} and their subcooled liquid vapor pressures (log *P*_{L}) was also investigated for 49 HOCs. This field calibration model was only performed for polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs), polybrominated diphenylethers (PBDEs) and OCPs. And *K*_{PE-a} values are not available for other classes of HOCs.

The linear solvation energy relationship (LSER) and the theoretical linear solvation energy relationship (TLSER) are considered as useful modeling methods for predicting partition coefficients or toxicity of solutes (Lyakurwa *et al.* 2014). LSER descriptors based on linear free energy relationships are successful in correlating a wide range of chemical and physical properties involving solute-solvent interactions, which specifically identify and evaluate the individual solute-solvent interactions. The coefficients of the molecule descriptors in the correlation equation are expected to provide insight into the physical nature of the solute-solvent interactions related to the experimentally observed phenomena or data.

TLSER has been proven to be useful in three areas (Karelson *et al.* 1996), the first being its ability to calculate easily the parameters of most chemical species. The relevant capabilities of the method can be significantly improved, by increasing the number of compounds available for the data set, and by increasing the ease of calculation of the parameters. Second, the resulting correlations relate an empirical property to molecular parameters. In this way, how changes in molecular structures affect the observed property can be deduced (or identified). Third, since the parameters are orthogonal, problems of cross-correlation between independent variables are usually minimized.

The LSER parameters are usually obtained from experimental measurements, which are not readily available for most of the polar and non-polar neutral organic chemicals. It is an important means to overcome the insufficiency of the classical LSER model by using the quantum chemical parameters of the theoretical calculation to replace the molecular parameters measured by the experiment or to develop a model that can predict the LSER parameters. The TLSER parameters can be calculated by quantum chemical methods, which is more convenient to develop a model.

Therefore, the aim of this study was to develop a TLSER model by quantum chemical descriptors for predicting polyethylene-air partition coefficients. The TLSER models were based on a large data set, including 97 compounds, including most common HOCs, within the applicability domains to reduce experimental cost and time for innovation.

## MATERIALS AND METHODS

### Data sets

The experimental data sets of *K*_{PE-a} were collected from literature (Tables S1–S5 in the supplementary material, available with the online version of this paper) for five different classes of HOCs, including 41 PAH congeners, 22 PCB congeners, 4 PBDEs, 10 different OCPs, and 20 polychlorinated dibenzo-p-dioxins (PCDDs) and dibenzofurans (PCDFs).

The total data set covers 97 compounds with a wide range of log *K*_{PE-a} values (from 4.93 to 11.7). The normal distribution for the total data set with a mean of 8.40 and a standard deviation of 1.42 is shown in Figure 1. All compounds were randomly divided into a training set and validation set in the ratio of 4:1. The data in the training set was used to establish the model and the data in the validation set was used to validate the model.

### Calculation of molecular descriptors

The initial organic molecular structure was generated by ChemBio3D Ultra 12.0 software. The molecular structures were optimized with the MM2 method contained in the ChemBio3D software (at the minimum root mean square (RMS) gradient = 0.001) (Zhu *et al.* 2018). Then, MOPAC2016 software was applied to generate molecular descriptors (MTLSER parameters) with the minimize energy/geometry method, using keywords: PM7, Dielectric Constant = 78.6, EF, GNORM = 0.01, MULLIK POLAR SHIFT = 80 (MOPAC2016 & Stewart 2016). Based on the optimized geometric structures from MOPAC, molecular descriptors were calculated by using the PaDEL-Descriptor software (Version 2.21) (Yap 2011). As a result, 1051 descriptors were retained in the final set.

### Model development and evaluation

*et al.*1993; Lyakurwa

*et al.*2014): In the scheme of this model, property represents solute-solvent interactions. The McGowan volume (

*V*) describes the cavity term. The dipole moment (

*DM*) represents the polarity term. The

*q*and

^{−}*q*are the net charge of the most negative and most positive atoms, respectively, which describe the hydrogen bonding term. The electron donor-acceptor terms are represented by chemical hardness (

^{+}*η,*(E

_{LUMO}– E

_{HOMO})/2), chemical potential (

*μ,*(E

_{LUMO}+ E

_{HOMO})/2), ionization potential (

*I*, -E

_{HOMO}), electrophilicity index (ω,

*μ*

^{2}/2

*η*) and electron affinity (

*A*, -E

_{LUMO}).

The lowercase letters *a, b, c, d, e*, *f, g, h* and *i* are regression coefficients. The best model was selected, with the lowest number of chemical descriptors and the maximum values of the determination coefﬁcient (*R*^{2}_{adj}). The statistical parameters *R*^{2}_{adj}, leave-one-out cross-validated (*Q*^{2}_{LOO}), bootstrap method (*Q*^{2}_{BOOT}) (1/5, 5,000 iterations) and the root mean squared error (*RMSE*) were calculated to assess the predictive ability of the model. *R*^{2}_{ext}, *RMSE*_{ext} and external explained variance (*Q*^{2}_{ext}) were employed to evaluate the predictive capacity of the models. The severity of multicollinearity of each predictor variance was also characterized by the variance inflation factor (*VIF*). The applicability domain (AD) was characterized by the Euclidean distance method (OECD 2007). The calculation method of the Williams plot and the statistical parameters of the models are presented in the supplementary material (available with the online version of this paper).

## RESULTS AND DISCUSSION

### Development and validation of the model

*V*(McGowan volume),

*η*(chemical hardness) and

*DM*(dipole moment), was developed as follows:

*n*

_{tra=}77,

*R*

^{2}

_{adj}= 0.896,

*Q*

^{2}

_{LOO}= 0.883,

*Q*

^{2}

_{BOOT}= 0.887,

*RMSE*

_{tra}= 0.421,

*p*< 0.001,

*n*

_{ext}= 20,

*R*

^{2}

_{ext}= 0.931,

*Q*

^{2}

_{ext}= 0.909,

*RMSE*

_{ext}= 0.465, where

*n*

_{tra}and

*n*

_{ext}are the numbers of chemicals in the training set and the validation set, respectively.

The TLSER model showed satisfactory statistical parameters with *R*^{2}_{adj}, *Q*^{2}_{LOO} and *Q*^{2}_{BOOT} of 0.896, 0.883 and 0.887 in the training set, and *R*^{2}_{ext} and *Q*^{2}_{ext} of 0.931 and 0.909 in the validation set, respectively. According to the criteria (*Q*^{2} > 0.5, *R*^{2} > 0.6) proposed by Golbraikh *et al.* (2003), these statistical results suggested that the TLSER model had satisfactory stability and predictive ability. The results of the current study show that the importance of molecule descriptors follows the order: cavity term > electron donor-acceptor term > polarity term. The cavity term (McGowan volume*, V*) is the most significant variable in the TLSER model. The observations from this study are consistent with the results from Liu *et al.*'s (2017*)* investigations.

The model for HOCs had smaller *RMSE*_{tra} values (0.421 and 0.465), indicating goodness of fit and predictive capacity. The correlation between observed and predicted log *K _{PE-a}* values is presented in Figure 2. The external predictive ability of the model can be validated by a small

*RMSE*

_{ext}(0.465) and a high

*Q*

^{2}

_{ext}(0.909). The statistical results indicated that the predicted log

*K*values were consistent with the observed values in this study. In addition, the predictive variable involved in the models and corresponding Student's t test (

_{PE-a}*t*), statistical significance (

*p*) and variance inflation factor (

*VIF*) values are also listed in Table 1. The statistical parameters of the models indicated that there was no multicollinearity among predictive variables (descriptors), with the

*VIF*values <10 for all the predicted descriptors.

Model | Descriptor | t | p | VIF |
---|---|---|---|---|

TLSER | V | 18.328 | 0.000 | 1.638 |

η | −9.620 | 0.000 | 2.359 | |

DM | 2.085 | 0.000 | 2.361 |

Model | Descriptor | t | p | VIF |
---|---|---|---|---|

TLSER | V | 18.328 | 0.000 | 1.638 |

η | −9.620 | 0.000 | 2.359 | |

DM | 2.085 | 0.000 | 2.361 |

### Applicability domain

The evaluation results of Euclidean distance for the TLSER model are shown in Figure 3. All the compounds in the training and validation sets were located at the accepted domain, which indicated that the training sets for the TLSER model had excellent representativeness. Considering the standardized residuals (δ), there was one chemical (2,3,7-chlorinated dibenzo-p-dioxin) in the training set with |δ| > 3, identified as an outlier (Figure 4). This showed that there are no outliers in the TLSER model except for the dioxin. But dioxin was in the TLSER model because other chemicals with similar structure were calculated correctly (Lyakurwa *et al.* 2014). The overestimated log *K*_{PE-a} values for dioxin may have resulted from the limited number of data points for this class of compounds, or the observed (experimental) values have been inaccurate (Liu *et al.* 2017). Experimental errors might be the reason why dioxin was an outlier 3δ*** as discussed by Gramatica (2007). This erroneous prediction could probably be attributed to incorrect experimental data rather than to molecular structure. And there was no significant impact on the model when the outlier (2,3,7-chlorinated dibenzo-p-dioxin) was removed from the training set. In general, the AD of the TLSER model covered a multitude of structurally diverse compounds, including PAHs, PBDEs, PCBs, OCPs, polychlorinated dibenzo-p-dioxins and dibenzofurans.

### Mechanism interpretation

In the TLSER model, *V*, *η* and *DM* are the most important descriptors. Among them, *V* and *DM* were positively correlated with log *K*_{PE-a}, and *η* was negatively correlated with log *K*_{PE-a}. *V* is the McGowan volume. This descriptor, which describes the cavity term, has been widely used to analyze physicochemical properties of solutes in chemistry. It can be calculated from the sum of all atomic contributions for the compound, subtracting 6.56 cm^{3} mol^{−1} for each bond, regardless of whether it is a single, double or triple bond (Zhao *et al.* 2003a; Liu *et al.* 2017). The McGowan volume is a measure of the energy needed to overcome the solvent molecular interactions to form a cavity for the solute molecule. Furthermore, the McGowan volume is also related to polarizability and/or hydrophobicity of the solute. So, the McGowan volume can be a measure of the exoergic solute-solvent general dispersion interaction that arises through solute polarizability (Zhao *et al.* 2003b; Sprunger *et al.* 2007; Liu *et al.* 2017). The molecular orbital theory as formulated by Mulliken and Hund has been very successful in explaining and predicting chemical behaviors for an enormous number of molecules (Zhou & Parr 1990). The highest occupied molecular orbitals (HOMO) and the lowest unoccupied molecular orbitals (LUMO) play an important role in governing chemical reactions. The energy difference between the LUMO of an electron acceptor and the HOMO of an electron donor has long been used as a reactivity index (Zhou & Parr 1990). Previous investigations have revealed that the E_{HOMO}–E_{LUMO} gap is an important stability index for the individual species concerned (Zhou & Parr 1990). So, chemical hardness (*η,* (E_{HOMO}–E_{LUMO})/2) is a good index of orientation of electrophilic aromatic substitution. A large HOMO–LUMO gap implies high stability. High stability of a molecule reflects its lower reactivity toward chemical reactions in some sense (Zhou & Parr 1990). In the sense of lower reactivity in chemical reactions, a large *η* indicates high stability for the molecule of the compound. Dipole moment (*DM*) is an overall descriptor of the electronic interaction among the solvent and solute molecules. It is well known that *DM* is extremely important for various physicochemical properties, and many descriptions have been made to quantify the polarity effect. For example, molecular polarity indicates chromatographic retention on a polar stationary phase. The most obvious and often used quantity that describes polarity is the molecular dipole moment (Karelson *et al.* 1996). The dipole moment decreases the adsorption coefficients (*K*_{PE-a} value). This is reasonable, because greater *DM* would imply greater dipole-dipole and dipole-induced dipole interactions between solutes and air, resulting in a greater tendency to be partitioned into air and a lower toxicity (Chen *et al.* 1996). As demonstrated in this study, McGowan volume, chemical hardness and dipole moment were selected as the most relevant variables in the TLSER model. It can be seen that large molecules cannot pass through the membrane easily, and need more energy to permeate into the polymer membrane, which is not conducive to partitioning behaviors (Yang *et al.* 2007; Bao *et al.* 2011; Liu *et al.* 2017).

### Model comparison

Several researchers have developed different models for predicting the *K _{PE-a}* value (Table 2). Bartkow

*et al.*(2004) presented

*K*

_{PE-a}for three PAHs based on the comparison between their accumulation in PE film and gaseous concentrations from active high-volume sampling (Bartkow

*et al.*2004). In a follow-up study by Kennedy

*et al.*(2007), a more rigorous correlation of

*K*

_{PE-a}versus octanol-air partition coefficient (

*K*

_{oa}) was suggested for six PAHs. Ma

*et al.*(2010) reported a correlation of

*K*

_{oa}versus

*K*

_{aw}values for predicting

*K*

_{PE-a}.

*K*

_{PE-a}values reported by Khairy & Lohmann (2012) were in very good agreement with those obtained according to the equation proposed by Bartkow

*et al.*(2004), with factor difference less than 46% in all the investigated individual PAHs and similar to the prediction by Lohmann (2012) based on the partitioning of PAHs between PE and water (Khairy & Lohmann 2012). In previous work, Khairy & Lohmann (2013) reported that the

*K*

_{PE-a}versus

*K*

_{oa}relation for OCPs explained only 32% of the total variability in the data, and

*K*

_{oa}values were not suitable as predictors for

*K*

_{PE-a}. And they obtained a strong linear relationship between log

*K*

_{PE-a}and log

*P*

_{L}(subcooled liquid vapor pressure), which included only 59 compounds from three different chemical classes (PCBs, PBDEs and OCPs) (Khairy & Lohmann 2014). The relationship with log

*K*

_{PE-a}and log

*K*

_{oa}is shown in Figure S1 (in the supplementary material, available with the online version of this paper). A linear relationship between log

*K*

_{PE-a}and log

*K*

_{oa}, was obtained. This suggested that uptake of gas-phase HOCs by PE is not as efficient as their uptake by octanol, probably due to the rigid matrix of the polymer (Lohmann 2012). This field calibration model was only performed for PAHs, PCBs, PBDEs and OCPs. And

*K*

_{PE-a}are not available for other classes of HOCs (Table 2).

Parameter | m^{a} | n^{b} | R^{2} | Chemicals | RMSE^{c} | Reference |
---|---|---|---|---|---|---|

K_{oa} | 1 | 3 | 0.95 | PAHs | NR | Bartkow et al. (2004) |

K_{oa} | 1 | 6 | 0.84 | PAHs | NR | Kennedy et al. (2007) |

K_{oa} | 1 | 15 | NR | PAHs | NR | Ma et al. (2010) |

K_{oa} | 1 | 12 | 0.99 | PAHs and alkylated PAHs | 0.052 | Khairy & Lohmann (2012) |

P_{L} | 1 | 17 | 0.90 | OCPs | 0.19 | Khairy & Lohmann (2013) |

P_{L} | 1 | 59 | 0.96 | PCBs, PBDEs and OCPs | 0.22 | Khairy & Lohmann (2014) |

K_{oa} | 1 | 59 | 0.96 | PCBs, PBDEs and OCPs | 0.23 | Khairy & Lohmann (2014) |

V, η, DM | 3 | 97 | 0.90 | PAHs, PCBs, PBDEs, OCPs, PCDDs and PCDFs | 0.34 | Current study |

Parameter | m^{a} | n^{b} | R^{2} | Chemicals | RMSE^{c} | Reference |
---|---|---|---|---|---|---|

K_{oa} | 1 | 3 | 0.95 | PAHs | NR | Bartkow et al. (2004) |

K_{oa} | 1 | 6 | 0.84 | PAHs | NR | Kennedy et al. (2007) |

K_{oa} | 1 | 15 | NR | PAHs | NR | Ma et al. (2010) |

K_{oa} | 1 | 12 | 0.99 | PAHs and alkylated PAHs | 0.052 | Khairy & Lohmann (2012) |

P_{L} | 1 | 17 | 0.90 | OCPs | 0.19 | Khairy & Lohmann (2013) |

P_{L} | 1 | 59 | 0.96 | PCBs, PBDEs and OCPs | 0.22 | Khairy & Lohmann (2014) |

K_{oa} | 1 | 59 | 0.96 | PCBs, PBDEs and OCPs | 0.23 | Khairy & Lohmann (2014) |

V, η, DM | 3 | 97 | 0.90 | PAHs, PCBs, PBDEs, OCPs, PCDDs and PCDFs | 0.34 | Current study |

^{a}*m*: the number of parameters; ^{b}*n*: the number of compounds; ^{c}*RMSE*: the root mean squared error; *NR*: not reported; *K*_{oa}: the octanol-air partition coefficient; *P*_{L}: subcooled liquid vapor pressure; *V*: McGowan volume; *η* : chemical hardness; *DM*: dipole moment.

Few TLSER models existed in previous studies, which could predict the distribution behavior of molecules between LDPE and the gas phase. In the current study, a TLSER model was developed with the aim of optimally correlating with the LSER model and hence being generally applicable to solute/solvent interactions. Comparing with previous results, the models proposed in this study had many advantages, such as higher correlation coefficient (*R*^{2}), smaller value of *RMSE* and higher cross-validated coefficient (*Q*^{2}). Furthermore, the present model had a broad application domain, which included 97 compounds and contained much more structurally diverse compounds.

### Practical applications

PAS has proven to be a widely used technique for measuring HOCs in the atmosphere. Chemical partition coefficients between the sorbent material and air (*K*_{PE-a}) are significant for calculating chemical concentrations from passive sampling devices. We have established a TLSER model for predicting *K*_{PE-a} values by molecular descriptor. The model had high correlation coefficient (*R*^{2}), leave-one-out cross-validation coefficient (*Q*^{2}_{LOO}) and bootstrapping coefficient (*Q*^{2}_{BOOT}). Statistical parameters indicated that the TLSER model appropriately fitted the results, and also showed robustness and predictive capacity. The results of the current study provide a good tool for predicting log *K*_{PE-a} values of HOCs, within the applicability domains to reduce cost and time for innovation.

Furthermore, the current study proved the advantages of the TLSER model: (i) the chemical descriptors in the TLSER model can be easily obtained by computation instead of experimentation, so a large amount of expense and time can be saved, and (ii) the quantum chemical descriptors have clear physicochemical interpretations, and interpretation of the correlation equations can suggest modes of interaction between solute and solvent molecules. Future work should focus on developing tasks that assess the application of TLSER models for a wider range of chemicals, including emerging contaminants, the relationship between log *K*_{PE-a} and many different predictors, and ways to account for the McGowan volume, chemical hardness and dipole moment that result in distribution behavior in PE among classes of chemicals or individual chemicals.

## CONCLUSIONS

In the current study, the TLSER model was successfully constructed by quantum chemical descriptors for predicting low density polyethylene-air partition coefficients. The TLSER model was based on a large data set, including 97 compounds from five different chemical classes, which implied that it had a wide AD. With the TLSER model, McGowan volume, chemical hardness and dipole moment were screened as the most relevant variables. Mechanism explanation for the TLSER model was also performed. Statistical analysis suggested that the model had good adaptability and high prediction performance. This study demonstrated that the TLSER approach is useful to predict log *K*_{PE-a} values for complex molecules, within the applicability domains to reduce experimental cost and time for innovation. The method is easy to use and it has general applicability. Meanwhile, it can provide a theoretical basis and technical support for quickly obtaining the partition coefficient of HOCs in passive sampling materials.

## ACKNOWLEDGEMENTS

The study was supported by the National Natural Science Foundation of China (Grant No. 21607123) and the science and technology projects fund of Yangzhou City (Grant No. YZ2016112). This study was also sponsored by the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.