ABSTRACT
This study investigated the characteristics of dissolved organic matter (DOM) in two distinct water bodies, through the utilization of three-dimensional fluorescence spectroscopy coupled with self-organizing map (SOM) methodology. Specifically, this analysis concentrated on neurons 3, 14, and 17 within the SOM model, identifying notable differences in the DOM compositions of a coal subsidence water body (TX) and the MaChang Reservoir (MC). The humic substance content of DOM TX exceeded that of MC. The origin of DOM in TX was primarily linked to agricultural inputs and rainfall runoff, whereas the DOM in MC was associated with human activities, displaying distinctive autochthonous features and heightened biological activity. Principal component analysis revealed that humic substances dominated the DOM in TX, while the natural DOM in MC was primarily autochthonous. Furthermore, a multiple linear regression model (MLR) determined that external pollution was responsible for 99.11% of variation in the humification index (HIX) of water bodies.
HIGHLIGHTS
Study of the variation in DOM spectral characteristics of different waterbodies.
Neural networks were applied to spectral resolution.
Source analysis was performed to identify the origin of water pollution.
INTRODUCTION
Dissolved organic matter (DOM) is a soluble fraction of organic matter, consisting of humic-like substances, proteins, hydrophilic organic acids, and polysaccharides, with complex chemical properties and environmental effects (Zhang et al. 2020). DOM exists widely in aquatic environments, such as lakes, rivers, and soils, typically originating from rainfall runoff, soil leaching, surface runoff, and sewage discharge (Inamdar et al. 2012). DOM rich in elements (such as C, N, and P) affects carbon and nutrient cycles, participating in geochemical and photochemical reactions (Fellman et al. 2011; Shatrughan et al. 2013). Previous studies have shown that the chemical composition and properties of DOM in aquatic environments are closely related to their source, while also being affected by the soil–water transport of DOM and release processes from submerged silt (Sulzberger & Durisch-Kaiser 2009; Yang et al. 2014). Therefore, understanding the sources and characteristic properties of DOM in aquatic environments is essential to better understand the role of this complex fraction in different ecosystems.
A considerable body of research is available on DOM in water (Zhang et al. 2010). Studies indicated that the dynamic changes in DOM and the release of endogenous nutrients can contribute to eutrophication, such as in the Chaohu Lake (China), causing the deterioration of lake ecosystems (Bao et al. 2023). In coal-mining areas, high groundwater levels can lead to land subsidence, with only about half of the subsidence areas in Huainan (China) being actually submerged in water (Hu et al. 2014; Xiao et al. 2014). Large-scale ground subsidence is an important issue and it alters the local topography and brings about fundamental changes in the structure and function of local ecosystems, significantly affecting the surrounding environment. Currently, some coal-mining subsidence areas have been designated as water source protection zones, fishing areas, and wetland ecological restoration zones, exhibiting considerable variations in their nutrient status. With the continuous increase in coal-mining subsidence areas and the ongoing deterioration of mining area ecological environments, some subsidence areas are experiencing aquatic eutrophication or severe level of pollution.
The self-organizing map (SOM) methodology is an unsupervised artificial neural network (ANN) technique that exhibits powerful clustering capabilities when dealing with nonlinear data, constructing a competitive learning network by Santos et al. (2020) reducing high-dimensional data matrices into one- or two-dimensional topological structures. The SOM model is composed of interconnected neurons forming an adaptive system. During the learning phase, these neurons adjust their structure and weight vectors between neurons based on the specific input features, minimizing the overall error (Kohonen 2013). As an unsupervised clustering algorithm, SOM classifies all participating objects, without specifying the name of the classified object. Compared to parallel factor analysis (PARAFAC), which has been widely used for the analysis of fluorescence data, the SOM approach demonstrates higher noise tolerance, without the need for a cumbersome data preprocessing stage. Additionally, SOM has excels in the analysis of short-excitation wavelength fluorescence data (Jin et al. 2023). By employing SOM models to predict fluorescence data from two different types of water body, the distribution characteristics of DOM can be effectively established.
This study utilized the three-dimensional fluorescence spectroscopy (3D-EEMs) technology combined with an SOM model to investigate samples collected from the MaChang Reservoir (MC) (Kongxiangdian, Huainan City, China) and a coal subsidence waterbody in Suntuan (Huaibei City, China). The relationships between DOM content and structure were investigated in both the coal subsidence waterbody and reservoir samples, while also determining other water quality parameters, for establishing distinct nutrient patterns for different waterbodies.
MATERIALS AND METHODS
Study area
Sample analysis
Temperature (T), conductivity (EC), salinity, pH, turbidity, total dissolved solids (TDS), and dissolved oxygen (DO) were measured in the field, while total nitrogen (TN), ammonia nitrogen (), nitrate nitrogen (
), total phosphorus (TP), and chemical oxygen demand (COD) were determined in the laboratory, as shown in Table 1. Water samples were filtered through 0.45 μm glass fiber membranes for three-dimensional fluorescence spectra (EEMs) and ultraviolet-visible absorption spectra (UV-vis) analysis.
Water indices and the experimental methods applied
Parameter . | Experimental method . |
---|---|
TN | Water quality – determination of the total nitrogen-alkaline potassium persulfate digestion UV spectrophotometric method (HJ 636-2012) |
![]() | Water quality – determination of ammonia nitrogen-Nessler's reagent spectrophotometry (HJ 535-2009) |
![]() | Water quality – determination of nitrate-nitrogen-ultraviolet spectrophotometry (HJ/T 346-2007) |
TP | Water quality –determination the of total phosphorus-ammonium molybdate spectrophotometric method (GB 11893-89) |
COD | Water quality – determination of the chemical oxygen demand-dichromate method (HJ 828-2017) |
Parameter . | Experimental method . |
---|---|
TN | Water quality – determination of the total nitrogen-alkaline potassium persulfate digestion UV spectrophotometric method (HJ 636-2012) |
![]() | Water quality – determination of ammonia nitrogen-Nessler's reagent spectrophotometry (HJ 535-2009) |
![]() | Water quality – determination of nitrate-nitrogen-ultraviolet spectrophotometry (HJ/T 346-2007) |
TP | Water quality –determination the of total phosphorus-ammonium molybdate spectrophotometric method (GB 11893-89) |
COD | Water quality – determination of the chemical oxygen demand-dichromate method (HJ 828-2017) |
DOM measurement and characterization
The absorbance of water samples was measured (200–800 nm) using a UV-Vis photometer (N5000PLUS, Yoke, Shanghai, China). Milli-Q water was applied as the blank sample for determinations, which were performed using a quartz cuvette with a 1 cm optical range. The scanning interval was 1 nm and spectral calibration was performed using the 680–800 nm band. EEMs’ data were obtained for water samples (F-7000, Hitachi, Japan) using an excitation wavelength (Ex) from 200 to 500 nm, an emission wavelength (Em) from 200 to 550 nm, a scanning speed of 1,200 nm/min, and a scanning interval of 5 nm. Measurements were done using a 1 cm quartz cuvette and EEMs data for ultrapure water were simultaneously recorded as a quality control reference.
Data analysis and processing tools
This study employed Excel and IBM SPSS Statistics v.27 for statistical correlation analysis, while principal component analysis (PCA) was conducted using the ‘ggplot2’ and ‘ggrepel’ packages of R v.4.2.3. SOM processing of EEM data, determination of the K-means clustering algorithm, DBI index analysis, and the establishment of SOM models were performed using the ‘dream’, ‘statisticstoolbox’, and ‘somtoolbox’ tools in Matlab v.2022b (Agoubi 2018; Chen et al. 2022). The PARAFAC model was established using the ‘stardom’ v.1.1.21 package in R (Krylov et al. 2020), while graphical representations were prepared using ArcMap v.10.8.1, Bigemap GIS Office, Origin v.2022b (educational version), Matlab, and the ggplot2 package in R.
RESULTS AND DISCUSSION
SOM model analysis
The SOM model was used to cluster the EEMs dataset, with the clustering results then used as the output for the data mapping method. The SOM model algorithm consisted of two steps: self-organizing training and output feature mapping, requiring the adjustment of weights and neural network parameters in the input layer to connect output layer neurons with input layer neurons. The specific steps involved in this process have been described in detail in previously reported literature (Bieroza et al. 2011).
SOM-based visualization diagram of the neuron 3, neuron 14, and neuron 17.
Three-dimensional fluorescence spectral information or neuron 3, neuron 14, and neuron 17.
Three-dimensional fluorescence spectral information or neuron 3, neuron 14, and neuron 17.
Under natural conditions, the composition of DOM is complex and susceptible to the effects of biogeochemical processes and human activities (Du et al. 2021). TX was located near residential villages and agricultural land, resulting in a significant influence from external inputs, with humic substances being dominant. Both MC and TX contained humic substance signals in the visible light range, which are primarily generated associated with forested streams, agricultural discharge, and wetlands, with a combined influence from terrestrial inputs, water microbial activity, and photochemical oxidation processes (Fellman et al. 2008). In the case of MC, humic substances were likely to be produced by aquatic plants, other organisms and biological processes within the waterbody, while also being influenced by sewage discharge from surrounding anthropogenic activities although these inputs have a relatively low humic substance content (Yu et al. 2020). Additionally, the poor mobility of water and the long hydraulic retention time, coupled with the significant influence of bioavailable small molecules (anthropogenic origin), contributed to the high humic substance content in the water (Zhang et al. 2006; Solomon et al. 2015; Gao & Wang 2019).
Fluorescence parameters
The fluorescence indices, such as the fluorescence index (FI), biological index (BIX), and humification index (HIX), were analyzed to determine the spectral characteristics of DOM in water (Huguet et al. 2009; Lavonen et al. 2015), with the results presented in Table 2. The fluorescence indices FI and BIX exhibited no significant differences between the two waterbody types. The FI values for both waterbodies ranged from 1.4 to 1.9, indicating both external and internal DOM sources. Similarly, analysis of the biological index BIX showed that the average BIX values for DOM ranged between 0.8 and 1.0, suggesting that DOM originated from both terrestrial and internal sources, which is consistent with the FI index results.
Water fluorescence index
Water . | FI . | BIX . | HIX** . |
---|---|---|---|
MC | 1.73 ± 0.11 | 0.98 ± 0.056 | 0.66 ± 0.059 |
TX | 1.69 ± 0.13 | 0.95 ± 0.063 | 0.85 ± 0.034 |
Water . | FI . | BIX . | HIX** . |
---|---|---|---|
MC | 1.73 ± 0.11 | 0.98 ± 0.056 | 0.66 ± 0.059 |
TX | 1.69 ± 0.13 | 0.95 ± 0.063 | 0.85 ± 0.034 |
Note: Numerical values are presented as mean ± standard deviation (SD).
**denotes significant differences between the same indicators (P < 0.01).
The degree of DOM humification in water is characterized using the HIX, where a higher HIX value indicates a higher degree of DOM aromaticity (Huguet et al. 2009). The mean HIX value for TX (0.66 ± 0.059) was significantly higher than for MC (0.85 ± 0.034) (P < 0.01). However, both HIX values were <1.5, indicating that the DOM in water exhibited the characteristics of weakly humified substances, suggesting that in the short term, DOM was primarily influenced by algal or bacterial organic matter.
DOM in natural water bodies is primarily influenced by terrestrial inputs and microbial turnover (Johnston et al. 2019). The endogenous fluorescent signals present in MC water samples were pronounced, resembling those reported for artificially regulated water (Jiang et al. 2018; Yingxin et al. 2021). The MC BIX values further confirmed that biogenic DOM production was stimulated by anthropogenic and natural factors. FI values indicated that endogenous and exogenous sources had a combined influence, with anthropogenic inputs having a dual effect on DOM microbial processes. On the one hand, anthropogenic DOM inputs enhance microbial activities in the water, leading to the degradation of external inputs and an increase in the endogenous contribution (Lambert et al. 2017). On the other hand, anthropogenic inputs disrupt the balance of DOM in the aquatic environment, disturbing the aquatic environment, and hindering microbial adaptation to environmental changes, thereby suppressing the growth of endogenous microbial sources within a certain timeframe (Butman et al. 2015).
Anthropogenic influences can alter the composition of DOM in natural water bodies, affecting source determination. Previous studies have indicated that agricultural land can be responsible for a significant influx of terrestrial soil into waterbodies, leading to an increase in exogenous DOM (Wilson & Xenopoulos 2009). The high frequency of agricultural activities around TX is likely to contribute to changes in the nutrient composition and organic content of water, leading to the enrichment of terrestrial and biogenic humic substances. The input of large amounts of bioavailable organic matter alters the metabolism of microbes and algae within the waterbody, increasing DOM degradation and reducing its production, resulting in endogenous sources being dominant. This pattern is similar to those reported previous studies on the contribution of anthropogenic sewage discharge on DOM in natural water bodies (Williams et al. 2016).
Principal component analysis
Physical–chemical indicators determined for each waterbody
Water type . | EC (μS·cm−1) . | TDS (g·L−1) . | pH . | DO (mg·L−1) . | TN** (mg·L−1) . | TP** (mg·L−1) . | ![]() | ![]() | COD**(mg·L−1) . |
---|---|---|---|---|---|---|---|---|---|
MC | 545.85 ± 184.61 | 0.35 ± 0.12 | 8.12 ± 0.29 | 8.72 ± 0.65 | 3.40 ± 1.46 | 0.22 ± 0.15 | 0.97 ± 0.59 | 0.39 ± 0.15 | 19 ± 5 |
TX | 648.97 ± 50.67 | 0.42 ± 0.034 | 8.22 ± 0.92 | 8.85 ± 2.16 | 0.79 ± 0.14 | 0.71 ± 0.23 | 0.11 ± 0.01 | 0.31 ± 0.13 | 72 ± 20 |
Water type . | EC (μS·cm−1) . | TDS (g·L−1) . | pH . | DO (mg·L−1) . | TN** (mg·L−1) . | TP** (mg·L−1) . | ![]() | ![]() | COD**(mg·L−1) . |
---|---|---|---|---|---|---|---|---|---|
MC | 545.85 ± 184.61 | 0.35 ± 0.12 | 8.12 ± 0.29 | 8.72 ± 0.65 | 3.40 ± 1.46 | 0.22 ± 0.15 | 0.97 ± 0.59 | 0.39 ± 0.15 | 19 ± 5 |
TX | 648.97 ± 50.67 | 0.42 ± 0.034 | 8.22 ± 0.92 | 8.85 ± 2.16 | 0.79 ± 0.14 | 0.71 ± 0.23 | 0.11 ± 0.01 | 0.31 ± 0.13 | 72 ± 20 |
Note: Numerical values are presented as mean ± standard deviation (SD).
**denotes significant differences between the same indicators (P < 0.01).
To simplify the structure of the factor loading matrix and aid in source apportionment, two principal components extracted from PARAFAC were utilized as factors. The contribution rate of the first principal component was 54.4%, which was characterized by a higher loading of factor variables in fluorescence components and TN, this component was primarily influenced by the proteinaceous fraction (Figure 6(a)). The contribution rate of the second principal component was 12.9%, exhibiting a high loading on the BIX. Previous studies showed that humic substances produced by microbial activities exhibit a good correlation with nitrogen-containing elements (Hur & Cho 2012), suggesting that the second principal component primarily reflects the influence of endogenous microbial sources (Figure 6(a)).
The two water samples were subjected to Adonis multivariate analysis of variance (based on Euclidean distance) with 999 permutations (P < 0.001) (Sylvain et al. 2019). As shown in Figure 6(b), TX was distributed along the positive axis of PC1, indicating that TX DOM was dominated by humic substances, while MC DOM was associated with autochthonous sources.
Source apportionment


Contribution of public factors to indicators
Parameter . | Contribution rate % . | Measured average concentration . | Predicted average concentration . | R2 . | |
---|---|---|---|---|---|
F1 exogenous pollution . | F2 endogenous pollution . | ||||
FI | 87.21% | 12.79% | 1.72 ± 0.11 | 1.72 ± 0.074 | 0.66 |
HIX | 99.11% | 0.89% | 0.72 ± 0.10 | 0.72 ± 0.081 | 0.82 |
BIX | 20.94% | 79.06% | 0.97 ± 0.063 | 0.97 ± 0.047 | 0.74 |
TN | 78.51% | 21.49% | 2.59 ± 1.61 | 2.59 ± 1.55 | 0.97 |
NH + 4-N | 68.85% | 31.35% | 0.74 ± 0.65 | 0.74 ± 0.57 | 0.88 |
NO- 3-N | 43.69% | 56.31% | 0.36 ± 0.064 | 0.36 ± 0.053 | 0.83 |
Parameter . | Contribution rate % . | Measured average concentration . | Predicted average concentration . | R2 . | |
---|---|---|---|---|---|
F1 exogenous pollution . | F2 endogenous pollution . | ||||
FI | 87.21% | 12.79% | 1.72 ± 0.11 | 1.72 ± 0.074 | 0.66 |
HIX | 99.11% | 0.89% | 0.72 ± 0.10 | 0.72 ± 0.081 | 0.82 |
BIX | 20.94% | 79.06% | 0.97 ± 0.063 | 0.97 ± 0.047 | 0.74 |
TN | 78.51% | 21.49% | 2.59 ± 1.61 | 2.59 ± 1.55 | 0.97 |
NH + 4-N | 68.85% | 31.35% | 0.74 ± 0.65 | 0.74 ± 0.57 | 0.88 |
NO- 3-N | 43.69% | 56.31% | 0.36 ± 0.064 | 0.36 ± 0.053 | 0.83 |
Comparison of the predicted and measured concentrations for pollution parameters and indices.
Comparison of the predicted and measured concentrations for pollution parameters and indices.
Based on the constructed APCS-MLR and pollution source contribution models, the contributions of pollution sources to water parameters were deduced (Table 4), showing that exogenous pollution contributed 99.11% of HIX. Therefore, nutrient-rich nitrogen pollutants in the water originated from agricultural and domestic sewage discharges in the surrounding areas. Autochthonous pollution contributed 79.06% of BIX, indicating that microbial activities were the primary endogenous source. These calculations demonstrate that the sources of DOM were influenced by exogenous and endogenous pollution, in which exogenous pollution was mainly attributed to anthropogenic inputs and agricultural non-point source pollution, while endogenous sources were primarily influenced by the intensification of microbial activities.
CONCLUSIONS
(1) The content of humic components affected by agricultural non-point source pollution was higher in the DOM from the subsidence waterbody than from natural reservoir water.
(2) The results of the SOM model showed that neurons 14 and 17 (a) representing water samples from the MC exhibited characteristics of protein-like substances, indicating a higher contribution from autochthonous biological activity. Neurons 3 and 17 (b), representing humic-like substances, suggested that the DOM in TX mainly originated from agricultural non-point source pollution, while the humic substances in MC were associated with anthropogenic activities in the surrounding area.
(3) The APCS-MLR model results indicated that humic-like components of DOM in the water primarily originated from agricultural non-point source pollution, with an exogenous pollution contribution rate to HIX of 99.11%. The contribution rate of endogenous sources to BIX was 79.06%, suggesting that proteins in the water originated from plankton and microorganisms.
(4) PCA yielded consistent results in SOM model analysis. The SOM model, based on the original excitation–emission matrix fluorescence spectroscopy (EEMs) data for water samples, effectively analyzed the components of DOM in both waterbodies. Simultaneously, the multiple linear regression model can be used to incorporate monitoring data from various waterbodies to help predict the relative contribution of different pollution sources, providing support for environmental pollution source apportionment studies.
DISCLOSURE STATEMENT
This study was supported by the Anhui Provincial Key Research and Development Project (202004i07020012).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.