ABSTRACT
Escherichia coli and total coliforms are important tools for identifying potential faecal contamination in drinking water. However, metagenomics offers a powerful approach for delving deeper into a bacterial community when E. coli or total coliforms are detected. Metagenomics can identify microbes native to water systems, track community changes and potential pathogens introduced by contamination events, and evaluate the effectiveness of treatment processes. Here, we demonstrate how the dual application of traditional monitoring practices and metagenomics can improve monitoring and surveillance for water resource management. The robustness of long-read metagenomics across replicates is demonstrated by the effect and interaction between manganese filters and bacterial communities, as well as the impact of chlorination after coliform detection. These examples reveal how metagenomics can identify the complex bacterial communities in the distribution system and the source waters used to supply drinking water treatment plants (DWTPs). The knowledge gained increases confidence in identified causes and mitigations of potential contamination events. By exploring bacterial communities, we can gain additional insights into the impact of faecal contamination events and treatment processes. This insight enables more precise remediation actions and enhances confidence in communicating health risks to drinking water operators and the public.
HIGHLIGHTS
Full-length 16S rRNA amplicon sequencing aids species-level resolution of potential bacterial pathogen risks in drinking water.
Bacterial community analysis using amplicon metagenomics characterises disturbance and contamination events in drinking water.
Compared to total coliforms and E. coli in drinking water, bacterial community analysis provides greater resolution for mitigation efforts.
INTRODUCTION
Access to safe drinking water is a basic requirement in developed countries. However, ensuring the supply of clean drinking water requires vigilance and responsiveness to potential microbial contamination events. Additionally, climate change is affecting water resources, such as groundwater, rivers, and rainfall, and is exacerbating events such as drought, floods, and sea level rise (Boholm & Prutzer 2017; Dvorak et al. 2018; Bartlett & Dedekorkut-Howes 2022). This requires appropriate adaptation strategies to address the increased impacts of heavy rainfall on faecal contamination ingress to source waters and the effects of drought on microbial water quality.
In New Zealand, the current Drinking Water Quality Assurance Rules 2022 (Water Services Regulations 2022) refer to Escherichia coli and total coliforms (TCs) as key targets for monitoring microbial water quality in drinking water distribution systems and source waters. Monitoring E. coli by cultivation is a widely verified indicator for identifying faecal contamination in water and, hence, the potential for the presence of pathogens (Haramoto et al. 2012; Price & Wildeboer 2017; Hu et al. 2018). However, E. coli detection alone is unable to differentiate between animal sources of faecal contamination, where targeted genomic approaches have shown greater power (Derx et al. 2021; Demeter et al. 2023; Li et al. 2023; Liu et al. 2023a; Vargha et al. 2023; Wu et al. 2023). Attempts have been made to improve monitoring and surveillance for water resource management by implementing next-generation sequencing and metagenomic tools to characterise microbial contaminants (Del Olmo et al. 2021; Guo et al. 2021; Potgieter et al. 2021; Mahajna et al. 2022; Vargha et al. 2023). These tools allow us to investigate entire bacterial communities within water samples and identify the naturally present microbes (i.e., environmental sources) or those from external ingress. This can lead to the identification of health risks from, for instance, faecal contamination events (Ashbolt 2015; Thom et al. 2022).
Compared to surface waters (i.e., lakes and rivers), drinking water and groundwater are more challenging environments for metagenomic analyses due to lower microorganism concentrations, which can still present a health risk at low infectious doses. Even low concentrations of bacteria (103–106 cells/mL) in drinking water can be used to investigate the native microbial communities and how they are perturbed by adverse events and contaminants (Prest et al. 2016; Mahajna et al. 2022). Furthermore, longer-term studies of the microbial communities associated with water systems can provide insights into methods to improve the control of microbial biofilms in pipes (Potgieter et al. 2018; Del Olmo et al. 2021).
Metagenomic analyses can identify how bacterial communities are altered by the treatment and processing of drinking water (Potgieter et al. 2018; Schijven et al. 2019; Potgieter et al. 2021; Bai et al. 2022; Liu et al. 2023b). For instance, treatment is required for drinking water containing high manganese levels, as the consumption of manganese can lead to cognitive problems, muscle tremors, and headaches (Kullar et al. 2019). Additionally, elevated manganese levels have aesthetic impacts such as imparting an unpleasant metallic taste to food (Daughney 2003; Frisbie et al. 2012). Consequently, the New Zealand drinking water standards specify a maximum acceptable value (MAV) of <0.4 mg/L of manganese (Water Services Regulations 2022), and for aesthetic purposes, a limit guidance value of 0.04 mg/L (Aesthetic Values for Drinking Water Notice 2022). To reduce manganese concentrations, water can be treated using biological filters that pass aerated water through a sand filter, enriching bacteria capable of oxidising and removing manganese (Piazza et al. 2019; Yuan et al. 2023; Timmers et al. 2024).
Most metagenomic bacterial community analyses have used 250–500 base pair (bp) amplicon sequencing of the 16S ribosomal RNA (rRNA) gene for the assignment of bacterial taxonomies (Potgieter et al. 2018; Douterelo et al. 2020; Del Olmo et al. 2021; Thom et al. 2022). However, the relatively short 16S rRNA amplicons generated provide low taxonomic resolution, often only resolving to the genus level, thereby reducing the ability to discriminate between pathogenic species and closely related environmental species (Acharya et al. 2019; Johnson et al. 2019). Oxford Nanopore Technology (ONT) sequencing can generate longer sequencing reads of hundreds of thousands of bps and with relatively low capital costs (Jain et al. 2018). Improvements in nanopore sequencing, such as minimising basecalling errors and new chemistries, enable the generation of full-length (∼1,500 bp) 16S rRNA amplicons that can improve bacterial taxonomic assignment (Kerkhof 2021; Lu et al. 2023; Zhang et al. 2023). However, the reliable classification of potential pathogens remains a challenge, especially in drinking water where DNA concentrations may be low (Benítez-Páez et al. 2016; Benitez-Paez & Sanz 2017; Acharya et al. 2019). Here, we discuss the application of full-length 16S rRNA amplicon sequencing to New Zealand municipal drinking water systems to increase confidence in the accurate identification of bacterial communities, potential pathogens, and their origin. These data will interest water suppliers who wish to perform cost-effective mitigations to address the source(s) of contamination and their remediation.
This study investigated the application of ONT sequencing for drinking water analyses to evaluate its value and potential biases. Using three examples of drinking water treatment plants (DWTPs), we outline collaborative investigations with water suppliers interested in investigating source water quality and the detection of total coliforms within their supplies. The utility of metagenomics for characterising bacteria, including potential pathogens and contamination events, is highlighted. The DWTP examples provide insight into the impact of a manganese filter system on bacterial communities over time and the changes in bacterial communities pre- and post-chlorination identified by total coliform detections. This work highlights ongoing research towards implementing cost-effective metagenomic approaches that support suppliers in mitigating contamination events, protecting water resources, and improving surveillance and remediation practices.
METHODS
Evaluation of ONT sequencing consistency
Water samples from the two sites were collected on 6 days, three per week over two consecutive weeks (Figure 1 and Supplementary Table S1). The sampling and the DNA extraction process yielded 12 samples from the headwork site and multiple replicates from the reticulation site (n = 48 water samples) to test variations in the diversity of the bacterial community (Figure 1). The one headwork sample (2 L) collected daily was split into two 1 L samples, which were filtered. A 1 L sample from each of the three grab samples (2 L each) collected daily from the reticulation site was filtered to monitor variability in bacterial composition between grab samples collected 2 minutes apart (experiment (Expt.) A, Figure 1). The other 1 L samples were mixed as a composite sample (3 L) before filtering into three individual 1 L samples to monitor community variability due to filtering and the DNA extraction process (Expt. B). Post-DNA extraction, one of these samples from Expt. B was run as three replicates on a sequencing run to measure intra-run variability (Expt C). Furthermore, additional analyses were undertaken to compare inter-run variability between selected triplicate sets from Expt. C across three sequencing runs.
Investigation of total coliforms
Routine sampling of network locations in a town municipal drinking water supply had previously detected TCs in several samples, but E. coli were not detected. In August 2022, to investigate TC detections, samples were collected from two pump stations and 10 reticulation locations on a single day (Supplementary Table S1).
Evaluation of manganese biofilter treatment
A town with historically high manganese concentrations in groundwater bores had implemented biological manganese filters as part of drinking water treatment. Historic sampling before our investigation had identified low concentrations of TCs. The case study investigated temporal changes in the bacterial community of the manganese filters and the reticulation system. Samples were collected from up to 14 locations within the water supply (six groundwater bores, two manganese filters, two sites within the headworks, and four reticulation sites) on two separate occasions in September 2022 (all locations), two in February 2023 (Bore 1 and the two manganese filters), and once in April 2023 (six bores and two manganese filters) (Supplementary Table S1).
Water quality parameter assessment
Water samples were enumerated for E. coli and total coliforms in 100 mL of drinking water using the enzyme-substrate coliform test with a 97-well Colilert Most Probable Number (MPN) Quantitray method (IDEXX, Maine, USA).
Turbidity measurements utilising the method APHA 2130 B were undertaken using turbidity meters (TL2350 or 2100Q, Hach, Loveland, Colorado, USA). Free available chlorine (FAC) concentrations were measured using the method APHA 4500-CI G and FAC meters (DR300 or handheld Pocket Colorimeter II from Hach).
DNA extraction
For DNA extraction, 1 L samples of drinking water were filtered through a 0.2 μm, Supor, 47 mm membrane filter (Pall Corporation, Michigan, USA). The PowerSoil Pro kit (Qiagen, Venlo, The Netherlands) was used for DNA extraction, and CD1 buffer (800 μL) from the PowerSoil Pro kit was added to the filter and vortexed. Filters were stored at −20 °C until DNA extraction. Samples were extracted using the PowerSoil Pro protocol (Qiagen) on the QiaCube extraction robot (Qiagen PowerSoil Pro Kit, Clayton, Victoria, Australia). Modifications to the PowerSoil Pro protocol, including bead beating prior to extraction, are outlined in the Supplementary Material. Each filtering and DNA extraction procedure included a negative sample processing control and a no DNA template negative extraction control. To minimise cross-contamination, separate laboratories were used for each step of the sample process, and measures to reduce laboratory contamination were applied, including UV decontamination of biohazard units and laboratory benches.
Library preparation
Sequencing libraries were performed using the SQK-16S024 16S rapid barcoding kit (Oxford Nanopore Technologies, Oxford, UK) according to the manufacturer's instructions. Sequencing of the water samples from the DWTP with prior TC detections was performed on a MinION, while all other samples were sequenced on a GridION (Oxford Nanopore Technologies). All sequencing was performed on R9.4.1 flowcells for 72 h. The ZymoBIOMICS Microbial (Mock) Community DNA standard (Zymo Research, California, USA) was used at a 1:100 dilution with 10 mM Tris for testing (0.1 and 0.2 ng post-amplification) and as a positive control (0.1 ng post-amplification).
Bioinformatic and statistical analysis
Basecalling and barcode and adapter trimming were performed with guppy-gpu v6.0.1 using a super-accurate basecalling model (dna_r9.4.1_450bps_sup). A total of 22,420,513 reads were produced with an average median read length of 1447, an N50 of 1448, and a median Q score of 12.64. Quality control was performed using pycoQC v2.5.0.3 (Leger & Leonardi 2019), and passing reads (average quality score greater than 10) for each barcode were concatenated. Concatenated reads were trimmed using Trimmomatic v0.36, reads shorter than 1,200 bp were removed, and reads longer than 1,600 bp were cropped to maximise the lengths of query sequences and minimise the number of misalignments (Bolger et al. 2014). Trimmed reads were taxonomically identified using Kraken v2.1.2 with the SILVA 138.1 16S rRNA database to remove any reads identified as of plastid or mitochondrial origin (Quast et al. 2013; Wood et al. 2019). The remaining reads were assigned taxonomy using Emu v3.4.4, and the default database and outputs were analysed using the phyloseq v1.44.0 and vegan v2.6-4 R packages (Dixon 2003; McMurdie & Holmes 2013; R core Team 2018; Curry et al. 2022). Principal coordinate analysis (PCoA) was performed using data rarified to the lowest sample size in each set and the Bray–Curtis distance, and permutational multivariate analysis of variance (PERMANOVA) using adonis2 was applied for significance testing.
For species-level validation, amplicon_sorter v2023-06-19 was used with a 0.99 species threshold to build consensus sequences from the taxa of interest, and BLASTN v2.13.0 was then used with the 16S rRNA RefSeq database (accessed 19 September 2023) to align consensus sequences (NCBI 1988; Vierstraete & Braeckman 2022).
RESULTS AND DISCUSSION
Evaluation of ONT sequencing consistency
The relative DNA input between multiplexed samples can affect the efficiency and throughput of direct sequencing approaches like ONT more than sequencing by synthesis (e.g., Illumina sequencing), as sequences must compete for active pores (Kerkhof 2021). Since drinking water supplies often have low levels of DNA, using the recommended volumes of mock communities during library preparation can result in more sequencing data from the positive control than samples of interest. In this study, diluted inputs of the mock community DNA (0.1 and 0.2 ng) were trialled, showing that the mock community was a useful positive control with the consistent relative abundance of species between sequencing runs and dilution volumes. This finding was supported by the study of Acharya et al. (2019), which compared Illumina and ONT sequencing technologies to evaluate the same bacterial mock community.
Sequencing run variability
An investigation of the spatial and temporal variability of bacterial communities was undertaken by targeting multiple samples from headworks and a downstream reticulation site within a single water supply. PCoA of grab samples collected from the headworks (n = 12) and reticulation sites (n = 18) shows the spatial diversity between the two sites and greater temporal community variability at the headwork site (with a larger 95% confidence interval) compared to reticulation communities (Supplementary Figure S1). The greater variation at the headworks may have been due to the mixing of source waters from five groundwater bores prior to reticulation.
Overall, the sampling of bacterial communities from the headworks and the reticulation site showed a similar temporal profile of a stable bacterial population within each site sampled, suggesting minimal changes or impacts from external factors (Figures 3 and 4, Supplementary Figure S1). Previous studies of drinking water distribution systems have also noted stable biological communities, which could have important implications for monitoring disturbances in bacterial communities as an indicator of the ingress of contaminants (Hwang et al. 2012; Potgieter et al. 2021; Thom et al. 2022).
It was concluded that the grab and composite sampling strategies employed in this study produced similar bacterial community profiles (Figures 3–5). Analysis of multiple replicates within and between sequencing runs showed that samples collected simultaneously were similar in bacterial composition with low variability between samples (Figures 4 and 5). The results of this investigation confirmed that ONT full-length 16S rRNA amplicon sequencing is suitable for investigations of bacterial communities in drinking water, producing comparable community profiles within and between sequencing runs.
Investigation of total coliforms
Routine sampling of reticulated water in a New Zealand municipal water supply detected the presence of TCs at several locations. It was hypothesised that contamination was due to biofilm being pushed into the reticulation system from the opening of a rarely used pressure release valve located upstream of the sampling locations. Analysis was conducted on the pump station and reticulation water samples to determine the potential origins of TC that had triggered subsequent chlorination events.
The samples with TCs were largely dominated by the Pseudomonadota families (formerly Proteobacteria, Oren & Garrity 2021): Oxalobacteraceae, Aeromonadaceae, Pseudomonadaceae, and Cellvibrionaceae (Luo et al. 2013; Potgieter et al. 2018; Remple et al. 2021; Vargha et al. 2023). Members of the Pseudomonodota are highly prevalent in drinking water, are often biofilm-associated, and found in diverse environmental habitats like water and soil, with some members adapted to oligotrophic, low-nutrient environments (Baldani et al. 2014; Douterelo et al. 2014; Rosenberg et al. 2014; Talagrand-Reboul et al. 2017; Narenkumar et al. 2021; Vargha et al. 2023).
A total of 19,407 reads were identified as Listeria in a single sample (Retic E). Additional analysis of the putative Listeria consensus sequences showed a 100% identity with L. innocua, in agreement with the initial taxonomic classification. Listeria species, including L. innocua, have been identified in urban and rural aquatic environments, including source waters for drinking water supplies (Stea et al. 2015), and have low pathogenicity risk (Perrin et al. 2003; Perni et al. 2006; Orsi & Wiedmann 2016). Legionella was identified in eight samples, ranging from 33 to 2,707 reads, and consensus sequences confirmed these to be non-pathogenic species. However, low identity matches and alignments to uncharacterised taxa indicated that these were potentially novel species. The potential for false positives highlights the issue of current microbial sequence databases focusing on clinically relevant microbes, with difficult-to-culture environmental microorganisms being under-represented (Walk et al. 2009; Walk, 2015). Updating databases with the increasing number of environmental studies could remedy this false-positive potential; however, false negatives are less likely due to the high representation of clinically relevant pathogens in taxonomic databases.
Post-chlorination samples from reticulation sites F–H had FAC ranging between 0.68 and 0.82 mg/L (Figure 6). The network's residual FAC of >0.68 mg/L showed a markedly reduced bacterial read count compared to samples with FAC of <0.10 mg/L (Retic A–E). The few microorganisms detected in samples with FAC of >0.68 primarily belonged to members of the Comamonadaceae, Burkholderiaceae, and Bacillaceae families (Peters et al. 2018; Bai et al. 2023). Bacillaceae species were all non-faecal and associated with soil and water environments. Commanodaceae was the most dominant family post-chlorination, although with only 79 reads across samples, comprised of the genera Limnohabitans, Malikia, Rhodoferax, and Variovorax, followed by Burkholderiaceae genus Polynucleobacter with a total of 21 reads. These low read counts may be the result of residual DNA from non-viable bacteria (Peters et al. 2018).
Reticulation samples 1 and 2 shared a similar bacterial community profile to the two pump stations with relatively low read counts, although FAC was <0.10 mg/L (Figure 6). These two sites were on separate lines independent from the other reticulated sites.
This snapshot of a water network using samples taken during chlorination implementation suggests that the TCs detected were not due to compromised source water and were associated with environmental and biofilm-related taxa that had become established internally to the reticulation network. These organisms were likely released by the opening of the rarely used pressure valve. Where there was residual FAC, the bacterial community was represented by low read counts (<260). Using our method, the bacteria comprising the contamination event were found to be environmental, presented low risk to human health, and could be adequately removed with chlorination.
Studies have shown that it is difficult to identify or correlate E. coli enumerated in samples using viable counts with short-read 16S rRNA amplicon sequencing (Hu et al. 2018; Acharya et al. 2019). Furthermore, detecting non-viable bacteria and free DNA by metagenomic approaches suggests that a dual approach of cultivation and molecular detection is required. Therefore, E. coli as the faecal indicator in combination with using full-length 16S rRNA amplicons to identify faecal sources using ONT rapid sequencing together provide the potential for a rapid assessment of drinking water and its sources.
Evaluation of manganese biofilter treatment
The drinking water source for this DWTP originated from six groundwater bores, which ranged in depth from 140 to 250 m. These groundwater bores naturally contained high manganese concentrations, and low TCs had been detected in prior years. During February 2022, manganese results of 0.04–0.1 mg/L were reported in these groundwater bores, and levels greater than the MAV (<0.4 mg/L) had been observed previously. To reduce the manganese concentrations, the water from these six bores was mixed and then processed in parallel through two biological filters that passed aerated water through a sand filter. Typical manganese values after treatment decreased to <0.0005 mg/L and the filter was backwashed twice a week.
Filters in February 2023 had begun to accumulate a higher abundance of Methylococcaceae, while a marked increase in the proportions of Zoogloeacae and Comamonadacae occurred in April (Figure 8). Similarly, the April bore samples were dominated by Comamonadaceae, Methylococcaceae and Bore 1 by Zoogloeaceae and Bore 6 by Polyangiaceae. Although the source of the community disturbance could not be explained, the perturbations seen in the February sampling of decreasing diversity and increases in Methylococcaceae abundance could be used as indicators of a significant change that may herald the requirement for filter maintenance.
In April 2023, the Zoogloeaceae from Bore 1 may have been the source of the increase of Zoogloeaceae seen in the Filter 2 sample, as it did not appear in any other bore sample. Consensus sequences confirmed this to be Zoogloea resiniphila. This species is a floc-forming bacterium, producing mats of extracellular polymeric matrix that acts as a glue, holding the bacteria together, and increasing resistance to chlorination (Douterelo et al. 2013). They are an environmental taxon not recognised as human pathogens and are associated with removing organic contaminants in water systems. The high dominance of a single species indicates some environmental disruption, impacting previously diverse communities. Due to chlorine resistance, alternative remediation would be required.
Aside from the major perturbation seen in the April 2023 event, findings from this manganese filter study point to the maintenance of stable, but complex bacterial communities in individual bores and within the reticulation supply post-manganese filtration. In this current study, the multiple observations of stable bacterial communities associated with drinking water supplies agree with previous studies (Hwang et al. 2012; Pinto et al. 2012; Potgieter et al. 2021). If a water distribution system has a recognised stable biological signature, it would be worthwhile to monitor disturbance events such as heavy rainfall/drought to observe if community shifts in richness and diversity have long-term impacts, and if those shifts act as sentinels to alert water managers to contamination issues within the supply.
Limitations
Some limitations were noted for this study. Guppy may over-trim barcodes and adapters, and 16S rRNA, even at full length, may not always differentiate between sequences at the species level (Benítez-Páez et al. 2016; Lan et al. 2016). Therefore, where 16S rRNA provides a lack of differentiation at the species or strain level (e.g., discriminating commensal and pathogenic E. coli) or confirmation of pathogen detection, additional genetic methods should be employed, such as full metagenomic procedures and/or species-specific virulence gene qPCR (Lan et al. 2016; Ferreira et al. 2023).
Samples in the manganese filter DWTP were not collected from the rest of the reticulation supply in April 2023; therefore, it could not be ascertained if the changes noted in the post-filter samples occurred in the rest of the water supply as observed during the September samplings. Additionally, due to a lack of sampling data over a full year, it is difficult to rule out seasonal variation for the changes in bacterial community, or if the bore that clustered with the post-April samples was anomalous (Figure 9).
Furthermore, due to incomplete database records, particularly for environmental and endemic groundwater species unique to the local environment, the initial identification of potential pathogens requires additional investigation to exclude false positives. However, false negatives are less likely due to the high representation of medically relevant pathogens in reference databases.
CONCLUSIONS
This study aimed to evaluate the effectiveness of ONT amplicon sequencing for characterising bacterial communities in drinking water and identifying potential method biases.
ONT sequencing of full-length 16S rRNA showed robust temporal analysis of long-read metagenomics across replicates, species-level classification of bacteria, and comparability of complex bacterial community profiles when analysed across sequencing runs.
The mock microbial community DNA standard was identified as a useful positive control when using the ONT platform with the consistent relative abundance of species between sequencing runs and dilution volumes.
Full-length 16S rRNA amplicons provided sufficient information to differentiate between pathogenic and environmental species of the same bacterial genera. It was noted, however, that databases containing reference information for species native to local environments need to be improved for increased classification accuracy and to avoid false-positive detections of pathogens.
Temporally stable bacterial communities were identified within source waters and DWTP distribution systems. In addition, metagenomics provided the ability to monitor complex bacterial community changes as a consequence of disturbance events and in post-treatment processes (e.g., manganese filters and chlorination).
When investigating the sources of total coliforms, metagenomics identified microbes native to water systems and provided species-level information to differentiate between pathogenic and environmental species of the same genera, which aided effective remediation actions.
These DWTP studies have demonstrated how the dual application of traditional monitoring practices (e.g., E. coli enumeration) and metagenomics can advance monitoring and surveillance to improve the management of microbial health risks affecting drinking water.
ACKNOWLEDGEMENTS
The authors acknowledge the Ministry of Business Innovation and Employment for funding through the Strategic Science Investment Funding (SSIF) system and are grateful for the in-kind contributions of operators and sample collectors from the drinking water supplies at Christchurch City Council and Waimakariri District Council including Hayley Profitt.
AUTHOR CONTRIBUTIONS
B.G., W.T., M.D., K.R., C.R., H.P., and J.W. performed the study design. S.L., W.T., C.R., H.P., and J.W. is involved in technical and sequencing analysis. W.T. carried out a bioinformatic analysis. W.T. visualised the work. M.D., W.T., B.G., and K.R. wrote and reviewed the manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories. Sequencing reads can be found in the Sequence Read Archive (SRA) under the BioProject PRJNA1052754, (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1052754/).
CONFLICT OF INTEREST
The authors declare there is no conflict.