Abstract
Wastewater-based epidemiology (WBE) is a valuable tool for monitoring the circulation of COVID-19. However, while variations in population size are recognised as major sources of uncertainty, wastewater SARS-CoV-2 measurements are not routinely population-normalised. This paper aims to determine whether dynamic population normalisation significantly alters SARS-CoV-2 dynamics observed through wastewater monitoring, and whether it is beneficial or necessary to provide an understanding of COVID-19 epidemiology. Data from 394 sites in England are used, and normalisation is implemented based on ammoniacal nitrogen and orthophosphate concentrations. Raw and normalised wastewater SARS-CoV-2 metrics are evaluated at the site and spatially aggregated levels are compared against indicators of prevalence based on the Coronavirus Infection Survey and Test and Trace polymerase chain reaction test results. Normalisation is shown, on average, to have a limited impact on overall temporal trends. However, significant variability in the degree to which it affects local-level trends is observed. This is not evident from previous WBE studies focused on single sites and, critically, demonstrates that while the impact of normalisation on SARS-CoV-2 trends is small on average, this may not always be the case. When averaged across many sites, normalisation strengthens the correlation between wastewater SARS-CoV-2 data and prevalence indicators; however, confidence in the improvement is low.
HIGHLIGHTS
394 sites were analysed to determine the impact of population normalisation on SARS-CoV-2 trends.
Comparison made with the Coronavirus Infection Survey and NHS Test and Trace polymerase chain reaction test results.
Significant variability between sites in the impact of normalisation was revealed.
On average, population normalisation strengthens the correlation between wastewater SARS-CoV-2 data and prevalence indicators.
INTRODUCTION
Wastewater-based epidemiology (WBE) has been widely recognised as a valuable tool for monitoring the circulation of COVID-19 (e.g. Ahmed et al. 2020; European Commission 2020; Wade et al. 2020; Prado et al. 2021; Westhaus et al. 2021) and has been implemented in at least 66 countries worldwide (University of California 2022). The concentration (gene copies per volume), or load (gene copies per individual), of RNA fragments from the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been shown to correlate well with clinical case data (Hoffmann et al. 2021; Huisman et al. 2021), and can complement clinical surveillance (Medema et al. 2020). Previous studies have also shown that SARS-CoV-2 can be detected in wastewater several days before cases are reported (Medema et al. 2020; Fongaro et al. 2021; La Rosa et al. 2021), and thus WBE could provide a useful early warning of the emergence or re-emergence of the disease and timely insights for public health interventions (although the extent of this benefit will be dependent on the coverage and response of local public health actions).
There remain, however, multiple uncertainties in the use of WBE to infer the prevalence of the disease – including, for example, due to variability in faecal shedding rates between infected individuals and throughout the period of infection and vaccination status, as well as factors associated with the sewer network characteristics, sampling, and analysis methodologies (Wade et al. 2022). Variable population size in the area from which wastewater is captured (e.g. due to commuters, students, and tourists) is one potentially major source of uncertainty in WBE – Thomas et al. (2017), for example, found population estimates to account for uncertainties of up to 55% in WBE for illicit drug monitoring – and multiple studies have highlighted the importance of accounting for fluctuations in the population (Daughton 2012; Béen et al. 2014; Chen et al. 2014; O'Brien et al. 2014; Hou et al. 2021; Wade et al. 2022).
Census data may be used to provide static population estimates; however, this is infrequently updated, only accounts for the permanent residential population (Lai et al. 2011), and does not address transient changes (Daughton 2012). The design capacity of sewage treatment works (STWs) may also be used to estimate population size, but this too is not usually reflective of the real-time load in the system (Hou et al. 2021). Instead, it is preferable to use a dynamic estimate of the actual population in the catchment, for example, based on metabolites in wastewater (Zuccato et al. 2008).
To date, population normalisation approaches using dynamic population estimates have been investigated for WBE applications such as illicit drug monitoring (Béen et al. 2014; Thomas et al. 2017) and monitoring of pharmaceutical use (Zhang et al. 2019). Some recent COVID-19 studies have also considered normalisation using faecal markers such as Pepper Mild Mottle Virus (PMMoV), F+ RNA coliophage groups II and III and β-2 Microglobulin (Wu et al. 2020; Acosta et al. 2021; Li et al. 2021b; Zhan et al. 2022), or ammoniacal nitrogen (Sakarovitch et al. 2022). However, previous studies have focused on only a small number of sites and, therefore, do not allow more general conclusions about the potential impact of population normalisation to be drawn. Furthermore, in the case of COVID-19 monitoring, wastewater SARS-CoV-2 RNA concentration (i.e. not a dynamic population-normalised load per capita value) is still commonly reported and considered as an indicator of prevalence (e.g. Cao & Francis 2021; Karthikeyan et al. 2021; Prado et al. 2021; Saththasivam et al. 2021).
This study, using data from measurements collected at 394 sampling locations in England, aims to investigate whether dynamic population normalisation (i) significantly alters the observed trends from measured SARS-CoV-2 RNA in wastewater and (ii) is beneficial or necessary to provide an understanding of prevalence from wastewater SARS-CoV-2 concentrations. This knowledge will provide greater confidence in the interpretation of wastewater SARS-CoV-2 data and provide insights into the degree by which changes in concentration correspond to those in disease prevalence in comparison to fluctuations in the underlying population. The inclusion of data from a greater number of sampling locations than in previous studies will improve confidence in the results. The knowledge provided will be of particular importance in periods with zero or low restrictions on movement and thus increased population mobility.
The availability of alternative indicators of SARS-CoV-2 prevalence (ONS randomised Coronavirus Infection Survey (CIS) (Office for National Statistics 2021) and National Health Service (NHS) Test and Trace Pillars I and II test data (Department of Health & Social Care 2020) in England for benchmarking of the wastewater-based insights makes this study particularly important, as similar epidemiological datasets have not been available for previous studies investigating population normalisation for WBE in the context of applications such as illicit drug-use monitoring. Furthermore, due to the widespread implementation of WBE for SARS-CoV-2 monitoring in England, this study has been able to analyse 42,267 samples collected from 394 different sewerage sites at different points in the network (i.e. STW, within sewer/in-network, and near-to-source) over an 11-month period, thus providing insights into variability between sites. This is of key interest since previous studies into population normalisation for WBE have analysed only on one site (e.g. Béen et al. 2014; Lai et al. 2015; Thomas et al. 2017) or a relatively small number of sites and/or samples (e.g. Chen et al. 2014 – 14 sites with 10–13 samples; Zheng et al. 2019 – 10 sites with 1 month of 24 h composite samples; Hou et al. 2021 – 20 sites with 2 days of samples; Sakarovitch et al. 2022 – seven sampling points across three sites, with an average of 40 samples from each; Hutchinson et al. 2022 – two sites with 24 weeks of samples; Zhan et al. 2022 – four sampling points across two sites, with up to 11 months of weekly samples) and have, therefore, been limited in the evidence they can provide for more general and transferable conclusions on the impacts of dynamic population normalisation.
MATERIALS AND METHODS
Wastewater sampling and analysis


Wastewater sampling site locations. All sites shown have sufficient data for, and are analysed using, normalised metric 1 (low data approach); sites highlighted in yellow also have sufficient data availability for analysis using normalised metric 2 (high data approach), as discussed in the text. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/wh.2023.318.
Wastewater sampling site locations. All sites shown have sufficient data for, and are analysed using, normalised metric 1 (low data approach); sites highlighted in yellow also have sufficient data availability for analysis using normalised metric 2 (high data approach), as discussed in the text. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/wh.2023.318.
Samples were collected 4–7 days per week, either as a grab sample (80%) or a composite sample collected via autosamplers (20%) and transported to laboratories at 4 °C. Grab samples were taken once a day during estimated peak flow (the time of which varied between sites and throughout the monitoring period) and composite samples were collected over a 24 h period from hourly sub-sampling (i.e. at set intervals).
Concentrations of the N1 gene from the SARS-CoV-2 virus in samples collected before 1 January 2021 were quantified using reverse transcriptase qPCR (RT-qPCR) with polyethylene glycol (PEG) precipitation. Full details of the RT-qPCR method implemented, including control measures, are given by Farkas et al. (2021). Briefly, large particulate matter was eliminated using centrifugation; pH adjustment was applied (when necessary); the supernatant was incubated in a PEG8000/NaCl solution (40% PEG8000, 8% NaCl); the viruses bound to PEG were pelleted using centrifugation; the pellet was resuspended; and the N1 gene from the SARS-CoV-2 virus was quantified using an RT-qPCR assay. From 1 January 2021 onwards, ammonium sulphate was used instead for precipitation (as described by Kevill et al. (2022)), followed by lysis with guanidine thiocyanate for the viral RNA extraction.
NH3-N and concentrations were determined using colorimetric assays.
Further operational and technical details are available in Wade et al. (2020), Hoffmann et al. (2021) and Jones et al. (2020).
Population normalisation
NH3-N and concentrations in the wastewater are used for population normalisation. Indicators such as crAssphage and PMMoV have also been used previously for WBE (Crank et al. 2020; Ahmed et al. 2021; Wilder et al. 2021), but can be difficult to reliably detect. NH3-N and
are selected for this study as such traditional water quality parameters are routinely monitored (Lai et al. 2011), are well understood, have established and standard analysis methods, and have been widely used to estimate real-time population size (e.g. Béen et al. 2014; Rico et al. 2017; Xiao et al. 2019). They are also less time-consuming and expensive to monitor, and potentially subject to less uncertainty than other biomarkers that may be present at lower concentrations (Xiao et al. 2019), although there is the disadvantage that they may be subject to bias due to the contribution of industrial discharges (Béen et al. 2014). It is also noted that NH3-N is considered indicative of the total contributing upstream population as it is a metabolite of urea (Wilde et al. 2022) (i.e. associated with urine production). Whilst detection of SARS-CoV-2 in wastewater is primarily associated with faeces (Castro-Gutierrez et al. 2022) and urea-only events may not discharge large quantities of the virus, NH3-N is still a proxy for toilet usage (particularly when considering larger populations), and can be used in the same way as other human-specific bio or chemical markers.
derives from human faeces (Wilde et al. 2022), thus matching the dominant source of SARS-CoV-2 detected in wastewater.
Input data required for calculation of two normalised SARS-CoV-2 metrics: (1) SARS-CoV-2 gene copies per unit of biomarker (low data requirements) and (2) SARS-CoV-2 gene copies per capita per day (high data requirements).
Input data required for calculation of two normalised SARS-CoV-2 metrics: (1) SARS-CoV-2 gene copies per unit of biomarker (low data requirements) and (2) SARS-CoV-2 gene copies per capita per day (high data requirements).
Normalised metric 1: SARS-CoV-2 gene copies per unit of biomarker (low data approach)


This approach to normalisation is based on the assumption that the daily discharge per capita of each water quality parameter is constant for any given site (i.e. does not vary with time, but may differ between sites), and that industrial sources and wet weather runoff have minimal influence on temporal variation in total NH3-N and loads. Catchment-level data on non-anthropogenic NH3-N and
discharges are not available to validate this assumption; however, it is in alignment with the common understanding that domestic wastewater typically contributes the majority of influent to a domestic treatment plant (Henze et al. 2001), and that industrial sites increasingly employ pre-treatment to minimise their impact on the downstream network.
Given the above assumption, Sd/Xb,d values will be directly proportional to the SARS-CoV-2 gene copies per capita per day values for any given site (irrespective of any variation in dilution due to wet weather runoff) and, thus, reveal the same trends. However, as the daily discharge per capita of each water quality parameter may vary between sites, the values provided by Equation (1) are not directly comparable between sites. Furthermore, the uncertainty resulting from any variability in the daily discharge per capita of each water quality parameter at a given site cannot be quantified.
This approach has wide applicability, as only wastewater SARS-CoV-2 and water quality parameter data for the period of interest are required; flow data (to calculate biomarker loads) and mean population estimates (to enable the calculation of site-specific daily discharge per capita of each water quality parameter) are not required. Sufficient data for normalisation using this approach is available for all 394 of the sites previously discussed.
Normalised metric 2: SARS-CoV-2 gene copies per capita per day (high data approach)
The second normalisation approach incorporates the calculation of a site-specific daily biomarker discharge per capita to enable reporting of SARS-CoV-2 gene copies discharged per capita per day.



In the absence of catchment-level data on non-anthropogenic NH3-N and discharges, this method again assumes that industrial contributions are constant at any given site.
In addition to wastewater SARS-CoV-2, NH3-N, and concentrations for the period of interest, this approach requires flow rate data, biomarker concentrations for a corresponding time period and a mean population estimate, and is thus less widely usable.
Under the EMHP programme, flow rates were only monitored at a subset of the sites (15 STWs, one network, and four near-to-source – in all instances, at 2-min intervals and resampled to provide daily values). The mean population served by sampling sites is estimated based on Office for National Statistics (ONS) mid-2019 population estimates (Office for National Statistics 2020), aggregated at the lower-layer super output area (LSOA) level. LSOAs are then assigned to the site catchment in which their population-weighted centroid falls, and the total LSOA population is adjusted proportionally, based on the fraction of the identified LSOA that is overlapped by the site catchment.
Population estimates are available for only 14 of the sites with flow data (13 STWs and one network). Consequently, there is only sufficient data for the application of the second normalisation approach to 14 sites in this study. Without knowledge of how representative these sites are of the wastewater monitoring sites more widely, they cannot be used to draw any general conclusions. However, this metric is still calculated and used to provide further insight into the impacts of population normalisation. The locations of the 14 sites analysed are highlighted in Figure 1.
Clinical cases and prevalence indicators
Wastewater SARS-CoV-2 metrics are evaluated against three indicators of clinical cases and disease prevalence at the site level:
- 1.
Positivity rates from the ONS CIS (Office for National Statistics 2021) (estimate of the percentage of the population testing positive). These are available for 48 STW sites until 8 February 2021 (prevalence was too low beyond this point to provide sub-regional estimates).
- 2.
The test positivity rate, based on Pillar I and II cases reported by NHS Test and Trace at the LSOA level, is projected onto wastewater sampling catchments and aggregated by specimen date. These are available for 171 STW and 197 in-network sites. Pillar I data captures swab testing in Public Health England laboratories, NHS hospitals, and by health and care workers; Pillar II captures swab testing for the wider population (Department of Health & Social Care 2020).
- 3.
A total number of Pillar I and II cases, calculated as above. This is converted to cases per 100,000 using mean population estimates (calculated using ONS data, as detailed above) when required to compare sites on a like-for-like basis.
It is noted that none of the reference metrics is expected to provide a perfect measure of prevalence in the upstream population (for instance, there may be delays between an individual becoming infected and presenting to NHS Test and Trace for testing); there remains uncertainty and, thus, complete agreement with wastewater data is not a realistic expectation.
Data analysis
Preparation
In all correlation coefficient calculations (described in the following section), log-10-transformed values for the SARS-CoV-2 concentrations, normalised SARS-CoV-2 metrics, and clinical case and prevalence indicators were used.
Where SARS-CoV-2 concentrations are below the limit of detection (LOD) or LOQ, values equal to half the corresponding limit (which is variable between laboratories) are used for the visualisation of trends in the following analysis but are omitted from the evaluation of the impacts of normalisation and any calculation of correlation coefficients due to their uncertainty. Zero values for each clinical case and prevalence metric are omitted from analyses when the corresponding sample is not already omitted due to the SARS-CoV-2 concentration being below the LOQ (0% of positivity rates from the CIS and 6.38% of Pillars I and II positivity rate and case numbers), as these cannot be captured in a log model.
Samples with NH3-N or concentrations reported as <LOD, <LOQ or zero (0.92% of total) are omitted from analyses. Samples with an NH3-N concentration below 12 mg/L (lower bound of the typical range of NH3-N concentrations for wastewater (Henze et al. 2001)) are categorised as dilute.
Analysis
To evaluate the agreement between normalised SARS-CoV-2 metrics calculated using different water quality parameters (i.e. gc/mg NH3-N, gc/mg , and gc/d/capita estimated based on either NH3-N or
), a Pearson's correlation coefficient is calculated for each site.
Similarly, to evaluate the extent to which normalisation alters the SARS-CoV-2 trends, the correlation between the SARS-CoV-2 concentration (gc/L) and each normalised metric at each individual site is quantified using Pearson's correlation coefficients. For each normalised metric, the correlation coefficients are then averaged across all sites to provide an aggregate measure of the impact of normalisation.
The impact of normalisation on the correlation between prevalence and wastewater SARS-CoV-2 metrics is evaluated by calculating two Pearson correlation coefficients for each site: one for the correlation between the SARS-CoV-2 concentration (gc/L) and a prevalence metric, and one for the correlation between the normalised SARS-CoV-2 value and the prevalence metric. For each site type, the mean and standard deviation of the difference between the correlation coefficients are used to provide aggregate measures of the impact of normalisation on the correlation between prevalence and the wastewater SARS-CoV-2 metric, and the significance of this difference is tested using an alpha level of 0.05. 95% confidence intervals are also calculated where results are presented for sites individually. This is repeated for each combination of the prevalence metric and the normalised SARS-CoV-2 metric.
RESULTS AND DISCUSSION
Normalised metric 1: SARS-CoV-2 gene copies per unit of biomarker (low data approach)
SARS-CoV-2 gc per unit of water quality parameter
Normalisation using NH3-N and concentrations provides two separate estimates for SARS-CoV-2 gc per unit of the water quality parameter, both of which are expected to be directly proportional to SARS-CoV-2 gc per day per capita, and thus directly proportional to each other. The correlation between SARS-CoV-2 gc/mg NH3-N and gc/mg
varies between the 394 sites analysed, with a mean correlation coefficient of r = 0.956, a minimum of r = 0.592, and a maximum of r = 0.999. On average, the correlation is weakest at near-to-source sites (mean r = 0.920, N = 14) and similarly strong at STW and network sites (mean r = 0.957, N = 175 and mean r = 0.958, N = 205, respectively), which suggests weaker confidence in the results of normalisation for near-to-source sites. This may be explained by the possibility that individuals spend variable amounts of time within near-to-source sites and, thus, there is increased uncertainty in the per capita contribution to NH3-N and
in the wastewater collected. However, this near-to-source site mean is based on only a small number of sites and still indicates a very strong correlation on average.
The distribution of correlation coefficients at all sites is provided in Supplementary Material, Figure S1.
Impact on trends identified




Cumulative distribution of Pearson correlation coefficients across all sites for correlation between: (a) SARS-CoV-2 concentration (gc/L) and SARS-CoV-2 gc/mg NH3-N and (b) SARS-CoV-2 concentration (gc/L) and SARS-CoV-2 gc/mg . Samples with SARS-CoV-2 concentrations below LOQ are omitted from the calculation of correlation coefficients.
Cumulative distribution of Pearson correlation coefficients across all sites for correlation between: (a) SARS-CoV-2 concentration (gc/L) and SARS-CoV-2 gc/mg NH3-N and (b) SARS-CoV-2 concentration (gc/L) and SARS-CoV-2 gc/mg . Samples with SARS-CoV-2 concentrations below LOQ are omitted from the calculation of correlation coefficients.
Figure 3 also reveals differences in the effects of normalisation between site types. Sites with the lowest correlation coefficients (i.e. where the impacts of population normalisation are greatest) are all either in-network or STW sites. This is somewhat counterintuitive since greater variability in population may be expected at near-to-source sites; however, it is also noted that only 14 near-to-source sites are included in the analysis, compared with 175 STW and 205 in-network sites, and therefore confidence in their representativeness of that site type more widely is lower. Figure 3 also shows that correlation coefficients are typically greater when dilute samples are omitted from the analysis, suggesting that the normalisation is addressing variable levels of dilution as well as population.

Example of the impact of normalisation using NH3-N or on SARS-CoV-2 trends identified in wastewater: (a) All results (concentrations and normalised values) as a time series, showing trends; (b) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/mg NH3-N; and (c) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/mg
. Squares indicate dilute samples; crosses indicate <LOQ samples (including <LOD); grey shading indicates the period under full national lockdown.
Example of the impact of normalisation using NH3-N or on SARS-CoV-2 trends identified in wastewater: (a) All results (concentrations and normalised values) as a time series, showing trends; (b) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/mg NH3-N; and (c) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/mg
. Squares indicate dilute samples; crosses indicate <LOQ samples (including <LOD); grey shading indicates the period under full national lockdown.
Figure 4(a) shows that normalisation reduces the relative magnitude of several peaks with high SARS-CoV-2 concentrations; the sample collected on 1 April (‘A’ in Figure 4(a)), for example, has the ninth highest concentration recorded during the monitoring period, but this ranks only 70th when normalised with NH3-N, or 69th when normalised with . This is notable as, based on concentrations, this sample may suggest concerning levels of COVID-19 in the upstream population, whereas once normalised it appears less exceptional. Conversely, there are also occasions where normalisation increases the relative significance of SARS-CoV-2 detection (e.g. 17 January, ‘B’ in Figure 4(a)) and may, therefore, reveal high SARS-CoV-2 loads that would be obscured if relying on concentrations alone.
Figure 4(b) and 4(c), notably, shows that omitting samples affected by dilution has a negligible impact on the lines of best fit or the strength of the correlations. This indicates that the alteration in the SARS-CoV-2 trends provided by normalisation cannot be attributed (solely) to the requirement for flow normalisation to account for dilution effects and that the impacts of population change are important. However, it is also noted that there are only six diluted samples, and thus their omission would not be expected to have a significant impact on the correlation coefficient.
Impact on correlation with indicators of prevalence

Summary of the impact of using population-normalised wastewater SARS-CoV-2 metrics instead of SARS-CoV-2 concentration on correlation with prevalence indicators for each site
Reference indicator of prevalence . | Site type . | Number of sites with data . | Mean (st. dev.) correlation between prevalence and SARS-CoV-2 gc/L . | Proposed wastewater SARS-CoV-2 metric . | Mean (st. dev.) correlation between prevalence and proposed wastewater metric . | Percentage of sites with increased correlation . | Mean (st. dev.) change in correlation coefficient . |
---|---|---|---|---|---|---|---|
CIS positivity rate | STWs | 45 | 0.545 (0.200) | gc/mg NH3-N | 0.617 (0.191) | 97.8 | 0.072 (0.046) |
gc/mg ![]() | 0.618 (0.195) | 95.6 | 0.073 (0.049)* | ||||
Pillars I and II positivity rate | Network | 196 | 0.186 (0.282) | gc/mg NH3-N | 0.222 (0.291) | 71.9 | 0.036 (0.088) |
gc/mg ![]() | 0.213 (0.283) | 65.3 | 0.027 (0.087) | ||||
STWs | 171 | 0.199 (0.298) | gc/mg NH3-N | 0.204 (0.309) | 61.4 | 0.005 (0.085) | |
gc/mg ![]() | 0.218 (0.310) | 63.7 | 0.019 (0.080) | ||||
Any | 367 | 0.192 (0.289) | gc/mg NH3-N | 0.213 (0.299) | 67.0 | 0.021 (0.088) | |
gc/mg ![]() | 0.215 (0.295) | 64.6 | 0.023 (0.084) | ||||
Pillar 1 and 2 total cases | Network | 194 | 0.266 (0.257) | gc/mg NH3-N | 0.292 (0.260) | 68.0 | 0.027 (0.092) |
gc/mg ![]() | 0.274 (0.263) | 59.8 | 0.009 (0.095) | ||||
STWs | 171 | 0.235 (0.298) | gc/mg NH3-N | 0.246 (0.307) | 68.4 | 0.010 (0.083) | |
gc/mg ![]() | 0.232 (0.315) | 55.0 | −0.003 (0.087) | ||||
Any | 365 | 0.251 (0.277) | gc/mg NH3-N | 0.270 (0.284) | 68.2 | 0.019 (0.088) | |
gc/mg ![]() | 0.254 (0.289) | 57.5 | 0.003 (0.092) |
Reference indicator of prevalence . | Site type . | Number of sites with data . | Mean (st. dev.) correlation between prevalence and SARS-CoV-2 gc/L . | Proposed wastewater SARS-CoV-2 metric . | Mean (st. dev.) correlation between prevalence and proposed wastewater metric . | Percentage of sites with increased correlation . | Mean (st. dev.) change in correlation coefficient . |
---|---|---|---|---|---|---|---|
CIS positivity rate | STWs | 45 | 0.545 (0.200) | gc/mg NH3-N | 0.617 (0.191) | 97.8 | 0.072 (0.046) |
gc/mg ![]() | 0.618 (0.195) | 95.6 | 0.073 (0.049)* | ||||
Pillars I and II positivity rate | Network | 196 | 0.186 (0.282) | gc/mg NH3-N | 0.222 (0.291) | 71.9 | 0.036 (0.088) |
gc/mg ![]() | 0.213 (0.283) | 65.3 | 0.027 (0.087) | ||||
STWs | 171 | 0.199 (0.298) | gc/mg NH3-N | 0.204 (0.309) | 61.4 | 0.005 (0.085) | |
gc/mg ![]() | 0.218 (0.310) | 63.7 | 0.019 (0.080) | ||||
Any | 367 | 0.192 (0.289) | gc/mg NH3-N | 0.213 (0.299) | 67.0 | 0.021 (0.088) | |
gc/mg ![]() | 0.215 (0.295) | 64.6 | 0.023 (0.084) | ||||
Pillar 1 and 2 total cases | Network | 194 | 0.266 (0.257) | gc/mg NH3-N | 0.292 (0.260) | 68.0 | 0.027 (0.092) |
gc/mg ![]() | 0.274 (0.263) | 59.8 | 0.009 (0.095) | ||||
STWs | 171 | 0.235 (0.298) | gc/mg NH3-N | 0.246 (0.307) | 68.4 | 0.010 (0.083) | |
gc/mg ![]() | 0.232 (0.315) | 55.0 | −0.003 (0.087) | ||||
Any | 365 | 0.251 (0.277) | gc/mg NH3-N | 0.270 (0.284) | 68.2 | 0.019 (0.088) | |
gc/mg ![]() | 0.254 (0.289) | 57.5 | 0.003 (0.092) |
Pearson correlation coefficients are calculated using log10 prevalence and wastewater metrics.
*Change in correlation coefficient hypothesis test p-value is <0.05.
Comparison of ‘correlation between prevalence indicators and wastewater SARS-CoV-2 concentration’ (x axis) and ‘correlation between prevalence indicators and normalised wastewater SARS-CoV-2 metrics’ (y axis). Pearson correlation coefficients (r values) are calculated using log10 prevalence and wastewater metrics. Percentages indicate the proportion of sites above/below the y=x line. Symbol shapes and colours identify different site types as indicated in the legend.
Comparison of ‘correlation between prevalence indicators and wastewater SARS-CoV-2 concentration’ (x axis) and ‘correlation between prevalence indicators and normalised wastewater SARS-CoV-2 metrics’ (y axis). Pearson correlation coefficients (r values) are calculated using log10 prevalence and wastewater metrics. Percentages indicate the proportion of sites above/below the y=x line. Symbol shapes and colours identify different site types as indicated in the legend.
Table 1 shows that, on average, across all sites, correlation with all three indicators of prevalence is improved by normalising the wastewater SARS-CoV-2 using either NH3-N or . However, as for population normalisation, increases in the mean correlation coefficients are small (maximum 0.073), of low confidence (95% confidence intervals overlap in all cases), and are statistically significant (based on a Mann–Whitney U-test with a significance level α = 0.05) only for correlation with the CIS positivity rate. Figure 5 also shows that the benefits are not universal for either STW or in-network sites, and there is no indication that the sites with the weakest correlations (i.e. where improvement is most needed) are improved more consistently or more significantly.
The wastewater SARS-CoV-2 concentrations are most strongly correlated with the CIS positivity estimate (mean r = 0.545, compared with r = 0.192 and 0.251 for Pillars I and II positivity and cases, respectively), and the improvement provided by normalisation is most consistent here too, with the correlation strengthened for at least 95% of sites (depending on the water quality parameter selected). However, the impacts of normalisation shown in Table 1 are not directly comparable between prevalence indicators, as results related to correlation with CIS data are based on a smaller sample of sites than those related to Pillars I and II data. Results based on only the 45 sites with ONS data (Supplementary Material, Table S5) show a stronger correlation with metrics based on Pillar I and II data, but are still weaker (on average) than the CIS positivity estimate. The normalisation continues to increase the mean correlation coefficients.
On average, the correlations shown here between wastewater SARS-CoV-2 metrics and prevalence indicators are lower than those calculated for the 14 sites to which population normalisation was applied. For all metrics except the CIS positivity estimate (which was not available for sites with population normalisation), this difference is statistically significant (based on α = 0.05), indicating that the 14 sites used for population normalisation are not representative of wastewater sampling sites more widely and that conclusions drawn from these cannot be assumed to be applicable to other sites. However, for both the full and reduced sets of sites, normalisation is shown, on average, to strengthen the correlation between prevalence metrics and wastewater metrics. In both cases, it is also found that the relative increase in the mean correlation coefficients provided by normalisation is less than that provided by using Pillar I and II cases instead of positivity, thus highlighting the importance of the choice of prevalence indicator.
Normalised metric 2: SARS-CoV-2 gene copies per capita per day (high data approach)
Site-specific water quality parameter loads









Estimated daily per capita water quality parameter (NH3-N and ) loads, based on samples collected during periods of national lockdown. Only available for sites with flow data and an ONS population estimate. N indicates the number of samples upon which the estimate is based.
Estimated daily per capita water quality parameter (NH3-N and ) loads, based on samples collected during periods of national lockdown. Only available for sites with flow data and an ONS population estimate. N indicates the number of samples upon which the estimate is based.
The standard deviation also varies considerably between sites; in the best case, it represents only 2.0% of the mean (for at STW22), indicating stable nutrient contributions to the catchment and providing higher confidence in population estimates at this site. In the worst case, it represents 96.7% of the mean (for
at NH3), suggesting population estimates at this site will be subject to a very high degree of uncertainty. Where the standard deviation is high, this indicates that either the population was not constant during the lockdown periods or that the assumption of constant NH3-N and
discharge per capita is not valid for this site (for example, due to variable contributions from industry and surface runoff).
Although SARS-CoV-2 concentrations may be normalised using only the water quality parameter concentrations (i.e. omitting xb in Equation (4) and calculating Sd/Xb,d only, not a per capita value), as detailed in the methodology for the normalised metric 1, these results confirm that the values obtained with such an approach will not be directly comparable between sites since the daily per capita water quality parameter loads (xb values) are highly site-specific.
SARS-CoV-2 gc per day per capita
Separate SARS-CoV-2 gc per day per capita estimates are generated using (i) and (ii)
. Agreement between the estimates is strong at all 14 sites, with a minimum Pearson's correlation coefficient (based on log10 values) of r = 0.954, and a mean of r = 0.991. Estimates for all sites are provided in Supplementary Material, Figure S2, and correlation coefficients in Supplementary Material, Table S2.
Impact on trends identified
Impact of population normalisation on SARS-CoV-2 trends identified in wastewater at an example site: (a) SARS-CoV-2 metrics as a time series, showing trends; (b) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/d/capita estimated using and
; and (c) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/d/capita estimated using
and
. Squares indicate dilute samples; crosses indicate <LOQ samples (including <LOD); vertical lines around each data point in (b) and (c) indicate standard deviation of estimate (note that this is imperceptibly small for many points); grey shading in (a) indicates period under full national lockdown.
Impact of population normalisation on SARS-CoV-2 trends identified in wastewater at an example site: (a) SARS-CoV-2 metrics as a time series, showing trends; (b) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/d/capita estimated using and
; and (c) correlation between SARS-CoV-2 gc/L and SARS-CoV-2 gc/d/capita estimated using
and
. Squares indicate dilute samples; crosses indicate <LOQ samples (including <LOD); vertical lines around each data point in (b) and (c) indicate standard deviation of estimate (note that this is imperceptibly small for many points); grey shading in (a) indicates period under full national lockdown.
Results for all 14 sites are summarised in the Supplementary Material, Table S2.
Impact on correlation with indicators of prevalence
Wastewater SARS-CoV-2 concentration exhibits a moderate correlation with Pillars I and II positivity and with Pillar I and II case rates when each of the 14 sites is considered independently (mean correlation coefficients r = 0.592 and 0.632, respectively, based on log10 values). Note that these correlation coefficients are based on time-synchronised data (i.e. there is no time window lag) and do not account for the possibility that SARS-CoV-2 is detected in wastewater before an individual develops symptoms and presents for clinical testing, and are thus likely to be an underestimate.
In 11 of the sites, this correlation is strengthened when using population-normalised wastewater SARS-CoV-2 metrics instead (mean correlation coefficients increased to r = 0.610 and 0.655 if normalising using NH3-N; r = 0.617 and 0.659 if normalising using ), suggesting that population normalisation is beneficial if using wastewater data to provide a better understanding of SARS-CoV-2 prevalence. This supports the observation of Feng et al. (2021) that per capita normalisation of SARS-CoV-2 concentrations slightly improves correlation with COVID-19 incidence.
However, although the improvement is relatively consistent across sites, the 95% confidence intervals for the r values are wider than the increase in r in all cases, indicating that confidence in the improvements is low. The relatively minor impact of population normalisation may be attributed to most of the data having been collected during periods of national lockdown and/or local restrictions, and thus suppressed population variability. Individual correlation coefficients for each site are provided in Supplementary Material, Tables S4 and S5.

Correlation between prevalence indicators and wastewater SARS-CoV-2 metrics across all sites with enough data to estimate daily SARS-CoV-2 loads per capita. Different colours represent different sites.
Correlation between prevalence indicators and wastewater SARS-CoV-2 metrics across all sites with enough data to estimate daily SARS-CoV-2 loads per capita. Different colours represent different sites.
In the majority of cases, the correlations with population-normalised wastewater metrics identified in Figure 8 are weaker than when sites are analysed individually (true for 100% of sites when evaluating correlations with case numbers, and for 83–92% of sites when evaluating correlations with positivity rate). This suggests that population normalisation alone is insufficient to enable direct inter-site comparison of wastewater SARS-CoV-2 levels and that other site-specific characteristics to which concentrations are sensitive (such as hydraulic retention time and sampling technique (Li et al. 2021b, Wade et al. 2022)) must be accounted for. Potential inaccuracies in the estimation of the mean population, as discussed above, may also contribute to the ineffectiveness of population normalisation for improving inter-site comparisons.
CONCLUSIONS AND IMPLICATIONS
This study has shown that wastewater SARS-CoV-2 data can be normalised using NH3-N or concentrations to account for population, and both provide strongly correlated results. If a site-specific daily per capita load of either NH3-N and
is known or can be calculated, then per capita SARS-CoV-2 loads can be calculated. However, although the results illustrate that the relationship between water quality parameter load and population size is highly site-specific, they also show that normalisation based on this alone is insufficient to enable direct comparison of SARS-CoV-2 loads between sites. Alternatively, SARS-CoV-2 concentrations can be normalised without knowledge of these site-specific water quality parameter loads and, whilst this does not provide per capita values, it does provide the same impact on trends and correlation with prevalence indicators. This approach also has the benefit of lower data requirements, facilitating wider application, but does not enable quantification of uncertainty in the normalised SARS-CoV-2 value resulting from uncertainty in the daily water quality parameter load per capita.
The normalisation of wastewater SARS-CoV-2 data with NH3-N or is shown, on average, to have little impact on the overall trends. This suggests that the significance of fluctuations in the upstream population size is typically negligible in comparison with that of variability in the total SARS-CoV-2 loads, which matches previous observations regarding the impact of population change in the WBE for illicit drug monitoring (Béen et al. 2014). However, this study also reveals significant variability between the impacts of population normalisation at different sites, which is not evident from previous WBE studies that focus on a single site. Critically, this research demonstrates that while the impact of normalisation on SARS-CoV-2 trends is small on average, it is not reasonable to conclude that it is always insignificant.
When averaged across a large number of sites, normalisation using either population estimates or water quality parameter concentrations strengthens the correlation between wastewater SARS-CoV-2 data and reference indicators of prevalence. However, confidence in this improvement is low and, as with the impact on trends, there is significant variability in the benefit (or otherwise) between sites. This may be attributed partly to the use of grab samples since SARS-CoV-2 calculations have been shown to be sensitive to single-sample variability (Curtis et al. 2020).
Finally, it is noted most of the data used in this study were collected during periods of national lockdown and/or local restrictions, and thus movement of people is expected to have been significantly lower than usual. Variations in population size, and thus the impacts and benefits of population normalisation, may increase when normal travel habits resume.
ACKNOWLEDGEMENTS
The sampling, testing and data analysis of wastewater in England is funded by the United Kingdom Government (Department of Health and Social Care). We acknowledge the work of the Environment Agency and the Centre for Environment, Fisheries and Aquaculture Science (Cefas) to process and analyse the wastewater samples reported in this study. We also thank the Water Utilities (Anglian Water, Northumbrian Water, Severn Trent Water, South West Water, Southern Water, Thames Water, United Utilities Water, Wessex Water and Yorkshire Water) who provided support on the wastewater sampling design and execution.
DISCLAIMER
The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the Department of Health and Social Care.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.