Scotland introduced wastewater monitoring for COVID-19 early in the pandemic. From May 2020, samples have been taken and analysed using quantitative polymerase chain reaction (qPCR). The programme was expanded to over 100 sites accounting for around 80% of the population. Data are presented publicly via a dashboard and regular reports are produced for both the public and health professionals. Wastewater-based epidemiology (WBE) offers opportunities and challenges. It offers an objective means of measuring COVID-19 prevalence and can be more practical or timely than other methods of mass testing. However, it also has substantial variability impacted by multiple environmental factors. Methods for data collection and analysis have developed significantly through the pandemic, reflecting the evolving situation and policy direction. We discuss the Scottish experience of wastewater monitoring for COVID-19, with a focus on the analysis of data. This includes our approach to flow normalisation, our experience of variability in measurements and anomalous values, and the visualisation and presentation of data to stakeholders. Summarising the Scottish methodology as of March 2022, we also discuss how wastewater data were used for informing policy and public health actions. We draw lessons from our experience and consider future directions for WBE in Scotland.

  • The Scottish COVID-19 WBE programme covered around 80% of the population with regular COVID-19 wastewater testing for over a year.

  • WBE reports complemented case data to assist policymakers and other stakeholders.

  • Techniques and decisions (e.g. normalisation and visualisation) were driven by changing context and feedback.

  • The practical pros and cons of WBE were made apparent, suggesting future directions.

Graphical Abstract

Graphical Abstract
Graphical Abstract

In 2020, the COVID-19 pandemic rapidly spread across the world, with the first cases seen in Scotland on 1 March. By the end of April, cases exceeded 10,000. Monitoring of the pandemic and its effects was seen as essential for directing efforts to contain its spread.

While the changing rates of reported cases as well as randomised survey data offer some insight into the progression of the pandemic, one alternative approach was wastewater (WW) monitoring. Wastewater-based epidemiology (WBE) had previously been used to monitor diseases such as norovirus and rotavirus (Santiso-Bellón et al. 2020) and Aichi virus (Lodder et al. 2013) and holds potential advantages over traditional methods. During the pandemic, WBE has been used in many countries for monitoring COVID-19 levels (COVIDPoops19; Naughton et al. 2021), including more widely in the UK (Wade et al. 2022).

From May 2020, samples began to be collected and tested for the SARS-CoV2 virus in Scotland (Fitzgerald et al. 2021), with the programme expanding greatly over time. By 2022, the programme covered around 80% of the population with 120 sites being sampled. The programme is ongoing as of the time of writing, with weekly reports being produced.

Unlike typical research projects, the dynamic nature of the pandemic led to an equally dynamic set of project parameters, with requirements changing rapidly over time. It involves multiple organisations, including a water company, Scottish Water, to provide samples, the Scottish Environment Protection Agency (SEPA) to conduct qPCR and other chemical analyses, a group of statisticians, Biomathematics and Statistics Scotland (BioSS) to conduct data analysis and reporting, and the Scottish Government to coordinate, disseminate and provide policy direction. There was close collaboration with scientists in Scotland and the United Kingdom, and laboratories were identified to provide emergency capacity – with some duplicate samples sent to other laboratories to measure inter-lab variability. Results of the monitoring programme were distributed via dashboards and reports to stakeholders, including the Scottish Government, public health officials, researchers, and the public. Feedback from the Scottish Government drove many of the decisions and improvements in the work.

This paper discusses the Scottish COVID-19 WBE programme up to March 2022, the time of writing. This includes the evolving sampling regime but particularly focuses on the development of data analysis techniques in Scotland, and the production and use of data outputs. We focus on specific components which are particular to the Scottish programme. Despite being motivated by the unique circumstances of the COVID-19 pandemic, these methodologies have potential application in a range of environmental monitoring scenarios. A summary of the overall methodology is provided. We also discuss the pros and cons of WBE, the lessons learnt, and proposed future directions for WBE in Scotland.

Data produced from the programme are available online at BioRDM Team (2022).

Scotland adopted WW sampling for COVID-19 early in the pandemic, with an arrangement where sampling was performed by Scottish Water, and qPCR testing conducted by the SEPA. Initial trials attempting to detect SARS-CoV-2 virus fragments in WW began in April 2020 with the development of a national monitoring programme starting in late May 2020 (Fitzgerald et al. 2021).

The main sampling sites used in the Scottish programme consisted of sewage and WW treatment plants, where auto-samplers were installed. There are over 1,800 wastewater treatment works (WWTWs) in the Scottish Network, and it would be uneconomic to sample all of them. However, WWTWs are diverse in terms of the size of the population covered, ranging from fewer than a hundred to over half a million inhabitants. Selecting the larger sites allowed samples from an initial list of 28 sites to cover around half the sewered population. This was the setup for much of 2020 when this programme used resources released by the pandemic restrictions that interrupted other work at the SEPA and Scottish Water. Dedicated resources from the Scottish Government expanded the programme from the winter of 2020, rapidly exceeding 100 sampling sites, and growing to 120 sampling sites by 2022. These covered around 80% of the population. An overview of the WWTWs sampled and the resulting population coverage is shown in Figure 1.

Auto-samplers used across the network varied in model. A typical sampler would collect a fixed subsample of around 10 ml every 15 min, producing a time composite sample of influent WW over a 24-h period. Generally, samplers were refrigerated to best preserve the samples taken. A higher sampling frequency would reduce sampling variability by more evenly sampling from waste packets in the system but introduce practical issues due to the increased volume of samples produced and transported.

With the additional resources from the Scottish government, in addition to increased sampling at treatment plants, NHS health boards were able to request sampling from within the sewer network, allowing the monitoring to generate data for smaller areas, such as a university campus, street, or even a few individual buildings. These ‘network samples’ were so-called grab samples – samples taken directly from the WW stream at a specific position in the sewage network via manholes. The aim was to collect these samples at the same time (particular to each site) each day, though obtaining access in some cases complicated this. Around 44 such network sites have been sampled, with about 23 continuing to be sampled after the start of 2022. These samples were used as part of regular reporting, but for specific one-off work.

All samples were sent to SEPA's laboratory for analysis. To measure the level of COVID-19 RNA in samples, the virus in the sample was concentrated before extraction. The N1 gene for SARS-CoV-2 was then detected and measured using RT-qPCR (Fitzgerald et al. 2021). By March 2022, over 10,000 samples had been analysed. The sampling frequency varied over this period across WWTW sites, ranging between approximately weekly to three to four times a week at the most frequently sampled sites. Sampling was generally targeted to be most frequent at those sites with the most populated catchment areas in order to balance the need for reliable smoothed estimates with the need for high population coverage, both nationally and within local authority areas.

To extract useful information from the WW data, it was necessary to develop a statistical analysis strategy including a comparison to other data sources. In late 2020, BioSS was commissioned to process and produce WW data output. A data processing pipeline was developed, including functional elements that were in some cases particular to the Scottish programme.

This pipeline involved pre-processing the data, adjustments and computation of aggregate summaries, and the semi-automated production of reports, with all processing done using the R programming language (R Core Team 2021). Here, we describe normalisation – an important part of data pre-processing, adjustments required to produce data on different geographic bases, decisions made in data visualisation, and other methodologies we developed.

Normalisation for dilution

Measurements of viral levels by PCR were represented as a concentration estimate, specifically the estimated number of gene copies per litre. Early in the programme, the decision was made to normalise these values by the flow levels measured at each WWTW, and by an estimate of the population covered by the WWTW.

Rationale behind flow normalisation

The decision to normalise was driven by a mechanistic understanding of the processes leading to WW collection. In Scotland, WW at each WWTW usually combines both wastes collected from properties and rainwater. During periods of high precipitation, viral material in WW would be diluted, leading to unrepresentative low measurements. In some cases, data on each day's intake of WW at the WWTW, measured by meters operated by Scottish water, were available. In those cases, we multiplied the measured RNA concentrations by that day's flow to obtain a daily gene copy measure. In addition, because WWTWs serve a wide range of catchment sizes, we also made use of the population served by each WWTW, as estimated by Scottish Water internal methods. The flow-adjusted gene copy rate is thus divided by this population quantity, to facilitate easier cross-site comparisons. The result was a measurement of the gene copies per day per person.

A major component in flow changes was seasonal effects, with observed flow levels sometimes twice as high during winter as opposed to summer. There could also be notable effects during extreme weather, especially during periods of intensive rainfall, when dilution effects could be sizable for some WW systems. In periods of relatively stable weather, the effect of normalisation was small compared to the variability seen in RNA concentrations.

Flow normalisation is a relatively straightforward way of encompassing environmental covariates in the assessment of COVID-19 levels. Compared to more dynamic models (Wade et al. 2022), flow normalisation is simple to incorporate into dashboards, allowing easy in-time access to adjusted data. It is also easy to understand and gives values in gene copies per person per day that allow simple comparison between locations.

Estimation of flow from ammonia concentrations

In practice, flow measurements were frequently unavailable (especially later in the programme) or were published with such a delay after sampling that they could not be incorporated into a timely analysis. Instead, we used ammonia concentrations, which were subject to the same dilution processes as the viral material, as a covariate for predicting the flow rate. Measurements of ammonia, obtained through colorimetry at Scottish Water laboratories, were more readily available because they are both cheaper and quicker to obtain than for flow.

We fitted a cross-site linear mixed model to predict log flow from log population and log ammonia, allowing different WWTW sites to differ in intercept and slope coefficients. Though theoretically, a direct dilution relationship would imply that flow should be inversely proportional to ammonia concentration, estimation of the coefficients through modelling allowed us to compensate for uncertainty in the concentration measurements and the flow–ammonia relationship.

Even the availability of ammonia data was often delayed by a number of days compared to the RNA concentration. In the meantime, a temporary estimate was required for normalisation. Initially, we opted to estimate flow by simply taking an average of flow at that site. This approach did not adequately account for the seasonal patterns in precipitation, resulting in excessively low estimates of flow during winter months. Subsequently, we adopted an approach where a generalised additive model (GAM, Wood 2017) was fitted across sites. This included an overall non-linear time trend in ammonia levels with individual site effects added. This model allowed ammonia, and thus flow, to be interpolated from the available data.

The limitation of ammonia-based adjustments is that they assume that the total volume of ammonia in the WW remains fairly constant over time and is accurately measured. Thus, uncertainty may arise due to errors and variability in the laboratory process for measuring ammonia, or due to changes in ammonia unrelated to dilution (such as agricultural or industrial sources). Examination of individual WWTW sites showed that the strength of the relationship between flow and ammonia varied. In some locations, the relationship was poor, or not linear as the model assumed. However, in some cases, this may be the result of ‘flow capping’, where extremely high levels of precipitation lead to some WW flow being redirected instead of passing through treatment works. In these cases, it would be more accurate to use ammonia-based metrics to estimate dilution. However, we prioritised concerns about the uncertainty of the ammonia adjustment, and thus used measurements of flow where available.

In-network samples

The flow and ammonia adjustment mentioned was focused on autosampler outputs from WWTWs covering wide areas. In our work, we also looked at grab samples taken within the sewage network. In these cases, normalisation was complicated by two factors:

  • Specific characteristics used for normalisation may be unavailable. In particular, we usually did not have a usable measure of flow, and in many cases, there could be no accurate population estimate because that population can change greatly from sample to sample.

  • Individual small subnetworks may have dilution properties that do not respect the assumptions of the normalisation we otherwise use. For example, some subnetworks were closed systems where there is no dilution from rainwater, while others may only be in use at certain times.

For network samples, we normalised the RNA concentration measurements by dividing them by the ammonia concentration. This produced a ratio that should be independent of the degree of use of WW services or dilution, assuming that the ammonia concentration experiences the same environment as the RNA concentration. The nature of in-network samples varied widely, but measurements from them tended to be noisier than WWTW samples, probably due to sampling effects, or the presence of inhibitory substances.

Geographic matching and aggregation

For comparison with the WW viral level data, it was desirable to produce appropriately matching data sets – in particular, case data. These comparisons give a direct sense of the relationship between WW and case rates over time. For example, a comparison of the variability of WW viral levels to case levels may be used to determine if particular sites had an excessively high day-to-day variation in WW viral levels. Changes in WW trends may also be observed and compared to changes in case levels, acting as a confirmation of case trends or identifying locations where one or the other may have issues. Case data were desirable as a comparator because they were measured daily and available at a high degree of spatial resolution. Case data were sourced from an open data site, published by Public Health Scotland (PHS 2022).

However, making these comparisons was complicated by the fact that cases and WW viral levels were measured and reported on different geographic bases. For WW, the basic unit was the catchment area of a given WW treatment works (or portion of the network, in case of in-network samples), which covered between a few thousand and several hundred thousand inhabitants. Meanwhile, cases were associated with individual addresses, then for the sake of privacy, aggregated into larger groupings. In Scotland, these groupings formed a cascading hierarchy of levels, composed of data zones of around a thousand inhabitants, intermediate zones (more commonly called neighbourhoods) of around 5,000 inhabitants, local authority areas of 20,000 to 600,000 inhabitants, and health board areas of 20,000 to 1.2 million inhabitants. While there are strong restrictions on data use due to confidentiality for case data at the data zone level, censored and smoothed data were available at the intermediate zone level, and fuller data were present at larger aggregations. Due to this factor, and sample size issues, Office of National Statistics (ONS) survey data were only available on the level of large subregions, which each contain multiple health boards.

To match one to the other, we used a postcode and census-based methodology. By quantifying each WWTW catchment as a collection of postal districts with population data specified by the latest census, we were able to determine the degree (in terms of the number of inhabitants) by which each catchment intersected with each geographic grouping. This overlap population was used to weight averages of case rates across neighbourhoods to obtain corresponding case rates for each catchment area.

We used data zone-level data to evaluate the loss of precision arising from using catchment case rates based on aggregating intermediate zones. For most WWTW sites, correlations between measures based on data zone and intermediate zone aggregations were good (>0.95 in most cases). For a small number of very small or isolated sites, intermediate zone level aggregation gave a poor approximation compared to the use of data zones.

As well as showing results for individual sites, there was policy interest in summaries for local authorities, health boards, and nationally. To produce plots for these different political areas, data were aggregated over sites. In this case, the overlap population computed above was used instead to weight site-level WW viral data. A rolling weekly average was used to smooth over sampling day effects.

Production of visualisations

At the start of the pandemic, it was difficult to predict what the relationship between daily WW RNA gene copy levels and other monitoring metrics, such as positive test levels and daily hospitalisations/deaths, would be. Furthermore, changes in circumstances, such as the emergence of new variants, vaccination programmes, and changes in testing policy, were anticipated to affect the relationships between monitoring measurements. This meant that a more formal modelling framework may have been difficult to maintain effectively, as basic assumptions that hold at one time-point may not hold at another. While we produced more specific comparisons from time to time, visual representations of site-level data and aggregates for specific geographic areas formed the core component of the reporting of WW viral loads.

Comparative time series

The most common visualisation we used was to jointly plot normalised daily WW viral RNA levels and the daily new case levels against time on the same axes. An example is shown in Figure 2.
Figure 1

Left: A map of wastewater treatment works sampled so far in the Scottish COVID-19 wastewater monitoring project. The type of point denotes the population covered by the site. Right: A map of local authority areas in Scotland with shading denoting the proportion of the population covered by WW COVID-19 testing on an average week.

Figure 1

Left: A map of wastewater treatment works sampled so far in the Scottish COVID-19 wastewater monitoring project. The type of point denotes the population covered by the site. Right: A map of local authority areas in Scotland with shading denoting the proportion of the population covered by WW COVID-19 testing on an average week.

Close modal
Figure 2

An example plot produced during the Scottish WW COVID-19 project. This graph shows Scottish population-weighted daily averages for WW viral levels alongside daily case levels from January 2021 to mid-March 2022. Wastewater viral levels are fairly well aligned to case rates, up until December 2021, apart from peaks in case/WW viral levels where WW viral levels tend to overshoot.

Figure 2

An example plot produced during the Scottish WW COVID-19 project. This graph shows Scottish population-weighted daily averages for WW viral levels alongside daily case levels from January 2021 to mid-March 2022. Wastewater viral levels are fairly well aligned to case rates, up until December 2021, apart from peaks in case/WW viral levels where WW viral levels tend to overshoot.

Close modal

For these comparative time series plots, a number of considerations were made. First, we opted not to use logarithmic scaling for our graphs. This was because our outputs were to be directed potentially towards a less technically experienced audience who may not be skilled at interpreting such graphs (Romano et al. 2020). Furthermore, the presence of values below the limit of detection at some sites meant that an offset would be required for logarithmic transforms to make sense, and the choice of the offset might greatly impact interpretation. While a logarithmic axis may be justified by the greater variation seen with higher COVID-19 levels, this was offset by the needs of the stakeholders.

Second, the juxtaposition of case and WW viral levels required a choice of the relative scaling of WW and new case rates. Early on, we experimented with the use of the least squares fit to find the best scaling factor to relate cases to WW. However, soon it emerged that having different scaling factors in each graph created confusion, as a reader would have to refer to each graph's axis to determine whether WW viral levels were relatively high or low. Refitting these scaling factors with each new set of data would also create inconsistent graphing between reports. Coincidentally, across most of our sites, WW viral levels as measured in millions of gene copies per person per day had an approximately one-to-one relationship with case rates, if the latter was given by the daily rate of new COVID-19 cases per 100,000 inhabitants. Therefore, we opted to adopt this constant scaling in our graphing. We found the relationship to be remarkably stable up until the appearance of the Omicron variant in December 2021.

Map outputs

We also used maps to display aspects of the data. Initially, we used bubble maps where the size of the points for each sampled location showed the level of COVID-19 found. These maps could be made using both WW and case levels. However, this visualisation became difficult to interpret when a larger number of sites are sampled, with too many points preventing specific locations from being identified. This was especially problematic in the densely populated central belt of Scotland. When COVID-19 levels were high, points also tend to overlap, and when levels were low, it can be difficult to identify differences.

Instead, we mapped population-weighted mean WW levels for each of Scotland's 32 local authorities using a colour scale. This was done initially only on the basis of 2-week aggregate values, though faster changes in COVID-19 levels led to us adopting weekly maps as well. Examples of 2-week maps are shown in Figure 3. Alongside maps of current WW COVID-19 levels, we also produced maps showing the changes in these levels relative to the previous 2-week period. We chose to represent absolute values for the changes, rather than relative values (such as percentage change) because relative values amplify the visual significance of changes at locations with very low levels. In practice these were the least reliable measurements, and unlikely to represent a consistent trend, and were not as important from a policy point of view as regions with higher COVID levels.
Figure 3

An example map produced during the programme. The left map shows levels of wastewater COVID-19 at each local authority, aggregated within a 2-week period. The right compares the level to the prior 2-week period.

Figure 3

An example map produced during the programme. The left map shows levels of wastewater COVID-19 at each local authority, aggregated within a 2-week period. The right compares the level to the prior 2-week period.

Close modal

We used the Viridis and Turbo colour scales (Garnier et al. 2022) to reduce problems for those with colour blindness and ensure uniform perceptual distances. For both maps, we used a logarithmically scaled colour bar. Although the logarithmic scaling makes direct interpretation harder, this allowed the colour range on the map to cover a wide range of values without requiring constant re-calibration, thus ensuring consistency in plotting COVID-19 levels from report to report.

Interactive dashboard

To satisfy stakeholder requirements for live access to data, an interactive dashboard was developed by SEPA. This was implemented using TIBCO Spotfire® in the summer of 2020 and made publicly available in October 2020. This was gradually updated over time to enhance the data available to the public and for public health professionals, providing interactive graphs of WW levels, details on the monitoring network, and at-a-glance summaries of WW data across large regions of Scotland.

The data feeds for the dashboard came from both the SEPA laboratories which provided the raw concentration values and Scottish Water which provided the data required to normalise the raw concentrations. The data feeds were merged upon opening of the dashboard with the normalised results and various analyses being calculated each day.

The dashboard complemented WW COVID-19 reports, produced by BioSS on a weekly, then twice-weekly basis. While the dashboard output was more timely, the written reporting allowed individual commentary informed by specific analyses, to assist readers in identifying what aspects are most important or statistically reliable.

Additional methodological changes

A key aspect of the COVID-19 WW monitoring project is its dynamic nature. It was necessary to balance the need for consistent reporting with the need to readjust goals over time. Requirements changed in tandem with the state of the pandemic, through times of high and low COVID-19 levels, and as policy priorities changed. This resulted in a number of additional procedures being added, including:

  • Smoothing of major sites: High levels of day-to-day variation in WW COVID-19 levels were seen at most sites. However, the WW sampling was generally not frequent enough to make site-level running-means useful. Thus, a smoothing model based on a GAM was used to smooth this variation and derive estimates of the underlying trend. A Tweedie distribution (Tweedie 1984) was used for the modelling because it is a flexible non-negative distribution that broadly captures the mean-variance characteristics we observed. The Tweedie GAM not only produces a smooth line to help interpretation but also gives a summary of the variability characteristics of each site. These in turn can be used to generate confidence intervals on the estimated trend and other probabilistic metrics to inform decision-making.

  • Outlier detection: Occasionally, WW-based estimates of viral levels gave single individual values that were extremely high relative to levels immediately before and after, and which were not reflected in the case-level data. These spike events could potentially have an outsized effect on other methodologies and could dominate visualisations. We used a GAM model to classify points with a high propensity of being spike, taking into account case data and WW data up to each observation. A threshold was used to remove some of these points according to a desired level of false positives and negatives on manually labelled data.

By March 2022, pandemic monitoring was influenced by additional concerns. Firstly, the appearance of the new Omicron variant has led to reductions in detected virus per infected individual across the world (Rasenberg 2022), probably due to reductions in the COVID-19 shedding rate. At the same time, case-based metrics have been impacted by the discontinuation of PCR testing for certain types of cases and the incorporation of Lateral Flow Device (LFD) testing. In response to these challenges, we have worked on incorporating other sources of data into the ongoing analyses, including the UK Office for National Statistics (ONS) Coronavirus Infection Survey (CIS; ONS 2022), as well as sequencing of WW samples for detection and quantification of COVID-19 variants. Sample analysis methodology will also pursue greater automation to better handle the volume of analyses required.

The methodology for COVID-19 WBE in Scotland may then be summarised in the following steps.

A network of auto-samplers operated at WWTW sites across Scotland, collecting composite WW samples. At regular times according to the sampling plan, these samples were collected and transported to the SEPA for laboratory qPCR analysis. Separately, samples were taken for ammonia analysis.

Once at the SEPA laboratories, samples were analysed for SARS-CoV-2 RNA according to ‘Method 6’ of Fitzgerald et al. (2021). This comprised spiking each sample with porcine reproductive and respiratory syndrome virus (PRRSv) for a check for chemical inhibition. Next, samples were clarified and then concentrated by centrifuge. Viral RNA was extracted using the QiAmp RNA extraction kit. After elution, one-step qPCR reactions were conducted using Luna® Universal One-Step RT-qPCR Kit, focusing on the N1 gene. From this, concentrations of viral RNA were computed.

Twice a week, the data from SEPA laboratories together with available sample flow or ammonia data from Scottish Water were taken by BioSS for processing and reporting. The first step involved adjusting sample RNA concentrations for dilution using flow or ammonia data, using coefficients previously derived. Data from other sources, such as case data, were also obtained at this stage, and geographically aggregated to produce comparable datasets to the WW data. Next, a preliminary report was produced, which allowed the data to be checked for errors, both via automatic means and by visual inspection. This report also included smoothing of WW outputs at major sites and mapping, which allowed the identification of significant patterns and changes in the data with respect to the estimation of COVID-19 prevalence. A report was then drafted, reviewed, and submitted together with a variety of data outputs as requested by stakeholders.

WW data have been integral in providing insight to better inform decisions made by the Scottish Government during the pandemic. Estimates based on WW data help the Scottish Government, local authorities and Scottish Health boards plan and put into place what is needed to keep the population of Scotland safe. Results from WW go to Ministers in a weekly report, providing additional information to help inform decision-making around the use of restrictions. Results also go to the general public as part of the weekly Research Findings output on COVID-19 (Scottish Government 2020–2022), and to scientists aiming to model COVID-19.

WW testing is used in Scotland to complement existing case testing, to confirm case trends, or provide an alternative means of surveillance as it is a COVID-19 indicator that is independent of healthcare-seeking behaviours and access to clinical testing. In some parts of Scotland, WW data have been used to inform decisions to deploy mobile testing. This has been most useful when prevalence levels have been low, such as in early summer 2021.

The Scottish Government's use of data from WW sampling allows for assessment of the progress of the pandemic in Scotland, including estimates of the R value and growth rate (Scottish Government 2020–2022). Scotland has been submitting estimates to be considered, discussed, and combined at the Epidemiology Modelling Review Group (EMRG) within The UK Health Security Agency (UKHSA) using WW data since June 2021.

At the time of writing, the COVID-19 pandemic persists in Scotland and around the world. WBE epidemiology still plays a critical role. From our experience of using WBE for COVID-19, we see several advantages for WW monitoring:

  • WW data are efficient to collect for large areas or large populations, depending on the WW network. We were able to monitor the majority of Scotland's population on a regular basis, sufficient to capture some intra-week effects despite RNA analysis being done for only 100–300 samples per week. Information at this scale and resolution is very difficult to match through population surveys. For example, the ONS CIS survey requires testing several thousand participants for Scotland each week and can only give weekly estimates at the resolution of subregions (which are typically made up of several health boards).

  • In contrast to other indicators, WW appears relatively unaffected by behavioural changes made in response to the state of the epidemic and policy changes. We saw this clearly in early 2022 when policy changes in individual PCR testing protocols greatly affected case-level measurements. When LFD-based statistics were introduced, an artificial increase appeared in certain areas, with large differences between genders or different socioeconomic statuses. We do not expect there to be similar effects with WW data.

  • WW can produce accurate and timely data. While there are significant levels of noise in the observations, trends in WW measurements of COVID-19 levels closely follow case-level data up until December 2021, especially once smoothed, with only occasional exceptions during rapid rises in viral activity. While trends in WW viral levels do not consistently precede case data, this depends on the practicalities of how rapidly each data set can be collated. Independent of reporting lag, WW trends do appear to precede those seen in survey and hospitalisation data.

These aspects of WBE are generally independent of the particulars of the COVID-19 pandemic. If in any future epidemic situation, infected individuals shed pathogenic DNA/RNA into WW, WBE-based monitoring should be similarly effective.

Our work has also highlighted a number of issues with WBE:

  • While the methodology described here tackles some of the uncertainty, WW-based measurements were noisier than other measures of disease levels. There was a substantial amount of day-to-day variability, and thus determining whether the virus is increasing or decreasing cannot be based on an individual measurement. Rather, it requires a week or more of data, preferably across several sites in a region. This sets a limit on the degree of immediacy that can be offered by WW data. In addition, we observed individual measurements that gave very high readings that do not appear reflective of real processes. The reasons for such anomalies are still not entirely understood. The variability of WW-based measurements can be ameliorated by more frequent sampling, but this has cost implications.

  • Some regions were much easier to sample from than others. While larger WWTW catchments can cover several hundred thousand inhabitants, there were many smaller WWTW in Scotland that, in some cases, only cover a few hundred inhabitants. While a small number of sample sites can cover the majority of the Scottish population, monitoring for smaller more remote locations is unlikely to be cost effective. In turn, large-catchment sample sites which are more cost effective do not offer a great deal of spatial resolution. In-network sampling can supplement these measurements and add resolution to the data but suffer from further increased uncertainties.

  • The relationship between WW viral levels and other metrics of disease epidemiology was affected by some external factors. While for the majority of the period so far, the case-level to WW-level ratio has remained constant, the hospitalisation to WW (and case) ratio fell rapidly in early 2021, probably due to increased vaccination. The case-to-WW ratio itself changed during the rise in prevalence of the Omicron strain. Leads and lags are also not necessarily consistent. At least part of these changes represents real changes in the nature of the pandemic. Cases, hospitalisation, and survey-based measures themselves present different trends over time, and hence no single measure represents a gold standard that others can be judged. While comparisons between different measures are useful for validation purposes, deviations can also provide useful information.

More formal evaluations and reviews are in progress, and we expect more sophisticated research to be conducted once the immediate pressures of the pandemic are over.

From the work done in the programme, we can derive a number of broad lessons about the future use of WBE.

  • The general methodology used here, involving normalisation, aggregation, and visualisation is an effective means of displaying and interpreting WBE data.

  • That said, further methodological developments may help, in particular focusing on the derivation and communication of uncertainty. This is a challenge due to the multiple sources of uncertainty present in WBE data.

  • In an epidemic context, WBE projects require flexibility to adapt to changing requirements. Effective analysis methods and sampling strategies in low virus-level contexts may differ from high virus-level contexts.

  • A framework of validation based on other data sources is beneficial to authenticate the quality of WW data and to generate confidence in its usage.

  • The emergence of variant strains in a pandemic could impact WW-based prevalence estimates. Analyses should be integrated with data on variant prevalence to allow adjustment for differing levels of shedding. Using WW to quantify the levels of variants in time and space is still experimental.

  • Sampling effort should consider the objectives required. Large numbers of samples at major population sites allow for more accurate assessments of national-level disease prevalence, while sampling at smaller sites may provide more useful data on local outbreaks. Some medium-sized sites are important to improve population coverage in certain locations, while spare sampling capacity may allow experimental development.

  • To inform a better, and more rapid, response for future epidemics, retrospective research based on the data gathered in Scotland and elsewhere is critical, without the pressure of immediate policy requirements. Research focussing on developing an improved understanding of WBE and improving methodology in this area is an important aspect of this. To this end, the data and protocols have been curated and shared (BioRDM Team 2022; Roberts et al. 2022; Scorza et al. submitted). We aim to make our code for modelling and visualisations available in the future.

In the medium term, as the COVID-19 pandemic continues, WW-based monitoring is likely to continue to see use in many countries, potentially taking a greater role as other data sources face budget-based limitations. In addition, WW-based sequencing work to detect emerging variants of COVID-19 is expected to come to the forefront.

Retrospective analyses of WW data are also likely to be done. These would, as mentioned earlier, facilitate better evaluation of WBE outputs. They may also apply more sophisticated analysis methodologies coupled with other measures and understanding of behaviour to reveal, for example, the changing dynamics of the COVID-19 virus at different points of the pandemic.

Further into the future, post-pandemic, the sampling infrastructure established during the COVID-19 pandemic could be extended to monitoring indicators of other disease in the population, including both viruses and bacteria. WBE opens the potential to begin or to expand the simultaneous monitoring of the spread of antimicrobial resistance, drug use including antidepressants, or other chemicals (OHBP 2021; Sims et al. 2021). Such non-epidemic monitoring of population health will require careful consideration of ethical and political concerns.

The Scottish Government (Rural & Environmental Science and Analytical Services division) funds the sampling, testing and data analysis of wastewater in Scotland. We are particularly grateful for advice and direction from Professor Andrew Millar, University of Edinburgh.

All relevant data are available from an online repository or repositories (https://covid-ww-scotland.github.io/PrevalenceData).

The authors declare there is no conflict.

BioRDM Team
.
2022
COVID Wastewater Scotland – Open Outputs of Monitoring COVID in Wastewater in Scotland
.
COVIDPoops19
:
Summary of Global SARS_CoV-2 Wastewater Monitoring Efforts
.
Fitzgerald
S. F.
,
Rossi
G.
,
Low
A. S.
,
McAteer
S. P.
,
O'Keefe
B.
,
Findlay
D.
,
Cameron
G. J.
,
Pollard
P.
,
Singleton
P. T. R.
,
Ponton
G.
,
Singer
A. C.
,
Farkas
K.
,
Jones
D.
,
Graham
D. W.
,
Quintela-Baluja
M.
,
Tait-Burkard
C.
,
Gally
D. L.
,
Kao
R.
&
Corbishley
A.
2021
Site specific relationships between COVID-19 cases and SARS-CoV-2 viral load in wastewater treatment plant influent
.
Environmental Science & Technology
55
(
22
),
15276
15286
.
https://doi.org/10.1021/acs.est.1c05029
.
Garnier
S.
,
Ross
N.
,
Rudis
B.
,
Sciaini
M.
,
Camargo
A. P.
&
Scherer
C.
2022
Viridis – Colorblind-Friendly Color Maps for R
.
https://doi.org/10.5281/zenodo.4679424
.
Lodder
W. J.
,
Rutjes
S. A.
,
Takumi
K.
&
de Roda Husman
A. M.
2013
Aichi virus in sewage and surface water, the Netherlands
.
Emerging Infectious Diseases
19
(
8
),
1222
1230
.
https://doi.org/10.3201/eid1908.130312
.
Naughton
C. C.
,
Roman
F. A.
,
Alvarado
A. G. F.
,
Tariqi
A. Q.
,
Deeming
M. A.
,
Bibby
K.
,
Bivins
A.
,
Rose
J. B.
,
Medema
G.
,
Ahmed
W.
,
Katsivelis
P.
,
Allan
V.
,
Sinclair
R.
,
Zhang
Y.
&
Kinyua
M. N.
2021
Show us the Data: Global COVID-19 Wastewater Monitoring Efforts, Equity, and Gaps. medRxiv 2021.03.14.21253564. https://doi.org/10.1101/2021.03.14.21253564
.
Office for National Statistics
2022
Coronavirus (COVID-19) Infection Survey: Methods and Further Information
.
OHBP
2021
Chemical Investigation Project 3 Scotland (CIP3 Scotland)
.
PHS
2022
Scottish Health and Social Care Open Data
.
Rasenberg
E.
2022
Water News Europe: Omicron Hard to Detect in Wastewater
.
R Core Team
.
2021
R: A Language and Environment for Statistical Computing
.
R Foundation for Statistical Computing
,
Vienna, Austria
.
Roberts
A.
,
Fang
Z.
,
Mayer
C.-D.
,
Frantsuzova
A.
,
Cameron
G. J.
&
Scorza
L. C. T.
2022
Data Normalisation of RT-qPCR Data for Detection of SARS-CoV-2 in Wastewater
.
Romano
A.
,
Sotis
C.
,
Dominioni
G.
&
Guidi
S.
2020
The scale of COVID-19 graphs affects understanding, attitudes, and policy preferences
.
Forthcoming, Health Economics
https://dx.doi.org/10.2139/ssrn.3588511
.
Santiso-Bellón
C.
,
Randazzo
W.
,
Pérez-Cataluña
A.
,
Vila-Vicent
S.
,
Gozalbo-Rovira
R.
,
Muñoz
C.
,
Buesa
J.
,
Sanchez
G.
&
Rodríguez
J. R.
2020
Epidemiological surveillance of norovirus and rotavirus in sewage (2016–2017) in Valencia (Spain)
.
Microorganisms
8
(
3
).
https://doi.org/10.3390/microorganisms8030458
.
Scorza
L.
,
Cameron
G.
,
Murray-Williams
R.
,
Findlay
D.
,
Bolland
J.
,
Cerghizan
B.
,
Campbell
K.
,
Thomson
D.
,
Corbishley
A.
,
Gally
D.
,
Fitzgerald
S.
,
Low
A.
,
McAteer
S.
,
Roberts
A.
,
Fang
F.
,
Mayer
C.
,
Frantsuzova
A.
,
Baby
A.
,
Zieliński
T.
&
Millar
A. J.
2022
SARS CoV-2 RNA levels in Scotland's wastewater
.
Scientific Data
9
,
713
.
https://doi.org/10.1038/s41597-022-01788-3
.
Scottish Government
2020–2022
Coronavirus (COVID-19): Modelling the Epidemic
.
Sims
N.
,
Avery
L.
&
Kasprzyk-Hordern
B.
2021
Review of Wastewater Monitoring Applications for Public Health and Novel Aspects of Environmental Quality (CD2020_07)
.
Scotland's Centre of Expertise for Waters (CREW)
.
ISBN: 978-0-902701-85-4. Available from: crew.ac.uk/publications
Tweedie
M. C. K.
,
1984
An index which distinguishes between some important exponential families
. In:
Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference
(
Ghosh
J. K.
&
Roy
J.
eds.).
Indian Statistical Institute
,
Calcutta
, pp.
579
604
.
Wade
M. J.
,
Lo Jacomo
A.
,
Armenise
E.
,
Brown
M. R.
,
Bunce
J. T.
,
Cameron
G. J.
,
Fang
Z.
,
Farkas
K.
,
Gilpin
D. F.
,
Graham
D. W.
,
Grimsley
J. M. S.
,
Hart
A.
,
Hoffmann
T.
,
Jackson
K. J.
,
Jones
D. L.
,
Lilley
C. J.
,
McGrath
J. W.
,
McKinley
J. M.
,
McSparron
C.
,
Nejad
B. F.
,
Morvan
M.
,
Quintela-Baluja
M.
,
Roberts
A. M. I.
,
Singer
A. C.
,
Souque
C.
,
Speight
V. L.
,
Sweetapple
C.
,
Walker
D.
,
Watts
G.
,
Weightman
A.
&
Kasprzyk-Hordern
B.
2022
Understanding and managing uncertainty and variability for wastewater monitoring beyond the pandemic: lessons learned from the United Kingdom national COVID-19 surveillance programmes
.
Journal of Hazardous Materials
424
,
127456
.
https://doi.org/10.1016/j.jhazmat.2021.127456
.
Wood
S. N.
2017
Generalized Additive Models: an Introduction with R
, 2nd edn.
Chapman and Hall
,
Boca Raton
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).