COVID-19 case prediction via wastewater surveillance in a low-prevalence urban community: a modeling approach

Estimating and predicting the epidemic size from wastewater surveillance results remain challenging for the practical implementation of wastewater-based epidemiology (WBE). In this study, by employing a highly sensitive detection method, we documented the time series of SARS-CoV-2 RNA occurrence in wastewater influent from an urban community with a 360,000 population in Japan, from August 2020 to February 2021. The detection frequency of the viral RNA increased during the outbreak events of COVID-19 and the highest viral RNA concentration was recorded at the beginning of January 2021, amid the most serious outbreak event during the study period. We found that: (1) direct back-calculation still suffers from great uncertainty dominated by inconsistent detection and the varying gap between the observed wastewater viral load and the estimated patient viral load, and (2) the detection frequency correlated well with reported cases and the prediction of the latter can be carried out via data-driven modeling methods. Our results indicate that wastewater virus occurrence can contribute to epidemic surveillance in ways more than back-calculation, which may spawn future wastewater surveillance implementations.


INTRODUCTION
The COVID-19 pandemic has caused over 5 million deaths as of 16 November 2021 (WHO 2021) and is still deeply interrupting the global society. In a new era where communities are adapting to revive, there is a substantial need for a surveillance tool that can identify both asymptomatic and symptomatic patients with high efficiency, preferably also having a short turnaround time, wide-coverage, non-invasive, and anonymous nature, and low implementation cost.
Previous studies have confirmed the persistent shedding of SARS-CoV-2 viral RNA in the feces and urine from patients during the infection course (Wölfel et al. 2020), and the viral shedding at the asymptomatic stage (Tang et al. 2020;Zhang et al. 2020). A review on viral shedding demonstrated that the shedding lasts 14-28 days (Jones et al. 2020). Detections of SARS-CoV-2 viral RNA from the wastewater have been reported in a number of studies, including in low-prevalence areas (Haramoto et al. 2020;Randazzo et al. 2020;D'Aoust et al. 2021;Hata et al. 2021) and prior to the surge of an outbreak (Peccia et al. 2020). These studies provided a clear rationale to monitor the viral components in wastewater to perceive the spread of disease in a given catchment. Improved detection sensitivity had been reported by using the solid fraction of wastewater (Peccia et al. 2020;Balboa et al. 2021;Kitamura et al. 2021), and the continuous detection of viral RNA in the suspended solid rather than the supernatant throughout the surveillance period was reported (Kitamura et al. 2021). Pre-amplification of the target sequence is one option to improve sensitivity in RT-PCR assay. A previous study reported a 100-fold improvement of quantification limit in an RT-PCR assay for coronavirus in clinical tests (Lau et al. 2003). Therefore, it may also be suitable for wastewater samples that may contain a low amount of target genetic materials.
Although much has already been achieved regarding concentrating and quantifying SARS-CoV-2 from wastewater, wastewater surveillance implementations remain limited, mainly due to a lack of means to interpret the result. The efforts of converting the results into prevalence level often referred to as 'back-calculation', initiated with the global emergence of detection reports. Being able to quantitatively interpreting the wastewater surveillance, outcome can greatly facilitate the situation assessment and countermeasure formation (Sims & Kasprzyk-Hordern 2020;Thompson et al. 2020). The mass balance function commonly seen in applications regarding drug and pesticide monitoring had been adopted by some recent studies. Briefly, the measured biomarker concentration is multiplied by the wastewater flow rate to obtain the total load (Ahmed et al. 2020a;Hart & Halden 2020;Fernandez-Cassi et al. 2021;Hasan et al. 2021;Saththasivam et al. 2021), which can then be divided by an average load to obtain an approximate shedder base. Besides, coefficients such as the degradation factor and the shedding ratio are often included to improve the estimation. However, as research went deeper, it was realized that the shedding profile of infected individuals is rather erratic and cannot be accurately represented by a fixed rate (Walsh et al. 2020;Buonerba et al. 2021), and other sources of uncertainty may further bring down the credibility of prevalence estimated from this manner (Li et al. 2021).
Unquantifiable data points did not receive sufficient consideration either. The wastewater virus concentration sometimes exceeds the limit of detection (LoD) or sensitivity threshold, but not the limit of quantification (LoQ), especially in the early stage of an epidemic or in a low-prevalence region. Data points that fall into this range can only be identified as 'positive' without a quantifiable value, making them binomial (either positive or negative). This creates a grey area of data as the previous mass balance approach fails to apply to data that fall within this range. Thus, it is critical to develop alternative modeling approaches that manage to extract information from these data, yet related studies are scarce.
This study aimed to analyze the relationship between the number of reported COVID-19 cases and the RT-qPCR-based viral RNA detection in wastewater influent. To achieve this, we monitored the time series of SARS-CoV-2 RNA concentrations in wastewater influent from Sendai city, Japan, from August 2020 to February 2021, with a highly sensitive detection method that consists of the recovery of viral RNA from wastewater solids and pre-amplification of cDNA before qPCR assay. Then, using the detection data, we present two metrics that help researchers extract information from the wastewater surveillance results. Specifically, we applied and evaluated two modeling approaches that apply to quantifiable and binomial wastewater virus detections, respectively.

Sample collection
A total of 51 influent wastewater samples were collected from a municipal wastewater plant (WWTP) in Sendai, Miyagi, Japan that receives approximately 69% of wastewater generated in the city. Samples were taken twice a week, at 10 a.m., on Tuesday and Thursday from August 2020 to February 2021, from an influent line that serves the major urban area and about 360,000 people. All samples were grab samples (250 mL). Samples were immediately transported to the laboratory after collection and stored at À80°C until analysis.
2.2. Virus concentration, nucleic acid extraction, and cDNA synthesis SARS-CoV-2 RNA was recovered from 40 mL influent wastewater sample. The suspended solid was concentrated by centrifugation at 5,000 g for 10 min at 4°C. After the supernatant was removed, 1 mL of TRIzol reagent (Thermo Fisher Scientific, MA, USA) was added to the concentrated suspended solid, then the suspension was homogenized using a vortex mixer. The total volume of the suspension was less than 3.5 mL. A 140 μL aliquot of the concentrate was processed for viral RNA extraction using the QIAamp Viral RNA Mini Kit (QIAGEN, Hilden, Germany) following the manufacturer's instructions, and the viral RNA was eluted in 60 μL of elution buffer provided in the kit. The whole process recovery of the SARS-CoV-2 RNA was verified based on the concentration of Pepper Mild Mottle Virus (PMMoV) in wastewater samples, 140 uL aliquot was extracted from samples both before and after concentration. A 10 μL of extracted RNA was used to obtain 20 μL of cDNA with the high-capacity cDNA RT Kit (Thermo Fisher Scientific). To synthesize cDNA for the SARS-CoV-2, the random primer included in the kit was substituted with CDC nCOV_N1-R Primer (CN 10006831, Integrated DNA Technologies, Inc., Iowa, USA) (10 μM) (CDC 2021).
2.3. SARS-CoV-2 genome pre-amplification and quantification by qPCR assay The PCR-based pre-amplification method was applied to cDNA prior to the qPCR assay. The pre-amplification was performed with TaKaRa Ex Taq ® Hot Start Version (Takara Bio, Kusatsu, Japan) and the CDC nCOV_N1 Primers (CN10006830 and CN10006831, Integrated DNA Technologies, Inc.) (CDC 2021). Each 50 μL reaction contained 20 μL of cDNA, 0.25 μL of TaKaRa Ex Taq HS (5 U/μL) (Takara Bio), 5 μL of 10ÂEx Taq Buffer (Mg 2+ plus) (20 mM) (Takara Bio), 4 μL of dNTP Mixture (Takara Bio), and 400 nM of forward and reverse primers. The PCR cycling condition was 2 min at 94°C, followed by 10 cycles of 30 s at 94°C, 30 s at 55°C, and 1 min at 72°C. The pre-amplification step was simultaneously applied to 18 or 20 μL of standard DNA (2.0 to 2.0Â104 copies/μL) created via 10-fold dilution series of 2019-nCoV_N_PositiveControl (CN10006625, Integrated DNA Technologies, Inc.). The pre-amplification step also was applied to 20 μL of TE buffer as the negative control for the pre-amplification and qPCR.
The concentrations of SARS-CoV-2 and PMMoV viral RNA were determined by real-time qPCR on a CFX96 real-time PCR detection system (Bio-Rad, Hercules, CA, USA). The amplification reaction was performed with SsoAdvanced Probes Supermix (Bio-Rad). For SARS-CoV-2 viral RNA, we used the same forward/reverse primers from the previous pre-amplification step with nCOV_N1 Probe Aliquot (CN10006832, Integrated DNA Technologies, Inc.) (CDC 2021), and RT-qPCR was performed only on pre-amplified samples. Each 20 μL reaction contained 5 μL of pre-amplified cDNA, 10 μL of SsoFast Probes Supermix (Bio-Rad), 500 nM of forward and reverse primers, and 200 nM of fluorogenic probe. The PCR cycling condition was 30 s at 95°C, followed by 40 cycles of 10 s at 95°C and 30 s at 60°C. The number of SARS-CoV-2 genome copies was determined by a standard curve generated with the pre-amped standard samples (1.0Â10 1 to 1.0Â10 5 copies/reaction). Each sample was quantified in triplicate. The amplification efficiency in the real-time PCR was at least 80%.
The concentration of SARS-CoV-2 viral RNA in the influent wastewater sample, C v,w (copies/mL), was determined by the following equation: where V is the volume of concentrated wastewater suspended with TRIzol regent (mL) (1.1-3.5 mL), C qPCR is the concentration of cDNA applied to qPCR (copies/μL), V w is the volume of raw wastewater (mL). V f,ex , V s,ex , V f,syn , V s,syn , V f,pre , and V s,pre are all the volumes of samples in the intermediate steps (μL). Subscripts f and s stand for the final and starting volumes, while subscripts ex, syn, and pre stand for the RNA extraction, the cDNA synthesis, and the pre-amplification, respectively. The LoQ was 118-375 copies/mL based on the quantification limit in a qPCR assay (2 copies/μL of preamped standard samples) and Equation (1).

PMMoV genome quantification by qPCR assay
For the quantification of PMMoV viral RNA, the cDNA was applied to real-time qPCR without pre-amplification. The amplification reaction was performed with SsoFast Probes Supermix (Bio-Rad), the reverse and forward primers, and probe. Each 20 μL reaction contained 5 μL of cDNA, 10 μL of SsoFast Probes Supermix (Bio-Rad), 500 nM of forward and reverse primers, and 200 nM of fluorogenic probe. The PCR cycling condition used for the detection of CDC N1 was also used for PMMoV. The number of PMMoV genome copies of one reaction was determined by a standard curve generated with the standard samples (1.0Â10 2 -1.0Â10 6 copies/reaction). Each sample was quantified in triplicate.
The concentration of PMMoV viral RNA in influent wastewater samples was determined by the following equation: where C qPCR is the concentration of cDNA applied to qPCR (copies/μL).

Viral load calculation
For data points with quantified SARS-CoV-2 RNA concentration, we applied a model developed in a prior study (Equation (3)) (Zhu et al. 2021). Briefly, patients are assigned different shedding rates according to a pre-defined shedding function that takes into count how many days the patients are into the infection course. The transition of the shedding rate is governed by a matrix whose dimension is determined by the patient report coverage and the shedding duration of infected individuals. Thus, with a consecutive patient number record, the viral load excreted by the active shedder base on a given day can be calculated as follows: In this model, L p [d] is the patient viral load on calendar day d, j is the shedding duration, P [n,d] represents the number of patients on the nth day of the infection course on day d and can be called from a matrix used to store and track the infection status of reported cases. C f,n is the fecal shedding rate (copies/g feces) that is calculated from a separate function, r f is the ratio of patients who develop fecal shedding, and m f stands for the daily feces load (g/day). More details can be found in Zhu et al. (2021).
The number of daily COVID-19 cases in Sendai city was obtained from the Sendai municipal government. The case report covers the entire population of Sendai city (about 1 million), while the catchment area only covers 360,000 people; the final patient viral load was multiplied by 0.3.
The patient viral load L p was then compared to the wastewater viral load L w calculated by Equation (4).
In this equation, the measured concentration of SARS-CoV-2 RNA in the wastewater C v,w is multiplied by the wastewater flow rate Q and a dilution coefficient k 1 to obtain the wastewater viral load L w . The daily flow rate was acquired from the treatment plant operator. A dilution coefficient k 1 was introduced to help normalize the wastewater viral load, the reason being that toilet flushing is not evenly distributed throughout the day. Previous studies have reported that a peak of toilet flushing in the morning accounts for about 25% of the total daily flushes (Butler et al. 1995;Campisano & Modica 2015). However, due to a lack of in-depth information, k 1 was set to 1 in this study.

Positive rate modeling
To test whether wastewater surveillance can provide information about how the epidemic may unfold, for data points with positive yet unquantifiable results, a prediction model framework was established ( Figure 1). First, the correlation between the reported cases and the positive rate was assessed by Spearman's rank-order correlation and generalized linear model (GLM). Rolling two-and four-week were used to ensure enough data points in a calculation window. The positive rate was calculated as the number of positive signals divided by the total sample number in the given calculation window (rolling two-or fourweek). Then, the positive rates from consecutive calculation windows were used as inputs to predict the reported cases in the last calculation window. Both the positive rate and reported cases were assigned to the last week of the calculation window.
Three models, GLM, artificial neural network (ANN), and random forest (RF), were employed to perform the prediction tasks for their ability to solve nonlinear regression problems and learn from available data. For each model, a pre-determined portion of the dataset (80%) was randomly selected for model training; the trained models were then used to perform prediction on the remaining part of the dataset (testing data). This random sampling-training prediction process was repeated 5,000 times. The mean squared error (MSE) of actual value versus predicted value was calculated each time for performance evaluation. The optimal number of inputs was determined through a performance analysis (Supplementary Table S2). The data pretreatment, model configuration, prediction, and statistical analysis were all performed using the R programming language, and related code is provided in the Supplementary material.

COVID-19 prevalence summary
As a low-prevalence region, Sendai was not severely hit by COVID-19 during the study period. A total of 2,142 cases were reported in Sendai city from August 3, 2020, to February 28, 2021 and can be approximately assigned into two outbreak events (Figure 2). The first one lasted between late October and late November 2021. The peak appeared on October 27, 2020, when 38 patients were reported. The second outbreak event that struck between mid-December and late January was more critical with a higher daily case count. There were 8 days when the daily reported cases exceeded the peak in the first outbreak, and 63 cases were reported on January 14, 2021, marking an all-time high. It is worth mentioning that although the daily reported patient number had a temporal dip during the New Year holiday, it was more likely due to the reduced testing capacity and delayed reporting rather than actual ease of epidemic. Following the second outbreak event, the daily reported cases dropped to a low level and remained that way until the end of the study period.

Wastewater surveillance summary
The concentration of PMMoV RNA in wastewater influent ranged from 5.2 log 10 to 5.8 log 10 genome copies/mL, while in concentrated influent samples ranged from 5.4 log 10 to 6.4 log 10 genome copies/mL. As the occurrence of PMMoV RNA in concentrated wastewater samples was stable, this may suggest there was no significant loss of SARS-CoV-2 RNA in the whole quantification process. Figure 1 | A brief illustration of the model framework. The positive rate and cumulative cases in a rolling week are calculated from a calculation window of two or four weeks, the values are assigned to the last week included in the calculation window. For example, when using rolling two-week, the positive rate during calendar weeks 1 and 2 is denoted as p (rolling week #1). In the prediction models, the model inputs are the positive rates in consecutive calculation windows (in chronological order), while the output is the cumulative cases of the following calculation window.
In total, 51 samples were examined throughout the surveillance period, and 33 of them were negative. The genome concentration corresponding to the highest Ct value in our assay (referred to as qualitative sensitivity hereafter) was estimated to be 0.025 copies/mL by extrapolating the standard curve. Seventeen samples recorded Ct values greater than this, but still lower than the LoQ, ranging from 1.18 Â 10 2 to 3.75 Â 10 2 copies/mL with a median of 1.61 Â 10 2 copies/mL. The amplification efficiency in real-time PCR was from 80 to 120% which was the acceptable range according to the MIQE guidelines (Bustin et al. 2009). The coefficient of determination of the standard curves was greater than 0.99 in each assay. No PCR products were detected in negative controls. We considered samples that were tested positive in at least one well in triplicate analysis as positive.
During the study period, the measured viral RNA concentration exceeded the LoQ only once on January 5, 2021 (Figure 2). With a concentration of 2.67 Â 10 2 copies/mL, the daily wastewater viral load calculated from Equation (4) would be 6.74 Â 10 13 copies. However, the viral load contributed by reported patients on that day, calculated using Equation (3), would be 3.40 Â 10 9 copies with 203 cumulative cases in the 26-day patient viral load calculation window, meaning a L w =L p value of 1.98 Â 10 4 . Also, this quantifiable signal occurred prior to the peak of reported cases which came 9 days later, on January 14, 2021. The wastewater virus concentration did not exceed the LoQ again despite a higher daily case count reported in the following days.
On the other hand, due to the lack of quantifiable data points, non-quantitative detection gave us a more consistent dataset to work on. The first positive signal occurred on August 7, 2020, with just 21 cumulative cases in the patient viral load calculation window and an estimated patient viral load of 3.42 Â 10 8 copies, which translates into a theoretical wastewater virus concentration of only 2.70 Â 10 À3 copies/mL, far below the qualitative sensitivity. However, assuming the concentration sits somewhere between the qualitative sensitivity and the LoQ, the wastewater viral load would have a range of 6.32 Â 10 9 -4.06 Â 10 13 , thus a L w =L p range of 1.85 Â 10 1 -1.19 Â 10 5 , covering the L w =L p value estimated from the quantifiable detection on January 5, 2021. Over the study period, a total of 18 (35.29%) samples were tested positive. Although a positive signal does not directly translate into wastewater viral concentration, consecutive positives may indicate a high viral load with higher confidence. In that sense, two consecutive positives appeared four times and all of which occurred during the two outbreak events.

. Positive rate modeling
A stronger correlation was found between the four-week positive rate and cumulative cases than that between the two-week positive rate and cumulative cases (Figure 3). Therefore, the prediction models were used to predict the cumulative cases using the four-week positive rate. By testing different amounts of input numbers, we found the optimal value was two (Supplementary Table S2). Among the three models, ANN offered the best overall performance (median MSE: 7520.22) followed by the other two (median MSE: 9,038.60 (GLM), 12,021.26 (RF)). For about half of the data points (45.83%, 11 in 24), the actual four-week cumulative cases were within the 95% CI range of the prediction. For the remaining data points, the average error was 17.51%.

DISCUSSION
By employing a highly sensitive detection method, we monitored the time series of SARS-CoV-2 RNA occurrence in wastewater influent from an urban community with a population of 360,000. Eighteen out of the 51 influent samples yielded positive signals, and 17 samples had SARS-CoV-2 RNA concentrations lower than the LoQ. By examining the reported cases, we found the positive rate of detection has a strong correlation (four-week rolling window, ρ¼0.7598, p,0.05) with the cumulative cases in the same time frame and established prediction models, hoping to extend the knowledge on wastewater-based epidemiology (WBE) implementation strategies.
In this study, the LoQ was around 1.61 Â 10 2 copies/mL, about 1 log above the data reported in the previous studies (Ahmed et al. 2020a;Hokajärvi et al. 2021). This is due to the decreased volume of a sample considered throughout the analysis in this study. Specifically, the volume of a sample input was approximately 1/10, 1/6, and 1/10 of the total volume of a suspension obtained in a previous step, in RNA extraction, cDNA synthesis, and qPCR, respectively. The LoQ may be improved by using other different extraction kits that use a larger volume of samples for extraction. Nevertheless, the method we employed is a feasible option because the data is obtained within 1 day, and our analysis has not been restricted by the stock shortages of manufacturers.
We used PMMoV as an internal process control for SARS-CoV-2 detection from suspended solids. PMMoV is detected throughout the year and abundant in wastewater (10 6 -10 10 copies/L) (Kitajima et al. 2014;Symonds et al. 2018; Ahmed et al. 2020b), which may allow for using it as an internal control of RT-qPCR for a wastewater sample (Haramoto et al. 2020). A previous study reported that the recovered load of PMMoV correlated with that of murine hepatitis virus, suggesting that PMMoV is the potential indicator of the efficiency of SARS-CoV-2 (Torii et al. 2022). We concluded that there was no significant loss throughout the analysis because the PMMoV concentration was consistent in both influent and concentrated samples. Future studies should decide the best whole process control for extraction from suspended solids. The better options are human coronaviruses 229E and HKU1 although the longitudinal concentration has not been reported (Bibby et al. 2011;Bibby & Peccia 2013). Figure 4 | The cumulative cases predicted by the three models. The inputs were the positive rates in two consecutive rolling four-weeks, while the output was within the latter. Normalization was applied to inputs and output before model training and all data were denormalized back to the original scale once the prediction was conducted. The three rows of figures are the time series of actual versus predicted (median and 95% CI) four-week cumulative cases for each model, the scatter plot of 1,000 randomly selected pairs of actual and predicted values, and the boxplot of MSE distribution from the 5,000 predictions for each model, respectively.
The pre-amplification employed in this study increases the number of amplicons in the downstream qPCR. The theoretical qualitative sensitivity should have the same Ct value as that obtained from one copy per reaction, which in this study is 32.9. However, this value was surpassed by multiple samples whose Ct values reached up to 40.0. A possible explanation for the results was that organic compounds in the influent samples inhibited the amplification efficiency in PCR. We evaluated the effect of inhibitors using a commercially available RNA positive control according to the Gibson et al. (2012) study but did not observe lowered efficiency in pre-amplification and qPCR of the positive control RNA, indicating that other reasons were responsible for the greater-than-expected Ct values. It may be explained by the different affinity efficiency of primers and polymerase to the target amplicons between the cDNA derived from viral RNA and plasmid DNA used as the positive control.
From the perspective of early warning, getting a positive signal from wastewater can be a solid proof that the virus has started circulating in the community (Randazzo et al. 2020;Fernandez-Cassi et al. 2021). So far, different studies have reported varied sensitivity. Hong et al. (2021) reported that a positive signal in hospital wastewater requires 253-409 positive cases out of 10,000 individuals, while Hata et al. (2021) detected the presence of SARS-CoV-2 RNA in municipal wastewater when the number of cases was ,1.0 per 100,000 people, and Betancourt et al. (2021) reported a positive detection when there were only one symptomatic and two asymptomatic individuals among a total of 311 residents in a student dormitory. In practice, the varied detection sensitivity can be mainly attributed to the different experimental methods used as well as the characteristics of the sewage system in which samples are collected. A standardized method may contribute to the comparison and integration of studies (Weidhaas et al. 2021). In this study, when the first positive signal was recorded, the number of active shedders estimated from the clinical reports was only 21 in the catchment area with about 360,000 people. Nevertheless, a bigger active shedder base does not guarantee a positive signal in subsequent detections. Even during the summit of the second outbreak event which enabled the only signal above LoQ, negatives were still recorded. Such inconsistent detection had also been reported in other studies (Fernandez-Cassi et al. 2021;Hong et al. 2021), adding another layer of complexity.
Despite the strong interest in quantitative wastewater surveillance, a streamlined solution has yet to be formed. Especially, although the experimental side has received substantial attention which led to more sensitive and reliable detection, the analytical side still lacks adequate investigation and verification. There is a noteworthy knowledge gap in how to associate the measured wastewater virus concentration with the epidemic size in the catchment area. So far, most studies had tried directly correlating the abundance of viral RNA with reported cases. However, the following are the findings in our study: (1) a positive signal occurred when the speculated active shedder group was supposedly far from large enough to enable a successful detection, (2) higher reported cases did not translate to high wastewater virus concentration, and (3) there is a significant yet uncertain gap between the observed wastewater viral load and the viral load contributed by the supposed active shedder base, presented as L w =L p in this study, all point to a conclusion that at the current stage, the uncertainty associated with the wastewater viral load is still a great hindrance to reliable back-calculation.
The dimensionless metric L w =L p largely determines the robustness of back-calculation. Although a stable L w =L p is ideal for back-calculation and was therefore assumed in some recent studies, its value seems to be both time-and location-specific due to various factors. For instance, in Sendai, August and September are the rainy season, because the major urban area is served by a combined sewer, this likely aggravated the dilution of viral RNA and led to a lower L w =L p . This is supported by the lower bound of L w =L p estimated for August 7, 2020 when the first positive signal appeared. The way patient viral load L p was calculated also implies it can be impacted by societal factors. For instance, a high level of underreporting may occur under limited testing capacity, leading to a smaller speculated active shedder base, thus a smaller L p and a larger L w =L p . Similarly, if the asymptomatic infection ratio increase, a larger L w =L p can also be expected. Knowing this, some critical epidemic-related information may be drawn by keeping a close eye on L w =L p . Nevertheless, it should be pointed out that our calculations of L w and L p were based on a set of assumptions including the shedding profile, which may be further refined once more medical evidence becomes available.
As shown in this study, in wastewater surveillance projects, researchers may obtain positive yet unquantifiable signals, especially in a low-prevalence period/region. As far as we know, no wastewater surveillance study has utilized binominal data other than as occurrence indicator yet. But, the detection frequency, or positive rate, might serve as a suitable indicator of the virus occurrence upon which further analysis can be performed. This indicates that binominal results may also be utilized to help with epidemic surveillance while establishing a precise connection between the wastewater viral load and the prevalence level remains challenging and entails further research. Rolling four-week was used as the calculation window in this study, but it may be shortened by increasing the sampling frequency or the number of samples collected each time, albeit more time and resource consuming. It should be noted, though, that using a positive rate as an indicator may only be feasible in a low-prevalence region or at the early stage of an outbreak event. There exists an upper bound of epidemic size beyond which its linear correlation with positive rate becomes invalid, which may explain why the epidemic peaks were not successfully modeled in this study. On the other hand, a higher prevalence level means a higher chance of getting quantifiable signals, and back-calculation models should take over once developed.
When a causal relationship between input and output is difficult to establish, data-driven methods like those used in this study may be employed. However, being data-driven also means training data need to be accumulated to fine-tune the model, and prediction does not always match the reality. With all the uncertainties, it should be reiterated that wastewater surveillance ought not to be a stand-alone tool and its outcome should be interpreted along with other information sources before reaching any conclusion. For instance, in the early stage of an epidemic when clinical testing capacity is often compromised, wastewater detection may be put into action quicker and cover a larger area. Nevertheless, our study shows that a positive rate may be an important indicator, as also recognized by a recent study (Fernandez-Cassi et al. 2021). Also, in terms of prediction accuracy, as stated above, environmental and societal factors may affect the detection result; thus, adding explanatory variables into the model may improve the model performance.
Several limitations of this study should be noted. First, the lag between symptom onset and reporting was not included in modeling. Counting in the delay in case reporting may explain the 9-day delay between the quantified virus concentration and the case peak, it may also improve the correlation between the positive rate and cumulative cases, as cases would be assigned to an earlier date. However, existing studies about the delay between symptom onset and hospitalization had varied estimations ranging from 7 days (Huang et al. 2020) to a much shorter 1.2 days (Lauer et al. 2020). Therefore, without enough information about the local testing and reporting practice, integrating this factor into the model may introduce further error. Second, grab samples are prone to short-term heterogeneity of viral RNA abundance, which may affect the representativeness of samples. While composite samples collected by an autosampler may improve the consistency of detection, the viral RNA may also get highly diluted as toilet flushing mainly occurs during certain times, resulting in false negatives. Designing a sampling strategy that captures the toilet flushing peak, therefore, may be a viable solution as suggested by recent studies (Betancourt et al. 2021).

CONCLUSIONS
As the world is still under the shadow of COVID-19, on top of timely medical and societal intervention, each and every tool that helps monitor the situation and alerts the society is worth looking into. In this study, using a highly sensitive assay, we (1) monitored the occurrence of SARS-CoV-2 viral RNA in the wastewater of an urban area in Japan for over 7 months and (2) established a model framework to help extend the existing knowledge base about analyzing and interpreting the surveillance results. Particularly, we found that although quantitative epidemic size estimation based on measured virus concentration is still challenging, the positive rate of wastewater virus detection is strongly correlated with reported cases and can be used for its prediction, which may guide toward novel wastewater surveillance strategies. Our findings may not only strengthen the application of wastewater surveillance in the current COVID-19 pandemic but also help the scientific community prepare for other public health challenges.