Abstract
The uncertainty associated with the determination of load parameters, which is a key step in the design of wastewater treatment plants (WWTPs), was investigated on the basis of data sets from 58 WWTPs. A further analysed aspect was the organic load variations associated with variable sewage temperatures. Data from 26 WWTPs with a high inflow sampling frequency was used to simulate scenarios to investigate the effect of lower sampling frequencies through a Monte Carlo approach. The calculation of 85-percentile values for chemical oxygen demand (COD) loadings based on only 26 samples per year is associated with a variability of up to ±18%. Approximately 90 samples per year will be necessary to reduce this uncertainty for estimation of COD loadings below 10%. Hence, a low sampling frequency can potentially lead to under- or overestimation of design parameters. Through an analogous approach, it was possible to identify uncertainties of ±11% in COD loading when weekly average data was used with four samples per week. Finally, a tendency to lower COD input loads with increasing temperatures was identified, with a reduction of about 1% of the average loading per degree Celsius.
HIGHLIGHTS
Uncertainty of statistical measures for COD loads relating to sampling frequency.
Determination of yearly 85-percentile values from COD loads with one weekly sample results in an up to ±13% uncertainty.
Calculation of weekly averages of COD loads with four samples per week is associated with an uncertainty of up to ±11%.
Relative influent COD loading decreases with higher temperatures.
Graphical Abstract
INTRODUCTION
The prognosis of the ongoing processes within domestic wastewater treatment plants (WWTPs) is a highly complex task that requires identifying uncertainties at several levels and from different sources (Oliveira & von Sperling 2008; Belia et al. 2009). With respect to the design of these facilities, the determination of organic and nutrient loadings from sewage is a fundamental step, which requires a careful consideration of its associated uncertainty. In locations with pre-existing sewage systems, the design loadings are commonly derived from a pool of inflow samples, whereby preferentially volume proportional samples should be used for load calculations. The combination of average values and peak loading factors are commonly suggested as the basis for design of WWTPs (e.g. Tchobanoglous et al. 2013). Alternatively, percentile values can be used to provide an estimation of design loadings. Thereby, different guidelines for the design of WWTPs recommend the use of percentile values to quantify loading parameters (Henze et al. 2002; ATV-DVWK 2003). In an international overview of activated sludge dimensioning guidelines – comprising design models from the United States, Denmark, Germany, Austria, Switzerland, Japan, and South Africa – the 50, 60, 80, and 90% percentiles are additionally reported for calculation of oxygen demand and nutrient loadings (Wichern 1996). However, information regarding the determination of an ideal sampling frequency for the derivation of organic and nutrient loading parameters is scarce. So far, no information is available on how the sampling frequency affects the accuracy of input parameters; for example, what is the gain in a parameter's accuracy when derived from 100 inflow samples in comparison to 50 inflow samples. This is an important question for understanding the quality of the design data, dimensioning of WWTPs and for planning cost-effective measuring campaigns. For instance, the quantification of loadings from antibiotics with an accuracy of ±20% might require from 10 to 160 samples per year depending largely on the target substance seasonality (Marx et al. 2015).
Regarding simulation studies, several recommendations and strategies to incorporate stochastic influent variations and prognostic uncertainties exist (e.g. Ort et al. 2005; Martin & Vanrolleghem 2014; Talebizadeh et al. 2016). Devisscher et al. (2006) suggested the use of seasonal average load data calculated from at least 2 years of operation data to perform simulations for cost estimation and evaluation of control strategies. Their recommendation is to use only values that lie within the 5 and 95% percentile limits for calculating average load. Moreover, the calibration of WWTPs models also is very dependent on the amount, and quality of influent data. Hence, the number of samples and tests should be kept at a good but minimal level to allow cost-effective use of the models (Hulsbeek et al. 2002). Still, recommendations for sampling campaigns have a much stronger focus on the wastewater characterization than on estimation of statistic measures for design purposes.
In this work, we provide a quantification of expected variance ranges of key inflow parameters for design of WWTPs on dependence of the sampling frequency. The chemical oxygen demand (COD) is the reference parameter to determine influent organic loadings for this present analysis. This is in line with most of the WWTP design guidelines (e.g. DWA 2016) and simulation models (e.g. Hauduc et al. 2010). This analysis concentrates exclusively on the variability of the measured data, and is not considering other sources of uncertainty and potential systematic errors, such as precision of flow measurement and of the analytical techniques or sampling conservation at the automatic auto sampler (Belia et al. 2009; Rieger et al. 2012). The influence of the sample pool size was investigated for two representative statistic measures for dimensioning and modelling purposes: the yearly 85-percentile and weekly averages. Additionally, a correlation analysis between wastewater temperature and influent COD load was performed. The main hypothesis for this analysis was that the degradation rates in the sewage system increase at higher temperatures and that this correlation can be visualised in a large sample collective from different WWTPs. As discussed by Ahnert et al. (2005), the temperature dependency may also have important implications for the determination of degradable and non-degradable organic material fractions.
MATERIALS AND METHODS
Database
Inflow data from a total of 58 activated sludge WWTPs (52 in Germany and six in Switzerland) was available for this study. The data was collected by members of the work group KA 6.4 (Deutsche Vereinigung für Wasserwirtschaft, Abwasser und Abfall e.V., German Water Association; DWA) and anonymized prior to this study. Together these 58 WWTPs correspond to an equivalent loading of ∼14 million people equivalent with a total of 64,705 daily COD loadings data points, which were determined on the basis of 24-h composite samples. Hence, the organic loading was obtained by multiplication by the total flow rate of a day and the COD concentration from the composite sample on this day. Data series with samples at the effluent of grit chambers and of primary clarifiers were both considered for these analyses. However, only a single sample point per WWTP was used. Thus, the number of sampling points at Figure 1 refers only to the sampling location with the largest pool of data. A threshold of 260 samples per year (or five samples per week) was defined to indicate WWTPs with quasi-complete data sets, which could be used for the stochastic reduction of the sampling pool. Data series from WWTP connected to separate and combined sewer systems and with variable contribution of industrial wastewaters were considered here. These sewage system characteristics were not further considered.
Average daily chemical oxygen demand (COD) loadings and average number of samples per year that were available for each WWTP. Striped columns indicate that the COD loading and sample availability refer to measurements at the effluent of the grit chamber, while grey columns refer to influent data to the activated sludge tanks. The red dashed horizontal line indicates the threshold for WWTPs with a complete or quasi-complete data set, which were used for the uncertainty analysis regarding 85-percentile loads. The red letters ‘T’ identify the WWTPs used for the COD and temperature correlation analysis. The full colour version of this figure is available in the online version of this paper, at http://dx.doi.org/10.2166/wst.2020.588.
Average daily chemical oxygen demand (COD) loadings and average number of samples per year that were available for each WWTP. Striped columns indicate that the COD loading and sample availability refer to measurements at the effluent of the grit chamber, while grey columns refer to influent data to the activated sludge tanks. The red dashed horizontal line indicates the threshold for WWTPs with a complete or quasi-complete data set, which were used for the uncertainty analysis regarding 85-percentile loads. The red letters ‘T’ identify the WWTPs used for the COD and temperature correlation analysis. The full colour version of this figure is available in the online version of this paper, at http://dx.doi.org/10.2166/wst.2020.588.
Calculation platform
All the calculations were performed using the software MATLAB (release 2017b; MathWorks, USA). The Monte Carlo simulation used the function randsample for randomized sample selection. The calculation of the percentile value was performed with the function prctile in the MATLAB software. In short, this method performs a sorting of the vector accessing for each data point a percentile proportional to the sample size. All other percentiles in between are interpolated.
Quantification of the effects of the sampling frequency on determination of the COD loading


The stochastic re-sampling of the yearly data sets was performed through two different year-matrix structures: (i) 52 weeks with 7 days; and (ii) 13 pseudo-months with 28 days. In this way, the removal of sampling points was always uniformly distributed – for example, 208 samples per year result in a distribution of four samples per week or 16 samples per pseudo-month (Figure 2(b)) – and a seasonal over-representation was avoided (e.g. more samples in summer than winter). All the years started on 1 January and no distinction between weekdays was considered.
Schematic visualization of the methodology to quantify the effects of the sampling frequency on the determination of the COD loading for design purposes. (a) Sequential steps in the calculation with weeks and pseudo-months; (b) exemplary visualization of the year data sets with selection of four samples per week and 16 samples per pseudo-month from a complete year-data set. Filled squares indicated the selected samples for the 85-percentile calculation and the empty squares are the randomly eliminated samples and/or the previous missing samples.
Schematic visualization of the methodology to quantify the effects of the sampling frequency on the determination of the COD loading for design purposes. (a) Sequential steps in the calculation with weeks and pseudo-months; (b) exemplary visualization of the year data sets with selection of four samples per week and 16 samples per pseudo-month from a complete year-data set. Filled squares indicated the selected samples for the 85-percentile calculation and the empty squares are the randomly eliminated samples and/or the previous missing samples.
The RSD by calculation of weekly average COD load was performed analogously to the uncertainty analysis for determination of 85-percentile COD loadings. Therefore, first all weeks with measurements available for all days were selected from all the 58 WWTPs data sets. A total of 3,179 weeks with 7-data points were available for this analysis. Through a Monte Carlo simulation, these data sets were reduced to lower sampling frequencies of 1, 2, 3, 4, 5, and 6 samples per week. For each week and sampling frequency, 10,000 reduced data sets were generated and the RSD between their average values was calculated with Equation (1).
Wastewater temperature and COD loading measurements
RESULTS AND DISCUSSION
Sampling frequency effects on calculation of COD loadings
The analysis with data of 26 WWTPs, totalizing 92 individual years with at least 260 samples per year, indicated that the re-sampling using weekly and monthly based data structures results in very similar RSD values (Table S1 in supplementary materials). The month-based sampling results in 0.1 to 0.6% higher RSD95 values. Due to the convergence of both methods for discretization of the yearly data, they were integrated in one graphic (Figure 3(a)). The lowest sampling frequency of 13 samples per year resulted in an RSD95 of up to 25% within the 95% confidence threshold. Increasing the sampling frequency to one sample per week almost halved the RSD95, and at least 208 samples per year are necessary to have an RSD95 below 5%. To reach a similar RSD95 for determination of weekly values, six samples are necessary. A utilization of four samples in a week, which is the minimal number of samples according to the German standard A198 that defines how to determine the influent parameters for WWTPs (ATV-DVWK 2003), results in an RSD95 of about 11%.
Relative standard deviations (RSD) using COD load series that were stochastically reduced to lower sampling pools. (a) RSD for 85-percentile loads calculation; (b) RSD for weekly arithmetic mean values. The grey area gives the RSD confidence interval within the 5 to 95-percentiles and the median RSD is indicated by the black line and dots. Data from both diagrams is given in Tables S1 and S2.
Relative standard deviations (RSD) using COD load series that were stochastically reduced to lower sampling pools. (a) RSD for 85-percentile loads calculation; (b) RSD for weekly arithmetic mean values. The grey area gives the RSD confidence interval within the 5 to 95-percentiles and the median RSD is indicated by the black line and dots. Data from both diagrams is given in Tables S1 and S2.
An increase of the catchment area and number of connections to the sewer system is expected to result in lower stochastic variation of the organic loadings at the influent of WTTPs (Tchobanoglous et al. 2013; DWA 2016). Although the three largest WWTPs of this study (COD loading above 80 ton/d) are within the lowest RSD range (Figure 4), it is not possible to confirm this trend here. The correlation between the RSD for 52 and 104 samples per year results in a coefficient of determination (R2) of 0.17 and 0.16, respectively. This low R2 can be explained by the irregular distribution of the available WWTPs with high influent sampling regarding their COD loadings. The range from 3 to 20 ton COD per day represents more than half of the WWTPs (15 from 26). Moreover, it is important to underline that the number of yearly data series from each WWTP varied from 2 to 7 years (Figure 4(a)), and that the analysis integrates two sampling points (effluent grit chamber and influent activated sludge tanks). Thus, these factors limit the identification of potential correlations between RSD and specific characteristics of the WWTPs.
Mean relative standard deviations (RSD) from each of the 26 wastewater treatment plants (WWTPs) with more than 260 samples per year and their respective average daily chemical oxygen demand (COD). Data from the effluent of the grit chamber is depicted with white-filled circles, while blue filling refers to influent data to the activated sludge tanks. (a) Mean RSD calculated for 52 samples per year. The number at the left of each symbol indicates the number of data-years used for the mean RSD calculation, which was available for each WWTP; (b) mean RSD calculated for 104 samples per year. The full colour version of this figure is available in the online version of this paper, at http://dx.doi.org/10.2166/wst.2020.588.
Mean relative standard deviations (RSD) from each of the 26 wastewater treatment plants (WWTPs) with more than 260 samples per year and their respective average daily chemical oxygen demand (COD). Data from the effluent of the grit chamber is depicted with white-filled circles, while blue filling refers to influent data to the activated sludge tanks. (a) Mean RSD calculated for 52 samples per year. The number at the left of each symbol indicates the number of data-years used for the mean RSD calculation, which was available for each WWTP; (b) mean RSD calculated for 104 samples per year. The full colour version of this figure is available in the online version of this paper, at http://dx.doi.org/10.2166/wst.2020.588.
COD loadings and temperature
In the visualization of the results for the COD load's dependency on the wastewater temperature, a distinction was made between temperature intervals with a high and low number of data points (Figure 5). Therefore, a threshold value of nT = 400 was defined and helps to discard potential artefacts that might result from low data density. The same nT threshold was assumed for the P data (Figure 5(b)). Due to the ∼20% daily data points without information regarding P loads, the LP and RP/COD has a tighter temperature interval with a high number of data points.
Mean relative COD loads and P to COD ratios for variable temperatures. Results for the evaluation from 17 WWTPs with 19,238 daily data points (nT) for the COD and temperature analysis, and 15,133 daily data points for the analysis, including phosphorous data. Filled symbols indicate an nT >400 and symbols without filling indicate temperature clusters with low sample availability. (a) Arithmetic mean of the relative COD loadings for the given temperature are depicted in red lines and diamonds, while black lines and circles indicate 85-percentile values of the relative COD loadings for the given temperature. (b) Arithmetic means of the relative P loadings are depicted in black line and triangles, while red lines and squares indicate the relative P to COD ratios for the given temperature. Data from the diagram is detailed in Table S3. The full colour version of this figure is available in the online version of this paper, at http://dx.doi.org/10.2166/wst.2020.588.
Mean relative COD loads and P to COD ratios for variable temperatures. Results for the evaluation from 17 WWTPs with 19,238 daily data points (nT) for the COD and temperature analysis, and 15,133 daily data points for the analysis, including phosphorous data. Filled symbols indicate an nT >400 and symbols without filling indicate temperature clusters with low sample availability. (a) Arithmetic mean of the relative COD loadings for the given temperature are depicted in red lines and diamonds, while black lines and circles indicate 85-percentile values of the relative COD loadings for the given temperature. (b) Arithmetic means of the relative P loadings are depicted in black line and triangles, while red lines and squares indicate the relative P to COD ratios for the given temperature. Data from the diagram is detailed in Table S3. The full colour version of this figure is available in the online version of this paper, at http://dx.doi.org/10.2166/wst.2020.588.
Overall, there is a tendency for lower loads to reach the sewage treatment plant with rising temperatures. Considering the high-density data interval from 10 to 23 °C, the average relative COD decreases 14%. This decline is more accentuated for the relative 85-percentile values, with a 19% lower loading at 23 °C than at 10 °C (Figure 5(a)). This is in line with the expected increased biological activity within the sewage system (Ahnert et al. 2005; Sun et al. 2018). However, there is also a similar trend for the phosphorous loads (Figure 5(b)), which decreases 7% between 11 to 22 °C. In view that no net phosphorous losses from the wastewater should be possible during the transport to the WWTP; that is, no long-term phosphorous accumulation or release as gas, this might point to seasonal variability of the organic loadings. Still, the higher temperature dependence of the COD loads in comparison to the phosphorous loads (Figure S1) combined with the concomitant increase of the RP/COD (4% from T = 11 to 22 °C) for higher temperatures are important indicators of biological degradation of organic matter.
Implication of these uncertainties estimations for the design of WWTPs
Regarding the determination of organic loadings at the influent, there are irreducible and reducible uncertainties sources (Belia et al. 2009). While variations due to scenario prognostics including weather and demographic changes lead to an intrinsic uncertainty, the variability of an existing system can be evaluated through intensive monitoring. Indeed, according to Figure 3(a), the uncertainty range by estimation of organic loads is reduced in an exponential-like curve by increasing the sampling frequency from 13 to 104 samples per year with an RSD95 drop from 24.9 to 8.4%. Thereafter, the correlation follows a quasi-linear behaviour and the RSD95 decreases to 2.9% for 312 samples per year.
Nonetheless, despite providing this uncertainty quantification in terms of the RSD95 values, the translation of this information to practical design might be not completely straightforward. First, because this analysis does not provide any further support to define what is an acceptable uncertainty level for the design of WWTP; that is, how much uncertainty by the estimation of the 85-percentile COD loads can be afforded? Therefore, the effects of these stochastic variations on the discharge characteristics need to be further considered (Oliveira & von Sperling 2008). Second, the absolute variations were dependent on the calculation approach developed here. The adoption of other thresholds for the quasi-complete data sets, the differentiation of working days and weekends, rain and dry weather conditions or merging the WWTP independent yearly data series (instead of separating each WWTP, as in Figure 4) would potentially affect the RSD calculations. However, further modifications or improvements to this calculation might require an expansion of this data set to include more WWTPs with high sampling frequency. Therefore, the proposed methodology (Figure 2) was developed aiming to use as few assumptions as possible to sort the data and to allow for an integrated analysis of all yearly data series. Indeed, it is expected that a larger pool of WWTPs with quasi-complete data sets would allow for identifying the dependence of the influent load variations to other variables, such as the catchment area size (Figure 4), according the sewer systems (separate and combined sewer systems), to the density of industries or to climatic conditions.
Conversely, the calculation of RSD95 resulting from missing samples for weekly average values could rely on a much larger data pool – 3,179 full-data weeks – than the yearly percentile calculations. A reduction of the complete week data series down to two samples per week results in almost linear increase of the RSD95 by ∼4% per missing sample. Interestingly, this data provide an estimate of the potential variation in the interpolation of weekly measurements from the RSD95 with a single sample per week of ±32.1%. Data interpolation is often reported in simulation studies. These loading variation uncertainties can be integrated with other process-related uncertainties in model-based studies for WWTPs operation and optimization, for example: Sin et al. (2009).
The relatively lower COD loadings at higher temperatures highlight the importance of considering seasonality and potential degradation processes within sewer systems. The approximate decrease of 10% COD load per 10 °C (Figure 5) shows the quantifiable impact of these temperature effects for the organic loads at WWTPs. A pre-degradation of the biodegradable wastewater fraction within the sewer system might increase the influent inert COD fractions. This would led to important implications for the operation of WWTPs, affecting the sludge production, the oxygen demand, and denitrification processes, among other aspects (Wichern et al. 2002; Ahnert et al. 2005). Moreover, in combined sewer systems, rain and dry weather conditions can have an important influence on the transport of solids (Lange & Wichern 2013), which might affect the COD loads. Hence, these climatic conditions; that is, temperature and precipitation, have potential overlapping effects within this analysis. Therefore, seasonal dependencies and wastewater temperature are important factors to be considered during sampling campaigns.
CONCLUSIONS
The methodology for RSD determination presented here provides an approach to quantify the uncertainty of the influent COD loading at WWTPs with different sampling frequencies. Thereby the 95-percentiles of the RSD from 92 yearly data sets with at least 260 measured points was used as an uncertainty measure. About 90 samples per year are necessary to assure a variability below ±10% for the estimation of the 85-percentile value for the COD loading. The estimation of a weekly average COD loading with a similar precision requires four daily loadings per week. Although the implications of these estimated uncertainties are not fully addressed here, this study provides a novel reference from real data for the potential deviations of statistic measures for organic loadings that might arise with insufficient sampling. These findings can be used in simulation studies as well as to interpret data or sampling campaigns on WWTPs.
This method of stochastic reduction of sample pools can be implemented to any other WWTP, or sewer system. Also for different parameters, if data series with a sufficient sampling frequency are provided. Furthermore, the scope of this analysis might be expanded in future investigations by the identification of the probability distribution of the data and its corresponding parameters. The analysis, based on a large sample pool of WWTPs, also allows for identification of potential effects from increased wastewater temperatures and organic loading variations. However, further investigations including a detailed overview of the catchment area are necessary to provide a comprehensive description of the underlying degradation mechanisms and to disclose effects from precipitation events (e.g. Rodríguez et al. 2013).
ACKNOWLEDGEMENTS
The described investigation is part of the revision of German manual ATV-DVWK (2003) by the DWA technical board KA 6 and its work group KA 6.4. The authors would like to thank all the WWTP operators, which provided the data, and the KA 6.4 members: M. Ahnert, K. Alt, M. Blunschi, S. Keller, M. Klingel, R.-L. Lange, K.-H Rosenwinkel, B. Teichgräber, D. Thöle, A. Spindler, and T. Schmitt. The financing by the German Federal Ministry of Education and Research (BMBF; Project 02WA1450B) is acknowledged by T. Gehring. We also thank the suggestion of an anonymous reviewer to include the phosphorous data in these analyses.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.