ABSTRACT
This study presents a novel approach for the automatic optimization of sizing French vertical flow wetlands, a widely used solution in wastewater treatment across France, with over 6,000 operational units. By leveraging a comprehensive database collected by the French National Research Institute for Agriculture, Food and Environment (INRAE), we have developed robust regression models that link total Kjeldahl nitrogen to surface and total chemical oxygen demand to material depth. By integrating these models into a unified optimization framework, our methodology enhances the design of the entire treatment train, particularly addressing how the sizing of the first stage influences the second stage sizing while complying with stringent effluent quality standards. Extensive validation confirms the effectiveness and reliability of our approach, which is tailored to specific influent characteristics and regulatory requirements. Our strategy improves upon traditional sizing methods, which often rely on surface loads, by allowing for refined optimization based on specific performance targets. The findings contribute to improved design practices and suggest potential directions for future developments, including the integration of additional treatment processes and multi-criteria decision analysis for broader applicability in wastewater management.
HIGHLIGHTS
A novel optimization framework for sizing treatment wetlands considering the entire treatment train.
It combines regression models with global optimization under constraints.
Uncertainties are provided for a more secure design.
Filter depths over 30 cm are necessary for stringent outlet levels or heavily loaded influents.
Sizing for TKN outlet levels down to 10 mg/L is achievable with acceptable uncertainty.
INTRODUCTION
The French vertical flow wetland (Dotro et al. 2021) is a two-stage design whose main feature is the direct discharge of raw wastewater in its first treatment stage (Molle et al. 2005). The first experiments in France were initiated in the 1980s, and the design rules were derived empirically by observing the successive developments (Liénard 2010; Morvannou et al. 2015).
At present, there are over 6,000 operational French vertical flow wetlands serving small communities in France and a growing number abroad (Ministère de la Transition Écologique 2023). The regulatory monitoring results have been collected in a database (Morvannou et al. 2015) by the French National Research Institute for Agriculture, Food and Environment (INRAE). This includes the collection of accurate design data for approximately 100 wastewater treatment plants, which has been used to define design equations based on regression models.
Further data collection has enhanced existing data-driven models, particularly through the incorporation of filter depth (Millot et al. 2016). Furthermore, the advent of more stringent outflow level quality standards has also prompted the necessity for these improvements. Regression models represent one among several other numerical models that have been developed over the last three decades with the objective of understanding and predicting the performance of treatment wetlands (Meyer et al. 2015; Defo et al. 2017; Yuan et al. 2020).
These models can be classified into two categories:
- Data-driven models. These models are typically concerned with forecasting outcomes rather than processes, focusing on inputs and outputs based solely on the collected data. These models range from linear regressions to simple machine learning regressors to deep learning methods (Nayak et al. 2006; Naz et al. 2009).
- Mechanistic process-based models. These models are designed to provide a more detailed description of the various mechanisms of contaminant transformation and degradation that occur at the core of a treatment wetland. The hydrodynamic is represented either using tank-in-series models (Kadlec & Knight 1996; Kadlec 2000; Stone et al. 2004; Stein et al. 2007; Sun et al. 2008; Kadlec & Wallace 2009) or by solving the Richards equation for water flow in variably saturated porous media, using for example the finite element method (Langergraber & Šimůnek 2005). Biodegradation is estimated using either first-order models (Rousseau et al. 2004) or Monod models (Mitchell & McNevin 2001). Various models and software packages are available for use, with the HYDRUS wetland module (Langergraber & Šimůnek 2005, 2012; García et al. 2010; Morvannou et al. 2014) being the most widely used. Other favored choices include FITOVERT (Brovelli et al. 2007; Giraldi et al. 2010), BIO_PORE (Samsó & Garcia 2013a, b), among others.
The majority of model applications have concentrated on predicting performance, with only a small number of studies focusing on design optimization (Pálfy et al. 2016; Ruiz 2017). Furthermore, the studies that have addressed this issue have focused on a specific type of wetland. However, the majority of wastewater treatment plants consist of multiple treatment wetlands arranged in a train. When considering the optimization of trains, it is imperative to consider the issue of combining reactors, which is often overlooked and can present challenges in terms of dimensionality. Optimization strategies for comprehensive treatment trains have been simultaneously investigated for conventional treatment systems (Grau et al. 2007; Rieger et al. 2010).
The initial hurdle in optimizing wetland design is the lack of high-quality performance data (Guillaume-Ruty et al. 2024), restricting the use of both empirical and mechanistic models. This research tackles this issue by enhancing current empirical models with new data derived from French vertical flow wetlands. Furthermore, unlike earlier studies that usually examine each treatment stage separately, our methodology considers the entire treatment train, including the interactions between the first and second stages. Through the integration of robust regression models into an optimization algorithm, we present an automated strategy to identify the minimum reactor volume necessary to meet the desired effluent quality standards.
This study has two main objectives:
1. To develop an optimization framework that refines the sizing of French vertical flow wetlands using regression-based models.
2. To assess the impact of uncertainty on design decisions, improving the reliability of optimized solutions.
This study thereby enhances the design practices for wetlands as well as the general domain of wastewater treatment optimization, offering potential applications that extend beyond French vertical flow wetlands.
MATERIALS AND METHODS
Regression model
The proposed design support model utilizes regression models implemented using the non-linear least squares (nls) function in R (R Core Team 2024) for each treatment stage. Each pollutant's removal was modeled independently for each stage, with the complexity of the models varying according to the pollutants of interest, namely total suspended solids (TSS), biochemical oxygen demand (BOD5), total chemical oxygen demand (CODt), and total Kjeldahl nitrogen (TKN).
The regression correlations were established following the works of Molle et al. (2005, 2008), with some adaptations for CODt and TKN.
Total suspended solids


Biochemical oxygen demand


Total Kjeldahl nitrogen
While the removals of CODt, BOD5, and TSS demonstrate a linear relationship with inlet mass loads (Min), this is not the case for TKN whose treatment efficiency diminishes as influent loads increase (Molle et al. 2005, 2008). As a result, the TKN treatment objective has a significant impact on the determination of filter surface area.


These are analogous to the equations presented by Molle et al. (2005, 2008), but offer a superior fit and are less prone to bias in the upper range of TKN loads or when very stringent treatment objectives are required. Please refer to Table S1 in Supplementary Material 1 for a detailed display of the original literature equations, corrected versions, and R2 coefficient of determination. For further insight, an uncertainty analysis to visualize the propagation of regression errors is also available in Supplementary Material 1, Figure S1.
Total chemical oxygen demand
For CODt, the standard design approach assumes that removal performance is stable and independent of filter depth. However, recent research has demonstrated that filter depth can significantly impact dissolved biodegradable COD (CODdb) removal in first stage filters (Millot et al. 2016) and total organic carbon (TOC) removal in sand-based vertical filters (Olsson 2012). The proposed model incorporates the impact of depth on CODdb removal, thereby enabling the calculation of the total volume of the filter. This model integrates findings from Millot et al. (2016) and Olsson (2012) and assumes that CODdb undergoes first-order degradation as a function of depth.
According to Olsson (2012), dissolved TOC values were converted into dissolved COD (CODd). Several studies have previously established relationships between TOC and CODt concentrations (Meijer 2004; Henze et al. 2008; Dubber & Gray 2010). The equation developed by Meijer (2004) was selected for its simplicity and its alignment with the upper limit of the range of the CODt/TOC ratio as reported by Henze et al. (2008) ([CODt]/[TOC] ranges from 2 to 2.5 gCODt/gTOC). The resulting ratio is expressed as follows: [TOC]/[CODt] = 0.4 gTOC/gCODt.


The equation used to model the removal of the CODp is identical to that used for TSS.



To verify the accuracy of the regression model, we used the R2 statistical metric. Additionally, leave-one-out (LOO) cross-validation was performed to estimate the standard deviations for each stage (for further details, see James et al. 2013).
Summary of the various equations and numerical values used in our optimization methodology (simplified version). The abbreviations used for the parameters correspond to the same elements presented in the equations above. The complete version is available in Supplementary Material 2, Figure S2.
Summary of the various equations and numerical values used in our optimization methodology (simplified version). The abbreviations used for the parameters correspond to the same elements presented in the equations above. The complete version is available in Supplementary Material 2, Figure S2.
Validation
The model was validated using the INRAE database (Morvannou et al. 2015) for each stage individually. The database contains data from approximately 700 treatment wetlands with various configurations. Of these, 67% are two-stage French vertical flow wetlands arranged in series, while 6% consist of only one stage of French vertical flow wetlands. A sample of 415 French vertical flow wetlands plants was selected to assess treatment performance, using 24-h flow-proportional composite samples collected from both the inlet and outlet. These samples included measurements of CODt, BOD5, TSS, and TKN, conducted according to APHA (2012) standards. The data from the 24-h composite samples were verified against standard literature ranges and typical ratios utilizing the Chauvenet criterion (Guillaume-Ruty et al. 2024). Samples lacking flow rate measurements at the inlet or outlet were excluded.
Constraints and bounds
To propose a coherent and relevant optimization problem, it is necessary to first establish sizing rules expressed as constraints or bounds. There are three distinct categories: daily surface load constraints, outlet concentration constraints, and technical bounds.
The distinction between a constraint and a bound depends entirely on the optimization algorithm. Bounds prevent the optimization algorithm from generating candidate solutions with values outside of the specified limits. Constraints are conditions that impose a severe penalty on the quality of the candidate solution, but in principle solutions, violating constraints can be produced during the optimization process, even if the correct solution should ideally abide by all given constraints.
Daily surface load constraints
Dotro et al. (2021) provided recommendations on the maximum pollutant load allowable in French vertical flow wetlands. These recommendations are listed in Table 1.
Daily surface loads for classical French vertical flow wetlands design under dry weather conditions
Treatment stage . | TSS (gTSS/m2/d) . | BOD5 (gBOD5/m2/d) . | TKN (gTKN/m2/d) . | CODt (gCODt/m2/d) . |
---|---|---|---|---|
First stage | 150 | 150 | 50 | 350 |
Second stage | 30 | 20 | 25 | 70 |
Treatment stage . | TSS (gTSS/m2/d) . | BOD5 (gBOD5/m2/d) . | TKN (gTKN/m2/d) . | CODt (gCODt/m2/d) . |
---|---|---|---|---|
First stage | 150 | 150 | 50 | 350 |
Second stage | 30 | 20 | 25 | 70 |
Values given are per square meter of bed in operation.
Note that the values for TKN differ slightly from those recommended by Dotro et al. (2021), who suggest 30 gTKN/m2/d for the first stage and 15 gTKN/m2/d for the second stage. The TKN load constraints indicated here are less stringent than those indicated by Dotro et al. (2021). The previous approach was to ensure the reliability of the system by setting very stringent values. However, as we are optimizing, we can set less stringent values for cases where low nitrification is required. For the remaining parameters (TSS, BOD5, and CODt), we align with established literature values to mitigate the risk of long-term clogging.
Outlet concentration constraint





Technical bounds
The state-of-the-art recommends keeping the hydraulic loading rate (HLR) between 0.25 m/d (to prevent plant water stress) and 0.7 m/d (to avoid aeration issues). The HLR can be further reduced to 0.1 m/d if a recirculation system is installed (Prost-Boucle & Molle 2012).
Regarding filter layer depths, the minimum 0.3 m, for the filtration layer, and depths exceeding 0.6 m are considered irrelevant (Morvannou et al. 2015; Paing et al. 2015). These values are consistent for both the first and second stages and are detailed in Table S2 in Supplementary Material 3.
All constraints and bounds are summarized in Figure 1 (simplified version; the complete version is available in Supplementary Material 2, Figure S2).
Optimization
Inputs and outputs
After defining the model equations, the objective was to optimize the surface area and depth of each stage while meeting the treatment objectives. The resulting system's inputs and outputs are presented in Table 2a and 2b.
Inputs (a) and outputs (b) of the methodology
a . | Type . | Unit . | b . | Type . | Unit . | ||
---|---|---|---|---|---|---|---|
Input | Inlet concentrations | [TSS]in | mgTSS/L | Output | Outlet concentration | [TSS]out | mgTSS/L |
[BOD5]in | mgBOD5/L | [BOD5]out | mgBOD5/L | ||||
[TKN]in | mgTKN/L | [TKN]out | mgTKN/L | ||||
[CODt]in | mgCODt/L | [CODt]out | mgCODt/L | ||||
Hydraulic flow rate | L/d | Optimized first stage surface area (S1) | m2 | ||||
Objective concentrations | [TSS]obj | mgTSS/L | Optimized second stage surface area (S2) | m2 | |||
[BOD5]obj | mgBOD5/L | Optimized first stage depth (Z1) | m | ||||
[TKN]obj | mgTKN/L | Optimized second stage depth (Z2) | m | ||||
[CODt]obj | mgCODt/L |
a . | Type . | Unit . | b . | Type . | Unit . | ||
---|---|---|---|---|---|---|---|
Input | Inlet concentrations | [TSS]in | mgTSS/L | Output | Outlet concentration | [TSS]out | mgTSS/L |
[BOD5]in | mgBOD5/L | [BOD5]out | mgBOD5/L | ||||
[TKN]in | mgTKN/L | [TKN]out | mgTKN/L | ||||
[CODt]in | mgCODt/L | [CODt]out | mgCODt/L | ||||
Hydraulic flow rate | L/d | Optimized first stage surface area (S1) | m2 | ||||
Objective concentrations | [TSS]obj | mgTSS/L | Optimized second stage surface area (S2) | m2 | |||
[BOD5]obj | mgBOD5/L | Optimized first stage depth (Z1) | m | ||||
[TKN]obj | mgTKN/L | Optimized second stage depth (Z2) | m | ||||
[CODt]obj | mgCODt/L |
System equations
The objective function is the core equation for optimizing the system. This is the equation for which a minimum must be found, taking into account the constraints and bounds. In optimization problems, the objective function is typically defined so that its minimum corresponds to the optimal solution. This equation was developed to meet two objectives: First, to propose an economically viable solution by minimizing the material volume – and, consequently, the cost of the treatment wetland – while accounting for the cost differences between construction materials (gravel for the first stage and sand for the second stage). Second, to ensure technical viability by maintaining the proposed sizing as close as possible to the optimum CODt loads.









This equation consists of several components: the outputs to be optimized and the model parameters also referred to as model hyperparameters.
As shown in Table 2b, the outputs represent the values to be minimized while adhering to the system constraints. These values include the surface area (S1) and depth (Z1) of the first stage, as well as the surface area (S2) and depth (Z2) of the second stage.
The value LCODt 1 represents the CODt load entering the first filter stage. It is also an output that will be optimized, as it includes the surface area of the first stage. It is essential to maintain this load as close as possible to the optimal load (Lopti CODt 1) to ensure proper system operation. It is set to the maximum CODt load (350 gCODt/m2/d, see Table 1) until an output of 12 mgTKN/L is achieved (threshold value, set by expert knowledge). Once the threshold is reached, the optimal load will decrease linearly until it reaches half the optimal load for an output of 6 mgNTK/L (i.e. 175 gCODt/m2/d). This approach reduces oxygen demand in the first stage when nitrification objectives are particularly stringent.
The parameters rgravel and rsand represent the economic cost ratio between gravel and sand. Since sand is generally more expensive, setting a higher rsand coefficient than rgravel will favor a second stage with a smaller volume compared with the first stage.
is a model hyperparameter that does not have physical reality. α ensures a fair balance between the value of the CODt load and the material cost. The addition of this model hyperparameter enables the control of the weight of the term associated with the CODt load. Its proper calibration guides the design toward standard values. The procedure for optimizing its value is described below.
The objective function clearly highlights the four parameters to be optimized: the surface areas and depths of each stage. Subsequently, an optimization algorithm, the covariance matrix adaptation evolution strategy (CMA-ES) (Hansen & Ostermeier 2001), was employed and is presented in the following section. Its goal is to find the optimal values for these four parameters in order to minimize the objective function.
Covariance matrix adaptation evolution strategy
When addressing a non-convex objective function, it can be challenging to find the best overall solution. Multiple local minima may exist that closely resemble the desired global minimum (Horst et al. 2000). Some optimization algorithms are limited to finding only local minima, while others make assumptions that may not apply to all situations, such as the search space being convex (Nocedal & Wright 2006). To solve our optimization problem, which has no guarantee of being convex, the evolutionary algorithm CMA-ES, first introduced by Hansen & Ostermeier (2001), was employed. CMA-ES is a stochastic optimization algorithm designed for the robust optimization of non-linear and non-convex problems. A key benefit of CMA-ES is its ability to accumulate information across iterations and adjust its parameters, particularly the covariance matrix of the candidate solutions. This adaptability enhances resilience in the presence of noise (Hansen et al. 2010; Hansen 2023). The Pycma library was used to implement CMA-ES (Hansen et al. 2019).
Model hyperparametrer
Before implementing the optimization, it was essential to determine the optimal value for the α parameter within the objective function. To determine this value, a simultaneous optimization using CMA-ES was performed to obtain a parameter value that satisfies the two sizing rules outlined below. These rules were established using expert knowledge:
For outlet target concentrations equal to or exceeding 12 mgTKN/L, the sizing should be as close as possible to 1.2 m2/people equivalent (PE) for the first stage.
For an outlet target concentration of 6 mgTKN/L, the sizing should approach 1.5 m2/PE for the first stage.
These sizing rules apply to an average influent, with its characteristics detailed in Table 3. Through the strategic setting of inlet and outlet concentrations, we have optimized the coefficient to achieve the desired sizing, aligning with these specific guidelines.
Characteristics of the two tested influents
. | [TSS]in . | [BOD5]in . | [TKN]in . | [CODt]in . | Flow (Q) . |
---|---|---|---|---|---|
Unit | mgTSS/L | mgBOD5/L | mgTKN/L | mgCODt/L | m3/d |
Average influent | 288 | 265 | 67 | 646 | 226 |
95% centile influent | 696 | 570 | 123 | 1,341 | 105 |
. | [TSS]in . | [BOD5]in . | [TKN]in . | [CODt]in . | Flow (Q) . |
---|---|---|---|---|---|
Unit | mgTSS/L | mgBOD5/L | mgTKN/L | mgCODt/L | m3/d |
Average influent | 288 | 265 | 67 | 646 | 226 |
95% centile influent | 696 | 570 | 123 | 1,341 | 105 |
Examples for benchmarking
It is essential to tailor the various model parameters to align with the specific local context is essential. In this project, we used parameters corresponding to a temperate climate and a mainland France context. Typical values for rgravel and rsand were selected based on local economic conditions. Materials costs (gravel and sand) were sourced from various suppliers. The cost of gravel ranged from €60–€85/m3, while sand costs ranged from €80–€120/m3. The reference values used in this study were €75/m3 for gravel and €100/m3 for sand, representing the corresponding mean values. For the number of filters in parallel, conventional sizing values for temperate climates were used: three cells for the first stage and two cells for the second (Molle et al. 2005). Please refer to Table S3 in Supplementary Material 3 for a comprehensive summary of the values utilized.
Two synthetic datasets were used to benchmark the model: one representing an average influent concentration (Case 1) and the other a heavily concentrated influent (Case 2). Influent concentrations were estimated using a dataset collected by Mercoiret (2009), focusing on agglomerations with fewer than 2,000 PE in mainland France (see Table 3). For Case 1, average concentrations were used, while the concentrations corresponding to the 95% centile influent concentration were used for Case 2.
The characteristics of these influents are summarized in Table 3.
To better understand how the model responds to desired values, the model outputs were plotted across a range of objective output concentrations (see Table 4). These ranges remain consistent across the two tested cases. As the sizing is based solely on TKN and CODt, only these parameters are presented. Objective values are fixed for TSS and BOD5.
Test intervals and values for objectives concentrations
. | [TSS]obj . | [BOD5]obj . | [TKN]obj . | [CODt]obj . |
---|---|---|---|---|
Unit | mgTSS/L | mgBOD5/L | mgTKN/L | mgCODt/L |
Value/Interval | 20 | 20 | [5; 20] | [20; 80] |
. | [TSS]obj . | [BOD5]obj . | [TKN]obj . | [CODt]obj . |
---|---|---|---|---|
Unit | mgTSS/L | mgBOD5/L | mgTKN/L | mgCODt/L |
Value/Interval | 20 | 20 | [5; 20] | [20; 80] |
Uncertainty analysis
A Monte Carlo analysis was performed to evaluate the reliability of the optimizations (Metropolis & Ulam 1949; Metropolis et al. 1953). The analysis examined how uncertainties in the empirical data used for model calibration affect the results.
The method is outlined as follows: (i) Randomly vary experimental data values using a Normal distribution to simulate measurement uncertainties; (ii) Perform linear regressions on the new experimental values to obtain updated model equations; and (iii) Integrate the updated model into the optimization system and carry out a new optimization. By repeating this method a thousand times using Monte Carlo simulation, it is possible to obtain a range of uncertainties related to sizing and the various quantiles. This study focuses exclusively on surface area sizing with TKN.
RESULTS AND DISCUSSION
Model validation
This section provides a comparison of the model results with those obtained from the INRAE database.
Total Kjeldahl nitrogen
(a and b) Treated load as a function of applied load in gTKN/m2/d for the first stage (a) and the second stage (b) of a French vertical flow wetland. Blue dots: experimental values from the INRAE database (Morvannou et al. 2015); blue curve: obtained by power regression with the values from the database; orange curves: corrected Molle et al. (2008) model for the first stage and corrected Molle et al. (2005) for the second stage.
(a and b) Treated load as a function of applied load in gTKN/m2/d for the first stage (a) and the second stage (b) of a French vertical flow wetland. Blue dots: experimental values from the INRAE database (Morvannou et al. 2015); blue curve: obtained by power regression with the values from the database; orange curves: corrected Molle et al. (2008) model for the first stage and corrected Molle et al. (2005) for the second stage.
Figure 3(a) and 3(b) compares the results of fitting the power model using the data from Molle et al. (2005, 2008) and the INRAE database. A discrepancy exists between the two fitted models, with the difference being more pronounced for the second stage. The INRAE dataset represents a larger pool of treatment wetlands, operating conditions, and design features and, therefore exhibiting a greater variability. The model partially fails to capture this variability (R2 of the corrected Molle et al. (2005, 2008): 0.4880 (first stage), 0.4970 (second stage); R2 of the power regression with the values from the database: 0.5245 (first stage), 0.7058 (second stage)), as it does not account for all the contributing factors (e.g., the frequency of batches, number of days in operation, temperature, operation quality, and so on). Consequently, we consider our model relevant, especially since the treatment plants where the 24-h composite samplings were taken by Molle et al. (2005, 2008) have a known, reliable design, and the measurements were also conducted under known and reliable conditions. Nevertheless, the discrepancies between the model and the experimental data in our database remain substantial, highlighting the need for a Monte Carlo study to assess the impact of uncertainties.
Total chemical oxygen demand



These equations and their use are summarized in Figure 1.
The R2 obtained for the regression models are 0.823 (Equation (10a)) and 0.990 (Equation (10b)). The results of the LOO statistic test give us the following frames for the parameters obtained: 3.136 within [2.498; 3.703] (Equation (10a)) and 7.008 within [6.320; 7.413] (Equation (10b)). This suggests a significant impact from certain data points and underscores the need for further experiments to improve the model's predictive efficacy.
As with TKN, we compared our model with 24-h composite samplings from the INRAE database. The data, clustered by depth, are presented in Supplementary Material 5. However, the INRAE database lacks sufficient variation in depth, as it mainly contains filter layers of 0.6 m for the first stage and 0.3 m for the second stage. Thus, it was not possible to statistically validate the results using the database.
Benchmark results
The results of the hyperparameter fitting process resulted in a value of α = 200.
(a–d) Optimized total treatment wetland surface areas and volume of each of the two stages as a function of the TKN treatment target. (a) and (b) show the surface area optimizations (yellow for the first stage and green for the second stage), for the average influent (a) and the 95% centile influent (b). Dashed lines stand for the sizing recommendation currently in use in mainland France: 0.4 m2/PE per filter in operation, so 1.2 m2/PE for the first stage and 0.8 m2/PE for the second stage (in yellow and green, respectively). (c) and (d) are showing the total volume optimizations, for the average influent (c) and the 95% centile influent (d).
(a–d) Optimized total treatment wetland surface areas and volume of each of the two stages as a function of the TKN treatment target. (a) and (b) show the surface area optimizations (yellow for the first stage and green for the second stage), for the average influent (a) and the 95% centile influent (b). Dashed lines stand for the sizing recommendation currently in use in mainland France: 0.4 m2/PE per filter in operation, so 1.2 m2/PE for the first stage and 0.8 m2/PE for the second stage (in yellow and green, respectively). (c) and (d) are showing the total volume optimizations, for the average influent (c) and the 95% centile influent (d).
(a–d) Optimized total treatment wetland depths and volume of each of the two stages as a function of the CODt treatment target. (a) and (b) are showing the depth optimizations (yellow for the first stage and green for the second stage), for the average influent (a) and the 95% centile influent (b). Dashed lines stand for the sizing recommendation currently in use in mainland France: 0.3 m per filter in operation, both for the first and the second stage (in pink). (c) and (d) show the total volume optimizations, for the average influent (c) and the 95% centile influent (d).
(a–d) Optimized total treatment wetland depths and volume of each of the two stages as a function of the CODt treatment target. (a) and (b) are showing the depth optimizations (yellow for the first stage and green for the second stage), for the average influent (a) and the 95% centile influent (b). Dashed lines stand for the sizing recommendation currently in use in mainland France: 0.3 m per filter in operation, both for the first and the second stage (in pink). (c) and (d) show the total volume optimizations, for the average influent (c) and the 95% centile influent (d).
The initial observations on these results are as follows: (i) The higher the TKN treatment objective, the greater the surface areas and volumes required to achieve the desired output. (ii) The total volume is higher for heavily loaded effluents compared with the average one. (iii) For average influents, increasing the surface area of the first stage (in yellow) ensures the desired output at very low concentrations (below 8 mgTKN/L). This is because increasing the surface area of the first stage represents a lower cost in the objective function. For the 95% centile influent, the proposed solution for a very high TKN treatment level involves a compromise between increasing the surface areas of the first (yellow) and second (green) stages to optimize the objective function.
Current sizing in France is based on surface load limits (Table 1) and theoretical pollutant production per PE. This typically results in a design with 0.4 m2/PE per filter in operation (represented by dashed lines in Figure 4(a) and 4(b)). An identical surface area is automatically allocated to the filters of the second stage. For typical treatment objectives (from 20 mgTKN/L to 9 mgNTK/L) and considering three filters in parallel for the first stage, the model proposes a total surface area of 1.3 m2/PE. This exceeds the usual sizing of 1.2 m2/PE for the first stage, which can be explained by the TSS load limit, reached as early as 0.43 m2/PE. The filter in operation cannot be sized below this boundary. However, this is offset by a second stage with a smaller surface area than typically required. Overall, for the average influent, the model proposes a sizing that is not significantly lower than 2 m2/PE (approximately 1.3 m2/PE for the first stage and around 0.65 m2/PE for the second stage). Conversely, from an economic perspective, the proposed sizing is more compelling in scenarios where there is a substantial discrepancy in the volumetric cost of sand and gravel.
As the outlet target concentration of TKN drops below 8.2 mgTKN/L, the surface area required per PE to achieve the desired treatment level increases. At approximately 7 mgTKN/L, the surface area exceeds 0.5 m2/PE per filter in operation. These results demonstrate that current guidelines allow for a high level of nitrification, but would need to be adapted if outlet TKN requirements were below 8 mg/L. Furthermore, these results illustrate that sizing of the second stage could be optimized to reduce construction costs and the system's footprint.
In light of the 95% centile influent, it is clear that an increase in surface area is necessary to accommodate higher loads. The first stage of the process remains constrained by the TSS load until the outlet TKN target is below 8 mg/L. The oversizing of the first stage surface area results in a less TKN-concentrated effluent at the outlet, allowing the second stage to have a slightly lower sizing than the second stage of the average influent. However, for more stringent outlet requirements for TKN, the surface of the second stage must also be increased. The model allows for the quantification of the point at which the system approaches its limits and requires adaptation.
In addition to the TKN analysis, a similar investigation into CODt is presented in Figure 5, illustrating the evolution of the sizing parameter related to CODt treatment, focusing specifically on the depths of each of the two stages for the two tested influents and the corresponding material volumes. The surface areas of each stage are not shown as they remain constant regardless of the CODt treatment target.
The initial observations on these results are as follows: (i) The more restrictive the CODt treatment objective, the greater the depths and volumes required to guarantee the desired output. (ii) The total volume is greater by about 5% for treating the more heavily loaded effluent than the average one. (iii) To reach high levels of treatment, after a phase of increasing depth, the two stages reach a plateau. This is linked to the bounds, which limit the depth of each stage. Consequently, below 35 mgCODt/L for the average influent and below 52 mgCODt/L for the 95% influent percentile, this two-stage treatment train is unable to achieve the desired CODt treatment objective. It will therefore be necessary to select an alternative treatment solution.
With regard to depth evolution, the main observation is that oversizing the filtering layer of materials above 30 cm for each of the filters in parallel (see dashed lines in Figure 5(a) and 5(b)) is unnecessary, except in cases where demanding treatment levels are sought (below 50 mgCODt/L in the case of conventional influent) or in the presence of a highly loaded influent, like the 95% centile influent.
The approach enhances cost-effective sizing within a standard output scope by ensuring the best distribution of volume at each stage, taking into account material costs. For more stringent output targets, it adjusts as needed, possibly allowing for more intensive solutions.
Uncertainty analysis results
(a and b) Optimized total volume sizing as a function of the desired outlet TKN concentration. The reference, in hashed lines, corresponds to the sizing with model data, while the various quantiles show the dispersion of the results according to a Monte Carlo analysis. For both average influent (a) and 95% centile influent (b).
(a and b) Optimized total volume sizing as a function of the desired outlet TKN concentration. The reference, in hashed lines, corresponds to the sizing with model data, while the various quantiles show the dispersion of the results according to a Monte Carlo analysis. For both average influent (a) and 95% centile influent (b).
The results demonstrate that uncertainty on model fitting does not significantly affect the optimized sizing value for discharge levels up to 10 mgTKN/L, for both average and 95% centile influents. As illustrated in the figures, the curves corresponding to four quantiles of all the results of the uncertainty analysis are perfectly superimposed, indicating that there is no variation in the optimization results. For outlet required levels below 10 mgTKN/L, the optimization remains unaffected for the average influent (with only slight variation in the 95th quantile when the treatment target is at its lowest, below 6 mgTKN/L). In contrast, discrepancies between the quantile curves of the uncertainty analysis results are observed for the 95% centile influent. As the level of treatment becomes more stringent, optimization results become more sensitive to model-fitting uncertainties. It can therefore be concluded that the optimization process is not significantly affected by model-fitting uncertainty for normal treatment levels, but less so when aiming for very high treatment levels. This is likely because the models fit particularly well the data for a specific range of input concentrations where most of the observation exists. The uncertainty analysis exhibits that using the model outside the range for which it has been specifically trained may lead to erroneous sizing values. Therefore, using the model for average influents is a more reliable approach. In the future, a more precise model for handling heavy loads will be necessary. Nevertheless, accounting for uncertainties allows the designer to determine when it is preferable to oversize the treatment train design (e.g., by following 75th or 95th quantiles) to ensure specific outlet requirements.
An additional Monte Carlo study is provided in Supplementary Material 6, demonstrating that the dispersion of results also increases as inlet concentrations vary. If inlet concentrations are very high, uncertainties in volume sizing also increase, particularly when aiming for very stringent discharge levels.
CONCLUSION
This work introduces a new approach to sizing French vertical flow wetlands. By defining simple regression models that link TKN to surface sizing and COD to depth sizing, the model employs robust equations that facilitate the independent sizing of each stage of the treatment train. This work introduces a new approach to sizing French vertical flow wetlands. Moreover, by integrating constraint equations and establishing a precise objective function, the entire treatment train can be optimized. This approach allows for meeting discharge criteria while reducing the volume of materials used. It enables more precise design, and, in some cases, reduces construction costs. The optimization process has yielded a design that is tailored to both the characteristics of the influent and current regulatory requirements. This approach differs from traditional methods which rely solely on the treatment plant capacity (population equivalents), as it allows for further refinement by optimizing for specific performance targets. Extensive validation work was carried out on both the models and the optimization, particularly in relation to surface area sizing. These validations confirm the effectiveness and reliability of our approach for systems with influents similar to those in our database. They also provide designers with valuable insight into the optimal design choices. These are connected to constraints on output concentration or, if these are not limiting, to the economic factor.
This method also paves the way for future optimizations and broader applications in wastewater treatment. The framework is versatile and, given the availability of data, additional removal equations can be incorporated to address varying climates and local conditions. The proposed approach can be enhanced by adding other treatment wetland processes, to widen the possibilities in terms of treatment systems and combinations (e.g. recirculation, three-stage or parallel processes, etc.). In addition, this approach can be integrated into a broader decision-support process that incorporates new parameters such as social and environmental concerns.
ACKNOWLEDGEMENTS
The CARIBSAN project is co-financed by the INTERREG Caribbean program under the European Regional Development Fund, the French Development Agency, and the Martinique and Guadeloupe Water Offices.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository at https://forgemia.inra.fr/reversaal/nature-based-solutions/caribsan/wetlandoptimizer.
CONFLICT OF INTEREST
The authors declare there is no conflict.