## Abstract

River water quality degradation is a risk to human health. Hence, many water quality models have been developed to predict the future states of water bodies and understand how the current water treatment systems will respond to future pollution loads and climatic drivers. A Japanese river was evaluated with the River Water Quality Model No.1 (RWQM1), and parameter sensitivity and identifiability analyses were executed on the model output using parameter sensitivity ranking, collinearity index, and Fisher Information Matrix-derived criterion. Among RWQM1 kinetic parameters, those related to hydrolysis, growth of aerobicheterotrophs, and first-stage nitrifiers were the most influential. Reactive soluble organic substances included in untreated gray waters, in addition to a prevalence ratio of the most advanced on-site treatment facility, strongly contributed to the model output variability. A remediation analysis revealed that a renewal to the most advanced on-site treatment facility by 20% increment was almost equivalent to the 70% decrease in the effluent concentration from an on-site treatment facility producing the highest pollutant load in terms of a BOD concentration decrease in the stream. This study provided baseline data assisting in policy implementation regarding the management of effluents from on-site treatment facilities.

## HIGHLIGHTS

The impact of wastewater discharge from on-site treatment facilities was studied.

Sensitivity-identifiability analyses provided effluent management perspectives.

A prevalence ratio of the most advanced on-site treatment facility was the most influential parameter.

Easily soluble organic matter in untreated gray water should be prioritized as a wastewater constituent to be treated in the studied area.

## INTRODUCTION

Today, the world's most crucial river water quality problem is posed by areas not connected to centralized sewer systems (Yates *et al.* 2019). In many countries, most houses in rural or isolated areas are not connected to centralized sewer systems. These houses rely on on-site domestic wastewater treatment facilities such as septic tank systems to treat household sewage (Withers *et al.* 2011). One of the challenges of on-site domestic wastewater treatment facilities is that it is not inspected as routinely as centralized wastewater treatment plants. This suggests that underperforming systems, or ones that exceed a functional lifespan, could be sources of pollutant loading. Withers *et al.* (2011) pointed out that the semi-continuous release of nutrients can considerably impact local stream ecosystem health, particularly rivers with low dilution effects. Jarvie *et al.* (2006) showed that septic tank effluents posed more significant risks to river eutrophication than diffuse agricultural pollution. Gill & Mockler (2016) demonstrated that annual nutrient emissions from septic tank systems could reach 22% for P and 13% for N in small catchments, indicating that they could be significant sources of nutrient loading. Based on high temporal resolution monitoring data, it is apparent that even small clusters of poorly performing septic tank systems could become a significant source of nutrients in the spring and summer seasons when rural watersheds are ecologically active (Macintosh *et al.* 2011). Sato *et al.* (2013) have demonstrated that approximately 90% of the untreated wastewater in low-income countries is discharged into the surrounding water bodies, such as rivers, wetlands, lakes, and streams. The findings agree with World Health Organization (WHO) Fact-sheet, which indicates that at least two billion people worldwide drink from fecal-contaminated water sources. The WHO fact-sheet further shows that microbiologically contaminated drinking water causes 485,000 deaths each year, which are linked to transmitting diseases such as diarrhea (WHO 2022).

United Nations Sustainable Development Goal 6 addresses the global water quality and sanitation challenges. It emphasizes eliminating dumping and minimizing the release of hazardous chemicals and materials, reducing pollution, halving the proportion of untreated wastewater, and substantially increasing recycling and safe reuse. Several integrated water quality monitoring programs and approaches have been developed over the past years to achieve this goal. The water quality modeling has been proven effective in predicting the future states of water bodies and understanding how the current water treatment systems will respond to future pollution loads and climatic drivers. They have been used for risk assessment, identifying and quantifying the sources of water quality constituents, and exploring potential outcomes of climate, hydrology, and management scenarios.

However, complex nonlinear environmental simulation models have many parameters. It is often not clear which parameter subset should be estimated from observed data. Hence, an identifiability analysis is an effective tool to obtain insights into reasonable parameter subsets that describe observed data adequately (Soares *et al.* 2020). A river often receives wastewater from treatment facilities at different treatment levels. The sensitivity and identifiability analyses can reveal which parameters or variables are most influential on the river water quality. This could provide practical guides to determine the priority of managing effluents from treatment facilities and characterize biological kinetic parameters that dominate the conversion processes of pollutants in a river. Hence, the main objective of this study was to investigate the most influential water quality variables in wastewater discharged from on-site treatment facilities and kinetic parameters included in a river water quality model. Our main goal was to generate baseline data that may provide a clue in identifying critical elements in the water quality models for improving river basin management.

## METHODS

### Test case

^{3}s

^{−1}on average), mainly relying on discharged wastewater for maintaining water volume, as recognized by the local municipality. This study considered three types of on-site wastewater treatment facilities used in households of the study area, namely Gappei-Jyokasou, Tandoku-Jyokasou, and Kumitori (hereafter abbreviated as GJ, TJ, and KU, respectively). The first two are equivalent to septic tank systems: the former treats both gray and black water, while the latter does only black water, and gray water is discharged without treatment. The third one is the system in which black water stored in the underground is vacuumed; thus, only gray water is discharged without treatment. According to the statistics published in 2021 (Ministry of Forestry and Fisheries, the Japanese Government), Jyokaso serves the estimated 11.75 million people in Japan, which is equivalent to ca. 10% of the total population. Since GJ is the most advanced treatment facility among the three, it is recommended to replace TJ and KU with GJ. The prevalence of GJ was represented by the ratios of the usage of GJ to that of the three on-site treatment systems (TJ and KU in addition to GJ) in areas of the CSN connecting to Chubu and of the direct discharge (denoted as and , respectively); and were 0.21 and 0.43, respectively as of 2019 (the detailed calculation procedure is presented in the Supplementary Material). Those values were assumed to be constant during the whole period. The prevalence ratio of GJ is incremented by ca. 0.7% annually in Honjo City (Saitama Prefecture 2023). The prevalence ratio of the whole area of Honjo City in 2005 was estimated at 46.9%, while that in 2019 was reported at 56.1%. The study area was clearly behind in the prevalence of GJ, and the increment rates could be smaller for the study area. If the increment rate was assumed to be 0.3%, the total change in 14 years would be around 4%. Hence, the assumption that the prevalence ratios of GJ were constant could be reasonable. Three types of on-site treatment facilities are summarized in Table S1 in the Supplementary Material. The area covered by the simulation was limited to the segment between the two confluence points of the river and CSNs, which are called Chubu and Kujyo (1,490 m long), as can be seen in Figure 1. Although the study area was somewhat limited, this setting was necessary to investigate the detailed effect of on-site treatment facilities on the river water quality with comprehensive data regarding domestic wastewater discharge and hydraulic conditions in this area. The river receives partially treated wastewater from neighboring houses as well as rainwater at Chubu and Kujyo, where CSN outlets exist. In between Chubu and Kujyo, partially treated wastewater is directly discharged from inhabitants into the river without a pass via a CSN. For those pollution sources, constant concentrations were imposed throughout the year. Water flow from CSNs as well as lateral inflow of wastewater in the segment between Chubu and Kujyo were estimated based on the data kindly provided by the Honjo City Office. Monthly samplings have been conducted slightly upstream of the Chubu and Kujyo confluence points. Water quality monitoring activities including data handling done by the local municipality have been governed by the ministerial notification on water quality surveys issued on 30 September 1971; hence, the way of sampling, laboratory analyses, and data quality control is assumed to be credible. Water quality data monitored at upstream of Chubu were treated as a boundary condition; thus, evaluation of the modeling was done for monitored data in Kujyo only, which is considered reasonable, because Kujyo is the outlet location for the region covered in this study. Monitoring data included hydrological (flow velocity and rate, and water temperature) and physico-chemical data [dissolved oxygen (DO), ammonium ions (), nitrate and nitrite (), suspended solids (SS), and biochemical oxygen demand (BOD)]. The currently available dataset (data from April 2005 to March 2019) was split into three: time period 1 from 2005 to 2010; time period 2 from 2011 to 2014; time period 3 from 2015 to 2019. They were considered as either an in-program period or a post-program period, and the post-program period was split into two to make the number of years covered in each period balanced. Differences in mean concentrations among the three periods were statistically evaluated and will be shown in a later section.

### Model description

^{−1}) was calculated by the following equation, which is suitable for Japanese rivers (The Japan Society of Civil Engineers 2004):where

*n*is a Manning roughness coefficient (m

^{−1/3}s),

*u*is a cross-sectional averaged flow velocity (m s

^{−1}), and

*H*is a mean depth (m). We used in-series continuously stirred tank reactors (CSTR) model to simplify the complex system of a real river. It was assumed that a river stream is segmented into conceptual reservoirs, in which complete mixing is achieved. The mass balance equation for a segment

*i*can be described as:where is the volume of the segment

*i*(m

^{3});

*Q*is a flow rate of water entering or leaving the segment

*i*(m

^{−3}day

^{−1}); and are concentrations of the segments

*i*− 1 and

*i*, respectively; is a volumetric reaction rate (day

^{−1}). This simplification is widely used in the field of water and environmental disciplines and facilitates the reduction of computational costs, which is important because this study required numerous times of simulations. Although physical and hydrological conditions influence the fate of biochemical processes, these conditions cannot be identified through the variation of biological parameters. To optimize the present model, a monitored flow rate at Chubu () was used to expect more accurate flow velocity estimations. Flow rates in the reach between Chubu and Kujyo (denoted as ) were estimated using a discharge due to precipitation () and an amount of wastewater discharged from on-site wastewater treatment facilities (). Estimation of varied, depending on the type of on-site wastewater treatment facility, as described in the Supplementary Material.

^{3}day

^{−1}) was assumed to be constant through a month (i.e., a value was changed 12 times for the 1-year simulation), and estimated from the following equation:where is a factor of rainwater discharge into the river (–) that was set at 0.74, which is a typical value for an urban area in Japan (The Japan Society of Civil Engineers 2004);

*A*is a covered land area of the combined sewer system network (m

^{2}); is precipitation data (mm month

^{−1}), measured at a meteorological station of the Japan Meteorological Agency near the study area. Monthly precipitation data, along with those of light intensity (W m

^{−2}), were retrieved from their website.

*u*(m s

^{−1}) was calculated with the empirical expression as shown in the following equation.where

*n*is a Manning roughness coefficient,

*R*is a hydraulic radius (m), and

*I*is a slope (–).

*u*can be also calculated using the following equation:where

*B*is a mean river width (m). Equating the right-hand sides of Equations (5) and (6) allows us to calculate a depth

*H*using an iterative method. Although values for

*B*,

*n*, and

*I*are fixed during simulation,

*R*is calculated by:

*n*, a width of the river

*B*, and a slope

*I*, using the weighted least-square estimation to minimize the error between predicted results and data observed at Kujyo. The fitting results as well as error metrics for the three different time periods are presented in the Supplementary Material. The longitudinal dispersion coefficient () and division number () are estimated by the following equations (The Japan Society of Civil Engineers 1999):where

*g*is a gravitational acceleration (m s

^{−2}); is a Peclet number (–) and calculated by ;

*L*is a length of a river reach between confluence points (m). At Chubu, values of a concentration were updated along with a flow velocity and the number of segments downstream. The constant lateral inflow (denoted as ) between Chubu and Kujyo was introduced as a constant model input for a segment; thus, the mass balance equation for this region was modified from Equation (2) as the following:where is a flow rate of water entering the segment

*i*; and are a flow rate and a concentration of partially treated wastewater discharged from households, respectively; is a factor of attenuation multiplied with the input loading (). was used for the update of concentrations at Chubu, while was for calculation of in Equation (10). Fixed values were used for

*C*

_{dis}, based on data of effluent concentrations for three types of on-site treatment facilities measured in this region, which were published in a technical report (Ministry of the Environment 2018). Based on the assumption of the prevalence of GJ, it was assumed that no considerable change in terms of effluent water quality from on-site treatment facilities happened during the studied time period, since the same on-site treatment facilities tended to be kept in use in this area. The conceptual diagram of the CSTR model is illustrated in Figure 2. It was assumed that four components, namely, , , , and , included in the effluent from on-site treatment facilities, evenly existed, but and were not included. Since the direct discharge entails great uncertainty, the attenuation factor, was first adjusted. The value was varied in the range of 0.01–1, and observed the sum of squares of weighted deviations between measurements and model outputs (denoted as ). A specific value for was then chosen when it was minimized and was used in the subsequent analyses. Regarding the comparability of monitoring data with RWQM1 components, DO, , , and were comparable without any conversions to the RWQM1 components; however, BOD and SS were assumed to equal to the sum of multiple RWQM1 components as follows:where represents dissolved organic substances; represents organic particulates, assumed to be available for bio-degradation after hydrolysis; represents a particulate component. A period of a month was modeled, with varying inputs such as average water temperature, irradiation, and precipitation, which were assumed to be constant during a month. This process was repeated 12 times (i.e., a period of a year). The final values of each month were considered as initial values for the subsequent month, except for the boundary segment. The fourth-order Runge–Kutta method was used to solve the differential equation for the CSTR model presented earlier and modeled a period of 1 year. Model evaluation metrics generated in this study included a correlation coefficient and root mean squared error (RMSE), which is defined as the following:where and are an observed value and simulation result, respectively (monthly data points in a year). The model and associated sensitivity analyses (described in the next section) were implemented using Fortran. The analysis of parameter identifiability was carried out using Numpy in Python on the output produced from the Fortran program. Programs were developed based on the code implemented in AQUASIM (Reichert 1994). All code and datasets accompanying this article are available on GitHub at https://github.com/gakkykun/IWA-RWQM-package.

### Sensitivity measures

*et al.*2001):whereis a derivative matrix evaluated at with a shape of ( is the number of observations;

*m*is the number of parameters), is the absolute sensitivity function, representing an absolute change of variable due to the change of parameter value ( refers to a parameter) at observation

*i*. The choice of a local sensitivity analysis is due to a simple implementation of the numerical finite difference method. In addition, this linear approximation is the common practice in the field of model-based design of experiments (Maheshwari

*et al.*2013). Other approaches such as bootstrap and Monte Carlo simulations can be used to characterize the uncertainty in estimated parameters; however, these approaches require a considerable number of iterations of a model run, resulting in high computational costs. Brun

*et al.*(2001) proposed the sensitivity measure, , which is calculated for every parameter, separately. We used results derived from to determine the sensitivity rankings of the parameters. When is high, the parameter is said to influence the simulation result such that individual parameter importance is able to be assessed. is defined as:where is a non-dimensional local sensitivity of the model result:The dimensional scaling parameters, , represents the uncertainty range of the parameter based on prior knowledge. Brun

*et al.*(2001) proposed three classes of relative uncertainty, which are (i) accurately known parameters (defined as Class 1), (ii) moderately inaccurate known parameters (defined as Class 2), and (iii) very poorly known parameters (defined as Class 3). Relative uncertainty percentages are set at 5, 20, and 50%, respectively. Every examined parameter is assigned to one of the three classes. Model parameters can be typically grouped into four types: physical parameters, stoichiometric parameters, kinetic parameters, and parameters related to boundary conditions (Omlin

*et al.*2001). Among them, biochemical kinetic parameters contained in rate expressions generally have more uncertainty than other parameters (Anh

*et al.*2006). Thus, kinetic parameters including half-saturation concentrations, specific death, and respiration rates were classified into Class 3 with the exception of growth rates, which were classified into Class 2 (Anh

*et al.*2006). and were treated as parameters related to boundary conditions and were classified into Class 2. Initial parameter values were set to the ones proposed by the RWQM1 model developers. A list of parameter subject to the analysis is shown in Table S10 in the Supplementary Material. Note that and are not involved in the transformation process of the model; hence, they were excluded from the sensitivity analysis. Also, uncertainty in stoichiometric coefficients, temperature-dependence coefficients, and mass fractions of C, H, O, N, and P for organic matter were not considered in this study because they are usually transferable from one system to another; thus, fixed values the same as Reichert

*et al.*(2001) suggested were used. The scale factor, , represents the typical magnitude of a measured variable and can be represented by the mean values of overall experimental observations (Brun

*et al.*2001); were mean concentrations of the five water quality variables in the different months as shown in Table S11 in the Supplementary Material. Analyses were separately conducted covering RWQM1 model parameters or newly introduced parameters (i.e., and ) and water quality values of wastewater from on-site treatment facilities (i.e., inputs or boundary conditions of the model). In other words, we conducted the sensitivity and identifiability analyses for the three different time periods twice: the one was for the evaluation of kinetic parameters; the other for water quality variables of effluents from on-site treatment facilities and prevalence ratios of GJ. The model was first run with the initial parameter values and then by successive runs in which every parameter value was increased by 1% of initial values, while other parameters' value was kept at the initial values, thereby quantifying the effect of variation of the parameter on the model output.

### Collinearity index

*et al.*(2001) applied linear regression diagnostic techniques (Belsley 1991) to the problem of parameter subset selection for parameter estimation in large environmental simulation models, including the analysis of parameter inter-dependencies. The following normalized non-dimensional sensitivity function was defined to select identifiable model parameter subsets:

This index represents the degree of approximate linear dependence of the sensitivity functions of the parameters, which corresponds to the columns of the matrix. is equal to unity if the sensitivity functions are orthogonal, while it goes to infinity if these functions are exactly linearly dependent. A high collinearity index indicates that changes in model results induced by small changes in a parameter can be compensated by changes in other parameters in *K*. The value of is preferable in a study where available data are limited and identifiability problems start to arise when is between 10 and 15 (Omlin *et al.* 2001)

### FIM-based criterion

*et al.*(2001), this FIM approach is less affected by a subjective definition of a threshold that is used for determining the maximum size of identifiable parameter subsets. A trajectory sensitivity function defined in Equation (16) is the incremental ratio of the variation of output variable

*i*to that of the perturbed parameter

*j*. Derivatives in this sensitivity function were calculated by finite differences with a perturbation factor of . This function forms the FIM together with a diagonal matrix consisting of weighing factors that are inversely proportional to measurement errors (Freni

*et al.*2011):where, for

*n*output variables and

*m*parameters, is a matrix; is an inverse covariance matrix under the assumption that measurement errors were uncorrelated; is a transposed matrix of ; thus, is an matrix. The FIM inherently contains the importance of each model parameter over outputs and can be used for the selection of identifiable parameter subsets. Criteria based on the determinant and condition number (the ratio of the highest FIM eigenvalue to the lowest one) of the FIM, denoted as

*D*and

*modE*criterion, respectively, have been frequently used for kinetic models in previous studies (Reichert & Vanrolleghem 2001; Freni & Mannina 2012; Machado

*et al.*2014; Li

*et al.*2018). The

*D*and

*modE*criteria are measures of the size and shape of the modeling confidence region in the parameter space (Freni

*et al.*2009) and can be a reasonable indicator measuring the correlation of a set of model parameters (Weijers & Vanrolleghem 1997). A large value of

*D*criterion indicates smaller confidence intervals of the parameters and represents the importance of model parameters, while a

*modE*criterion value close to the unity means that all the involved parameters are equally important, independently affecting the model outputs (Freni

*et al.*2011). The

*D*criterion was first calculated by taking the 2

*m*th root of the determinant of FIM (), where

*m*is the number of parameters in the subset, as a more universal measure suggested by Reichert & Vanrolleghem (2001), and was then normalized (denoted as the

*normD*criterion), introducing the square Euclidean norm of the parameter vector () evaluated at the middle point of the parameter uncertainty range (see Equation (23)), since this criterion is affected by the magnitude of parameters included in a subset (Freni

*et al.*2011).

*normD*and

*modE*criteria, defined as the ratio between the two (denoted as the

*DE*criterion), provides the advantages of both criteria such that evaluation of the aggregated impact of parameters as well as the individual impact of each parameter is possible (Freni

*et al.*2011). The logarithmic form of this combination can be represented as:

### Procedures for the parameter identifiability analysis

### Statistical analysis

Two-way analysis of variance (ANOVA) was used to assess variability in mean concentrations of water quality variables observed at Kujyo. The time period (1–3) and month (January to December) were used as the independent categorical variables. All statistical analyses were performed using Python with the statsmodels library version 0.14.0 (Seabold & Perktold 2010). All tests were considered statistically significant at *p* 0.05.

## RESULTS

### Comparisons of observed and simulation results

*p*0.05) between different time periods, although the results from the two-way ANOVA for SS confirm that there was no statistically significant difference observed between different time periods. This provided a rationale for the split of the whole time period into three. The other main effect of the month was significant (

*p*0.05) for all of the water quality variables. An interactive effect between month and time period was found to be not significant, except for DO (however, the

*p*-value for DO was very close to the threshold (i.e., 0.05). The ANOVA table is presented in the Supplementary Material. Figure S2 in the Supplementary Material shows variations of values in response to that was changed in the range of 0.01–1. When was 0.1, the value was minimized. Subsequent simulations were conducted with a fixed value of at 0.1. Simulation results compared with observed data are presented in Figure 4. The model simulated well trends of BOD and for all three periods, and for time periods 2 and 3 (0.82

*r*0.99, where

*r*is a Pearson correlation coefficient). Moderate correlations were found in SS for time periods 2 and 3 (

*r*= 0.54 and 0.79, respectively), DO for time periods 2 and 3 (

*r*= 0.49 and 0.63, respectively), and for time period 1 (

*r*= 0.68), while SS for time period 1 and DO for time period 2 did not show strong correlations (

*r*= 0.37 and 0.31, respectively). RMSE was used as a model validation metric. This metric was calculated for every combination of time period and water quality variable (in total, 15 cases), as shown in Figure 4. In 11 cases, RMSEs were smaller than 2.0 mg L

^{−1}. Hence, the current simulation provided reasonably accurate results. Modeling of SS exhibited relatively large deviations, since this water quality variable was calculated with the sum of concentrations of five different particulate components. Simulation results for BOD for all three periods were satisfactory in terms of correlation coefficients and RMSE. concentrations remained at high levels that reached close to 10 mg-N L

^{−1}in the winter season (November to January). concentrations were higher for the first half of a year, as can be seen during the period from January to May, which spans winter to spring and early summer. In other months, stable low concentrations of were observed and well predicted by the model; however, concentrations increased in December. There were marked increases in SS concentrations observed from June to September. The discrepancy between the model and observed result reached up to 16 mg L

^{−1}in June of time period 3, while, for time period 1, the model overestimated SS concentration in February and March. The simulation results of DO were maintained within the small range (3–4 mg L

^{−1}), and tended to be not as dynamic as the observed data. Although the model underestimates the DO concentrations during the cold months (i.e., January to March), the model was able to reproduce the DO concentrations well in other months.

### Screening for important parameters

### Identifiability of parameter subsets

*B*) water quality and prevalence ratios, respectively (hereafter termed as ‘Analysis A’ and ‘Analysis B’, respectively). There were huge leaps of in Analysis A (13.394–100.209 in time period 1; 12.4585–95.121 in time period 2; 15.374–99.143 in time period 3) as the set size increased from 5 to 6, exceeding 15.0, which is considered to be the threshold as identifiable parameter subsets (Brun

*et al.*2001). Similar leaps were observed in Analysis B (2.353–592.0 in time period 1; 2.159–587.9 in time period 2; 2.068–220.9 in time period 3) as the set size increased from 3 to 4. Hence, the maximum identifiable subset sizes were determined to 5 and 3 in the current study. Parameters not included in an identifiable subset were considered to be irrelevant to water quality variables, or their interactions in the mathematical model can be compensated by other model parameters according to available measured data (Brun

*et al.*2001). Interestingly, regardless of the starting parameter, others included in the maximum identifiable subsets were common: , , , and for Analysis A; and for Analysis B. is a parameter associated with the degradation of particulate organic compounds as well as generation of soluble ones. and control the growth of algae; and do the growth of heterotrophs. and are easily degradable soluble organic components included in untreated gray water discharged from households with TJ and KU. The chronological change of datasets over the three time periods did not affect the selection of model parameters for an identifiable subset. Based on sensitivity and identifiability analysis results, three important parameters were identified: one was the prevalence ratio of GJ in the direct discharge area; the other two were water quality components associated with organic matter. A remediation analysis was conducted, with the BOD data in April at time period 3, which was the highest concentration observed throughout a year. Values of the prevalence ratio varied to make it 53 and 63% (corresponding to 10 and 20% increments from the default case), while concentrations of the two components were decreased by 10, 40, and 70%, and were increased by 10, 50, and 100%. Those simulation output results are shown in Figure 6.

. | Time period 1 (2005–2010) . | Time period 2 (2011–2014) . | Time period 3 (2015–2019) . | |||
---|---|---|---|---|---|---|

Set size . | A
. | B
. | A
. | B
. | A
. | B
. |

1^{a} | ||||||

2 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

() | () | () | () | () | () | |

3 | 2.919 | 2.353 | 3.093 | 2.159 | 2.127 | 2.068 |

() | () | () | () | () | () | |

4 | 2.924 | 592.0 | 3.093 | 587.0 | 2.137 | 220.9 |

() | () | () | () | () | () | |

5 | 13.394 | – | 12.585 | – | 15.374 | – |

() | () | () | ||||

6 | 100.209 | – | 95.121 | – | 99.143 | – |

() | () | () |

. | Time period 1 (2005–2010) . | Time period 2 (2011–2014) . | Time period 3 (2015–2019) . | |||
---|---|---|---|---|---|---|

Set size . | A
. | B
. | A
. | B
. | A
. | B
. |

1^{a} | ||||||

2 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

() | () | () | () | () | () | |

3 | 2.919 | 2.353 | 3.093 | 2.159 | 2.127 | 2.068 |

() | () | () | () | () | () | |

4 | 2.924 | 592.0 | 3.093 | 587.0 | 2.137 | 220.9 |

() | () | () | () | () | () | |

5 | 13.394 | – | 12.585 | – | 15.374 | – |

() | () | () | ||||

6 | 100.209 | – | 95.121 | – | 99.143 | – |

() | () | () |

^{a}The initial subset consists of the most influential parameter evaluated with . A parameter was added as the set size increases and is presented inside parentheses.

## DISCUSSION

Sensitivity and identifiability analyses in combination with RWQM1 were used as tools to investigate the effect of the discharge of wastewater on the river water quality. RWQM1 was originally developed, partially aiming at evaluating the impact of wastewater treatment plant operation and control (Reichert *et al.* 2001), which means that smooth integration of effluents from wastewater treatment facilities with river water quality modeling is possible in terms of interchangeability of model components. Sensitivity and identifiablity analyses were proposed in the context of river water quality modeling two decades ago (Reichert & Vanrolleghem 2001); however, they are still usable in various fields such as agriculture (Coudron *et al.* 2021) and biochemistry (Gábor *et al.* 2017). The usefulness of these analytical techniques is due to the facilitated understanding of the model's intrinsic behavior, principally evaluated based on the magnitude of output variations in response to parameter changes (Soares *et al.* 2020). In this study, sensitivity measures () capture all of the sensitives as shown in Equation (19), regardless of the sensitivity difference among the target water quality variables (e.g., BOD and DO), thereby being affected by a water quality variable most relevant to each parameter or input. It was possible to identify governing model kinetic parameters, water quality components and newly introduced parameters in the currently applied model. Important insights gained through the present analyses is that the model behavior could be explained largely based on hydrolysis, growth of aerobic heterotrophs, and first-stage nitrifiers, since kinetic parameters relevant to those processes were high-sensitive. Hydrolysis leads to a strong increase in degradable organic matter (DOM), while a higher heterotrophic growth facilitates DOM consumption. Those microbial activities could have affected dynamic variations of BOD. Nitrification processes did not seem to work very much to decrease these concentration levels. The high concentrations of observed in this river might be partly due to the influx of nitrate-contaminated groundwater (Inagaki *et al.* 2020). plays a role in the consumption of and hence influences the decrease in which was observed in a period from May to August, particularly for time period 1. concentrations increased in December and kept high in winter seasons. This could be due to a lower activity of stage 1 nitrifiers as the water temperature decreased. Parameters associated with on-site wastewater treatment facilities affect the accumulation of pollutants in the river. Easily soluble organic substances included in untreated gray water (i.e., and ) were found to be the most important among water quality components. This result was consistent with estimated BOD load quantities from point and nonpoint sources covered in this study. For example, in the direct discharge area, the BOD loads of TJ and KU (2.02 10^{4} and 9.84 10^{3} g day^{−1}, respectively) were one to two orders of magnitude higher than that of GJ (9.52 10^{2} g day^{−1}). Lowering effluent concentrations from those on-site treatment facilities was found to be important from the standpoint of pollution control. However, in fact, TJ and KU do not treat gray water, which is released into the stream without any treatment. Achieving a decrease in effluent concentrations is practically impossible, and a viable option is to increase the prevalence ratio of GJ. As can be seen in Figure 6, the 20% increment of the prevalence ratio is almost equivalent to the 70% decrease in the effluent concentration from TJ. This finding is aligned with a coherent policy encouraging the renewal of TJ or KU to GJ, and showcases the practical implementation for water quality governance and environmental policy, based on the findings obtained via those analyses. It is evident that and strongly contributed to the variability in the model outputs. A variation of sensitivities of model parameters was related to the chronological change of the three datasets; however, the top parameters included in the sensitivity rankings were somewhat similar for each time period. The differences in terms of the model structure were only attributed to changes in model inputs (e.g., inflow at Chubu, meteorological forcing data) that varied depending on time periods. However, actual increases in values were observed, which shows that the river condition had become more sensitive to the discharge, as the water quality had been improved. The statistical analysis also suggested that the mean concentrations of water quality variables (except for SS) differed significantly between the three time periods. The efforts to mitigate pollution loading should be continued by encouraging renewal of TJ or KU to GJ as well as maintaining facilities properly.

A subset of parameters is identifiable if the model output is sensitive enough to small variations of all parameters in the subset. The collinearity index and FIM-derived criterion were used to define an identifiable parameter subset, or the maximum number of subset dimensions. A combination of different indices leads to developing better judgments, instead of relying on a single index (Freni *et al.* 2009). The identifiable subset included parameters that appear in Figure 5 (except for ), which indicates that high sensitivities of these parameters and inputs stemmed from different causes (i.e., different impacts of parameters and inputs on different water quality variables), as they were linearly independent. With regard to water quality components, soluble organic substances included in untreated gray water had a great impact on the simulation. Reduction of untreated gray water would be a priority measure to effectively reduce the impact of discharge from those domestic wastewater facilities on this river. Identifiable parameter subsets did not change over the three time periods. The reason why this occurred is because sampling points and frequencies in addition to monitored variables were not changed (Freni & Mannina 2012).

Although our analyses were tailored to a specific condition of the test case, the procedures taken in this study produced relevant information on a situation where a river receives wastewater from treatment facilities at different treatment levels, and also are likely to be replicated to identify dominant water quality variables in other similar systems: The same technique is applicable to other sites where domestic wastewater is discharged, which could be used to assess the effectiveness of implemented regulatory measures. The expansion of the model should be also considered, covering other important water quality variables such as heavy metals and pathogens. The approach utilized in this study still has some limitations. The best available values were used for water quality of effluents from on-site treatment facilities as well as prevalence ratios of GJ. In reality, they were not necessarily constant. However, the variations of those model input values were considered very small, which backed the assumption that they were constant. Also, the current method for a sensitivity analysis is categorized as a local method, which relies on a linear approximation of the model around the expectation of a parameter value. A global sensitivity analysis method explores a wider input space of parameter values (Saltelli *et al.* 2010) and could be worth implementing to reveal more in-depth interrelationships among parameters. The current study paid attention to newly introduced model parameters. As mentioned earlier, their values were determined based on collected data for this specific region and expected to reflect a real situation, so that the local method was still considered as a good option.

## CONCLUSION

The sensitivity and identifiability analyses can effectively reveal which parameters or variables are most influential on model outputs when a large and complex model is involved. This study covered an urban river, which has been subject to long-term monitoring campaigns since the environmental program was implemented. This river receives partially treated wastewater from on-site wastewater treatment facilities such that it could be a good test case to evaluate the impact of wastewater discharge on river water quality. The three different time periods were considered as either an in-program or a post-program period. The model behavior could be explained largely based on hydrolysis, growth of aerobic heterotrophs, and first-stage nitrifiers, since kinetic parameters relevant to those processes were high-sensitive. Easily soluble organic substances included in untreated gray water were found to be the most important among water quality components. Also, prevalence ratios of the most advanced on-site treatment facility type strongly contributed to the variability in the model outputs. These analytical results were in turn used for a remediation analysis: the 20% increment of the prevalence ratio was almost equivalent in terms of a BOD concentration decrease in the stream to the 70% decrease in the effluent concentration from an on-site treatment facility with the highest pollutant load. This finding indicated that a renewal to the most advanced on-site treatment facility is effective. This study provided quantitative evidence that can assist in the implementation of policy for the management of effluents from on-site treatment facilities.

## ACKNOWLEDGEMENTS

The authors would like to thank students of Dr Sakakibara's research group for their assistance in the early stage of model development, especially Mr Jun Yasuoka, Mr Takahede Hasegawa, and Mr Atsushi Miyata. The observed data in the field were kindly provided by the Honjo City Office. The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the Japanese government and local municipalities. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.