ABSTRACT
This study investigates source identification using different transport models. A data series, operated by the United States Geological Survey and gathered from eight monitoring stations over a reach length of 41.45 km along Antietam Creek, is used. In this research, advection–dispersion equation (ADE), transient storage model (TSM), and fractional advection–dispersion equation (FADE) models were compared to identify the most suitable model for accurate source identification and pollutant transport prediction. The statistical analysis indicates the best performance by TSM since it gives the lowest RMSE of 0.52 ppm and the highest NSE of 0.95, as it allows accounting for temporary retention in the storage zone. With an RMSE of 0.52 ppm and a high NSE of 0.95, the TSM represents the solute transport process more accurately than the other models. In this study, a new relationship about injection distance has been presented using dimensional analysis with R2 = 0.94. It is also found that the dispersion coefficient scales with distance as D ∝ x0.7. Sensitivity analysis indicates that the model is most sensitive to the dispersion–advection ratio, having an elasticity of −0.18. Monte Carlo simulation shows that 95% confidence intervals range from ±3.5% for near-field to ±7.2% for far-field predictions.
HIGHLIGHTS
It was found that the TSM model outperforms ADE and FADA in modeling pollutant transport.
A novel dimensionless relationship for pollutant injection distance is developed, incorporating key parameters of dispersion and storage zone dynamics.
Sensitivity analysis and Monte Carlo simulation quantify model robustness across flow regimes and provide confidence intervals for near-field (±3.5%) and far-field (±7.2%) predictions.
INTRODUCTION
Research into the modeling and identification of pollutant entry points in rivers addresses key challenges that are fundamental to environmental science and water resource management today. The identification of locations of pollution sources with a reasonable degree of accuracy has become essential for strategies concerning mitigation measures and policymaking as anthropogenic activities continue to affect aquatic ecosystems. Indeed, earlier works had laid the foundation for locating sources of river pollution through the use of chemical fingerprinting methods that traced contaminants to their point of origin (Atmadja & Bagtzoglou 2001; Chabokpour 2020, 2024; Saadat et al. 2024).
The identification of sources that contaminate rivers is very important from the point of view of environmental protection and public health. Beginning from methods for tracing and quantifying contaminants, which are based on specific analytical techniques and computational models, the source identification of injected pollutants in rivers is crucial. It is an interdisciplinary area of research that combines environmental science, fluid dynamics, and data analysis to identify and characterize pollutants within different media, be it a water body, soil, or air, introduced by some source or another. Seminal works instilled the foundation of source identification with inverse modeling techniques in this field (Mrozik et al. 2021).
Direct-injection liquid chromatography-tandem mass spectrometry (LC-MS/MS) methods have already been efficient for the quantitative determination of contaminants of emerging concern (CECs) in river water, with high sensitivity and speed of analysis (Albergamo et al. 2018; Köppe et al. 2020; Egli et al. 2021).
High-resolution mass spectrometry coupled with solid-phase extraction is already applied for the detection of a wide range of polar organic pollutants and has been used for tracing known and unknown contaminants along river systems (Ruff et al. 2015; Köppe et al. 2020). Hybrid optimization models, coupling the requirements of both genetic algorithms and simulated annealing for contaminant release history reconstruction, have been developed to identify locations, times, and quantities of injections (Jiang et al. 2021). The backward probability method has been modified for applications on surface water to allow an accurate tracing of the release time and location of a pollutant source, more specifically finite-duration discharges (Khoshgou & Neyshabouri 2021). Models of random forests have been used to interpret changes in concentration levels measured by a sensor network and systematically to locate source contamination in a river system (Lee et al. 2018).
Machine learning-assisted suspect screening combined with passive samplers has been used for rapid, time-integrated monitoring of CECs, enhancing the identification of contaminants in river catchments (Richardson et al. 2021). High-throughput methods have now been applied to differentiate sources of CECs at pre- and post-wastewater treatment plant discharge points with the associated assessment of environmental risk at specific impacted sites (Egli et al. 2021). To date, it has been utilized to detect important organic contaminants and their impacts in source and in tap waters, helping in risk management within the context of effect-directed analysis using the reduced human transcriptome (RHT) recently developed (Guo et al. 2022).
Technological advances in monitoring have dramatically increased our ability to also locate entry points of pollutants into riverine systems. Begum et al. (2023) demonstrated the effectiveness of high-resolution spatial sampling combined with statistical analysis to pinpoint pollution sources along river stretches. Building on this, some studies developed a machine learning approach that integrates multiple data sources, including remote sensing and in-situ measurements, to improve the accuracy of identifying pollution entry locations in complex river networks (Gupta & Gupta 2021). Modeling pollutant transport and fate from related entry points is quite challenging because fluvial systems are very dynamic. In relation to this, Ciotti et al. (2021) set out to develop a hybrid model that links integral hydrodynamic simulation with modules of water quality and predicts pollutant dispersion coming from identified entry points for a variety of flow conditions.
One of the recent approaches was by Jin et al. (2021) using Bayesian inference coupled with genetic algorithms to the estimation of the location and intensity of multiple pollution sources along a river system. This kind of approach can be effective in resolving a particular problem: finding the complicated industrial and urban runoff entry point and its overall impacts on river water quality. As environmental regulations tighten fast, there arises the ever-growing need for timely and accurate source identification. For instance, recent research by Nazir et al. (2023) looked into the potentiality of real-time sensor networks, with artificial intelligence, in realizing immediate detections and localizations of pollution events in rivers: this indeed reflects a growing trend in proactive monitoring and management of river systems. The modeling and identification of pollution entrance locations into rivers has revolutionized the field, and the applications of geospatial technologies show significant advancements. The high-resolution satellite imagery fusion with spectral analysis helped in detecting subtle changes in the color of water and turbidity, which may be indicative of pollutant inputs. This technique is particularly useful in large river systems where traditional sampling would logistically be very difficult or perhaps even too costly (Gupta & Gupta 2021; Rakib et al. 2022).
Interdisciplinary collaborations have further enriched it, with the input of expertise from fields such as hydrology, chemistry, and data science. One of these ways, investigated in detail by researchers, is isotope analysis combined with machine learning algorithms, which may be instrumental in effectively differentiating between human and natural sources of pollutants based on these various identifications and thus allow for mitigation efforts to be adequately targeted (Orlowski et al. 2018). This approach can also increase public awareness and environmental stewardship. Advances in predictive modeling have resulted in the establishment of early warning systems in the event of probable pollution. A combination of hydrological forecasts by ensemble modeling techniques with historical data on pollution events enables one to forecast and locate pollution risks within river systems affected by seasonal variation and extreme weather events. Such predictive tools are increasingly valuable to water resource managers and environmental agencies in proactively addressing pollution threats (Cassagnole et al. 2021). Coupled with the growing complexity of the problems of river pollution, such innovative data management solutions are increasingly being utilized for effective environmental governance (Gad et al. 2022).
The source identification of river pollutants has been improved in the past few years by integrating computational models, machine learning, and high-resolution datasets. Bayesian inference combined with cellular automata (CA) modeling was applied to successfully identify pollution sources of rivers such as the Fen River, with relative errors of less than 19% for key parameters (Wang et al. 2024). Combined with multi-observation reconstruction, these enhanced ensemble Kalman filter methods allow for the accurate identification of pollution sources and dispersion coefficients by controlling the relative errors within 4% (Jing et al. 2024). Furthermore, the HydroFATE model has been developed to predict contaminant fate in global riverine systems by using high-resolution information to identify priority exposure areas (Ehalt Macedo et al. 2024). These developments indicate the increasing accuracy and efficiency in contaminant source identification for river reaches.
Previous research has focused on various methods for pollutant source identification in rivers, including chemical fingerprinting, inverse modeling, and machine learning approaches. However, challenges remain in accurately modeling complex river systems with transient storage and non-Fickian transport. This study aims to develop a dimensionless approach for enhanced pollutant source identification and transport through the river reaches.
MATERIALS AND METHODS
The transport of pollutants in river systems is governed by underlying physical processes including advection and dispersion coupled with numerous retention mechanisms. Each of these processes actuates at different spatial and temporal scales, and the resulting pattern of solute movement and transformation is complex. Customarily, mathematical models of such processes begin with a statement of mass conservation coupled with constitutive relationships that formulate the flux terms. The presence of channel heterogeneity, transient storage areas, and non-Fickian behaviors complicates the transport processes in natural river systems, resulting in irregularities in dispersion patterns. These complexities require the development of increasingly sophisticated modeling approaches to account for the major transport mechanisms and the subtle influences of river morphology and flow dynamics. The theoretical framework for the understanding of such processes has changed from simple advection–dispersion models to more refined representations, including fractional derivatives and storage zone dynamics, which allows for a better prediction of the fate and transport of pollutants in natural environments.
Model descriptions
Advection–dispersion equation
Transient storage model
The transient storage model (TSM) extends the ADE by incorporating transient storage into the ADE. It takes into account those areas where water is either stagnated or slow, like the hyporheic zones or dead-end pores, which constitute communities in which solutes can be stored for some time before being released again into the primary flow. The typical form of the TSM includes two coupled differential equations: one for the primary channel and another for the storage zone. This approach more accurately describes the solute transport in rivers and streams, most especially in the development of long tails off breakthrough curves (Equations (2) and (3)) (Bencala 1983; Runkel 1998; Marion et al. 2008).
Fractional advection–dispersion equation
All of these models have advantages and limitations, and the choice of models depends on the characteristics of the system being studied, the available data, and the model objective. Therefore, the selection and parameterization of models need to be carefully considered in order to properly represent the process of solvent transport in complex natural systems.
Moment analysis
Case study operated data
Antietam Creek represents an important tributary to the Potomac River in the hydrological system of the Mid-Atlantic region within the United States. Its 41 miles of length drain about 290 square miles in Maryland, Pennsylvania, and West Virginia. Varying topography at the watershed, characterized by rolling hills, agricultural lands, and forested areas, influences hydrological dynamics. Antietam Creek has a dendritic drainage pattern; it has several smaller streams and tributaries that converge together to form the primary channel. Peak discharges in Antietam Creek are mainly driven by precipitation, usually during spring snowmelt and after big summer thunderstorms. Contributions from the underlying karst aquifer system help maintain baseflow in periods of dry weather. As a result of its climatic and geomorphic setting, the creek exhibits large seasonal fluctuations in flow, with an annual hydrograph that ranges from about 100 cubic feet per second during low flow to more than 1,000 during high-flow events. The overall water quality of Antietam Creek is good, but it still suffers from issues related to agricultural runoff, urban development, and legacy industrial activities in its watershed. The hydrological characteristics of this creek have been altered by human activities over time, including small dams and diversion of water for irrigation, as well as changes in land-use patterns. All these modifications have implications for sediment transport, nutrient cycling, and aquatic habitat availability. Despite the anthropogenic influence, Antietam Creek is one of the regional water resources and adds to the variety of ecosystems, provides people with recreation, and gives its water to local communities. The gathered data by the United States Geological Survey (Nordin & Sabol 1974) from Antietam Creek pertains to the concentrations of pollutants injected upstream in a river. Details of the experiments conducted are elaborated in Table 1.
Operated field data and test properties of breakthrough curves (gathered by the USGS)
Test reach . | Site number . | Distance from point of injection L (km) . | River discharge Q (m3/s) . | Maximum concentration Cmax (ppb) . | tp Time to Cmax (h) . |
---|---|---|---|---|---|
Antietam Creek | 1 | 1.6 | 4.86 | 274.4 | 1.3 |
2 | 5.95 | 5 | 150.2 | 5.5 | |
3 | 13.35 | 7.02 | 48.98 | 16.0 | |
4 | 18.4 | 7.42 | 24.48 | 23.6 | |
5 | 26.25 | 8.1 | 13.51 | 33.2 | |
6 | 30.55 | 9.72 | 8.51 | 38.0 | |
7 | 36.8 | 11.61 | 6.27 | 43.2 | |
8 | 41.45 | 12.15 | 5.06 | 46.8 |
Test reach . | Site number . | Distance from point of injection L (km) . | River discharge Q (m3/s) . | Maximum concentration Cmax (ppb) . | tp Time to Cmax (h) . |
---|---|---|---|---|---|
Antietam Creek | 1 | 1.6 | 4.86 | 274.4 | 1.3 |
2 | 5.95 | 5 | 150.2 | 5.5 | |
3 | 13.35 | 7.02 | 48.98 | 16.0 | |
4 | 18.4 | 7.42 | 24.48 | 23.6 | |
5 | 26.25 | 8.1 | 13.51 | 33.2 | |
6 | 30.55 | 9.72 | 8.51 | 38.0 | |
7 | 36.8 | 11.61 | 6.27 | 43.2 | |
8 | 41.45 | 12.15 | 5.06 | 46.8 |
The aim of the current research is to determine one best-fit model for source identification purposes. For this purpose, data used included the measurements of pollutant concentration from eight stations on the river at distances of 1.6 to 41.45 km downstream, with measured flows ranging from 4.86 to 12.15 m3/s. The models commonly used to describe pollutant transport in rivers include ADE, FADE, and TSM.
RESULTS
Computed ADE parameters versus distance parameter, (a) peak concentration versus distance, (b) travel time versus distance, and (c) dispersion coefficient versus distance.
Computed ADE parameters versus distance parameter, (a) peak concentration versus distance, (b) travel time versus distance, and (c) dispersion coefficient versus distance.
Both models, ADE and TSM, were applied using a reverse modeling approach to the dataset to find the source. The ADE model was run with a variable dispersion coefficient and flow rate for each river segment. TSM was parameterized with extra storage zone parameters estimated from the tail of the breakthrough curves. The ADE model resulted in reasonable estimates for source identification, although the estimated injection point was within only 2 km of the actual location. Our results demonstrate that it performed very poorly in simulating tail behavior downstream. In comparison with peak concentrations and pollutant-profile extended tails, the TSM did much better. The source location was estimated to be within an accuracy of 1 km using the TSM. This improved fit of the TSM thus suggests that transient storage processes could play an important role in pollutant transport in this river system. Regarding Figure 1(c), the term ‘average distance’ refers to the arithmetic mean of successive measurement station distances along the river reach. Moment analyses of the concentration–time profiles at each measurement station were also conducted. The zeroth moment, M0, has the interpretation of the total mass of pollutants, while the first, M1, and second, M2, moments provide data about the center of mass and the spread of the pollutant cloud, respectively. Moment analysis at the first station, that is, at 1.6 km, results in M0 = 3,885 ppm h, M1 = 1.47 h, and M2 = 0.18 h2. At the last station, these values changed to M0 = 3,750 ppm h, M1 = 46.9 h, and M2 = 12.5 h2. The rise in M1 and M2 downstream quantifies the longitudinal spreading and mixing processes occurring along the river. The Peclet number was also examined for each river segment. The Peclet number is estimated from Pe = uL/D, where u denotes the mean velocity, L is the length of the reach, and D is the dispersion coefficient. In the case of the first reach from 1.6 to 5.95 km, Pe ≈ 21, which indicates that advection dominates over dispersion. However, the last reach represented 36.8–41.45 km, Pe decreased to a value of about 12, indicating an enhanced role of dispersion in pollutant transport in the downstream reaches. The storage zone exchange fluxes, qs, were also estimated for each river segment to examine the possible role of transient storage zones in pollutant transport. For the first segment, qs = αAs ≈ 2 m2/h, while for the last segment, qs was about 0.8 m2/h. This decrease of qs with stream distance is commensurate with the decrease in tailing effects in the concentration–time profiles with the downstream stations.
The local sensitivity was performed by changing each model parameter by ±10% from its estimated value, while other parameters were fixed, on the first hand. Sensitivity coefficients (Si) were calculated as: Si = (ΔY/Y)/(ΔXi/Xi), in which ΔY/Y is the estimation of ratio change in model outputs (concentration profiles) and ΔXi/Xi is the estimation of ratio change in parameter i. In this regard, the analysis provided indicators on the degree of sensitivity of the model with respect to each examined parameter enabling the identification of most sensitive parameters for pollutant transport predictions. The parameters studied included velocity (v), dispersion coefficient (D), storage zone exchange coefficient (α), and storage zone cross-sectional area (As).
It was found that the estimated source location is most sensitive to changes in the main channel velocity (∂x/∂u ≈ −1.2 km/m/s) and relatively sensitive to the dispersion coefficient (∂x/∂D ≈ −0.05 km/m2/s). The sensitivity to transient storage parameters (α and As) was relatively low (∂x/∂α ≈ −0.02 km/h and ∂x/∂As ≈ −0.01 km/m2), suggesting that while these parameters improve overall model fit, they have a limited impact on the estimated source location.
For a better understanding of the general transport behavior, the temporal moments of the residence time distribution (RTD) were determined for each river segment. The average residence time thus changed from 4.3 h for the first segment to 5.8 h for the last segment, while the variance of the RTD (σ2) grew from 2.1 to 4.7 h2. Skewness of the RTD increased downstream from 1.2 to 1.9, which could mean an increasingly asymmetric pollutant distribution likely as a function of cumulative effects of the transient storage processes. A correlation analysis was done in order to explain the variation of pollutant transport characteristics with river discharge. It was found that there is a strong positive correlation (r = 0.89) between the river discharge and the pollutant travel time between stations. This relationship can be described by the power-law equation: T = 3.2Q−0.65, where T is the travel time in hours and Q is the discharge in m3/s. This equation allows for estimation of travel times under different flow conditions, which is crucial for real-time pollutant tracking and management. The non-dimensional analysis also considered the dependency of the longitudinal dispersion coefficient on the hydraulic parameters of rivers. To do this, a dimensionless dispersion coefficient K = D/(H u*), where H is the mean depth and u* is the shear velocity, was computed for every river segment, providing values in the range of 10.5–15.2 with an average of 12.8. This value is within the typical range of 5–20 for natural rivers, but it is at the higher end, indicating considerable mixing processes taking place within the channel. The next step was to examine possible scale dependency in the dispersion process by looking at the relationship between the dispersion coefficient and distance downstream. In this research, it is shown that D ∝ x0.7represents a sublinear increase of dispersion with distance. This relationship is very useful in working out how mixing processes develop along the river and can be used in extrapolating dispersion coefficients to unmonitored reaches.
To assess the potential for hyporheic exchange (Liu et al. 2012), the hyporheic exchange flux was estimated: qH = (As/A) × (α/L), where L is the reach length. It was found that qH values range from 1.2 × 10−5 to 3.8 × 10−5 s−1, with higher values in the upstream segments. This suggests a more active hyporheic zone in the upper reaches, which could have implications for pollutant retention and transformation processes.
The relationships between TSM parameters can be used to infer valuable information about the dynamics of pollutant transport in this river system. The mean velocity decreases with distance, from 0.34 m/s at a flow distance of 1.6 to 0.25 m/s at 41.45 km. This 26.5% decrease in velocity over this reach of interest likely reflects a slow expansion or deepening of the river channel, which is confirmed by the increasing cross-sectional area. The longitudinal dispersion coefficient, D, also decreased with distance from 61.92 to 29.52 m2/s, a 52.3% drop. More precisely, the reduction in the value of D was greater than the reduction in velocity, which indicated that factors other than simple scaling with flow characteristics were operating in the dispersion process.
TSM parameters variation versus distance, (a) velocity versus distance, (b) dispersion coefficient versus distance, (c) area versus distance, (d) hyporheic zone area versus distance, (e) exchange coefficient α versus distance, and (f) β parameter variation along river distance.
TSM parameters variation versus distance, (a) velocity versus distance, (b) dispersion coefficient versus distance, (c) area versus distance, (d) hyporheic zone area versus distance, (e) exchange coefficient α versus distance, and (f) β parameter variation along river distance.
TSM parameters variation versus each other, (a) dispersion coefficient versus flow velocity, (b) flow area versus velocity, (c) flow area versus hyporheic zone area, and (d) exchange coefficient α versus β parameter.
TSM parameters variation versus each other, (a) dispersion coefficient versus flow velocity, (b) flow area versus velocity, (c) flow area versus hyporheic zone area, and (d) exchange coefficient α versus β parameter.
A fractional advection–dispersion analysis (FADA) was also conducted to further characterize the river's transport properties for possible anomalous dispersion behavior. To that end, a fractional order of differentiation α was introduced in the classical ADE, whose value is estimated to vary from 1.85 in the upstream reaches to 1.72 in the downstream segments using the method of moments. This slight departure from the classical ADE (α = 2) can be considered evidence of mild non-Fickian transport processes, possibly induced by the heterogeneous structure of the river and dead-zone effects. The scaling properties of pollutant transport by analyzing the relationship between the variance of the travel time distribution (σ2) and the mean travel time (t) were further explored. It was found that σ2 ∝ tβ, where β ≈ 1.2. This super-linear scaling (β > 1) indicates that the spread of the pollutant plume increases faster than would be expected under pure Fickian dispersion, further supporting the use of more complex models like TSM or FADA for accurate predictions.
Table 2 presents the statistical criteria for model adequacy based on different methods.
Statistical criteria for model adequacy
Model . | RMSE (ppm) . | NSE . | AIC . | R2 . | MAPE (%) . |
---|---|---|---|---|---|
ADE | 0.86 | 0.89 | −245 | 0.91 | 12.5 |
TSM | 0.52 | 0.95 | −318 | 0.97 | 7.8 |
FADA | 0.61 | 0.93 | −292 | 0.95 | 9.2 |
Model . | RMSE (ppm) . | NSE . | AIC . | R2 . | MAPE (%) . |
---|---|---|---|---|---|
ADE | 0.86 | 0.89 | −245 | 0.91 | 12.5 |
TSM | 0.52 | 0.95 | −318 | 0.97 | 7.8 |
FADA | 0.61 | 0.93 | −292 | 0.95 | 9.2 |
Estimated versus actual pollutant injection distances
Station . | Actual distance (km) . | ADE estimate (km) . | TSM estimate (km) . | FADA estimate (km) . |
---|---|---|---|---|
1 | 1.6 | 1.2 | 1.5 | 1.4 |
2 | 5.95 | 5.3 | 5.8 | 5.6 |
3 | 13.35 | 12.1 | 13.1 | 12.8 |
4 | 18.4 | 16.9 | 18.2 | 17.8 |
5 | 26.25 | 24.5 | 26.0 | 25.6 |
6 | 30.55 | 28.7 | 30.3 | 29.9 |
7 | 36.8 | 34.9 | 36.5 | 36.1 |
8 | 41.45 | 39.4 | 41.2 | 40.7 |
Station . | Actual distance (km) . | ADE estimate (km) . | TSM estimate (km) . | FADA estimate (km) . |
---|---|---|---|---|
1 | 1.6 | 1.2 | 1.5 | 1.4 |
2 | 5.95 | 5.3 | 5.8 | 5.6 |
3 | 13.35 | 12.1 | 13.1 | 12.8 |
4 | 18.4 | 16.9 | 18.2 | 17.8 |
5 | 26.25 | 24.5 | 26.0 | 25.6 |
6 | 30.55 | 28.7 | 30.3 | 29.9 |
7 | 36.8 | 34.9 | 36.5 | 36.1 |
8 | 41.45 | 39.4 | 41.2 | 40.7 |
Model performance (RMSE) for different operated models through eight stations.
Comparisons between the calculated and actual pollutant injection distances at each station give an indication of the performance and limitations of each model along the river's length. The TSM worked the best and had only an average absolute error of 0.25 km for all stations. This superior performance is particularly evident in the downstream reaches, where the ability of the TSM to account for temporary retention associated with storage zones is more important. The FADA model reflects an improvement over the classical ADE, especially in the middle reaches of the river, suggesting that indeed, non-Fickian transport processes play a significant role in these reaches. The performance of the hybrid model is roughly comparable to that of a TSM, where the average absolute error was only 0.31 km. This suggests that as much as transient storage is the dominant process affecting pollutant transport, there might be some accruing benefit to fractional dynamics mostly in the mid-reaches of the river. The ADE model underpredicts the injection distance everywhere with increasing errors in the downstream direction. This systematic bias illustrates that simple advection–dispersion assumptions are not adequate to accurately predict pollutant transport in river systems, especially over longer distances. These results underline how proper model choice should depend on both river system properties and the spatial scale of interest.
This equation relates the spread of the pollutant plume to the key dimensionless parameters of the TSM.

(a,b) Dimensionless parameters of the presented relationships versus distance from injection point.
(a,b) Dimensionless parameters of the presented relationships versus distance from injection point.
The negative exponent (−0.15) for the dispersion term D/(u × L) indicates that the pollutant, for a greater dispersion in comparison to advection, will spread out more and slightly shorten the perceived injection distance. The positive exponent (0.22) for the storage zone ratio (As/A) indicates that larger storage zones in relation to the main channel give a longer perceived injection distance. This supports the theory that storage zones hold pollutants, hence extending the effective transit time and distance. The small negative exponent (−0.08) for the dimensionless exchange coefficient (αL/u) implies that faster exchange between the main channel and storage zones slightly reduces the perceived injection distance. The rapid exchange may be the reason for the dispersed distribution since it enables the pollutant to enter storage zones quickly. This is further evidenced by the very small negative exponent (−0.03) for the normalized concentration term (CAutM) which indicates that the injection distance is relatively insensitive to the initial concentration, as we would expect for a conservative tracer. In order to validate this relationship, coefficients of determination (R2) were computed between the predicted and observed dimensionless injection distances. This results in R2 = 0.94, which means that the empirical relationship explains 94% of the dispersion in the observed data. Further assessment of the model's performance can be done by calculating the relative error for each measurement station (Table 4).
Actual, predicted, and relative error for calculated contaminant position using dimensional relationship
Station . | Actual x/L . | Predicted x/L . | Relative error (%) . |
---|---|---|---|
1 | 0.0952 | 0.0913 | 4.10 |
2 | 0.3542 | 0.3628 | −2.43 |
3 | 0.7946 | 0.7805 | 1.77 |
4 | 1.0952 | 1.1103 | −1.38 |
5 | 1.5625 | 1.5408 | 1.39 |
6 | 1.8185 | 1.8302 | −0.64 |
7 | 2.1905 | 2.1763 | 0.65 |
8 | 2.4673 | 2.4821 | −0.60 |
Station . | Actual x/L . | Predicted x/L . | Relative error (%) . |
---|---|---|---|
1 | 0.0952 | 0.0913 | 4.10 |
2 | 0.3542 | 0.3628 | −2.43 |
3 | 0.7946 | 0.7805 | 1.77 |
4 | 1.0952 | 1.1103 | −1.38 |
5 | 1.5625 | 1.5408 | 1.39 |
6 | 1.8185 | 1.8302 | −0.64 |
7 | 2.1905 | 2.1763 | 0.65 |
8 | 2.4673 | 2.4821 | −0.60 |
The low relative errors below 5% at all of the stations prove that the presented dimensionless relationship works well for the entire river length. This could give way to a simplified dimensionless relationship that can provide faster numerical simulations compared to solving full TSM equations. To further validate the presented dimensionless model, sensitivity analyses were conducted for each of the dimensionless parameters by perturbing them by ±10% and seeing the effect on the predicted injection distance. On average, results showed that the model is most sensitive to changes in the dispersion–advection ratio D/(u × L) with an elasticity of −0.18. This means that a 10% increase in D/(u × L) leads to a 1.8% decrease in the predicted injection distance. The model showed moderate sensitivity to the storage zone ratio (As/A) with an elasticity of 0.24, while it was less sensitive to the dimensionless exchange coefficient (αL/u) and normalized concentration (CAutM), with elasticities of −0.09 and −0.03, respectively. These elasticities align well with the exponents in our empirical equation, providing further confidence in the model's structure (Table 5).
Sensitivity analysis results
Parameter . | Base value . | − 10% perturbation . | + 10% perturbation . | Elasticity . |
---|---|---|---|---|
D/(u*L) | 0.0038 | +1.8% in x/L | −1.8% in x/L | −0.18 |
As/A | 0.25 | −2.4% in x/L | +2.4% in x/L | 0.24 |
αL/u | 2.8 | +0.9% in x/L | −0.9% in x/L | −0.09 |
CAutM | 0.0029 | +0.3% in x/L | −0.3% in x/L | −0.03 |
Parameter . | Base value . | − 10% perturbation . | + 10% perturbation . | Elasticity . |
---|---|---|---|---|
D/(u*L) | 0.0038 | +1.8% in x/L | −1.8% in x/L | −0.18 |
As/A | 0.25 | −2.4% in x/L | +2.4% in x/L | 0.24 |
αL/u | 2.8 | +0.9% in x/L | −0.9% in x/L | −0.09 |
CAutM | 0.0029 | +0.3% in x/L | −0.3% in x/L | −0.03 |
The performance of the presented relationship was also considered across flow regimes by splitting the dataset into low-, medium-, and high-flow conditions according to the calculated Re. Under low-flow conditions, Re < 10,000, the model slightly overestimated the injection distances with a MAPE (mean absolute percentage error) of 3.2%. On the other hand, under high-flux situations, Re > 50,000, the model underestimated these injection distances with a MAPE of 2.8%. The best performance was obtained for medium-flux flows, within the range 10,000 ≤ Re ≤ 50,000, where MAPE amounts to 1.5%. This suggests that while our dimensionless approach is very robust across a range of flow conditions, there remain some flow-dependent processes that are not captured by the current formulation. To consider possible scale-dependent effects, the dimensionless models herein were used to predict injection distances for hypothetical longer river reaches up to 100 km by extrapolating observed hydraulic and geomorphological parameters. Model predictions for these extended reaches have shown good agreement with the expected power-law scaling of peak arrival times with distance: tpeak ∝ x1.05, cited in the literature for other river systems, which suggests that our dimensionless approach might be valid for predicting pollutant transport over longer distances than those directly observed in our study.
The various functional forms of the longitudinal dispersion coefficients were put to a rigorous test for reliability. Three important functional relationships were examined: the power law D ∝ UH2/u*, the exponential form D ∝ exp(B/H), and the polynomial form D ∝ aU2 + bH + c, where U is the mean, H is the flow depth, B is the base channel considerate, and u* is the shear velocity. Each functional form was put to the test to see how well it performed. Out of all the functional forms that have been analyzed the power form was the best fit to the data with an R2 value of 0.94 and the power quadratic below the range of values of flow conditions. This functional form D/(uH) = 5.93(B/H)0.7(U/u)0.5 performed better than the exponential form with coefficient of determination R2 = 0.85 and the polynomial form R2 = 0.82. This was more apparent in reaches where the width-to-depth ratio exceeded 15, which is normal for any natural river. When evaluating the accuracy of different mathematical models, the power-law formulation expressed in terms of D was found to predict errors within ±20% of the actual data for approximately 85% of the cases. In comparison, the other two models (exponential and polynomial forms) performed less accurately. The exponential model predicted errors within ±20% for about 72% of the data, while the polynomial model achieved this for only 68% of the data.
A Monte Carlo simulation was conducted to propagate uncertainty in input parameters of the presented dimensionless model in the current study. Using 10,000 realizations with input parameters sampled from their respective probability distributions, it was found that the 95% confidence interval for predicted injection distances ranged from ±3.5% of the mean value for near-field predictions (x/L < 0.5) to ±7.2% for far-field predictions (x/L > 2). This quantification of prediction uncertainty is noteworthy for risk assessment and decision-making in water quality management scenarios.
In order to deepen our understanding of the effects of parameters and the reliability of predictions, an uncertainty quantification framework based on a granular computing-neural network approach has been conducted. This hybrid approach merges the advantage of information processing using granular computing with pattern recognition capabilities of a neural network. The longitudinal dispersion coefficient, a critical parameter in pollutant transport modeling, was investigated with this framework. We also developed a three-layered neural network model with granular computing principles to process the input parameters, which included flow velocity (v), hydraulic radius (R), shear velocity (u*), and channel width (W). The model was based on Gaussian membership functions to achieve granulation; hence, each input parameter was divided into five granules based on their physical ranges. The uncertainty analysis indicated that the prediction intervals for the dispersion coefficient ranged between ±12% under low-flow and ±18% under high-flow conditions. Sensitivity analysis conducted within this framework demonstrated that flow velocity and channel width were the most dominant parameters affecting the uncertainty of predictions, with normalized sensitivity coefficients calculated as 0.65 and 0.48, respectively. The model was able to achieve a 0.92 correlation coefficient in the prediction of the dispersion coefficient and had a MAPE of 15.3%. The analysis gives additional confidence in model predictions but quantifies the inherent uncertainties in parameter estimation.
Furthermore, a detailed analysis of uncertainty associated with longitudinal dispersion coefficients was carried out within a Monte Carlo framework. Several sources of uncertainty were integrated into the analysis. The study considered uncertainties in measurements of flow velocity (±3%), channel geometry (±2%), and concentration measurements (±5%). Furthermore, it accounted for uncertainties in parameter estimation for dispersion coefficients estimated using moment analysis (±8%), storage zone parameters (±10%), and exchange coefficients (±12%). The overall uncertainty related to estimates of the longitudinal dispersion coefficient was shown to range from ±15% in upstream reaches to ±22% in downstream reaches. This increase in uncertainty with distance was attributed to the indication of cumulative effects of parameter interactions coupled with measurement errors. It was determined that the flow measurements account for approximately 45% of the total uncertainty, while the geometric parameters and concentration measurements contribute 30 and 25%, respectively. A first-order error propagation method was applied in order to examine how the uncertainty influenced the predicted concentration profiles.
CONCLUSION
This study has been conducted using acquired data series of Antietam Creek in Pennsylvania and Maryland, USA. It provides a detailed analysis of pollutant transport and source identification using specific modeling techniques and dimensional analysis. Three models, ADE, TSM, and FADA, were compared in this study. The results demonstrated that the TSM was better at simulating the complex nature of pollutant transport dynamics in this river system. Statistical analysis results showed that among the models tested, the TSM has an RMSE of 0.52 ppm and a relatively high NSE of 0.95. This good performance was attributed to the fact that TSM allows for temporary retention in its storage zones, a process which downstream becomes very important. The ADE model always underpredicts the injection distance, and its errors increase in the downstream direction, reflecting an inability of the advection–dispersion assumptions to simulate pollutant transport over a relatively long, complex river system. One of the critical contributions of this research is developing a dimensionless relationship in estimating pollutant injection distance. This is a compact equation that brings together some of the important dimensionless parameters that represent dispersion, the dynamics of the storage zone, and mass conservation. The high coefficient of determination, R2 = 0.94, indicates that this empirical relationship explained 94% of the variance in observed data, hence its strength and potential for practical applications in similar river systems. Moreover, for the scale dependency of the dispersion process, the dispersion coefficient scale was found to be a distance of D ∝ x0.7. This sublinear increase in dispersion with distance is very useful information about how mixing processes evolve along the river and can be used to extrapolate dispersion coefficients to unmonitored reaches. Sensitivity analysis shows that the model is most sensitive to changes in the dispersion–advection ratio on average, where elasticity is –0.18. This implies that a 10% increase in D/(u × L) leads to a 1.8% decrease in the predicted injection distance. The model was fairly sensitive to the storage zone ratio (As/A), with an average elasticity of 0.24, but much less sensitive to the dimensionless exchange coefficient and normalized concentration (CAutM). The model's performance across different flow regimes was also investigated, revealing slight tendencies to overestimate injection distances in low-flow conditions and underestimate these in high-flow conditions. The best performance was observed in medium-flow conditions, suggesting that while the dimensionless approach is robust across a range of flow conditions, there may be some flow-dependent processes that are not fully captured by the current formulation. Monte Carlo simulation to propagate uncertainty in input parameters through the dimensionless model found that the 95% confidence interval for predicted injection distances ranged from ±3.5% of the mean value for near-field predictions (x/L < 0.5) to ±7.2% for far-field predictions (x/L > 2). This quantification of prediction uncertainty is essential for risk assessment and decision-making in water quality management scenarios.
AUTHOR CONTRIBUTIONS
J.C. designed the study, performed the data analysis and simulations, interpreted the results, drafted the manuscript, and served as the corresponding author.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.