Urban flooding has made it necessary to gain a better understanding of how well gully pots perform when overwhelmed by solids deposition due to various climatic and anthropogenic variables. This study investigates solids deposition in gully pots through the review of eight models, comprising four deterministic models, two hybrid models, a statistical model, and a conceptual model, representing a wide spectrum of solid depositional processes. Traditional models understand and manage the impact of climatic and anthropogenic variables on solid deposition but they are prone to uncertainties due to inadequate handling of complex and non-linear variables, restricted applicability, inflexibility and data bias. Hybrid models which integrate traditional models with data-driven approaches have proved to improve predictions and guarantee the development of uncertainty-proof models. Despite their effectiveness, hybrid models lack explainability. Hence, this study presents the significance of eXplainable Artificial Intelligence (XAI) tools in addressing the challenges associated with hybrid models. Finally, crossovers between various models and a representative workflow for the approach to solids deposition modelling in gully pots is suggested. The paper concludes that the application of explainable hybrid modeling can serve as a valuable tool for gully pot management as it can address key limitations present in existing models.

  • Existing models are presented and discussed.

  • Integrating data-driven and traditional models enhances performance.

  • Explainability could pose a challenge to adopting hybrids.

  • A review study is conducted on crossovers between different models to explore their limitations and propose potential improvement.

  • A workflow is developed to address the challenges associated with the implementation of explainable hybrids for prediction.

Stormwater is usually evacuated from road surfaces by gullies (Figure 1) connected to municipal sewers via unperforated subsurface pipes made to specification (Butler et al. 2018). The kerb and gully drainage systems (Figure 1) are one of the most common forms of road drainage used in the United Kingdom.
Figure 1

The kerb and gully drainage system.

Figure 1

The kerb and gully drainage system.

Close modal
Gullies are placed at defined intervals along the kerb and on the low side of the road surface. In the UK, gullies are placed 50 m apart or one gully for every 200 m2 of road surface (Department for Transport 2020). In practice 100 gullies may be expected to yield 7 m3 of debris (Butler et al. 2018). The gully grate provides cover to a buried component known as the gully pot (Figure 2).
Figure 2

Gully pot with an integral solids trap (all dimensions are in millimeters).

Figure 2

Gully pot with an integral solids trap (all dimensions are in millimeters).

Close modal

Surface runoff and underground drainage networks are connected by these gully pots. They are designed to minimise solids deposition in drainage systems which contribute to blockages in the drainage network, reduce sewer system efficiency, urban flooding and increased pollution into water bodies (Forty 1998; British Standards Institution 2021). Trapped gully pots have a solids collector, sometimes known as a solids trap. Solids traps capture sediments that could otherwise escape through the grate. The deposition of solids in road gullies is influenced by climatic- and anthropogenic-driven processes, which can be categorised into three phases: Solids Build-Up (SB), Solids Wash-Off (SW), and Solids Retention (SR) (Rietveld et al. 2020b). SB processes are mostly time-dependent and include variables in the contributing area such as: traffic intensity, road surface type, solids particle size, and street sweeping frequency. SW processes are mostly climate-dependent and include: rainfall amount and intensity, surface runoff, wind action and temperature. SR processes directly impact the accumulation of solids in gully pots including: variables of flow rate, gully cross-section, depth of solids trap, solids type and gully filling degree or position of outlet pipes. It is important to assess techniques for predicting the interaction between these processes to better understand their potential effects on the deposition of solids in gully pots (Rietveld et al. 2020b).

On average, gully pots are 750–900 mm deep and 450 mm wide (Figure 2; British Standards Institution 2021). The average urban road has a solids deposition rate of 14–24 mm/month (Butler & Karunaratne 1995). These numbers imply that gully maintenance needs to follow a defined interval. Nonetheless, maintenance cycles are not strictly defined (Forty 1998). Counties decide their own independent road sweeping and gully maintenance cycles with decision-making based on budget constraints, expert judgment, environmental vulnerability and public complaints (Forty 1998; Fenner 2000) which is mostly ineffective. For example, South Gloucestershire Council in southwest England has ∼51,175 inventoried gullies along its 1,533 km of carriageway, 1,391 km footway, and 118 km cycle ways (South Gloucestershire 2015, 2022). From 2017 to 2020, the levels of solids in these gully pots were recorded 28,123 times where 56% of the inspected gully pots were half-filled (marginal) (Figure 3). This highlights the burden of unnecessary inspection. In a related study, Entwistle (2021) conducted a survey on eight road networks, encompassing one million gullies where 20–60% of these gullies undergo unnecessary inspection each year, emphasising the need for an optimised approach to solids deposition prediction.
Figure 3

A county's gully inspection data showing solids deposition levels (South Gloucestershire Council 2022).

Figure 3

A county's gully inspection data showing solids deposition levels (South Gloucestershire Council 2022).

Close modal

To find an optimised approach to the deposition of solids in gully pots, conceptual, deterministic, statistical, and hybrid models need to be further explored to determine the differences and overlap (Obropta & Kardos 2007). For example, a deterministic model may include some stochastic elements to account for uncertainty or variability in the system. Similarly, a statistical model may use deterministic equations to model the relationship between variables. Since the boundaries between models are not always clear, it is important to understand the strengths and limitations of these models. By understanding the crossovers between them, they can be integrated to produce explainable hybrids that offer improved prediction accuracy. Following the above discussion, the objectives of this work are as follows:

  • Highlight the strengths and limitations of existing models that predict solids deposition processes in gully pots.

  • Explore the fusion of data-driven and traditional models to develop hybrid models that enhance the overall performance.

  • Consider the challenges involved in deploying hybrid models for solid deposition prediction.

  • Develop a clear workflow for explainable hybrid modelling.

The remaining sections of this study are organised as follows: Section 2 describes solids deposition procedures in gully pots. Section 3 presents a review of existing models for the prediction of solids deposition. The limitations in utilising existing models are discussed in Section 4 along with how explainable hybrid models can be used to enhance them. Finally, Section 5 explores ways to ensure that the hybrid models used for prediction are explainable to all stakeholders.

The amount of deposited solids in gully pots is influenced by climatic and anthropogenic processes. These processes are divided into three major phases as described below.

SB processes within a defined catchment area give rise to accumulation of debris, sediments and other particles that are eventually transported by wash-off processes into gully pots. Some of these time-dependent processes include characteristics of the contributing area such as leaf fall (Nix 2002), road slope (Muthusamy et al. 2018), traffic intensity (Chow et al. 2015), road surface roughness (Zhao et al. 2018), particle size distribution, mass and density (Xiao et al. 2022), and street sweeping frequency (Egodawatta et al. 2013).

SW processes are those by which solids are transported into gully pots. They are mostly climate-dependent and include contributing area, rainfall characteristics (Sartor et al. 1974; Egodawatta et al. 2007), surface runoff (Zhao et al. 2018), wind action (Butler & Karunaratne 1995), daily sunshine hours and solar radiation (Nix 2002), temperature (Post et al. 2016), antecedent dry weather period (ADWP) (Post et al. 2016; Rietveld et al. 2020b), solids geometry, which include initial sediment load and particle size distribution, mass and density (Grottker & Hurlebush 1987; Butler & Karunaratne 1995) and street sweeping frequency.

SR processes that directly impact accumulation of solids in gully pots are the result of their design geometry (Post et al. 2016) and flow rate (Deletic et al. 1997). These processes have the potential to reduce a gully's hydraulic capacity at any given time and can impact on a gully's trapping efficiency. SR processes include contributing area, rainfall characteristics, gully cross-sectional area and depth of solid trap or gully pot (Post et al. 2016), gully grate design pattern (Rietveld et al. 2020a), solids geometry and type, and gully filling degree/position of outlet pipes (Post et al. 2016; Rietveld et al. 2020b).

SB over a catchment is washed-off by rainfall and other sediment-transport processes. It is then transferred through gully pots into sewers although due to their trapping efficiency, gully pots can capture and retain these pollutants through retention variables implying that, SB, SW, and SR processes are interrelated by a range of overlapping variables. Contributing area is an example variable relevant to all processes, while specific variables may be important to individual processes.

To understand and manage the impact of climatic and anthropogenic changes on SB, SW, and SR processes, several models have been developed. However, certain models lack the resilience required to handle the uncertainty that arises from limitations, such as scope and applicability (Litwin & Donigian 1978; Driver & Troutman 1989), the use of complex and non-linear variables (Bertrand-Krajewski et al. 1993), inflexibility due to reliance on fixed constants and processes (Grottker & Hurlebush 1987), bias from the use of limited and erroneous variables (Sartor et al. 1974; Egodawatta et al. 2007), sensitivity to outliers, precision and data mismatch (Rietveld et al. 2020b).

Although some models are developed to recognise patterns in non-linear and complex problems and handle uncertainty, they may not be explainable (Lundberg & Lee 2017; Geng et al. 2022), thereby leading to the risk of misapplication (Almutairi et al. 2021). Models such as bootstrap aggregating and Adaptive Boosting (AdB) (Behrouz et al. 2022), generally referred to as ensemble learning, can explicitly integrate deterministic, statistical and stochastic models and potentially exploit the advantages of each approach to reduce prediction error and uncertainty. Other useful hybrid models in this regard include Artificial Neural Networks (ANNs), Random Forests (RF) (Breiman 2001), Gradient Boosting Machines (GBM) (Friedman 2001), and Monte Carlo simulations. Nevertheless, these hybrid models are not explainable and may require complex and resource-intensive computations (Clark 2005; Gelman & Hill 2006; Post et al. 2015; Lee et al. 2021). XAI involves developing a model that humans can explain. It is important for ensuring trust, accountability and transparency in the model.

This study reviewed eight models applied in literature for the study of solids deposition in gully pots and beyond. The reviewed models are first summarised in Table 1. These models are subsequently discussed in Sections 3.1–3.3.

Table 1

A summary of the existing modelling techniques used for the prediction of solids deposition

s/nModelTypeDeposition phase
Stormwater Management Model (SWMM) (Sartor et al. 1974; Alley & Smith 1981Deterministic SB 
Butler and Karunaratne's gully pot trapping efficiency model (Butler & Karunaratne 1995Deterministic SR 
Grottker's soilds retention model (Grottker & Hurlebush 1987; Grottker 1990; Butler & Karunaratne 1995Deterministic SR 
Modified Sartor and Boyd's model (Sartor et al. 1974; Egodawatta et al. 2007Deterministic and statistical SW 
The Non-Point Source model (NPS) (Litwin & Donigian 1978Conceptual SW 
Driver and Troutman's model (Driver & Troutman 1989Statistical SW 
Debris flow volume model (Lee et al. 2021Hybrid SW 
Gully pot sediment accumulation model (Post et al. 2015, 2016Hybrid SR 
s/nModelTypeDeposition phase
Stormwater Management Model (SWMM) (Sartor et al. 1974; Alley & Smith 1981Deterministic SB 
Butler and Karunaratne's gully pot trapping efficiency model (Butler & Karunaratne 1995Deterministic SR 
Grottker's soilds retention model (Grottker & Hurlebush 1987; Grottker 1990; Butler & Karunaratne 1995Deterministic SR 
Modified Sartor and Boyd's model (Sartor et al. 1974; Egodawatta et al. 2007Deterministic and statistical SW 
The Non-Point Source model (NPS) (Litwin & Donigian 1978Conceptual SW 
Driver and Troutman's model (Driver & Troutman 1989Statistical SW 
Debris flow volume model (Lee et al. 2021Hybrid SW 
Gully pot sediment accumulation model (Post et al. 2015, 2016Hybrid SR 

Deterministic modelling techniques

Deterministic models are based on mathematical equations. Therefore, they may not be applicable to all types of processes, for example the complex processes involved in solids deposition (Bertrand-Krajewski et al. 1993). These models struggle with complex variables, since they are limited in the context of scope and applicability, inflexibility, data limitations, bias, and sensitivity to outliers. These limitations further aggravate uncertainty, which are often calibrated by trial and error (Alley & Smith 1981), with little understanding of the models' sensitivity to the variables driving solid deposition. Deletic et al. (1997) and Rietveld et al. (2020b) further suggested that deterministic models may not adequately handle missing data or measurement errors, leading to inaccurate predictions or inferences. Furthermore, Bertrand-Krajewski et al. (1993) stated that the precision of a deterministic model relies on how well the calculated values agree with the observed values. This agreement can be measured through objective functions such as the mean square error (MSE) and least square method (LSM) (Egodawatta et al. 2007). Nonetheless, this precision accounts for discrepancies such as measurement errors in sampling and the assumption that the deterministic model is only a rough approximation of the complex physical processes.

A widely used deterministic model in storm water management applications for estimating temporal SB in a defined catchment is the exponential asymptotic-based Stormwater Management Model (SWMM), which is represented by Equation (1) (Sartor et al. 1974; Alley & Smith 1981). Software applications such as InfoSWMM (Environmental Systems Research Institute n.d.) and SWMM5 (United States Environmental Protection Agency 2023) have deployed the operational principles of this model to simulate SB. Suárez et al. (2013) utilised SWMM5 to develop a sand filter system for managing highway runoff by analysing SB and SW within a catchment.
(1)
where represents the accumulated mass of solids at time t (kg); is the daily accumulation rate (kg/d); is the disappearing coefficient (d−1); and are coefficients that should be calibrated for every catchment prior to the use of the model, since is influenced by a series of anthropogenic and hydrometeorological patterns such as wind action, traffic intensity, street sweeping frequency, contributing area (land use) and ADWP. Equation (1) considers SB and SW variables in the deposition process but cannot confidently comprehend the significance of ADWP which is an important variable that influences first flush (Bach et al. 2010).
Another deterministic model was developed by Butler & Karunaratne (1995) to assess gully pot trapping efficiency (Equation (2)):
(2)
where represents solids trapping efficiency; is a turbulence correction factor; g is the acceleration due to gravity (m/s); d represents solids particle diameter (mm); D is the gully pot diameter (mm); Q is the flow rate (L/s); v is kinematic viscosity (m2/s); and S is the particle specific gravity (PSG). However, Equation (2) accounts for uniform and laminar flows, which will result in an inaccurate estimation of solid transport and trapping efficiency due to lack of consideration for turbulence. Additionally, the model assumes that solid particles are spherical, which can be unrealistic since particles can have non-uniform shapes (Collinson et al. 2006). Rietveld et al. (2020b) highlighted the need for a more comprehensive understanding of the interactions between the limitations in Equation (2), the deposition of solids and the solids' depth in the gully pot to enhance the model's accuracy.
Grottker (1990) developed a deterministic SR model to study gully pot performance given in Equation (3):
(3)
where represents the mass of solids passing through the gully pot (kg); is the mass of solids washed off by rainfall (kg); Q is the discharge through the gully pot (L/s) and are solids geometry (diameter) – dependent numerical coefficients within the range of . It has been further argued that the proposed range of fixed numerical coefficients x and y might not always represent real-world scenarios indicating that variations from these fixed values could significantly impact the model's accuracy (Bertrand-Krajewski et al. 1993). As a result, deterministic models may be rigid because of their reliance on fixed constants and processes, making it challenging to adapt the model to changes in processes or new data.
Sartor et al. (1974)'s deterministic pollutant wash-off model given in Equation (4) assumes that every storm event has the capacity to remove all the available solids from a given surface, if the storm continues for an adequate duration. Egodawatta et al. (2007) further experimented with Equation (4) to replicate actual wash-off behaviours and proposed the modification, represented by Equation (5), which includes the capacity factor parameter :
(4)
The need for this modification arose because rainfall events have the capacity to mobilise only a fraction of solids on a given road surface, with the value of ranging from 0 to 1. Egodawatta et al. (2007) also cautioned that wrong assumptions could introduce uncertainty into the modified model (Equation (5)) due to the incorporation of :
(5)
where represents the fraction of SW after a storm event; W is the weight of solids mobilised after time t; is the initial weight of solids on the surface, represents the capacity factor; is a wash-off coefficient (mm−1); and l is rainfall intensity (mm/hr). Egodawatta et al. (2007) used the LSM to determine the optimal values of k and . Moreover, they evaluated Equation (5) at three sites using statistical techniques such as mean and coefficient of variation to understand each site's characteristic data. The coefficient of variation revealed significant inaccuracies in data estimation due to the use of non-representative build-up data, for each site. Xiao et al. (2022) investigated the applicability of Equation (5) for solids transport rate and the influence of particle size distribution on wash-off. They recommended the calibration of parameters k and for different particle size distributions on a road surface to reduce model's uncertainty.

Conceptual modelling techniques

Litwin & Donigian (1978) were the first to develop the Non-Point Source (NPS) conceptualised model, a pollutant loading model that estimates the quantity of pollutants from land surfaces to a watercourse using Equations (6) and (7):
(6)
(7)
where represents SW over a period of time t (kg/m2); represents the accumulated solids over a time period t (kg/m2); k is a wash-off coefficient; is surface runoff on impervious area (mm); and c is a numerical coefficient.

NPS model development was generalised to consider non-point pollutants from a maximum of five land use categories which include urban, agricultural, forested, and construction areas, whilst interfacing with water quality parameters of temperature, dissolved oxygen, suspended solids, and biochemical oxygen demand (Litwin & Donigian 1978). The model also considers seasonal variables stemming from construction activities, gritting, and leaf fall. The NPS model aids in estimating the solids transported by runoff and deposited into gully pots. This helps assess how well gully pots perform in reducing solids deposition in drainage systems and minimising pollution in receiving water bodies (Post et al. 2016). However, the mathematical representation of solids deposition and wash-off require rigorous and separate simulation and evaluation. In the absence of sufficient data, this introduces algorithm complexity and to address this, a simplified representation of processes controlling non-point pollution must be established. This raises the dilemma of trading algorithm complexity for reduced uncertainty and simplicity of application.

Bertrand-Krajewski et al. (1993) further argued that the simplified representation of processes used in the NPS Model (Equations (6) and (7)) may not accurately reflect the reality of SW and pollutant transport. This is due to its failure to account for the temporal fluctuations in precipitation, runoff, and pollutant concentrations commonly observed in hydrometeorological variables. As a result, the NPS model must be meticulously calibrated whenever it is applied to a new watershed which is a time-consuming and complex process (Yuan et al. 2020).

Hydrid modelling techniques

The statistical model of Driver & Troutman (1989) established a linear regression model presented by Equation (8). The model was developed to estimate storm-runoff solids deposition in urban watersheds across the United States:
(8)
where represents SW during rainfall event (kg); is the total rainfall depth (mm); A is catchment area (km2); and is rainfall duration (minutes). Linear regression models were developed for three different regions which were delineated based on mean annual rainfall to improve the accuracy of the models. However, the validity of the model is limited to arid western United States where annual rainfall is less than 500 mm.

According to Kunin et al. (2019) and Burden & Winkler (2008), Bayesian regularisation is a mathematical process that converts a non-linear regression into a statistical problem. Using a Bayesian Regularised Artificial Neural Network (BRANN) model, Lee et al. (2021) conducted an analysis of solids deposition prediction accuracy using historical extreme rainfall events and 15 climatic and anthropogenic solids variables, i.e. a hybrid model. They found that BRANN had a higher accuracy with a coefficient of determination (R2) of 0.911, compared to the Multiple Linear Regression Equations (MLRE) of Marchi & D'Agostino (2004) and Chang et al. (2011), which had R2 values of 0.693, 0.688, and 0.670. Although Lee et al. (2021)'s research is not unique to solids deposition prediction in gully pots. BRANN model has been applied in natural gas explosion risk analysis (Shi et al. 2019), rainfall prediction model for debris flow (Zhao et al. 2022) and optimisation of diesel engine combustion events (Ankobea-Ansah & Hall 2022). BRANN combines the deterministic Bayesian Regularisation Algorithm (BRA) with the stochastic ANN and is known to improve the prediction accuracy of complex systems, incorporate stochasticity, handle uncertainty, and capture data variability (Papananias et al. 2017).

According to Burden & Winkler (2008), BRANN offers a probabilistic interpretation of both model parameters and predictions. It aims to find the optimal model parameters by balancing the model's fit to the data and its complexity, and by minimising the objective function found in simplified linear adaptations of Burden & Winkler (2008) (Equations (9) and (10)):
(9)
where w is a vector of the model parameters; is the loss function that measures the difference between the predicted values of the model and the actual values of the data; is the regularisation term that penalises the complexity of the model by adding a penalty term that increases as the magnitude of the model parameters; and , the regularisation parameter that controls the trade-off between the fit of the model to the data and the complexity of the model.
By adding a prior distribution over the model parameters, which represents the belief about the parameter values before seeing the data, the model is encouraged to learn simpler representations that are more likely under the prior distribution. This can prevent overfitting. The objective function is then modified to include the negative log-likelihood of the data given the prior distribution over the model parameters (Equation (10)):
(10)
where the prior distribution is explained by the hyperparameter that controls the strength of the prior distribution and , the prior distribution over the model parameters.
Despite the improved prediction accuracy, the output of uncertainty-proof models such as BRANN, bagging, boosting, and stacking are often difficult to interpret due to their complex ‘black box’ architecture (Lundberg & Lee 2017; Geng et al. 2022). ANN, for example, contains several layers of nodes and hidden neurons that can detect intricate patterns and relationships in data, but their architecture and the use of hidden layers (Figure 4) make it difficult to interpret how the network arrives at its final predictions. Lee et al. (2021)'s ANN achieved better prediction accuracy than various MLRE, but utilised unexplainable neurons in the hidden layer.
Figure 4

Architecture of the ANN model.

Figure 4

Architecture of the ANN model.

Close modal

Apart from its ‘black box’ nature, complexity (Uzair & Jamil 2020), weight initialisation (Manish Agrawal et al. 2021), and use of activation functions (Uzair & Jamil 2020; Brownlee 2021) are some of the reasons why the hidden layer tends to be less explainable. According to Uzair & Jamil (2020), hidden layers are designed to perform non-linear transformations of the inputs entered into the network. These transformations can become increasingly complex as more hidden layers are added to the network.

The ‘circles’ in Figure 4 represent nodes. Thus, the model has n input nodes , denoted by vector , a hidden layer with m input nodes (m + 1“bias”), and an output layer. The additional node with the value, b is called the bias node, which is a scalar value. The ‘arrows’, (Figure 5) are a combination of weights which represent the impact of a preceding node on the next node. Using Figure 5 as an example, the inputs contribute weights , to the weighted sum, to each of the node in the hidden layer which has a predefined activation function, . The activation function defines if the receiving node will be activated or how active it will be (Brownlee 2021).
Figure 5

Schematic of an artificial neuron with inputs (x1,2…n), weights (w1,2,…n), bias (b), transfer function (), activation function () and output (y).

Figure 5

Schematic of an artificial neuron with inputs (x1,2…n), weights (w1,2,…n), bias (b), transfer function (), activation function () and output (y).

Close modal

The weights () can be randomly assigned, fine-tuned and calibrated through the process of back propagation (Ognjanovski 2019). In any case, the weights can be difficult to interpret and understand (Manish Agrawal et al. 2021). Furthermore, the neurons in the hidden layer take in a set of weighted input and produce an output through an activation function, whose choice can have a significant impact on the behaviour of the hidden layer. Some activation functions such as Rectified Linear Unit (ReLU) and Softmax can be more difficult to interpret than others (Agarap 2018) and misleading (Ozbulak et al. 2018). Simos & Tsitouras (2021) proposed a modification to the commonly used non-linear activation function, Hyperbolic Tangent (tanh), with the aim of reducing the computational complexity of neural networks (NNs).

Generalised Linear Mixed Models (GLMM) are an extension of linear mixed models (LMMs) and allow response variables from different distributions, such as binary responses. Alternatively, one could think of GLMM as an extension of generalised linear models (e.g., logistic regression) to include both fixed and random effects (hence mixed models) (University of California Los Angeles 2023). The general form of the model is shown in Equation (11):
(11)
where y is a column vector, the target variable; X is a matrix of the i predictor variables; is a column vector of the fixed-effects regression coefficients (the ); is the design matrix for the j random effects (the random complement to the fixed ); is a vector of the random effects (the random complement to the fixed ); and is a column vector of the residuals, that part of that is not explained by the model, .
The matrix dimensions are as follows:

From Equation (11), the target variable y is a linear combination of the fixed-effects part (), the random effects part (), and the residuals (). The equation assumes a linear model with fixed and random effects, and the residuals are assumed to be independent and identically distributed with mean zero. The GLMM thus incorporates both deterministic ( and stochastic () elements (Penn State University n.d.).

To clarify further, is the residual or the random error component of the model. It captures the variability in the target variable y that is not explained by the fixed-effects predictor variables X and the random effect variables Z. The residual introduces the stochastic or random element in the equation, accounting for the unexplained variability in the data (uncertainty), which can arise from measurement bias or unobserved variables. To improve the predictive performance and uncertainty quantification of the GLMM, an autoregressive (AR) component within a Bayesian framework (a stochastic process) is incorporated. This is achieved by using priors and posterior distributions as shown in Equations (12) and (13).

Considering a time series-based gully pot accumulation model with an AR() component, where p represents the order of the AR process, the model can be written as:
(12)
where represents the target variable at time t; are the AR coefficients; is the residual term at time t, assumed to follow a specific distribution.
To incorporate a Bayesian perspective, we assign prior distributions to the parameters and estimate the posterior distributions using Bayesian inference. The prior distributions can be specified based on prior knowledge or assumed distributional assumptions. For example, a common choice is to assign a normal prior distribution to the AR coefficients as shown in Equation (13):
(13)
where and are the mean and variance of the prior distribution for .

Given the observed data , …., , some Bayesian inference techniques, such as Gibbs sampling (Casella & George 1992) and Metropolis-Hastings algorithms (Chib & Greenberg 1995) that are based on Markov Chain Monte Carlo (MCMC) methodology can be used to obtain posterior distributions for the parameters. These posterior distributions provide information about the uncertainty in the estimates. Thus, AR models a time series as a linear combination of its previous values. By acknowledging uncertainty about the model parameters, this approach enables probabilistic predictions about future values of the time series (Martin et al. 2021).

The PyMC3 (Salvatier et al. 2016) is a probabilistic model which can be used to define the GLMM with a binary response variable and an AR component, and then sample from the posterior distribution using MCMC methods. For multiclass response variables, a multinomial logistic regression model with an AR component could be deployed (Chan 2023).

Post et al. (2015) combined the GLMM with an AR component from a Bayesian perspective. Their objective was to examine the impact of geometrical and catchment variables (c.f.Post et al. 2015) on the filling rates of gully pots, based on monthly measurements of solid bed levels from 300 gully pots for one year. Their results provided insights into the effect of different designs on accumulation in gully pots, allowing for better optimisation of maintenance activities and improved gully pot design.

Post et al. (2015)’s Bayesian approach over the quasi-likelihood technique may not accurately represent the true underlying distribution of the data, as it can be prone to inaccuracies, is sensitive to outliers and not well-suited for modelling non-linear relationships between variables (Spiegelhalter et al. 2002). Therefore, by utilising a combination of GLMM and AR from a Bayesian perspective, their model development was able to effectively capture the complex time series data that exhibited both temporal autocorrelation and dependence or clustering of observations.

However, Clark (2005) and Gelman & Hill (2006) suggested that the combination of GLMM and AR from a Bayesian perspective can lead to increased computational complexity and intensity. Also, the model results interpretation can be more challenging for some researchers and stakeholders not familiar with Bayesian statistics.

Conceptual and deterministic models rely on physically based equations and linear models to demonstrate SB over a catchment, SW and passage through gully pots where the solids are either deposited or transferred to sewers. However, these models are limited due to uncertainties arising from scope and applicability, precision, inflexibility, data limitations, bias, and sensitivity to outliers. Statistical models, which assume normality and linearity often rely on linear regression, also face similar limitations when dealing with complex relationships or non-linear patterns in data, as reported by Marchi & D'Agostino (2004), Chang et al. (2011), and Lee et al. (2021). Although hybrid models have been employed to handle uncertainties, they create new issues such as risk of misapplication, computational complexity and intensity, poor model interpretability and explainability. Therefore, the use and benefits of XAI tools will be discussed in this section to address the limitations of these hybrid models. Table 2 presents a summary of crossovers between models and how they may be addressed.

Table 2

Crossovers between solids deposition models

Model typeTraditional
Hybrid
Data-driven
DeterministicConceptualDeterministic
BRANNGLMM-ARStatistical
Deposition phaseSWSRSBSWSRSW
Explanatory variables Rainfall intensity, kinetic energy of rainfall and characteristics of solids Catchment area, solids accumulation rate, and ADWP Surface runoff and contributing area (land use) Rainfall intensity Flow rate, PSG, diameter of gully pot, diameter and depth of solids, and kinematic viscosity Rainfall amount and flow rate ACCU and DISP (dependent on land use, ADWP etc). Variables of morphology, rainfall and geology Road type, depth of trap, contributing surface area, catchment slope, position of outlet pipe, and presence of water seal Catchment area and rainfall attributes 
Model constraint Bias Complex and non-linear variables Inflexibility Complex and non-linear variables Explainability and transparency Computational complexity and intensity Scope and applicability 
Suggested improvement DT models, cross-validation, data balancing, and feature selection techniques Ensemble Learning DT models, label encoding, and one hot encoding Ensemble Learning mRMR algorithm, hybrid (PSO & FURIA) Linear models, DT, feature importance analysis or partial dependence plots, XAI techniques TL, hyperparameter tuning, ML models 
Target variable Mass of washed-off solids Mass of retained solids/gully trapping efficiency Mass of built-up solids Debris flow volume Filling rate of gully pots Mass of washed-off solids 
References Sartor et al. (1974); Egodawatta et al. (2007)  Servat (1984); Bertrand-Krajewski et al. (1993)  Litwin & Donigian (1978), Bertrand-Krajewski et al. (1993)  Bujon (1988); Alley & Smith (1981)  Butler & Karunaratne (1995); Rietveld et al. (2020b)  Grottker (1990); Butler & Karunaratne (1995)  Sartor et al. (1974); Alley & Smith (1981)  Lee et al. (2021)  Post et al. (2015, 2016Driver & Troutman (1989)  
Model typeTraditional
Hybrid
Data-driven
DeterministicConceptualDeterministic
BRANNGLMM-ARStatistical
Deposition phaseSWSRSBSWSRSW
Explanatory variables Rainfall intensity, kinetic energy of rainfall and characteristics of solids Catchment area, solids accumulation rate, and ADWP Surface runoff and contributing area (land use) Rainfall intensity Flow rate, PSG, diameter of gully pot, diameter and depth of solids, and kinematic viscosity Rainfall amount and flow rate ACCU and DISP (dependent on land use, ADWP etc). Variables of morphology, rainfall and geology Road type, depth of trap, contributing surface area, catchment slope, position of outlet pipe, and presence of water seal Catchment area and rainfall attributes 
Model constraint Bias Complex and non-linear variables Inflexibility Complex and non-linear variables Explainability and transparency Computational complexity and intensity Scope and applicability 
Suggested improvement DT models, cross-validation, data balancing, and feature selection techniques Ensemble Learning DT models, label encoding, and one hot encoding Ensemble Learning mRMR algorithm, hybrid (PSO & FURIA) Linear models, DT, feature importance analysis or partial dependence plots, XAI techniques TL, hyperparameter tuning, ML models 
Target variable Mass of washed-off solids Mass of retained solids/gully trapping efficiency Mass of built-up solids Debris flow volume Filling rate of gully pots Mass of washed-off solids 
References Sartor et al. (1974); Egodawatta et al. (2007)  Servat (1984); Bertrand-Krajewski et al. (1993)  Litwin & Donigian (1978), Bertrand-Krajewski et al. (1993)  Bujon (1988); Alley & Smith (1981)  Butler & Karunaratne (1995); Rietveld et al. (2020b)  Grottker (1990); Butler & Karunaratne (1995)  Sartor et al. (1974); Alley & Smith (1981)  Lee et al. (2021)  Post et al. (2015, 2016Driver & Troutman (1989)  

The limitation of complex and non-linear variables

Deterministic and statistical models may not be applicable to all types of systems (Equation (1)) and may struggle with complex variables, for example poor understanding of the impact of ADWP in SB. However, feature selection techniques have been used within climate studies to identify the relative contribution and significance of explanatory variables in forecasting. Warton et al. (2015) utilised a residual correlation matrix to examine how 65 alpine tree species (explanatory variables) responded to snowmelt (the target variable). They evaluated the degree of correlation between the tree species and identified their significance and relevance across 75 different sites. Moreover, by utilising residual correlation matrix, they identified the environmental variables that were strongly correlated with the tree species data. This highlights the effectiveness of the approach in analysing complex community ecology data with multiple variables.

Haidar & Verma (2018) used a combination of genetic algorithm (GA) and particle swarm optimisation (PSO) algorithm (Kennedy & Eberhart 1995) to optimise climate features in rainfall forecasting. Their model outperformed three established standalone models while highlighting the effectiveness of hybrid models in selecting the most relevant climate variables and optimising the network parameters of a NN-based model.

Caraka et al. (2019) used the PSO algorithm which combines deterministic and heuristic techniques to identify the most relevant features for accurately predicting particulate matter 2.5 (PM2.5), making it a useful tool for feature selection.

Hu et al. (2018) utilised minimum redundancy maximum relevance (mRMR) algorithm to identify the most significant features for local climate zone classification. The mRMR algorithm achieved high classification accuracy and outperformed other established feature selection techniques, such as principal component analysis (PCA) and correlation-based feature selection, due to its stochastic feature selection process, producing varied results in different algorithm runs. The utilisation of statistical measures to assess feature relevance and redundancy led to the selection of a feature subset that maximises relevance while minimising redundancy by Mazzanti (2021).

In developing a flash flood susceptibility model, Bui et al. (2019) used the FURIA-GA feature selection technique which is a combination of the fuzzy rule-based feature selection method and the GA, in selecting the most informative features for their flash flood susceptibility model. FURIA algorithm uses a DT to generate a set of fuzzy rules from the input data. Subsequently, the GA is utilised to search for the optimal subset of features (Bera 2020).

In identifying the main variables of solids deposition in gully pots, Rietveld et al. (2020a) utilised regression trees (RTs). Their study revealed that RTs provided slightly more accurate feature prediction when compared with LMMs, due to their capability to describe relationships between variables, under varying conditions. Lee et al. (2021) utilised Pearson's correlation analysis to identify the four most significant variables out of 15 that affect debris flow volume. The four prominent variables were then used to train the model.

It is important to acknowledge that the effectiveness of a chosen feature selection technique can be influenced by numerous variables present in a complex feature system. These variables may include high correlation, overfitting, large feature space leading to computational intensity, and imbalanced data (Cherrington et al. 2019). Therefore, it is crucial to determine an optimal feature selection method based on the characteristics of the data and the problem at hand. It is also equally important to validate the chosen features, to ensure their ability to generalise well to new data.

Equation (2) underlines the problem of inaccurate estimation of solid transport and trapping efficiency due to lack of consideration for turbulence and an assumption that solid particles are spherical. ML techniques are well-suited for handling various categories of shape and flow patterns (laminar, turbulent, steady, and unsteady) by converting the shapes and patterns into a numerical format that the algorithm can process. This process is known as label encoding (Table 3) (Scikit-learn Developers 2023). An example of label encoding could be assigning numerical values to recognised shape categories before developing a model (Table 3). This is exemplified in the Geng et al. (2022) study on predicting litterfall in forests by categorising forest types. Label encoding was used to convert categorical variables such as forest type, vegetation type, and climate zone into numeric variables to predict litterfall production.

Table 3

The use of label encoding to illustrate solids shape, as a categorical explanatory variable in soilds retention prediction

Solids shape categoryRepresentative numerical value
Spherical 
Angular 
Flaky 
Rod-like 
Discoid 
Ovoid 
Irregular 
Solids shape categoryRepresentative numerical value
Spherical 
Angular 
Flaky 
Rod-like 
Discoid 
Ovoid 
Irregular 

To address the issue of not accounting for turbulence in Equation (2), one possible solution is to use one hot encoding to represent fluid flow as a binary categorical variable rather than use the conceptualised turbulence correction factor, . One hot encoding assigns a value of either 0 or 1 to indicate laminar or turbulent flow, respectively. Gong & Chen (2022) argues that one hot encoding is better as it avoids a misleading ranking between categories. However, in cases where a categorical variable has a natural order, such as rating the level of risk posed by solids in a gully (e.g., low, medium and high), label encoding may be a more suitable approach.

To prevent algorithmic complexity, a simplified version of Equations (6) and (7) is used to represent the NPS model. However, this simplified approach disregards non-linear relationships and fluctuations in hydrometeorological variables, which can result in model uncertainty. As a result, the model may require frequent recalibration for each specific application. Whilst deterministic and statistical models may not be easily adjusted for complex fluctuations in hydrometeorological variables (Litwin & Donigian 1978) and changes in a contributing area (Deletic et al. 1997), ensemble learning techniques such as Adaptative Boosting (Freund & Schapire 1995), GBM (Friedman 2001), and Stochastic Gradient Boosting combine several base models to produce one optimal predictive model and can easily learn complex relationships. Thus, they are well trained to reduce the need for frequent recalibration. Furthermore, the performance of the trained model can be evaluated and validated on a separate test dataset, using evaluation metrics such as MSE and mean absolute error (MAE), along with resampling techniques like cross-validation (Refaeilzadeh et al. 2009).

In addition to the use of ensemble learning to resolve the recalibration issues identified in Equations (6) and (7) and the use of one hot encoding to deal with lack of consideration for turbulence in Equation (2), DT-based models such as RF (Breiman 2001) and RTs (Morgan & Sonquist 1963) can combine multiple trees for improved model performance and handle categorical variables without the need for one hot encoding (Gross 2020).

The limitation of scope and applicability

Transfer Learning (TL) (Bozinovski & Fulgosi 1976) is a technique that allows NN to adapt from pre-trained models for new tasks or datasets. By leveraging the knowledge learned from a previous task, the model can improve its performance on a different problem, thus increasing its applicability and widening its scope. However, it is important to note that TL alone does not automatically select the best variables, algorithms, and hyperparameters for a given problem, as discussed by Yogatama & Mann (2014). To address this, hyperparameter tuning (Feurer & Hutter 2019), which involves selecting the optimal set of hyperparameters for a given model and project by learning from historical training data (Brownlee 2019) is necessary. Therefore, combining TL with hyperparameter tuning is crucial in dealing with scope and applicability issues. For example, the dataset used in generating the regression model for Equation (8) can be fine-tuned for applications in higher annual rainfall areas. Furthermore, the use of TL ensures the continual use of the existing model.

Subel et al. (2023) applied TL in sub-grid scale turbulence modelling by enhancing the capabilities of convolutional neural networks (CNNs), thus enabling them to extrapolate from one system to another. This was achieved by introducing a general framework that identifies the best re-training procedure for a given problem based on physics and NN theory. Hyperparameter tuning was then used to optimise the performance of the NN by searching over a specified hyperparameter space and finding the best layers to re-train. TL has also been used to improve the efficiency of distinct wastewater treatment processes. Pisa et al. (2023) used TL to develop a control system for wastewater treatment plants. Data from a source plant were used to train a deep NN and then the network was fine-tuned by data from the target plant. They evaluated the transfer suitability of the trained network by comparing its performance on the target plant with that of a network trained only on the target plant. Russo et al. (2023) used a combination of algorithms which includes RF, support vector regression, and ANN, to predict sediment and nutrient first flush. Their framework was used to identify the most influential variables that contribute to sediment and nutrient pollution in any geographical region, thus eliminating scope and applicability limitations.

The limitation of inflexibility

As revealed in Equation (3), over reliance on fixed constants and processes makes it challenging for deterministic models to adapt to changes in processes or new input data. Nevertheless, models that combine deterministic and stochastic elements have been used to address the limitations of deterministic models. These hybrid models can be used to capture the relationship between the explanatory variables (varying discharge, solids geometry, rainfall characteristics) and the target variable (mass of solids passing through the gully pot). This approach may not rely on any predefined numerical or fitting coefficients and can learn the underlying patterns in the data to make accurate predictions as demonstrated by Lee et al. (2021)’s debris flow volume model. Their NN model outperformed various multiple linear regression models with fitting coefficients. Kim et al. (2022) proposed a novel hybrid model for water quality forecasting. The methodology of their research involved the use of data decomposition, ML and error correction, which eliminates the reliance on fixed deterministic constants and identifies underlying patterns and trends in data. Furthermore, they built an error correction framework similar to Figure 6 by combining variational mode decomposition (VMD) algorithm (Dragomiretskiy & Zosso 2013) and Bidirectional Long Short-Term Memory (BiLSTM) NN (Schuster & Paliwal 1997), which in turn improved the forecast accuracy, by correcting errors in the data.
Figure 6

A VMD and BiLSTM-based error correction flowchart (modified from Kim et al. 2022).

Figure 6

A VMD and BiLSTM-based error correction flowchart (modified from Kim et al. 2022).

Close modal

As shown in various studies, the hybrid model's ability to handle real-time data and correct errors makes it more accurate than depending on fixed deterministic constants (Li et al. 2021; Peng et al. 2022).

The limitation of bias from the use of non-representative data, missing data, and outliers

Egodawatta et al. (2007) introduced Equation (5) as a modification to Sartor and Boyd's pollutant wash-off model (Equation (4)) to address the issue of biased and unreliable predictions, resulting from erroneous assumptions. This suggests that deterministic models may not always address bias by simply adding constraints to the model. However, when Equation (5) was subjected to a basic statistical evaluation using data from different sites, it became apparent that using non-representative build-up data could exacerbate bias in modelling.

Uncertainties with the modified model (Equation (5)) could be addressed by implementing advanced statistical and stochastic techniques that deal with outliers and high-dimensional or redundant data. These techniques can extract information from the data before training the model to effectively deal with bias. For example, in predicting nitrogen, phosphorus, and sediment mean concentrations in urban runoff, Behrouz et al. (2022) made use of RF, an algorithm known for its ability to handle noisy data and outliers. Lee et al. (2021) used cross-validation in the development of their debris flow volume model. Their study randomly partitioned the data associated with the four prominent variables into 10 subsets of approximately equal size, with a 7:3 ratio for training and validation datasets. They then trained their model on the training data and evaluated its performance on the validation data. This process was repeated 10 times, with each of the 10 subsets being used as the validation set once. The average performance of the model over the 10 iterations was then calculated to provide a more reliable estimate of its performance on unseen data. By using cross-validation, Lee et al. (2021) ensured that their model was valid and unbiased, and that it was not overfitting to a particular subset of data.

Most traditional statistical models such as linear regression, logistic regression, and analysis of variance (ANOVA) are known to be sensitive to outliers and biased towards certain groups within the variables. This implies that the choice of modelling technique can affect the accuracy and validity of the models. According to Maharana et al. (2022), the use of more representative and robust data preprocessing techniques can effectively address missing data, bias, and data quality issues in solid deposition modelling.

Hybrid models that incorporate data-driven techniques have been recognised to handle uncertainties where traditional models may struggle (Post et al. 2015; Lee et al. 2021). However, these models require intricate and resource-intensive computation (Figure 4) and may be unexplainable due to their black box nature, posing the risk of model misapplication. As suggested in Section 4.1 and Table 4, the use of algorithms such as PDP, mRMR, PCA, FURIA-GA, RT has demonstrated effectiveness in model misapplication by selecting features during model development.

Table 4

Overview of methods for explaining ‘black box’ models, their corresponding algorithms and benefits

MethodAlgorithmsBenefits
Simplification L1 or L2 regularisation (Ng 2004), sigmoid function (Cramer 2003), modified tanh (Simos & Tsitouras 2021), smooth function approximation (Shurman 2016; Ohn & Kim 2019), weight sharing (Pham et al. 2018Reduces complexity of the model and the number of hidden layers, simplifies the activation function, and prevents overfitting 
Layer-wise explanation Gradient-weighted Class Activation Mapping, Grad-CAM (Selvaraju et al. 2016), Layer-wise Relevance Propagation, LRP (Bach et al. 2015), integrated gradients (Hsu & Li 2023Analyse the output of each layer in a NN to gain insights into the behaviour of the network. Thus, identifying importance of various layers 
Model-agnostic interpretation Local Interpretable Model-Agnostic Explanations, LIME (Ribeiro et al. 2016), Shapley Addictive exPlanations, SHAP (Lundberg & Lee 2017), Recursive Feature Elimination, RFE (Guyon et al. 2002), Principal Component Analysis, PCA (Abdi & Williams 2010), mutual information (Shannon 1948), partial dependence plot, PDP (Friedman 1991), permutation feature importance, PFI (Breiman 2001Visual feature importance insights that explain the behaviour of complex models regardless of the model's architecture. Identify variables that are most important for producing a given output and provide insights into correlation between variables, and the behaviour and transparency of the model 
Tree-based explanation Classification and RTs such as CART, ID3, C4.5, CHAID, MARS, RF, GBT (Loh 2008; Hannan & Anmala 2021Inherently interpretable ML model that can be used in conjunction with other XAI tools 
MethodAlgorithmsBenefits
Simplification L1 or L2 regularisation (Ng 2004), sigmoid function (Cramer 2003), modified tanh (Simos & Tsitouras 2021), smooth function approximation (Shurman 2016; Ohn & Kim 2019), weight sharing (Pham et al. 2018Reduces complexity of the model and the number of hidden layers, simplifies the activation function, and prevents overfitting 
Layer-wise explanation Gradient-weighted Class Activation Mapping, Grad-CAM (Selvaraju et al. 2016), Layer-wise Relevance Propagation, LRP (Bach et al. 2015), integrated gradients (Hsu & Li 2023Analyse the output of each layer in a NN to gain insights into the behaviour of the network. Thus, identifying importance of various layers 
Model-agnostic interpretation Local Interpretable Model-Agnostic Explanations, LIME (Ribeiro et al. 2016), Shapley Addictive exPlanations, SHAP (Lundberg & Lee 2017), Recursive Feature Elimination, RFE (Guyon et al. 2002), Principal Component Analysis, PCA (Abdi & Williams 2010), mutual information (Shannon 1948), partial dependence plot, PDP (Friedman 1991), permutation feature importance, PFI (Breiman 2001Visual feature importance insights that explain the behaviour of complex models regardless of the model's architecture. Identify variables that are most important for producing a given output and provide insights into correlation between variables, and the behaviour and transparency of the model 
Tree-based explanation Classification and RTs such as CART, ID3, C4.5, CHAID, MARS, RF, GBT (Loh 2008; Hannan & Anmala 2021Inherently interpretable ML model that can be used in conjunction with other XAI tools 

It is imperative to understand why and when stakeholders need insights from the ‘black box’ models that are used in predicting the performance of variables in solids deposition. These include the needs for informed stakeholder decision-making, directed future data collection planning, data troubleshooting, informed feature extraction and anomaly detection and embedding trust.

Data troubleshooting plays a crucial role due to the prevalence of ‘dirty’ data, potential errors in preprocessing code, and the risk of target leakage which occurs when the training data contains information about the target, but similar data will not be available during model prediction. This can adversely impact the overall performance of the model as shown in Post et al. (2016)’s robust outlier detection regime while developing their hybrid model for solids deposition in gully pots. Understanding the patterns identified by models allows for the identification and resolution of errors. Additionally, an understanding of model-based insights will enable feature extraction, which is achieved by creating new features from raw data or existing features. These insights become important when dealing with large datasets or lacking domain knowledge. By selecting or designing features that align with domain knowledge, the resulting model becomes more transparent and easier to explain to non-experts. This can be particularly important when the model's predictions impact critical decisions or require justification to gain trust from stakeholders. Lack of transparency in ‘black box’ models can pose a challenge in stakeholder decision-making, raise ethical concerns and possible discriminatory outcomes, potentially preventing specific groups from accessing opportunities. For example, a county that relies solely on data-driven systems to manage gully pot cleansing may disregard human contributions, potentially leading to a reduction in funds allocated to a gully jetting company responsible for routine and reactive cleansing of the county's gullies.

In the context of human decision-making, model insights hold significance as they can inform decisions made by individuals, sometimes surpassing the importance of predictions. There are also growing concerns about the autonomy of ML systems in their ability to take decisions and actions without inputs from human oversight, established deterministic theories, and conceptual thinking (Subías-Beltrán et al. 2022). These concerns underline the need for explainable models allowing humans to understand how they work and provide insights into the decisions made. This is important where decisions that are based on ML models can have consequences such as discriminatory outcomes. Furthermore, insights from models can guide future data collection efforts, helping local councils determine which types of data are most valuable for solid deposition management and investment.

Table 4 summarises tools and how they are employed in improving the explainability of ‘black box’ models. To explain further, SHAP values provide a comprehensive method for rationalising a model's output to enable stakeholders to understand the influence of individual variables by assigning a score to each variable. For example, let us consider a scenario where an unexplainable ‘black box’ algorithm is employed to build a solid deposition model based on a randomly-generated data for a solids level inspection (Table 5). In this scenario SHAP can be used to explain the contributions of land use i, an independent variable as shown in the following equation:
(14)
where is the Shapely value for land use, f is the black box model, x is an input data point which is a single row in the gully inspection data, represents iteration over all possible subsets and combination of variables to ensure that interactions between individual variables are accounted for. If land use and solids type is one of the subsets under consideration, we can get the model output for this subset with ( and without the variable of interest (i.e. land use). The difference in and explains how land use contributed to the prediction in the subset.
Table 5

A random example of a solids level inspection data showing climatic and anthropogenic variables

Road hierarchySolids typeSeasonRainfall intensity (mm/hour)Dry period (days)LanduseSolids level
Service Silt Winter 15.63 Residential 75% 
Lane Leaves Summer 0.2 Agricultural 50% 
Service Silt Autumn 9.3 Residential 50% 
Strategic Leaves Winter 0.2 Residential 50% 
Minor Silt Spring 15.63 Recreational 100% 
Road hierarchySolids typeSeasonRainfall intensity (mm/hour)Dry period (days)LanduseSolids level
Service Silt Winter 15.63 Residential 75% 
Lane Leaves Summer 0.2 Agricultural 50% 
Service Silt Autumn 9.3 Residential 50% 
Strategic Leaves Winter 0.2 Residential 50% 
Minor Silt Spring 15.63 Recreational 100% 

For example, if the model output (solids level) with land use ( is 75%-filled and without land use (, 50%-filled, this then implies that land use contributes 25%, which is otherwise known as marginal value. The same process is repeated for each possible combination of subsets which are additionally weighted according to how many variables of the total number of variables () are in the subset.

However, calculating all the combinations of subsets is computationally intensive. It is therefore beneficial to use an exponential term , with n representing the number of variables. For example, the gully inspection data in Table 5 has six independent variables and 64 possible subset combinations, which makes it computationally intense to get the average contribution of one variable. According to Lundberg & Lee (2017), Kernel SHAP, which is an approximation technique that samples variable subsets and fits a linear regression based on the samples, can be used to eliminate the need for intense computation. Other approximation techniques are tree SHAP and deep SHAP which are used for tree-based and deep NN models, respectively. The SHAP summary plot (Figure 7) presents a concise and easily understandable overview of the model's feature importance (Lundberg & Lee 2017; SHAP 2018).
Figure 7

The SHAP summary plot enhances the explainability of ‘black box’ models (modified from SHAP 2018).

Figure 7

The SHAP summary plot enhances the explainability of ‘black box’ models (modified from SHAP 2018).

Close modal

Geng et al. (2022) used SHAP values to demonstrate the importance and correlation of various explanatory variables in predicting litterfall production, a crucial solid build-up process. Similarly, Russo et al. (2023) used a combination of ‘black box’ algorithms which include RF, support vector regression, and ANN, to predict sediment and nutrient first flush. The study used 76 potential predictive variables as input to the machine learning algorithm. The SHAP algorithm was then used to determine the feature importance of the variables and to improve the interpretability and explainability of the ‘black box’ models.

Likewise, classification trees are simple and interpretable models that visually represent the decision-making process of a ‘black box’ model and explain how the model arrived at a specific prediction. Rietveld et al. (2020b) used RTs in explaining the significance and correlation between SB, wash-off, and retention predictors in predicting solid accumulation rate in gully pots.

Following insights from this study, a workflow (Figure 8) is presented for the implementation of explainable hybrid models in the context of solid deposition modelling for gully pots. Stages 2–5 may involve a series of iterations to achieve a satisfactory model.
Figure 8

A suggested workflow for deploying an explainable hybrid model that can effectively predict solids deposition in a gully pot.

Figure 8

A suggested workflow for deploying an explainable hybrid model that can effectively predict solids deposition in a gully pot.

Close modal

Traditional models have been used to estimate the deposition of solids in gully pots, but these methods have limitations. It has been demonstrated that explainable hybrid models can lessen the effects of these limitations.

This study offers a promising approach to overcome the limitations of traditional models in simulating complex systems such as SB, wash-off, and retention processes in gully pots. By integrating traditional and data-driven models, hybrids are produced to handle complex and non-linear variables, improve the scope and applicability of existing models, increase their flexibility, and reduce bias from non-representative data, missing data, and outliers. However the resource-intensive computation requirements and lack of explainability of hybrid models can lead to misapplication and flawed decision-making. There is a need for resource-efficient and explainable hybrid models that allow stakeholders to understand how a model works and why it takes certain decisions. SHAP values, DT, and other explainable Artificial Intelligence (XAI) tools can enhance the interpretability and explainability of ‘black box’ models, enabling stakeholders to make informed decisions based on reliable insights. By adopting these XAI tools, we can mitigate the risks associated with hybrid models and ensure that they are transparent, ethical, and beneficial. As explainable hybrids evolve, they will become an increasingly valuable tool for addressing complex modelling challenges in solids deposition on road surfaces and in urban stormwater management.

Future works will utilise explainable hybrid architecture to improve the predictive accuracy of solids deposition using gully inspection data from multiple local authorities.

C.F.E. contributed to conceptualisation, methodology, and writing. A.C. contributed to writing, review, and supervision. H.B. contributed to review and supervision. E.E. and C.S. reviewed the article.

There was no external funding for this research.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Abdi
H.
&
Williams
L. J.
2010
Principal component analysis
.
WIRES Computational Statistics
2
(
4
),
433
459
.
Agarap
A. F.
2018
Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375
.
Alley
W.
&
Smith
P.
1981
Estimation of accumulation parameters for urban runoff quality modeling
.
Water Resources Research
17
(
6
),
1657
1664
.
Almutairi
M.
,
Stahl
F.
&
Bramer
M.
2021
Reg-rules: An explainable rule-based ensemble learner for classification
.
IEEE Access
9
,
52015
52035
.
Bach
P. M.
,
McCarthy
D. T.
&
Deletic
A.
2010
Redefining the stormwater first flush phenomenon
.
Water Research
44
(
8
),
2487
2498
.
Bach
S.
,
Binder
A.
,
Montavon
G.
,
Klauschen
F.
,
Müller
K. R.
&
Samek
W.
2015
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation
.
PloS one
10
(
7
),
0130140
.
Behrouz
M.
,
Yazdi
M.
&
Sample
D.
2022
Using random forest, a machine learning approach to predict nitrogen, phosphorus, and sediment event mean concentrations in urban runoff
.
Environmental Management
317
,
115412
.
Bera
S.
2020
Feature Selection using Genetic Algorithm. Available from: https://medium.com/analytics-vidhya/feature-selection-using-genetic-algorithm-20078be41d16 (Accessed 10 April 2023)
.
Bertrand-Krajewski
J.
,
Briat
P.
&
Scrivener
O.
1993
Sewer sediment production and transport modelling: A literature review
.
Hydraulic Research
31
(
4
),
435
460
.
Bozinovski
S.
&
Fulgosi
A.
1976
The influence of pattern similarity and transfer learning upon training of a base perceptron b2
.
In Proceedings of Symposium Informatica
3
,
121
126
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
British Standards Institution
2021
 BS 5911-6:2021 Concrete Pipes and Ancillary Concrete Products. Road Gullies and Gully Cover Slabs
.
British Standards Institution
,
London
.
Specification
.
Brownlee
J.
2019
What is the Difference Between A Parameter and A Hyperparameter?
.
Brownlee
J.
2021
How to Choose an Activation Function for Deep Learning
. .
Burden
F.
&
Winkler
D.
2008
Bayesian regularization of neural networks
. In:
Artificial Neural Networks: Methods and Applications
. Humana Press, Sandown, pp.
23
42
.
Butler
D.
&
Karunaratne
S.
1995
The suspended solids trap efficiency of the roadside gully pot
.
Water Research
29
(
2
),
719
729
.
Butler
D.
,
Digman
C.
,
Makropoulos
C.
&
Davies
J.
2018
Urban Drainage
, 4th edn.
Taylor & Francis, CRC Press, Boca Raton
.
Caraka
R.
,
Chen
R.
,
Toharudin
T.
,
Pardamean
B.
,
Yasin
H.
&
Wu
S.
2019
Prediction of status particulate matter 2.5 using state Markov chain stochastic process and HYBRID VAR-NN-PSO
.
IEEE Access
7
,
161654
161665
.
Casella
G.
&
George
E. I.
1992
Explaining the Gibbs sampler
.
The American Statistician
46
(
3
),
167
174
.
Chang
C.
,
Lin
P.
&
Tsai
C.
2011
Estimation of sediment volume of debris flow caused by extreme rainfall in Taiwan
.
Engineering Geology
123
(
1–2
),
83
90
.
Cherrington
M.
,
Airehrour
D.
,
Lu
J.
,
Xu
Q.
,
Wade
S.
&
Madanian
S.
2019
Feature selection methods for linked data: Limitations, capabilities and potentials
. In:
Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
,
2–5 December
,
Auckland, New Zealand
.
Chib
S.
&
Greenberg
E.
1995
Understanding the metropolis-hastings algorithm
.
The American Statistician
49
(
4
),
327
335
.
Collinson
J. D.
,
Mountney
N. P.
&
Thompson
D. B.
2006
Sedimentary Structures
, 3rd edn.
Terra Publishing, Harpenden, Hertfordshire
.
Cramer
J. S.
2003
The Origins of Logistic Regression
.
Available from: http://www.ssrn.com/abstract=360300 (Accessed 9 July 2023)
.
Deletic
A.
,
Maksimovic
E.
&
Ivetic
M.
1997
Modelling of storm wash-off of suspended solids from impervious surfaces
.
Hydraulic Research
35
(
1
),
99
118
.
Department for Transport
2020
CD 526: Design Manual for Roads and Bridges, Version 3: Spacing of Road Gullies
.
DfT
,
London
.
Dragomiretskiy
K.
&
Zosso
D.
2013
Variational mode decomposition
.
IEEE Transactions on Signal Processing
62
(
3
),
531
544
.
Egodawatta
P.
,
Ziyath
A.
&
Goonetilleke
A.
2013
Characterising metal build-up on urban road surfaces
.
Environmental Pollution
176
(
2013
),
87
91
.
Entwistle
M.
2021
A new Approach to Risk Profiling Gullies
. .
Environmental Systems Research Institute
n.d
InfoSWMM
. .
Fenner
R.
2000
Approaches to sewer maintenance: A review
.
Journal of Urban Water
2
,
343
346
.
Feurer
M.
&
Hutter
F.
2019
Hyperparameter
Optimization
. In:
Automated Machine Learning: Methods, Systems, Challenges
(
Hutter
F.
, Kotthoff, L. & Vanschoren, J., eds.). Springer Nature, Cham, Switzerland, pp.
3
33
.
Forty
E.
1998
Performance of Gully Pots for Road Drainage, Report SR 508
.
HR Wallingford
,
Oxford, United Kingdom
.
Freund
Y.
&
Schapire
R.
1995
A decision-theoretic generalization of on-line learning and an application to boosting
. In
Computational Learning Theory: Second European Conference, EuroCOLT'95 Barcelona
,
13–15 March
,
Spain
.
Friedman
J.
1991
Multivariate adaptive regression splines
.
The Annals of Statistics
19
(
1
),
1
67
.
Friedman
J.
2001
Greedy function approximation: A gradient boosting machine
.
The Annals of Statistics
29 (5),
1189
1232
.
Gelman
A.
&
Hill
J.
2006
Data Analysis Using Regression and Multilevel/Hierarchical Models
.
Cambridge University Press, New York
.
Geng
A.
,
Tu
Q.
,
Chen
J.
,
Wang
W.
&
Yang
H.
2022
Improving litterfall production prediction in China under variable environmental conditions using machine learning algorithms
.
Environmental Management
306
,
114515
.
Gong
J.
&
Chen
T.
2022
Does configuration encoding matter in learning software performance? An empirical study on encoding schemes
. In:
Proceedings of the 19th International Conference on Mining Software Repositories
,
23–24 May
,
Pittsburgh, USA
.
Gross
K.
2020
Tree-Based Models: How They Work (In Plain English!) Available from: https://blog.dataiku.com/tree-based-models-how-they-work-in-plain-english (Accessed 11 April 2023)
.
Grottker
M.
1990
Pollutant removal by gully pots in different catchment areas
.
Science of the Total Environment
93
,
515
522
.
Grottker
M.
&
Hurlebush
R.
1987
Mitigation of storm water pollution by gully pots
. In:
Proceedings of the Fourth International Conference on Urban Storm Drainage
,
31 August – 4 September
,
Lausanne, Switzerland
.
Guyon
I.
,
Weston
J.
,
Barnhill
S.
&
Vapnik
V.
2002
Gene selection for cancer classification using support vector machines
.
Machine Learning
46
,
389
422
.
Hsu
C. Y.
&
Li
W.
2023
Explainable GeoAI: Can saliency maps help interpret artificial intelligence's learning process?
An Empirical Study on Natural Feature Detection. Geographical Information Science
37
(
5
),
963
987
.
Kennedy
J.
&
Eberhart
R.
1995
Particle swarm optimization
. In:
Proceedings of ICNN'95-International Conference on Neural Networks
,
27 November–1 December
,
Perth, WA, Australia
.
Kim
J.
,
Yu
J.
,
Kang
C.
,
Ryang
G.
,
Wei
Y.
&
Wang
X.
2022
A novel hybrid water quality forecast model based on real-time data decomposition and error correction
.
Process Safety and Environmental Protection
162
,
553
565
.
Kunin
D.
,
Bloom
J.
,
Goeva
A.
&
Seed
C.
2019
Loss Landscapes of Regularized Linear Autoencoders
.
Available from: https://arxiv.org/pdf/1901.08168.pdf (Accessed 8 April 2023)
.
Litwin
Y.
&
Donigian
A.
Jr.
1978
Continuous simulation of nonpoint pollution
.
Water Pollution Control Federation
50
(
10
),
2348
2361
.
Loh
W. Y.
2008
Classification and regression tree methods
.
Encyclopedia of Statistics in Quality and Reliability
1
,
315
323
.
Lundberg, S. & Lee, S. 2017 A unified approach to interpreting model predictions. In Proceedings of the 31st Annual Conference on Advances in Neural Information Processing Systems, California, United States of America, 4–9 December, pp. 1–10.
Maharana
K.
,
Mondal
S.
&
Nemade
B.
2022
A review: Data pre-processing and data augmentation techniques
.
Global Transitions Proceedings
3
(
1
),
91
99
.
Manish Agrawal
A.
,
Tendle
A.
,
Sikka
H.
&
Singh
S.
2021
WeightScale: Interpreting Weight Change in Neural Networks
.
Available from: https://arxiv.org/abs/2107.07005 (Accessed 8 July 2023)
.
Marchi
L.
&
D'Agostino
V.
2004
Estimation of debris-flow magnitude in the Eastern Italian Alps
.
Earth Surface Processes and Landforms
29
(
2
),
207
220
.
Martin
O. A.
,
Kumar
R.
&
Lao
J.
2021
Bayesian Modeling and Computation in Python
.
Chapman & Hall/CRC Press
,
Boca Ratón
.
Mazzanti
S.
2021
‘MRMR’ Explained Exactly How You Wished Someone Explained to You
. .
Morgan
J.
&
Sonquist
J.
1963
Problems in the analysis of survey data, and a proposal
.
The American Statistical Association
58
(
302
),
415
434
.
Muthusamy
M.
,
Tait
S.
,
Schellart
A.
,
Beg
M.
,
Carvalho
R.
&
de Lima
J.
2018
Improving understanding of the underlying physical process of sediment wash-off from urban road surfaces
.
Hydrology
557
,
426
433
. .
Ng
A. Y.
2004
Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance
.
Available from: https://dl.acm.org/doi/abs/10.1145/1015330.1015435?casa_token = PD-1fMI8I3cAAAAA:ER-_rnNwkx0tzZcF1vpJGAq9LTCa0pHdJVybET20F3-1DnKx_szxcINrD9pG9cN2PydGYo2w3ory (Accessed 9 July 2023)
.
Nix
S.
2022
Leaf Abscission and Senescence
. .
Obropta
C.
&
Kardos
J.
2007
Review of urban stormwater quality models: Deterministic, stochastic, and hybrid approaches
.
The American Water Resources Association. JAWRA
43
(
6
),
1508
1523
.
(Accessed 2 March 2023)
.
Ognjanovski
G.
2019
Everything you Need to Know About Neural Networks and Backpropagation
. .
Ozbulak
U.
,
De Neve
W.
&
Van Messem
A.
2018
How the Softmax Output is Misleading for Evaluating the Strength of Adversarial Examples
.
Available from: https://arxiv.org/abs/1811.08577 (Accessed 8 July 2023)
.
Papananias
M.
,
Fletcher
S.
,
Longstaff
A. P.
,
Mengot
A.
,
Jonas
K.
&
Forbes
A. B.
2017
Modelling uncertainty associated with comparative coordinate measurement through analysis of variance techniques
. In
Proceedings 17th International Conference European Society for Precision Engineering and Nanotechnology
,
29 May – 2 June
,
Hannover, Germany
.
Peng
X.
,
Li
C.
,
Jia
S.
,
Zhou
L.
,
Wang
B.
&
Che
J.
2022
A short-term wind power prediction method based on deep learning and multistage ensemble algorithm
.
Wind Energy
25
(
9
),
1610
1625
.
Penn State University
n.d
Generalized Linear Mixed Models
. .
Pham
H.
,
Guan
M.
,
Zoph
B.
,
Le
Q.
&
Dean
J.
2018
Efficient Neural Architecture Search via Parameters Sharing
.
Available from: http://proceedings.mlr.press/v80/pham18a/pham18a.pdf (Accessed 9 July 2023)
.
Pisa
I.
,
Morell
A.
,
Vicario
J. L.
&
Vilanova
R.
2023
Transfer Learning in wastewater treatment plants control: Measuring the transfer suitability
.
Process Control
124
,
36
53
.
Post, J., Pothof, I., Langeveld, J. & Clemens, F. 2015 Modelling progressive sediment accumulation in gully pots: A Bayesian approach. In: Proceedings of the 10th International Conference on Urban drainage modelling, Quebec, Canada, 20–23 September, pp. 59–61.
Post
J.
,
Pothof
I.
,
Dirksen
J.
,
Baars
E.
,
Langeveld
J.
&
Clemens
F.
2016
Monitoring and statistical modelling of sedimentation in gully pots
.
Water Research
88
,
245
256
.
Refaeilzadeh
P.
,
Tang
L.
&
Liu
H.
2009
Cross-validation
.
Encyclopedia of Database Systems
5
,
532
538
.
Ribeiro
M. T.
,
Singh
S.
&
Guestrin
C.
2016
‘Why should i trust you?’ Explaining the predictions of any classifier
. In:
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
,
14–18 August
,
Virtual
.
Rietveld
M.
,
Clemens
F.
&
Langeveld
J.
2020a
Solids dynamics in gully pots
.
Urban Water Journal
17
(
7
),
669
680
.
Rietveld
M.
,
Clemens
F.
&
Langeveld
J.
2020b
Monitoring and statistical modelling of the solids accumulation rate in gully pots
.
 Urban Water
17
(
6
),
549
559
.
Salvatier
J.
,
Wiecki
T. V.
&
Fonnesbeck
C.
2016
Probabilistic programming in Python using PyMC3
.
PeerJ Computer Science
2
,
e55
.
Sartor
J.
,
Boyd
G.
&
Agardy
F.
1974
Water pollution aspects of street surface contaminants
.
Water Pollution Control Federation
46
(
3
),
458
467
.
Schuster
M.
&
Paliwal
K. K.
1997
Bidirectional recurrent neural networks
.
IEEE Transactions on Signal Processing
45
(
11
),
2673
2681
.
Scikit-learn Developers
2023
Sklearn Preprocessing Label Encoder
. .
Selvaraju
R. R.
,
Cogswell
M.
,
Das
A.
,
Vedantam
R.
,
Parikh
D.
&
Batra
D.
2016
Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization
. .
Servat
E.
1984
Contribution à L'étude des Matières en Suspension du Ruissellement Pluvial à L'échelle D'un Petit Bassin Versant Urbain (Contribution to the Study of Suspended Matter in Stormwater Runoff at the Scale of A Small Urban Watershed)
.
PhD Thesis
,
Université des Sciences et Techniques du Languedoc
.
Shannon
C. E.
1948
A mathematical theory of communication
.
The Bell System Technical Journal
27
(
3
),
379
423
.
SHAP
2018
Welcome to the SHAP Documentation
.
Available from: https://shap-lrjball.readthedocs.io/en/latest/index.html (Accessed 8 April 2023)
.
Shi
J.
,
Zhu
Y.
,
Khan
F.
&
Chen
G.
2019
Application of Bayesian Regularization Artificial Neural Network in explosion risk analysis of fixed offshore platform
.
Loss Prevention in the Process Industries
57
,
131
141
.
Shurman
J.
2016
Approximation by Smooth Functions
. In:
Calculus and Analysis in Euclidean Space
(Shurman, J. ed.). Springer, New York, pp.
347
373
.
South Gloucestershire
2015
Highways Asset Management Framework 2015–2020. Available from: https://www.southglos.gov.uk/documents/Highways-Asset-Management-Framework2015-2020.pdf (Accessed 20 September 2022)
.
South Gloucestershire
2022
Drainage Data FOI Ref FIDP/017 (Accessed 25 May 2022)
.
Spiegelhalter
D.
,
Best
N.
,
Carlin
B.
&
Van Der Linde
A.
2002
Bayesian measures of model complexity and fit
.
The Royal Statistical Society: Series B (Statistical Methodology)
64
(
4
),
583
639
.
Suárez
J.
,
Jiménez
V.
,
del Río
H.
,
Anta
J.
,
Jácome
A.
,
Torres
D.
,
Ures
P.
&
Vieito
S.
2013
Design of a sand filter for highway runoff in the north of Spain
.
Municipal Engineer
166
(
2
),
121
129
.
Subel
A.
,
Guan
Y.
,
Chattopadhyay
A.
&
Hassanzadeh
P.
2023
Explaining the physics of transfer learning in data-driven turbulence modeling
.
PNAS Nexus
2
(
3
),
pgad 015
.
Subías-Beltrán
P.
,
Pujol
O.
&
Lecuona Ramírez
I. D.
2022
The forgotten human autonomy in Machine Learning
. In
CEUR Workshop Proceedings
,
13 June
,
Barcelona, Spain
, pp.
3221
.
United States Environmental Protection Agency
2023
Storm Water Management Model (SWMM)
. .
University of California, Los Angeles
2023
Introduction to Generalized Linear Mixed Models
. .
Uzair
M.
&
Jamil
N.
2020
Effects of hidden layers on the efficiency of neural networks
. In:
23rd International Multi-Topic Conference (INMIC)
,
5–7 November
,
Bahawalpur, Pakistani
.
Warton
D.
,
Blanchet
F. G.
,
O'Hara
R. B.
,
Ovaskainen
O.
,
Taskinen
S.
,
Walker
S. C.
&
Hui
F. K.
2015
So many variables: Joint modeling in community ecology
.
Trends in Ecology & Evolution
30
(
12
),
766
779
.
Xiao
Y.
,
Luan
B.
,
Zhang
T.
,
Liang
D.
&
Zhang
C.
2022
Experimental study of sediment wash-off process over urban road and its dependence on particle size distribution
.
Water Science & Technology
86
(
10
),
2732
2748
.
Yogatama
D.
&
Mann
G.
2014
Efficient transfer learning method for automatic hyperparameter tuning. [online]
. In
Proceedings of the International Conference on Artificial Intelligence and Statistics
,
22–25 April
,
Reykjavik, Iceland
.
Yuan
L.
,
Sinshaw
T.
&
Forshay
K. J.
2020
Review of watershed-scale water quality and nonpoint source pollution models
.
Geosciences
10
(
25
),
1
36
.
Zhao
H.
,
Jiang
Q.
,
Ma
Y.
,
Xie
W.
,
Li
X.
&
Yin
C.
2018
Influence of urban surface roughness on build-up and wash-off dynamics of road-deposited sediment
.
Environmental Pollution
243
,
1226
1234
.
Zhao
Y.
,
Meng
X.
,
Qi
T.
,
Li
Y.
,
Chen
G.
,
Yue
D.
&
Qing
F.
2022
AI-based rainfall prediction model for debris flows
.
Engineering Geology
296
,
106456
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).