ABSTRACT
Current approaches to align sanitary surveys to environmental fecal waste indicators have demonstrated limited success, leaving stakeholders focused on fecal waste management with limited and costly tools. We analyzed data from the Maputo Sanitation Project using exploratory factor analysis (EFA) and structural equation modeling (SEM) to enhance the survey's correlation with Escherichia coli concentration data. EFA grouped related survey questions and sampling locations into distinct latent factors. SEM was then used to assess the relationship between the survey latent factors and the mean E. coli concentrations from grouped locations. The results suggested four survey question subgroups: latrine structure, latrine cleanliness, compound/household waste management, and community waste management. In addition, three sampling location subgroups were identified: high-traffic areas (HTAs), food activity locations, and bodily cleaning areas. The largest significant effect size identified suggested that for every 1-unit increase in community waste management, there was a decrease of 1.94 in log10 E. coli per gram of soil in HTAs (p = 0.03), a substantial improvement from the initial 0.50 decrease reported for an expert-weighted metric of all survey questions. These results underscore the importance of community-level waste management and demonstrate the use of a data-driven approach in enhancing environmental health assessments and planning interventions.
HIGHLIGHTS
Enhanced sanitary surveys using exploratory factor analysis to correlate survey data with environmental E. coli concentrations.
Identified significant latent factors impacting E. coli concentrations in high-traffic areas.
Demonstrated that improved community waste management practices are significantly associated with E. coli concentrations in high-traffic areas.
INTRODUCTION
Sanitary surveys are an important tool for local environmental health specialists in monitoring various aspects of physical systems that are used to control contaminants in the environment and reduce human exposure (Gallego-Ayala et al. 2018). Monitoring typically focuses on the pathways of exposure or the fecal-oral transmission pathways primarily captured in the ‘5-F diagram’ of floors, foods, fluids, fingers, and flies (Wagner & Lanoix 1958). These tools are often used at a community or regional scale and guide decisions on (1) adjustments to current water or sanitation systems or services or (2) future planning for water and sanitation services (Jenkins et al. 2014). In addition, sanitary surveys and general water, sanitation, and hygiene (WASH) surveys embedded in national surveys (e.g., demographic and health surveys) have been used to help evaluate sanitary conditions at a household- or community-level as environmental testing can be expensive (Voth-Gaeddert et al. 2018a,b). In addition, concepts of the source-pathway-receptor model of environmental health can help guide decisions around control interventions (Mbae et al. 2024). Several attempts have been made to validate observational metrics with environmental testing (often using fecal indicator bacteria (FIB) as a proxy for overall contamination) (Jenkins et al. 2014; Campos et al. 2015; Ercumen et al. 2017; Kelly et al. 2021), however, with limited success. This can be for several reasons, not limited to observations of infrastructure not capturing true microbial contamination or the dynamism of environmental contamination and the limited proxies used to measure it. Recently, Capone et al. (2019) conducted a robust evaluation of a sanitary survey adapted from the World Bank's Urban Sanitation Status Index and corresponding soil and surface swab samples testing for Escherichia coli among compounds in Maputo, Mozambique (a set of 3–5 households) (Capone et al. 2019). A localized sanitation survey index (LSSI) generated based on expert weights and responses from a sanitary survey was correlated with E. coli concentrations in soil. However, a 10% increase in the LSSI only corresponded to a 0.05 log10 reduction in E. coli/gram of dry soil. The authors state, ‘Overall, the LSSI may be associated with fecal contamination in compound soil; however, the differences detected may not be meaningful in terms of public health hazards.’ While disappointing, a shift to an exploratory, data-driven approach could help increase this reduction and optimize the survey to more closely align with environmental testing results.
To improve alignment or optimize the sanitary survey to environmental testing data, it is important to understand (1) which survey questions are associated with which sampling locations and (2) which groups of questions and sampling locations are related to each other. Both the survey questions as well as the environmental sampling approaches are often proxies for underlying ‘latent’ factors that cannot be directly measured. For surveys, questions are often a proxy for the infrastructure functionality or use and adherence to WASH recommendations while, for environmental testing, E. coli (or qPCR targets) is often used as a general proxy for microbial pathogen contamination or an indicator of fecal waste contamination. To better understand which sets of questions and which sets of sampling locations represent the same underlying latent variables, exploratory factor analysis (EFA) can be used (Fabrigar & Wegener 2011). EFA provides a method to identify sets of variables that may all be affected by the same underlying latent factor. In addition, structural equation modeling (SEM) can then be used to evaluate the magnitude and direction of the relationship between these underlying latent variables and observable variables (Kline 2011; Voth-Gaeddert et al. 2018c).
In this study, a secondary data analysis was conducted on data collected in the Maputo Sanitation Project to build on previous work from Capone et al. (2019). First, EFA was applied to explore the clustering of similar sanitary survey questions and similar environmental sampling locations in compounds. Second, the identified clusters of survey questions (and their estimated latent variable) were compared to mean levels of E. coli across identified clusters of sampling locations. The aim was to identify which subgroups of survey questions provided the largest effect size on concentrations of E. coli.
METHODS
Location
The Maputo Sanitation Project was conducted in urban, low-income communities of Maputo, Mozambique. Maputo has a complex sanitary infrastructure and service system as well as a diverse socio-economic landscape. It is a city characterized by a mix of formal and informal settlements, with variations in sanitation facilities ranging from modern sewage systems to basic latrines (Capone et al. 2019). The city's coastal location, coupled with its dense population, presents unique challenges for sanitation management, making it an ideal location for studying the impact of sanitation practices on E. coli contamination. The study area consisted of compounds, which were groups of 3–5 homes or nuclear families.
Sanitary survey and data collection
The secondary data were retrieved from the Center for Open Science's OSF archive (https://osf.io/d967q/) and Capone et al. (2019) provide an extensive overview of the survey development and sample collection. Briefly, the sanitary survey was a questionnaire based on the World Bank's Urban Sanitation Status Index (Gallego-Ayala et al. 2018). It covered various aspects of sanitation, including the type of facility, its shared use, structural features, and waste management practices (see Capone et al. (2019) for further details). Trained field staff conducted the surveys through interviews and direct observations within the study area. The survey was a nested cross-sectional sub-study of a larger before-after sanitary control trial. N = 80 compounds were surveyed and were selected based on an evenly spaced geographic distribution. Survey data collected were ordinal variables from 0 to 1 with intermediate values utilized where necessary, divided evenly based on a number of ordinal levels. Capone et al. (2019) also created a weighted LSSI from the 20 survey questions and a panel of experts providing weights.
Environmental sampling included soil and surface swabs collected from pre-defined locations or compartments within each compound, such as entrances to compounds, central yard areas, areas for food preparation, and latrine entrances. These locations were identified by an adult member of the compound. Sampling and testing techniques followed previously established protocols for FIB (Campos et al. 2015). The final dataset generated by Capone et al. (2019) with complete data from both survey data and environmental sampling data included n = 75 compounds.
Data analysis
Both the survey data and the environmental testing results (i.e., E. coli data) were evaluated using EFA. EFA is a statistical method used for identifying underlying latent variables or factors that explain the observed correlations among a subset of measured variables (Fabrigar & Wegener 2011). EFA is similar yet distinct from principal component analysis (PCA) in that EFA aims to identify underlying latent factors explaining observed correlations by modeling shared variance among variables, while PCA focuses on maximizing total variance by transforming variables into new uncorrelated principal components.
First, EFA was utilized to identify latent variables from (1) the sanitary survey data and (2) E. coli concentrations from compound compartments, grouping related sampling compartments and survey questions into distinct latent factors. The resulting rotated component matrix and individual factor loadings were evaluated and used to identify which subgroup the (1) survey question or (2) sampling compartment should belong to. Based on previous EFA studies and prior hypotheses, three- and four-group models were evaluated for the survey question EFA while two- and three-group models were evaluated for the sampling compartment EFA. Final models were selected based on eigenvalues, model fit indices, and previous literature. After selection, several variables in the survey question EFA did not have one clear high factor loading, and, in this situation, previous literature and expert judgement were used to place the variable into the most relevant group. Specifically, placement was based on the functional characteristics of the survey question and its alignment with identified EFA groups. As the goal of EFA is to inform downstream analysis of the latent factor groupings, expert judgement can aid this initial step when the signal-to-noise ratio is too low (Fabrigar et al. 1999; Costello & Osborne 2005; Brown 2015).
Second, SEM was used to evaluate the magnitude and direction of the associations between each of the subgroups of factors identified among the sanitary survey variables and E. coli concentrations from the subgroups of compartments. SEM is a multivariable statistical analysis technique that allows for the estimation of hypothesized causal relationships among observed and latent variables using a combination of linear regression and factor analysis (Kline 2011; Voth-Gaeddert & Oerther 2014). The arithmetic mean values of E. coli concentrations were estimated for each of the subgroups identified via EFA (given the evenly distributed geospatial sampling approach, no weighing beyond the original LSSI weighting was necessary). Each of these subgroups was individually compared to the latent variable estimated for each of the subgroups for the survey questions identified in the EFA. SEM estimates p-values and effect sizes which were used to identify the subgroup of survey questions that had the largest effect on the mean concentration of E. coli from a certain subgroup of compartments. In R, the fa() function in the psych package (Revelle 2023) was used for EFA (nfactors = *, method = oblique, fm = ml) while the sem() function in the LAVAAN package (Rosseel et al. 2023) was used for SEM (estimator = ‘WLSMV’). All SEM models controlled for the presence of ducks, chickens, and sunlight as well as the socio-economic status via a compound wealth score. Further details are provided in the Supplementary material and in Capone et al. (2019).
RESULTS
EFA of E. coli concentrations across sampling compartments
Compartment . | Mean E. coli concentrations . |
---|---|
HTAs | 2.71 (SD: 0.69) |
FALs | 2.98 (SD: 0.90) |
BCALs | 3.11 (SD: 0.85) |
Compartment . | Mean E. coli concentrations . |
---|---|
HTAs | 2.71 (SD: 0.69) |
FALs | 2.98 (SD: 0.90) |
BCALs | 3.11 (SD: 0.85) |
SD, standard deviation.
EFA of sanitary survey questions
Associations between sanitary survey question clusters and E. coli concentrations
DISCUSSION AND CONCLUSION
The efficacy of a data-driven, exploratory approach in sanitary survey optimization
Our study demonstrates how a data-driven, exploratory approach can significantly optimize tools like sanitary surveys. By systematically analyzing E. coli concentrations across different compartments and correlating these with various sanitation practices and infrastructure conditions, we identified key factors potentially contributing to fecal contamination. This approach could allow for more targeted interventions, given validation of initial findings, focusing on the most impactful areas and practices. While research studies often use an expert or hypothesis-driven approach, reframing the problem as a ‘tool to be optimized’ allows for a potentially more efficient approach to developing an effective and data-driven sanitary survey. Such an approach can be important to refine sanitary surveys, ensuring they are more focused, efficient, and contextually relevant. This strategy of optimization aligns with the evolving needs of urban sanitation management, where varied and complex factors interplay to influence public health outcomes.
The role of community-level sanitation in household-level contamination
Our findings underscore the significance of the broader neighborhood or community context in influencing household-level contamination (Momberg et al. 2020). The association between community waste management practices and E. coli concentrations in HTAs highlights the interconnectedness of individual and community sanitation practices. This observation aligns with research indicating the importance of understanding community-level sanitation in interpreting individual health outcomes (Wolf et al. 2019) and its impact on child health (Fuller et al. 2016). Wolf et al. (2019) demonstrate why a sanitation or WASH intervention at an individual level may not have an effect on individual health outcomes due to community-level fecal waste contamination. In addition, Fuller et al. (2016) provide an excellent demonstration of how focusing on community-level fecal waste management issues can provide herd protection benefiting individuals. These studies emphasize that improvements in community sanitation infrastructure can have far-reaching effects on reducing household-level contamination risks. However, scrutiny must be given to identifying the critical components of community sanitation (e.g., waste piles, drainage) and identifying sustainable service models to alleviate these issues (Armitage & Rooseboom 2000). These studies and our results reemphasize the necessary shift back to the role of the sanitation service provider and other community stakeholders in improving fecal waste management practices, ultimately improving public health outcomes.
Key aspects of fecal waste management in low-resource communities
In the context of low-resource communities, the management of fecal waste presents unique challenges and opportunities. In addition to the need for a focus on community-level fecal waste, there may be options for households to reduce their individual exposure profile. Interventions such as improved flooring can significantly reduce soil-based pathogen transmission and exposure (EarthEnable 2023). Furthermore, the implementation of protective barriers or habits, like grass cover, shoe removal before home entry, foot washing or shoe cleaning before home entry, or proper drainage systems, can also mitigate transmission and contamination risks. These measures, coupled with enhanced community waste management practices, can potentially decrease transmission and incidence of fecal pathogen-related diseases. These interventions are particularly crucial in densely populated urban areas, where the risk of contamination is heightened due to the close proximity of living spaces and the often-inadequate sanitation infrastructure or services. Effective fecal waste management in these settings requires a holistic approach, incorporating both household-level improvements and broader community-based strategies.
Limitations of exploratory analysis and the need for confirmatory work
While the data-driven approach adopted in this study provides valuable insights, it is not without limitations. One key limitation is the inherent uncertainty in drawing definitive conclusions from exploratory analyses. These analyses are designed to generate hypotheses and identify potential associations rather than to confirm causal relationships. Therefore, there is a need for subsequent confirmatory studies to validate the findings and establish stronger causal links (Voth-Gaeddert et al. 2019). Such follow-up research could employ larger sample sizes and more granular environmental sampling to provide more definitive evidence. Furthermore, additional studies in varied geographical and cultural contexts would be beneficial to ascertain the generalizability of an optimized sanitary survey. Despite these limitations, our exploratory study provides a framework for optimizing sanitary surveys to aid low-resource communities and stakeholders in improving fecal waste management.
CONCLUSIONS
This study aimed to address the limitations of sanitary surveys in urban, low-resource settings by employing a data-driven approach to identifying key related components of surveys and environmental fecal contamination. Using EFA and SEM, the research identified key latent factors within survey data and their relationship to E. coli concentrations in specific sampling compartments. The exploratory results highlighted the potentially significant impact of community-level waste management practices on reducing fecal contamination in HTAs among compounds. This approach underscores the importance of targeted interventions and provides a framework for improving environmental health assessments and planning effective sanitation interventions in low-resource urban settings.
ACKNOWLEDGEMENTS
The author is grateful to the households, field teams, practitioners, and researchers involved in the original MapSan project.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.