Water treatment public–private partnership (PPP) projects are pivotal for sustainable water management but are often challenged by complex risk factors. Efficient risk management in these projects is crucial, yet traditional methodologies often fall short of addressing the dynamic and intricate nature of these risks. Addressing this gap, this comprehensive study introduces an advanced risk classification prediction model tailored for water treatment PPP projects, aimed at enhancing risk management capabilities. The proposed model encompasses an intricate evaluation of crucial risk areas: the natural and ecological environments, socio-economic factors, and engineering entities. It delves into the complex relationships between these risk elements and the overall risk profile of projects. Grounded in a sophisticated ensemble learning framework employing stacking, our model is further refined through a weighted voting mechanism, significantly elevating its predictive accuracy. Rigorous validation using data from the Jiujiang City water environment system project Phase I confirms the model's superiority over standard machine learning models. The development of this model marks a significant stride in risk classification for water treatment PPP projects, offering a powerful tool for enhancing risk management practices. Beyond accurately predicting project risks, this model also aids in developing effective government risk management strategies.

  • Pioneers data-driven risk management in water treatment PPP projects using machine learning.

  • Introduces an effective weighted voting mechanism for handling data irregularities in risk assessment.

  • Demonstrates superior performance of WETPR-SVM model over conventional machine learning models.

The integrity of surface and groundwater bodies is governed by an intricate interplay of natural processes and anthropogenic activities. Accelerating demographic expansion, industrialization, and agricultural practices, in conjunction with potential alterations in the hydrological cycle attributed to climate change, have collectively exacerbated global water quality degradation (Li et al. 2021; Su et al. 2022). The imperative of water environment treatment emerges not only as a means to safeguard water resources but also as a critical element in maintaining ecological balance and fostering sustainable economic development. In addressing these challenges, water environment treatment initiatives, frequently operationalized through public–private partnerships (PPP), have risen to prominence. These partnerships offer distinct advantages, including mitigation of government fiscal burdens and enhancement of management efficacy, thus attracting increasing attention in the field of environmental governance.

In recent years, China has seen a consistent increase in investments in water environment treatment PPP projects. However, from a practical standpoint, many of these projects have failed to achieve their objectives due to inadequate risk management (Wang et al. 2021; Su et al. 2022). On the one hand, numerous risk factors are involved in the construction and maintenance of these projects, including local climate, hydrology, geology, flora and fauna, and the socio-economic environment. These factors present high levels of uncertainty and danger, creating significant challenges for project risk management (Lu et al. 2017). On the other hand, the primary motivation for social capital to participate in ecological protection and restoration is profit-driven. This is to say, with government policies' support and considering the market prospects and economic benefits of ecological restoration, investments are made to generate profits. Compared to social capital, governments place greater emphasis on the subsequent environmental management effects of projects to maximize social and public benefits, while private investors tend to pursue short-term profit maximization without considering long-term operations (Wang et al. 2019; Tang et al. 2021). The inherent conflict of interest in public–private partnerships (PPP) often precipitates opportunistic tendencies among private investors, manifesting as the provision of substandard products or services. This issue is particularly acute in the context of water environment treatment, where such behaviors can lead to inadequate remediation of polluted waterways. These opportunistic practices are more likely to emerge in scenarios where government oversight in project risk management is deficient, thereby posing significant threats to public safety and societal sustainability. The crux of these challenges lies in the public welfare orientation and intrinsic public attributes of water environment treatment PPP projects. Regrettably, these critical aspects are frequently overlooked by decision-makers during the risk management process, with the government playing a pivotal role in both creating and engaging with these attributes. Current literature has scarcely delved into the government's role in managing water environment treatment PPP projects, particularly in relation to the intricate interplay among risk indicators (Liu & Xue 2018; Xue & Wang 2020). Given this context, there is an imperative need to develop comprehensive risk assessment and prediction models specifically for water environment treatment PPP projects. These models should duly consider the public welfare and public attributes and scrutinize risk factors from the vantage point of government management. Such an approach would provide a more robust framework for effective risk control and management in these critical projects.

Accurate risk forecasting is indeed a crucial factor in determining the success or failure of PPP project risk management. General methods for risk prediction in PPP projects can be categorized as qualitative, such as literature reviews, case studies, and questionnaires (Li et al. 2022a); semi-quantitative, such as the analytical hierarchy process, game theory, and fuzzy comprehensive evaluation (Zhang et al. 2021); and quantitative, such as artificial neural networks and value for money analyses (Wang et al. 2021). Expert judgment data are one of the most frequently employed data sources in PPP risk management. However, the subjectivity and ambiguity inherent in this data source pose significant challenges to the accuracy of risk analysis and assessment, and there is no unified standard for judgment (Owolabi et al. 2020). The complexity and specificity of PPP projects for water environment treatment generate numerous and diverse data sources, requiring extensive monitoring, sampling, and analysis. This situation, characterized by multi-source and heterogeneous data, poses significant challenges to project risk management. Traditional PPP project risk assessment and prediction methods that rely on expert survey questionnaires are complex, lack credibility, and are not applicable to large-scale multi-source heterogeneous datasets. They are especially unsuitable for PPP projects involving extensive multi-source heterogeneous data for water environment treatment. Machine learning, a potential solution to these problems, has been widely applied in the PPP domain (Zheng et al. 2021). Owolabi et al. used machine learning models including regression trees, support vector machines, and deep neural networks to predict potential delays in PPP projects (Owolabi et al. 2020). While a small number of researchers have examined construction project risk classification using machine learning techniques, the majority have relied on single machine learning algorithms for risk classification, and the accuracy and generalizability of single algorithms on different datasets require improvement (Huang et al. 2022). Consequently, a number of researchers have used ensemble combination models to process multi-source heterogeneous data, yielding new insights (Chou & Lin 2013), but it has not yet been applied in water environment treatment PPP projects.

In scrutinizing the extant methodologies and their constraints, our study carves out a novel niche in the domain and introduces an avant-garde approach to risk classification and prediction. Central to our methodology is the recognition of the multi-source heterogeneous nature of project risk data. From this perspective, our research constructs a risk feature set, predominantly from a governmental standpoint, and investigates the influence of specific indicators on project risk. Employing an ensemble learning paradigm that integrates multiple classifiers, our approach is empirically validated through a case study of the Jiujiang City water environment treatment PPP project.

The overarching aim of this research is to augment existing engineering risk management frameworks by synergizing conventional expert-driven assessments with contemporary machine learning techniques. This culminates in the development of a dynamic and adaptable risk assessment model. The proposed methodology endeavors to not only enhance the precision of risk prediction but also to furnish a robust framework for governmental entities. The implications of this advancement are profound, significantly bolstering the quality of the supply of environmental public goods and services. Additionally, it markedly improves the capability and efficiency of water pollution mitigation efforts, thereby contributing substantially to the field of environmental management and public health.

Ensemble learning method based on Stacking

Ensemble learning aims to improve predictive performance by integrating multiple algorithms or the same algorithm with varied parameters into a singular model. Three common techniques include boosting, bagging, and stacking (Wang et al. 2023). Among them, stacking-based ensemble learning merges prediction information from various models through self-organizing sampling or cross-validation, often resulting in superior performance compared to a single algorithm. As depicted in Figure 1, this study employs the stacking method to develop an ensemble learning model (Chung et al. 2023), and juxtaposes it with the traditional single model.
Figure 1

Research roadmap.

Figure 1

Research roadmap.

Close modal

Selection of base classifiers

The selection of precise and diverse machine learning classifiers is crucial for constructing the base classifier. A diverse set of classifiers enhances the ensemble's ability to capture various data aspects, while precision ensures reliable individual predictions. For this study, we selected six diverse algorithms known for their excellent classification performance: k-nearest neighbors (KNN), classification and regression tree (CART), linear discriminant analysis (LDA), Naive Bayes classifier (NB), support vector machine (SVM), and water environment treatment project risk support vector machine (WETPR-SVM) classifier.

The KNN classifier is selected due to its ability to handle nonlinear data, and it does not require any prior knowledge of the data distribution. CART was chosen for its interpretability and capability of handling both numerical and categorical data. The LDA was preferred for its ability to maximize the separability among known categories. The NB classifier is renowned for its simplicity and efficiency, especially when dealing with high-dimensional datasets. The SVM is well regarded for its effectiveness in high-dimensional spaces and its use of a subset of training points, making it memory efficient (Chou 2012). The WETPR-SVM classifier was chosen for its tailored design to handle the specific task of risk classification in water environment treatment projects.

Meta-classifier selection

We selected the logistic regression algorithm as the meta-classifier, a common choice in stacking ensemble models, for its strong interpretability and its ability to provide probabilities for outcomes, which aids in understanding the confidence level of the predictions.

Data preparation and model training

Our research began by meticulously organizing the risk feature set data for water environment treatment PPP projects. This initial phase set the foundation for subsequent steps, including data collection, data preprocessing, statistical analysis, etc. Subsequently, we develop and validate machine learning models, and integrate multi-source data. This structured approach is crucial for preparing data for comprehensive analysis through our powerful methodological framework.

Data collection methodology: The identification and refinement of risk factors pertinent to water environment treatment PPP projects were conducted through an extensive data collection process. This process harnessed information from a variety of sources, including the public data platform of the Chinese government, a network of diverse monitoring stations in Jiujiang City, and collaborations with joint bidding entities. The methodology employed advanced text analysis techniques for meticulous data extraction, focusing on key variables such as the scope of the project, financial parameters, and environmental impact assessments. These variables were meticulously selected for their integral relevance to the project's risk assessment and management. Furthermore, it is recognized that the risk landscape of water environment governance PPP projects is dynamic, evolving in response to temporal changes. To effectively capture this evolution, our approach incorporates a longitudinal data analysis framework. The dataset spans a considerable temporal range, thereby providing a comprehensive perspective on the shifting risk dynamics associated with water environment treatment PPP projects. This longitudinal approach is instrumental in yielding a nuanced understanding of the temporal evolution of risks, a critical aspect in the effective management and mitigation of such risks in PPP projects.

Data preprocessing: Prior to analysis, data cleaning procedures were employed to ensure accuracy. This included the verification of data integrity, handling of missing values, and normalization of financial figures for comparative analysis.

Statistical analysis: A robust statistical analysis framework was applied. Initially, descriptive statistical methods provided an overview of the dataset, identifying key trends and patterns. Subsequently, inferential statistical techniques, including regression analysis, were used to explore the relationships between identified risk factors and project outcomes.

Machine learning model development and validation: For predictive risk assessment, a machine learning model was developed using Python's Scikit-learn library. This model was trained on a subset of the dataset, using a combination of supervised learning techniques. Model performance was evaluated through cross-validation, assessing its predictive accuracy and reliability. The model's effectiveness in risk prediction was further validated against known outcomes of similar water treatment projects.

Integration of multi-source data: To enhance the model's accuracy, data from multiple sources were integrated, creating a comprehensive risk profile. This approach accounted for the multi-dimensional nature of the project, encompassing environmental, technical, financial, and socio-economic risk factors.

Ensemble learning mechanism

The final prediction results were secured via an enhanced voting mechanism. Individual prediction models were fused via the ensemble learning mechanism to create a comprehensive prediction model, serving as the ultimate risk grading prediction model for water environment treatment PPP projects. This model was assessed using commonly used model evaluation algorithms in machine learning, continuously refining the performance of the training model through training effects.

Model interpretability method

The interpretability of machine learning models, particularly when juxtaposed with conventional generalized linear models, remains constrained despite their capability to measure feature importance. The manifestation of the Shapley Additive Explanations (SHAP) model has alleviated this challenge by introducing a method to interpret features by computing each feature's contribution to the predicted outcome. The utilized values, known as SHAP values, quantitatively delineate each feature's contribution (Nordin et al. 2023). A higher SHAP value signifies a more substantial contribution of the feature to the predicted value (Li et al. 2022c). In this investigation, we employ the SHAP explanation model to compute the contributions of various risk features in water environment treatment PPP projects, capitalizing on its commendable computational performance and intuitive attributes.

The computation of the SHAP value for a feature within the SHAP explanation model is articulated as the weighted summation of all plausible combinations of feature values, as delineated in the following equation:
(1)
where:
  • denotes a subset of features utilized in the model, with the stipulation that j is excluded from the set S;

  • signifies the total count of features;

  • represents the prediction derived from the feature values within set S;

  • delineates the contribution of feature j to val.

Improved voting mechanism

Ensemble learning, a technique that integrates multiple predictive models, often surpasses the accuracy and generalization capabilities of singular predictive models. This integration is typically achieved through various voting mechanisms. However, the realm of risk prediction within public–private partnership (PPP) projects for water environment treatment is marked by its high complexity. This complexity is further compounded by the frequent occurrence of missing and anomalous data values at certain intervals. Such data irregularities can result in some base learners demonstrating enhanced accuracy or particular suitability for addressing specific challenges. In these contexts, conventional weighted voting strategies may not optimally harness the distinct characteristics and variances inherent in each base learner. This limitation could potentially impinge on the overall efficacy and accuracy of the predictive model, underscoring the need for a more nuanced approach in ensemble learning methodologies within this domain. Therefore, this paper proposes a weighted voting scheme designed to better accommodate the uncertainty and complexity in risk prediction for water environment treatment PPP projects, as illustrated in Figure 2. represents the predicted outcomes of each model, with values of 1 (indicating stability) or −1 (indicating instability). W denotes the integrated result of the n predictive models, calculated as . When , the project is deemed stable; when , the project is considered unstable; and when , the project requires verification through alternative means or is conservatively categorized as unstable. The weighting factor embodies the relative significance of each base learner during the voting process and can be discerned through the training dataset. A viable method is to employ cross-validation to assess the performance of different base learners and assign weights accordingly. More specifically, the accuracy scores of the base learners can be used as their weighting factors.
Figure 2

Schematic diagram of integrated learning voting mechanism.

Figure 2

Schematic diagram of integrated learning voting mechanism.

Close modal

Data collection and analysis

A significant portion of research concerning risks in PPP projects is centered on risk identification and classification, risk analysis and evaluation, alongside risk allocation and management strategies (Wang et al. 2018). Extensive research has been conducted over the past decade to investigate risk management issues in PPP projects, identifying various types of risks, such as financial, operational, political, and environmental risks (Xu et al. 2010). Water environment systems embody dynamic, complex, open systems with temporal, spatial, and volumetric variations. This complexity results in distinct techno-economic characteristics of water environment treatment PPP projects compared to purely commercial PPP projects, including strong quasi-public interest, high difficulty in integrating governance technologies, complex assessment of governance effects, and difficult project coordination and collaboration (An et al. 2018). The prevailing research challenge lies in identifying risk factors in water environment treatment PPP projects.

In the quest for a comprehensive understanding of the risks entailed in water environment treatment PPP projects, it is imperative to delve into a multi-dimensional data exploration. The richness and depth of data harnessed will fundamentally dictate the precision and insightfulness of the risk assessment framework being developed. Using keywords or subject terms in both Chinese and English, such as ‘PPP’, ‘risk’, and ‘water environment treatment’, a combined search was conducted in databases such as China National Knowledge Infrastructure (CNKI), Institute of Science Information (ISI) Web of Science, and ScienceDirect to find relevant literature for risk factor analysis. This resulted in a preliminary list of risk factors for water environment treatment PPP projects, as detailed in Table 1. The existing literature mainly discusses risks caused by the government, risks caused by social capital, and risks generated by the external environment. Government-caused risks primarily stem from government involvement in project management, including tax adjustments (Liu & Xue 2018; Li et al. 2022b), government intervention and credit issues (Cui et al. 2019; Wang et al. 2021), and inadequacies in existing laws, regulations, and regulatory systems (Feng et al. 2022; Su & Cao 2022). Social capital-caused risks mainly arise from actual project construction and operation, such as completion risks (Feng et al. 2022; Su & Cao 2022), construction technology risks (Zhang et al. 2021), contract change risks (Li & Wang 2019; Feng et al. 2022; Su et al. 2022), delay risks (Li & Wang 2019; Li et al. 2022a, 2022b; Su & Cao 2022), cost overrun risks (El-Kholy & Akal 2021; Su & Cao 2022), insufficient project revenue risks (El-Kholy & Akal 2021; Su et al. 2022), dispute and infringement risks (Chou & Lin 2013; Wang et al. 2019; Fu et al. 2023), and social capital change risks (Wang et al. 2019; El-Kholy & Akal 2021). External environment risks refer to risks directly or indirectly caused by the external environment, including environmental damage risks (An et al. 2018; Owolabi et al. 2020; Su & Cao 2022), geological condition risks (Cui et al. 2019; Feng et al. 2022), social stability risks (Wang et al. 2019; Li et al. 2021), public satisfaction (Li et al. 2020a, 2020b; Fu et al. 2023), inflation risks (Wang et al. 2018; Zhang et al. 2021), and force majeure (Li & Wang 2018; Wang et al. 2018).

Table 1

Preliminary list of risk factors for water environment treatment PPP projects

Risk typeSpecific risk indicators
Government-induced Tax adjustment risk 
Government intervention and credit issues 
Inadequate legal and regulatory frameworks 
Social capital-induced Quality completion risk 
Construction technology risk 
Contract change risk 
Schedule delay risk 
Operational cost overrun risk 
Project revenue shortfall risk 
Dispute and infringement risk 
Social capital change risk 
External environment-induced Environmental damage risk 
Geological condition risk 
Social stability risk 
Public opinion risk (public satisfaction) 
Inflation risk 
Force majeure risk (political, natural conditions) 
Risk typeSpecific risk indicators
Government-induced Tax adjustment risk 
Government intervention and credit issues 
Inadequate legal and regulatory frameworks 
Social capital-induced Quality completion risk 
Construction technology risk 
Contract change risk 
Schedule delay risk 
Operational cost overrun risk 
Project revenue shortfall risk 
Dispute and infringement risk 
Social capital change risk 
External environment-induced Environmental damage risk 
Geological condition risk 
Social stability risk 
Public opinion risk (public satisfaction) 
Inflation risk 
Force majeure risk (political, natural conditions) 

While the collated comprehensive risk indicators offer valuable insights, there exist notable limitations in their applicability to this research. Primarily, these indicators predominantly stem from a project-centric analysis, lacking in differentiation among the risk profiles attributable to varied stakeholders involved in the project. Furthermore, these indicators heavily rely on qualitative data, which poses substantial challenges in terms of comprehensive and accurate acquisition in practical scenarios.

Additionally, the existing body of research specifically focusing on risks associated with water environment governance projects remains relatively scant. This gap has led to the preliminary list of risk indicators being more aligned with those typical of general PPP construction projects rather than being tailored to the unique nuances of water environment governance projects. Consequently, this initial compilation of risk indicators does not completely resonate with the specific context and requirements of the current study. This disparity underscores the necessity for a more targeted and nuanced approach in identifying and analyzing risk factors pertinent to water environment governance PPP projects.

Jiujiang City's unique environmental, geographical, and economic characteristics make it an ideal case study for water environment treatment PPP projects. Located along the Yangtze River and home to parts of Poyang Lake, Jiujiang (28°41′-30°05′N, 113°56′-116°54′E) faces specific ecological challenges and opportunities. Its role in China's Yangtze River Economic Belt as a green development city further emphasizes its significance in national environmental sustainability efforts. The Jiujiang City Water Environment Treatment PPP Project for this study was launched in 2018, with a total investment of 7.699 billion RMB. It has a designed sewage treatment capacity of 145,000 m3/day, a designed pipeline length of 188.3 km, a service urban area of 56.5 km, and a population of 796,000. The project operation period is 20 years, including a 2–3-year construction period, including six sub-projects.

To surmount the challenges posed by the qualitative nature of some data, data augmentation techniques were deployed (Mazher et al. 2018). This entailed leveraging expert insights to quantitatively represent qualitative data, thereby augmenting the dataset and enhancing the comprehensiveness of the analysis. Expert interviews were conducted based on the preliminary list of project risk factors (refer to Table 2 for expert information), with the aim of compensating for the scarcity of research on risks in water environment governance projects. The data were sourced from the public data platform of the Ministry of Ecology and Environment of China (https://www.mee.gov.cn/), the Jiujiang City water environment treatment project's official document (https://www.yeec.com.cn/hbjt/index/index.html) and the project team's negotiation memorandum. Utilizing publicly available data along with enterprise data supports the scientific nature of this study in terms of data accessibility and authenticity (Shrestha et al. 2018). Five years' worth of project risk factor information was compiled, and the specific contents of various risk categories were categorized. This compilation yielded a set of 12 risk data features for water environment treatment PPP projects, covering the natural environment, ecological environment, socio-economic, and project entity subsystems. The evaluation indicators are delineated in Table 3.

Table 2

Expert basic information table

Basic informationCategorySample size%
Type of affiliated unit Government agencies 20 
Institutions of higher learning 20 
Water environment management enterprises 35 
General PPP project enterprises 15 
Others 10 
Related project work or research experience Within 1 year 
1–3 years 30 
3–5 years 45 
Over 5 years 20 
Degree of understanding of related projects Very well understanding 10 50 
Better understanding 35 
General understanding 15 
Little understanding 
Basic informationCategorySample size%
Type of affiliated unit Government agencies 20 
Institutions of higher learning 20 
Water environment management enterprises 35 
General PPP project enterprises 15 
Others 10 
Related project work or research experience Within 1 year 
1–3 years 30 
3–5 years 45 
Over 5 years 20 
Degree of understanding of related projects Very well understanding 10 50 
Better understanding 35 
General understanding 15 
Little understanding 
Table 3

Risk data feature table for water environment treatment PPP projects

System nameRisk feature nameRisk feature-related evaluation indicator set
Natural environment subsystem Water environment Hydro-sediment, water quality, water temperature, water level, sediment 
Acoustic environment Noise 
Atmospheric environment Dust, exhaust emissions, local climate 
Surface environment Solid waste, soil nutrients, geology, soil erosion, soil salinization, soil marshification, landslides 
Ecological environment subsystem Terrestrial organisms Terrestrial animal and plant growth risks 
Aquatic organisms Safety risks of aquatic animals, aquatic plants, aquatic microorganisms 
Socio-economic subsystem Livelihood security Public satisfaction, employment opportunities 
Local economic development Regional industry, regional agriculture, urban planning, surrounding landscape, regional economic risk 
Project entity subsystem Legal risks Dispute, breach, infringement risks, planning, standards, and contract change risks 
Operational risks Construction risks caused by social capital, operation and maintenance management risks 
Financial risks Interest rate change risk, revenue shortfall risk, social capital change risk 
Force majeure risks Force majeure due to political and natural conditions 
System nameRisk feature nameRisk feature-related evaluation indicator set
Natural environment subsystem Water environment Hydro-sediment, water quality, water temperature, water level, sediment 
Acoustic environment Noise 
Atmospheric environment Dust, exhaust emissions, local climate 
Surface environment Solid waste, soil nutrients, geology, soil erosion, soil salinization, soil marshification, landslides 
Ecological environment subsystem Terrestrial organisms Terrestrial animal and plant growth risks 
Aquatic organisms Safety risks of aquatic animals, aquatic plants, aquatic microorganisms 
Socio-economic subsystem Livelihood security Public satisfaction, employment opportunities 
Local economic development Regional industry, regional agriculture, urban planning, surrounding landscape, regional economic risk 
Project entity subsystem Legal risks Dispute, breach, infringement risks, planning, standards, and contract change risks 
Operational risks Construction risks caused by social capital, operation and maintenance management risks 
Financial risks Interest rate change risk, revenue shortfall risk, social capital change risk 
Force majeure risks Force majeure due to political and natural conditions 

Figure 3 delineates the evolution from a preliminary list to the finalized risk indicator feature set, highlighting the government's pivotal role in water environment treatment PPP projects. Key government objectives in these projects include aquatic ecosystem restoration, pollution mitigation, and the maximization of environmental, social, and economic benefits (Li et al. 2022a, 2022b, 2022c). The government's regulatory perspective necessitates a focus on the public interest and the intrinsic public attributes of these projects. Government departments, therefore, bear the critical responsibility of overseeing project operations and ecological restoration, a facet often underemphasized by other stakeholders, leading to various risk incidents (Li et al. 2020a, 2020b).
Figure 3

Relationship between existing system and government perspective system.

Figure 3

Relationship between existing system and government perspective system.

Close modal

In this context, our study emphasizes the importance of continuous operation risks and natural ecological environment risks in the risk categories prioritized by the government in water environment treatment PPP projects. Discussing specific projects in this domain also enables the collection of subjective risk data, crucial for empirical model validation. Financial risks, representing budget overruns, funding gaps, or unexpected financial emergencies, often stem from market volatility or fluctuating interest rates, posing significant challenges to project financial stability (Li & Wang 2018; Akomea-Frimpong et al. 2021). Operational risks cover potential issues in technical and managerial aspects of project execution, where technical failures, managerial shortcomings, or operational lapses can lead to delays, cost increases, or compromised performance (Xiang et al. 2022). The inherent complexity of water treatment projects often heightens these risks, underscoring the need for effective operational strategies and skilled project management.

Political risks arise from changes in policy or government stability, with PPP projects being particularly vulnerable to such shifts. Changes in policies or government can alter project priorities, introduce regulatory challenges, or affect funding, thereby impacting project timelines and financial frameworks (Bao et al. 2022). Environmental risks, crucial due to the ecological nature of water treatment PPP projects, include adverse weather conditions, geological surprises, or new environmental regulations, which can lead to delays, increased compliance costs, or in extreme cases, project suspension (Ma et al. 2020).

In essence, the study categorizes risk indicators into subsystems: the natural environment and ecological environment subsystems primarily encompass the government's regulatory risks related to the natural ecology, while the socio-economic subsystem addresses the public impact of water environment treatment PPP projects. The project entity subsystem acknowledges the government's role in ensuring sustainable PPP project operation but excludes engineering risks not shared by the government, such as cost overrun risks and construction technology risks.

It is crucial to recognize that the nature of participating enterprises significantly influences risk assessment outcomes in PPP projects. In the realm of comprehensive water environment management in China, projects demand substantial investment and carry public characteristics. Consequently, state-owned enterprises with robust comprehensive strength predominate this sector. Meanwhile, private or foreign-funded enterprises often find their most viable strategy in forming alliances with state-owned entities, a trend reflected in the increasing mergers and acquisitions within the environmental protection industry. For the Jiujiang project, the involvement of the state-owned China Three Gorges Corporation aligns with the prevailing trend in the Chinese market for water environment treatment projects. State-owned enterprises typically bring stability and significant financing to such projects, though they also pose unique challenges, particularly in aligning with bureaucratic procedures and national policies. In contrast, private or foreign-invested enterprises introduce different dynamics, focusing more on cost efficiency and innovation, albeit with higher financial risks and market sensitivity. Our study incorporates parameters to reflect these diverse characteristics and risk profiles of the involved enterprises.

After determining the risk indicator system, the data categorization utilized the equal interval method, classifying 927 risk data entries into four risk levels: low, medium, higher, and high, predominantly consisting of low-level risks. This imbalance in the dataset steered us away from deep learning approaches, such as multilayer neural networks, due to their unsuitability in such contexts (Abdoli et al. 2023; Tsai & Chang 2023). Thus, a conventional machine learning strategy was employed in our research.

To understand the relationships between various indicators and risk levels, we used scatter matrix plots to visualize the risk data. Among the 29 indicators analyzed, four were chosen for detailed examination due to their direct impact on the project's risk dynamics: water quality, local industrial economy, local climate, and soil erosion. These indicators were selected for their direct relevance to the project's environmental, socio-economic, and operational aspects. For example, water quality is a direct indicator of the project's environmental impact, the local industrial economy reflects socio-economic effects, local climate pertains to operational challenges, and soil erosion addresses long-term sustainability (Tabari et al. 2021; Chen et al. 2023; Li et al. 2023; Zhu et al. 2023).

These indicators were portrayed through pairwise coordinates, elucidating the interlinkages between various risk echelons and project risk attributes. Figure 4 elucidates the findings. Under the designated four risk levels, the dispersion of the four indicators is relatively aggregated, underlining a discernible correlation among different risk indicators, as well as a significant correlation between these indicators and the risk stratification of water environment treatment PPP projects. This nuanced analysis, augmented by a judicious selection of indicators and a deeper exploration of their implications, thereby affords a more robust understanding of the risk contours shaping water environment treatment PPP projects, enriching the analytical discourse surrounding risk assessment and mitigation in such complex project settings.
Figure 4

Partial feature risk level correlation analysis.

Figure 4

Partial feature risk level correlation analysis.

Close modal

These tailored strategies in data collection and processing not only uphold the integrity of the analysis but also significantly contribute toward achieving a nuanced and actionable risk assessment model. The ensuing subsections will delve into the methodologies deployed to dissect this data and unearth invaluable insights into risk management for water environment treatment PPP projects.

Risk feature contribution analysis

Enhanced interpretation of feature indicator contributions in model refinement: The dissection of the influence of specific feature indicators on model efficacy is crucial for improving model clarity and fine-tuning. As depicted in Figure 5, SHAP values are utilized to illustrate the impact of different risk indicators on the predictions of risk levels in water environment treatment PPP projects. This analysis highlights that key indicators, namely water environment risk, operation and maintenance management risk, and local economic development risk, play a significant role in determining the risk level predictions. Notably, a higher measurement in these indicators aligns with a heightened risk level, a finding that is in agreement with practical observations (Li et al. 2022a).
Figure 5

SHAP values of risk features.

Figure 5

SHAP values of risk features.

Close modal

In the context of water environment treatment PPP projects, the primary goal from a governmental perspective is to enhance water environment governance for ecological restoration. Within this framework, water environment risk is identified as a pivotal factor affecting the project's overall risk, emphasizing the necessity of effective operation and maintenance management for achieving the project's aims. This underscores the criticality of operation and maintenance management risk in ensuring the sustainability of the project. Additionally, the promotion of local economic development, closely tied to the betterment of societal conditions, is a fundamental objective of these government-led projects. Consequently, this objective's lower relative contribution to the overall risk assessment mirrors the government's prioritization of goals in water environment treatment PPP projects, highlighting a balanced approach to risk evaluation and project management.

Dataset construction

Utilizing the established risk feature set for water environment treatment PPP projects, we addressed the missing data values. The specific method initiates by setting a threshold to ascertain whether a feature is missing or not. Should the percentage of missing values for a feature surpass this threshold, the feature is excluded. In this study, the threshold for missing feature deletion is set at 80%. If the threshold is not surpassed, the KNN algorithm is used to locate the k-nearest samples to the sample with the missing value, and the average value of their corresponding features.

To ensure model accuracy and eliminate the influence of dimensions, we standardized the original risk indicator data using a standardization algorithm (Wang et al. 2019). Equation (2) represents the formula, where xi represents the ith evaluation indicator of the nth risk feature and represents the data for the standardized risk evaluation indicator.
(2)

After processing missing features and dimensionless treatment, the distribution of feature values is between 0 and 1. The dataset is divided into training and testing sets in a 7:3 ratio, and a SVM classifier is used to classify the data.

Model construction and training

The efficacy of our stacking ensemble learning model is fundamentally linked to the precision and diversity of its base classifiers. This aligns with Chung et al. (2023)'s principle of ‘good but different’ in ensemble learning. Our initial phase involved experimenting with various machine learning classifiers in Python's Scikit-learn library, conducted on the Jupyter Notebook platform. This phase led to the independent development of six classifiers: KNN, CART, LDA, NB, SVM, and WETPR-SVM. Each classifier underwent rigorous training on the dataset, utilizing cross-validation, random search, and learning curves for optimal hyperparameter tuning.

In-depth attention was given to the SVM classifier, tuned following Chou & Lin (2013). This process involved adjustments in SVM parameters, focusing on kernel functions – LinearSVM, radial basis function support vector machine (RBFSVM), and Sigmoid. The impact of these kernels on classification accuracy was profound, as depicted in Figure 6. Our analysis of Figure 6 reveals that the RBFSVM kernel function significantly outperformed others, achieving a test set accuracy of 0.9043, while LinearSVM and Sigmoid kernels recorded lower accuracies of 0.8191 and 0.8297, respectively.
Figure 6

Experimental results of kernel functions.

Figure 6

Experimental results of kernel functions.

Close modal
Figure 7

Box plot of accuracy results for each classifier.

Figure 7

Box plot of accuracy results for each classifier.

Close modal

The RBFSVM's effectiveness is attributed to its ability to project data into a higher-dimensional space, making complex, nonlinear relationships linearly separable. This characteristic is especially beneficial in water environment governance PPP projects, where risk factors often exhibit nonlinear interdependencies. The RBFSVM's aptitude for deciphering these complexities is a critical factor in its high classification accuracy.

Furthermore, the Gaussian kernel, also known as the RBF kernel, excels in handling nonlinear classification challenges. It transforms input data into a higher-dimensional feature space, effectively capturing complex, nonlinear patterns specific to our dataset. This capability of the Gaussian kernel to linearly separate intricate risk factor interrelations in water environment treatment PPP projects is unparalleled, especially when compared to the linear kernel functions.

As shown in Figure 7, integration of the Gaussian kernel in our model demonstrates a significant advancement in the classification of project risks, particularly for water environment governance PPP projects. Its ability to unravel the non-linearity and complexity inherent in these projects and map them into a higher-dimensional space was instrumental in achieving enhanced classification accuracy. This feature enabled a more refined and precise classification of project risks, substantiated by the RBFSVM's superior classification accuracy (0.9043) over LinearSVM (0.8191) and Sigmoid (0.8297).

Model comparison and evaluation

This study's ensemble learning model ultimately completes the four-classification task for water environment governance PPP project risk. Therefore, we use accuracy (accuracy), macro-average precision (Macro_P), macro-average recall (Macro_R), and macro-average F1 score (Macro_F1) as four indicators to evaluate the performance of the model.

Accuracy is the ratio of the number of project risk samples correctly classified by the model to the total number of project risk samples, which reflects the overall classification accuracy of the model (Choubin et al. 2023). The formula for calculation is depicted in the following equation:
(3)
The average precision across all classes is represented by the macro-average precision. Precision is the ratio of the number of risks correctly classified into a particular category to the number of risks classified into that category. The following equation demonstrates its formula for calculating:
(4)
The macro-average recall is the mean of all recall values across all classes. Recall is the proportion of risk samples correctly classified by the model for a particular project risk category relative to the total number of risk samples in that category. The formula for its calculation is shown in the following equation:
(5)
To evaluate the performance of a classification model in practical applications, it is frequently necessary to consider the model's precision and recall in depth. As a result, the F1 score, which is the weighted harmonic average of the two, is used as an evaluation metric. The formula for calculating the macro F1 score, which represents the mean of the F1 scores for all classes, is shown in the following equation:
(6)

In Equations (3)–(6), represents the number of positive samples predicted as positive by the model; represents the number of negative samples predicted as positive by the model; represents the number of positive samples predicted as negative by the model; represents the number of negative samples predicted as negative by the model; n represents the number of risk feature categories, and , and represent the precision, recall, and F1 scores of the model for different categories, respectively.

To evaluate the performance of various classification algorithms, we utilized multiple methodologies to train the dataset and assess their effectiveness. LDA stands as the sole linear algorithm in this mix, while the rest are nonlinear. The relevant steps involved are as follows: (1) partition the training set; (2) appraise the algorithm models using 10-fold cross-validation; (3) generate six unique models for predicting new data; and (4) compare their classification accuracy. As depicted in Table 4, the WETPR-SVM model yields the highest Accuracy, Macro_P, Macro_R, and Macro_F1 scores, which are 0.9025, 0.9055, 0.9026, and 0.9021, respectively. These results indicate that the WETPR-SVM model developed in this study surpasses traditional singular machine learning classification models in terms of overall performance. It demonstrates proficiency in addressing the classification problem of water environment governance PPP project risk with enhanced accuracy and generalizability, showcasing superior classification capacity. Thus, WETPR-SVM is selected as the optimal model for predicting the risk classification of water environment governance PPP projects.

Table 4

Performance evaluation of prediction models

ClassifierAccuracyMacro_PMacro_RMacro_F1
KNN 0.8532 0.8561 0.8537 0.853 
CART 0.8251 0.8252 0.8249 0.8253 
LDA 0.8469 0.8471 0.847 0.8469 
NB 0.8778 0.8781 0.8779 0.8775 
SVM 0.8467 0.8465 0.8469 0.8462 
WETPR-SVM 0.9025 0.9055 0.9026 0.9021 
ClassifierAccuracyMacro_PMacro_RMacro_F1
KNN 0.8532 0.8561 0.8537 0.853 
CART 0.8251 0.8252 0.8249 0.8253 
LDA 0.8469 0.8471 0.847 0.8469 
NB 0.8778 0.8781 0.8779 0.8775 
SVM 0.8467 0.8465 0.8469 0.8462 
WETPR-SVM 0.9025 0.9055 0.9026 0.9021 

Box plot 7 comparison indicates that the WETPR-SVM model boasts the smallest prediction accuracy range of 0.07, suggesting high stability and consistent performance across diverse datasets. Conversely, the NB model exhibits the largest interquartile range of prediction accuracy at 0.2636, likely due to its struggle with multi-source heterogeneous data, missing values, and class imbalance issues. The CART model, with the lowest median prediction accuracy bound of 0.8251, may falter due to its decision tree-based algorithm being sensitive to noisy data and overfitting issues. These findings underscore the superiority of the WETPR-SVM model in predicting water environment treatment PPP projects' risk classifications.

In summary, the ensemble learning-based approach posited in this study adeptly leverages various machine learning algorithms' strengths and addresses single algorithm limitations with multi-source heterogeneous data, missing values, and class imbalance issues, enhancing prediction accuracy and generalization capability.

Our study makes a contribution to risk management within PPP frameworks, with a particular focus on water environment treatment projects. We have delineated a comprehensive array of risk factors and demonstrated their profound impact on project success. Crucially, our analysis from a governmental perspective underlines the necessity of government oversight in ensuring the success and sustainability of PPP projects.

Central to our findings is the ensemble learning model based on the Stacking method, marking a considerable leap in risk prediction accuracy. This model's capacity to adeptly manage multi-source heterogeneous risk data facilitates a more refined risk classification and empowers stakeholders with informed decision-making capabilities. The empirical validation using data from the Jiujiang City project substantiates the WETPR-SVM model's superiority in risk prediction.

Our analysis revealed primary risk indicators such as water environment risk, operational risk, and local economic development risk. These indicators are instrumental in devising effective risk mitigation strategies. We found that prioritizing these risks enhances operational strategies, governance quality, and proactive local economic participation, ultimately fostering increased project success.

Furthermore, our research extends its implications beyond water treatment projects to encompass various PPP projects across different sectors and regions. The methodologies and insights gleaned are applicable to broader risk management strategies. This research bridges the gap between academic investigation and practical applications, contributing to sustainable development in environmental governance projects.

However, our findings also unearth several areas for future exploration. There is a critical need for further research to extend these methodologies to diverse PPP contexts, evaluating the scalability and adaptability of our model under varying environmental and economic conditions. Additionally, investigating the complex interplay of identified risk factors in various PPP scenarios could yield deeper insights, enhancing the efficacy of risk management strategies.

Our study focused on risk management in water environment treatment PPP projects and presents a comprehensive, data-driven model for risk assessment and prediction. The implementation of our ensemble learning model, specifically tailored for such projects, showcases the advanced application of machine learning techniques, significantly improving risk prediction precision and reliability. This approach is critical for proactive risk management, ensuring the success and sustainability of these projects.

The core findings of our study are as follows:

  • (1)

    The ensemble learning model demonstrates exceptional predictive accuracy, marking a significant advancement in integrating machine learning into PPP project risk management.

  • (2)

    The identification of key risk indicators – water environment, operational risk, and local economic development – is crucial for developing effective risk mitigation strategies and aligning stakeholder objectives.

  • (3)

    Our model's innovative weighted voting mechanism effectively overcomes the challenges of missing or abnormal data, enhancing robustness and real-world applicability.

  • (4)

    Empirical validation using the Jiujiang City project data underscores our model's practical relevance and effectiveness, setting a new benchmark in PPP project risk management practices.

Our study's industry contributions are manifold. By integrating advanced machine learning with the specific risk management requirements of water environment treatment PPP projects, we provide a pioneering, data-driven approach to navigate complex risk landscapes. The model enhances risk prediction accuracy and offers actionable insights for robust risk mitigation, potentially boosting the success and sustainability of PPP projects in water treatment, thus contributing to sustainable urban development goals.

Acknowledging our study as foundational, we intend to work with policymakers and practitioners to translate our research findings into actionable insights and policy recommendations, contributing to the legal and policy framework of water environment treatment PPP projects. Additionally, the methodological framework developed in this study, while applied to Jiujiang, is structured to accommodate adaptability to other similar regions. The risk assessment and prediction model is designed to be customized based on the distinct risk profiles, data availability, and governance structures of other regions, thereby facilitating its broader application. The versatility of the model is a stepping stone toward our future endeavors to further validate and augment the model's applicability across diverse regional settings, thereby contributing to the robustness and generalizability of the risk assessment framework for water environment treatment PPP projects.

This work was supported by the National Social Science Funds of China [grant number 17BGL156].

This paper does not contain any studies with human participants or animals performed by any of the authors.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abdoli
M.
,
Akbari
M.
&
Shahrabi
J.
2023
Bagging supervised autoencoder classifier for credit scoring
.
Expert Systems with Applications
213
,
118991
.
Akomea-Frimpong
I.
,
Jin
X. H.
&
Osei-Kyei
R.
2021
A holistic review of research studies on financial risk management in public–private partnership projects
.
Engineering, Construction and Architectural Management
28
(
9
),
2549
2569
.
An
X.
,
Li
H.
,
Wang
L.
,
Wang
Z.
,
Ding
J.
&
Cao
Y.
2018
Compensation mechanism for urban water environment treatment PPP project in China
.
Journal of Cleaner Production
201
,
246
253
.
Bao
F.
,
Martek
I.
,
Chen
C.
,
Wu
Q.
&
Chan
A. P. C.
2022
Critical risks inherent to the transfer phase of public-private partnership water projects in China
.
Journal of Management in Engineering
38
(
3
),
1
10
.
Chen
Y.
,
Wang
Y.
,
Leung
C.
,
Hyeon
P. J.
&
Ding
X. L.
2023
Urban river restoration in Hong Kong: Assessment, impact, and improvement strategies
.
Sustainable Cities and Society
99
,
104885
.
Chou
J.
&
Lin
C.
2013
Predicting disputes in public-private partnership projects: Classification and ensemble models
.
Journal of Computing in Civil Engineering
27
(
1
),
51
60
.
Choubin
B.
,
Hosseini
F. S.
,
Rahmati
O.
,
Youshanloei
M. M.
&
Jalali
M.
2023
Mapping of salty aeolian dust-source potential areas: Ensemble model or benchmark models?
The Science of the Total Environment
877
,
163419
163419
.
Chung
D.
,
Yun
J.
,
Lee
J.
&
Jeon
Y.
2023
Predictive model of employee attrition based on stacking ensemble learning
.
Expert Systems with Applications
215
,
119364
.
Cui
C.
,
Wang
J.
,
Liu
Y.
&
Coffey
V.
2019
Relationships among value-for-money drivers of public-private partnership infrastructure projects
.
Journal of Infrastructure Systems
25
(
2
).
El-Kholy
A. M.
&
Akal
A. Y.
2021
Assessing and allocating the financial viability risk factors in public-private partnership wastewater treatment plant projects
.
Engineering, Construction, and Architectural Management
28
(
10
),
3014
3040
.
Fu
L.
,
Sun
H.
,
Fang
Y.
&
Xu
K.
2023
A systematic review of the public-private partnership literature published between 2012 and 2021
.
Journal of Civil Engineering and Management
29
(
3
),
238
252
.
Huang
I.
,
Chang
M. J.
&
Lin
G. F.
2022
An optimal integration of multiple machine learning techniques to real-time reservoir inflow forecasting
.
Stochastic Environmental Research and Risk Assessment
36
(
6
),
1541
1561
.
Li
H.
,
Lv
L.
,
Zuo
J.
,
Bartsch
K.
,
Wang
L.
&
Xia
Q.
2020a
Determinants of public satisfaction with an urban water environment treatment PPP project in Xuchang, China
.
Sustainable Cities and Society
60
,
102244
.
Li
H.
,
Lv
L.
,
Zuo
J.
,
Su
L.
,
Wang
L.
&
Yuan
C.
2020b
Dynamic reputation incentive mechanism for urban water environment treatment PPP projects
.
Journal of Construction Engineering and Management
146
(
8
),
04020088
.
Li
H.
,
Wang
F.
,
Zhang
C.
,
Wang
L.
,
An
X.
&
Dong
G.
2021
Sustainable supplier selection for water environment treatment public-private partnership projects
.
Journal of Cleaner Production
324
,
129218
.
Li
L.
,
Qiao
J.
,
Yu
G.
,
Wang
L.
,
Li
H.
,
Liao
C.
&
Zhu
Z.
2022c
Interpretable tree-based ensemble model for predicting beach water quality
.
Water Research
211
,
118078
118078
.
Liu
J.
&
Xue
X.
2018
River management for local governments in China: From public to private
.
International Journal of Environmental Research and Public Health
15
(
10
),
2174
.
Ma
H.
,
Zeng
S.
,
Lin
H.
&
Zeng
R.
2020
Impact of public sector on sustainability of public–private partnership projects
.
Journal of Construction Engineering and Management
146
,
2
.
Mazher
K.
,
Chan
A.
,
Zahoor
H.
,
Khan
M. I.
&
Ameyaw
E. E.
2018
Fuzzy integral-based risk-assessment approach for public-private partnership infrastructure projects
.
Journal of Construction Engineering and Management
144
(
12
),
1
15
.
Owolabi
H. A.
,
Bilal
M.
,
Oyedele
L. O.
,
Alaka
H. A.
,
Ajayi
S. O.
&
Akinade
O. O.
2020
Predicting completion risk in PPP projects using big data analytics
.
IEEE Transactions on Engineering Management
67
(
2
),
430
453
.
Shrestha
A.
,
Chan
T.
,
Aibinu
A.
,
Chen
C.
&
Martek
I.
2018
Risk allocation inefficiencies in Chinese PPP water projects
.
Journal of Construction Engineering and Management
144
(
4
),
04018013
.
Su
L.
&
Cao
Y.
2022
Performance monitoring and evaluation of water environment treatment PPP projects with multi-source heterogeneous information
.
Frontiers in Environmental Science
10
,
1024701
.
Su
L.
,
Cao
Y.
,
Li
H.
&
Zhang
C.
2022
Water environment treatment PPP projects optimal payment mechanism based on multi-stage dynamic programming model
.
Engineering Construction and Architectural Management
31
(
2
),
866
890
.
Tabari
H.
,
Hosseinzadehtalaei
P.
,
Thiery
W.
&
Willems
P.
2021
Amplified drought and flood risk under future socioeconomic and climatic change
.
Earth's Future
9
(
10
),
e2021EF002295
.
Wang
Y.
,
Cui
P.
&
Liu
J.
2018
Analysis of the risk-sharing ratio in PPP projects based on government minimum revenue guarantees
.
International Journal of Project Management
36
(
6
),
899
909
.
Wang
Y.
,
Shao
Z.
&
Tiong
R. L. K.
2021
Data-driven prediction of contract failure of public-private partnership projects
.
Journal of Construction Engineering and Management
147
(
8
),
04021089
.
Wang
S.
,
Zhu
J.
,
Yin
Y.
,
Wang
D.
,
Cheng
T. C. E.
&
Wang
Y.
2023
Interpretable multi-modal stacking-based ensemble learning method for real estate appraisal
.
IEEE Transactions on Multimedia
25
,
315
328
.
Xiang
P.
,
Zhang
Q.
,
Jiang
Q.
&
Liu
Z.
2022
Operational risk allocation in urban rail transit public–private partnership projects
.
Frontiers in Environmental Science
10
,
1195
.
Xu
Y.
,
Yeung
J. F. Y.
,
Chan
A. P. C.
,
Chan
D. W. M.
,
Wang
S. Q.
&
Ke
Y.
2010
Developing a risk assessment model for PPP projects in China – a fuzzy synthetic evaluation approach
.
Automation in Construction
19
(
7
),
929
943
.
Zhang
Y.
,
He
N.
,
Li
Y.
,
Chen
Y.
,
Wang
L.
&
Ran
Y.
2021
Risk assessment of water environment treatment PPP projects based on a cloud model
.
Discrete Dynamics in Nature and Society
2021
,
1
15
.
Zheng
X.
,
Liu
Y.
,
Jiang
J.
,
Thomas
L. M.
&
Su
N.
2021
Predicting the litigation outcome of PPP project disputes between public authority and private partner using an ensemble model
.
Journal of Business Economics and Management
22
(
2
),
320
345
.
Zhu
R.
,
Yu
Y.
,
Zhao
J.
,
Liu
D.
,
Cai
S.
,
Feng
J.
&
Rodrigo-Comino
J.
2023
Evaluating the applicability of the water erosion prediction project (WEPP) model to runoff and soil loss of sandstone reliefs in the Loess Plateau, China
.
International Soil and Water Conservation Research
11
(
2
),
240
250
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).