Abstract
Water treatment public–private partnership (PPP) projects are pivotal for sustainable water management but are often challenged by complex risk factors. Efficient risk management in these projects is crucial, yet traditional methodologies often fall short of addressing the dynamic and intricate nature of these risks. Addressing this gap, this comprehensive study introduces an advanced risk classification prediction model tailored for water treatment PPP projects, aimed at enhancing risk management capabilities. The proposed model encompasses an intricate evaluation of crucial risk areas: the natural and ecological environments, socio-economic factors, and engineering entities. It delves into the complex relationships between these risk elements and the overall risk profile of projects. Grounded in a sophisticated ensemble learning framework employing stacking, our model is further refined through a weighted voting mechanism, significantly elevating its predictive accuracy. Rigorous validation using data from the Jiujiang City water environment system project Phase I confirms the model's superiority over standard machine learning models. The development of this model marks a significant stride in risk classification for water treatment PPP projects, offering a powerful tool for enhancing risk management practices. Beyond accurately predicting project risks, this model also aids in developing effective government risk management strategies.
HIGHLIGHTS
Pioneers data-driven risk management in water treatment PPP projects using machine learning.
Introduces an effective weighted voting mechanism for handling data irregularities in risk assessment.
Demonstrates superior performance of WETPR-SVM model over conventional machine learning models.
INTRODUCTION
The integrity of surface and groundwater bodies is governed by an intricate interplay of natural processes and anthropogenic activities. Accelerating demographic expansion, industrialization, and agricultural practices, in conjunction with potential alterations in the hydrological cycle attributed to climate change, have collectively exacerbated global water quality degradation (Li et al. 2021; Su et al. 2022). The imperative of water environment treatment emerges not only as a means to safeguard water resources but also as a critical element in maintaining ecological balance and fostering sustainable economic development. In addressing these challenges, water environment treatment initiatives, frequently operationalized through public–private partnerships (PPP), have risen to prominence. These partnerships offer distinct advantages, including mitigation of government fiscal burdens and enhancement of management efficacy, thus attracting increasing attention in the field of environmental governance.
In recent years, China has seen a consistent increase in investments in water environment treatment PPP projects. However, from a practical standpoint, many of these projects have failed to achieve their objectives due to inadequate risk management (Wang et al. 2021; Su et al. 2022). On the one hand, numerous risk factors are involved in the construction and maintenance of these projects, including local climate, hydrology, geology, flora and fauna, and the socio-economic environment. These factors present high levels of uncertainty and danger, creating significant challenges for project risk management (Lu et al. 2017). On the other hand, the primary motivation for social capital to participate in ecological protection and restoration is profit-driven. This is to say, with government policies' support and considering the market prospects and economic benefits of ecological restoration, investments are made to generate profits. Compared to social capital, governments place greater emphasis on the subsequent environmental management effects of projects to maximize social and public benefits, while private investors tend to pursue short-term profit maximization without considering long-term operations (Wang et al. 2019; Tang et al. 2021). The inherent conflict of interest in public–private partnerships (PPP) often precipitates opportunistic tendencies among private investors, manifesting as the provision of substandard products or services. This issue is particularly acute in the context of water environment treatment, where such behaviors can lead to inadequate remediation of polluted waterways. These opportunistic practices are more likely to emerge in scenarios where government oversight in project risk management is deficient, thereby posing significant threats to public safety and societal sustainability. The crux of these challenges lies in the public welfare orientation and intrinsic public attributes of water environment treatment PPP projects. Regrettably, these critical aspects are frequently overlooked by decision-makers during the risk management process, with the government playing a pivotal role in both creating and engaging with these attributes. Current literature has scarcely delved into the government's role in managing water environment treatment PPP projects, particularly in relation to the intricate interplay among risk indicators (Liu & Xue 2018; Xue & Wang 2020). Given this context, there is an imperative need to develop comprehensive risk assessment and prediction models specifically for water environment treatment PPP projects. These models should duly consider the public welfare and public attributes and scrutinize risk factors from the vantage point of government management. Such an approach would provide a more robust framework for effective risk control and management in these critical projects.
Accurate risk forecasting is indeed a crucial factor in determining the success or failure of PPP project risk management. General methods for risk prediction in PPP projects can be categorized as qualitative, such as literature reviews, case studies, and questionnaires (Li et al. 2022a); semi-quantitative, such as the analytical hierarchy process, game theory, and fuzzy comprehensive evaluation (Zhang et al. 2021); and quantitative, such as artificial neural networks and value for money analyses (Wang et al. 2021). Expert judgment data are one of the most frequently employed data sources in PPP risk management. However, the subjectivity and ambiguity inherent in this data source pose significant challenges to the accuracy of risk analysis and assessment, and there is no unified standard for judgment (Owolabi et al. 2020). The complexity and specificity of PPP projects for water environment treatment generate numerous and diverse data sources, requiring extensive monitoring, sampling, and analysis. This situation, characterized by multi-source and heterogeneous data, poses significant challenges to project risk management. Traditional PPP project risk assessment and prediction methods that rely on expert survey questionnaires are complex, lack credibility, and are not applicable to large-scale multi-source heterogeneous datasets. They are especially unsuitable for PPP projects involving extensive multi-source heterogeneous data for water environment treatment. Machine learning, a potential solution to these problems, has been widely applied in the PPP domain (Zheng et al. 2021). Owolabi et al. used machine learning models including regression trees, support vector machines, and deep neural networks to predict potential delays in PPP projects (Owolabi et al. 2020). While a small number of researchers have examined construction project risk classification using machine learning techniques, the majority have relied on single machine learning algorithms for risk classification, and the accuracy and generalizability of single algorithms on different datasets require improvement (Huang et al. 2022). Consequently, a number of researchers have used ensemble combination models to process multi-source heterogeneous data, yielding new insights (Chou & Lin 2013), but it has not yet been applied in water environment treatment PPP projects.
In scrutinizing the extant methodologies and their constraints, our study carves out a novel niche in the domain and introduces an avant-garde approach to risk classification and prediction. Central to our methodology is the recognition of the multi-source heterogeneous nature of project risk data. From this perspective, our research constructs a risk feature set, predominantly from a governmental standpoint, and investigates the influence of specific indicators on project risk. Employing an ensemble learning paradigm that integrates multiple classifiers, our approach is empirically validated through a case study of the Jiujiang City water environment treatment PPP project.
The overarching aim of this research is to augment existing engineering risk management frameworks by synergizing conventional expert-driven assessments with contemporary machine learning techniques. This culminates in the development of a dynamic and adaptable risk assessment model. The proposed methodology endeavors to not only enhance the precision of risk prediction but also to furnish a robust framework for governmental entities. The implications of this advancement are profound, significantly bolstering the quality of the supply of environmental public goods and services. Additionally, it markedly improves the capability and efficiency of water pollution mitigation efforts, thereby contributing substantially to the field of environmental management and public health.
METHODS
Ensemble learning method based on Stacking
Selection of base classifiers
The selection of precise and diverse machine learning classifiers is crucial for constructing the base classifier. A diverse set of classifiers enhances the ensemble's ability to capture various data aspects, while precision ensures reliable individual predictions. For this study, we selected six diverse algorithms known for their excellent classification performance: k-nearest neighbors (KNN), classification and regression tree (CART), linear discriminant analysis (LDA), Naive Bayes classifier (NB), support vector machine (SVM), and water environment treatment project risk support vector machine (WETPR-SVM) classifier.
The KNN classifier is selected due to its ability to handle nonlinear data, and it does not require any prior knowledge of the data distribution. CART was chosen for its interpretability and capability of handling both numerical and categorical data. The LDA was preferred for its ability to maximize the separability among known categories. The NB classifier is renowned for its simplicity and efficiency, especially when dealing with high-dimensional datasets. The SVM is well regarded for its effectiveness in high-dimensional spaces and its use of a subset of training points, making it memory efficient (Chou 2012). The WETPR-SVM classifier was chosen for its tailored design to handle the specific task of risk classification in water environment treatment projects.
Meta-classifier selection
We selected the logistic regression algorithm as the meta-classifier, a common choice in stacking ensemble models, for its strong interpretability and its ability to provide probabilities for outcomes, which aids in understanding the confidence level of the predictions.
Data preparation and model training
Our research began by meticulously organizing the risk feature set data for water environment treatment PPP projects. This initial phase set the foundation for subsequent steps, including data collection, data preprocessing, statistical analysis, etc. Subsequently, we develop and validate machine learning models, and integrate multi-source data. This structured approach is crucial for preparing data for comprehensive analysis through our powerful methodological framework.
Data collection methodology: The identification and refinement of risk factors pertinent to water environment treatment PPP projects were conducted through an extensive data collection process. This process harnessed information from a variety of sources, including the public data platform of the Chinese government, a network of diverse monitoring stations in Jiujiang City, and collaborations with joint bidding entities. The methodology employed advanced text analysis techniques for meticulous data extraction, focusing on key variables such as the scope of the project, financial parameters, and environmental impact assessments. These variables were meticulously selected for their integral relevance to the project's risk assessment and management. Furthermore, it is recognized that the risk landscape of water environment governance PPP projects is dynamic, evolving in response to temporal changes. To effectively capture this evolution, our approach incorporates a longitudinal data analysis framework. The dataset spans a considerable temporal range, thereby providing a comprehensive perspective on the shifting risk dynamics associated with water environment treatment PPP projects. This longitudinal approach is instrumental in yielding a nuanced understanding of the temporal evolution of risks, a critical aspect in the effective management and mitigation of such risks in PPP projects.
Data preprocessing: Prior to analysis, data cleaning procedures were employed to ensure accuracy. This included the verification of data integrity, handling of missing values, and normalization of financial figures for comparative analysis.
Statistical analysis: A robust statistical analysis framework was applied. Initially, descriptive statistical methods provided an overview of the dataset, identifying key trends and patterns. Subsequently, inferential statistical techniques, including regression analysis, were used to explore the relationships between identified risk factors and project outcomes.
Machine learning model development and validation: For predictive risk assessment, a machine learning model was developed using Python's Scikit-learn library. This model was trained on a subset of the dataset, using a combination of supervised learning techniques. Model performance was evaluated through cross-validation, assessing its predictive accuracy and reliability. The model's effectiveness in risk prediction was further validated against known outcomes of similar water treatment projects.
Integration of multi-source data: To enhance the model's accuracy, data from multiple sources were integrated, creating a comprehensive risk profile. This approach accounted for the multi-dimensional nature of the project, encompassing environmental, technical, financial, and socio-economic risk factors.
Ensemble learning mechanism
The final prediction results were secured via an enhanced voting mechanism. Individual prediction models were fused via the ensemble learning mechanism to create a comprehensive prediction model, serving as the ultimate risk grading prediction model for water environment treatment PPP projects. This model was assessed using commonly used model evaluation algorithms in machine learning, continuously refining the performance of the training model through training effects.
Model interpretability method
The interpretability of machine learning models, particularly when juxtaposed with conventional generalized linear models, remains constrained despite their capability to measure feature importance. The manifestation of the Shapley Additive Explanations (SHAP) model has alleviated this challenge by introducing a method to interpret features by computing each feature's contribution to the predicted outcome. The utilized values, known as SHAP values, quantitatively delineate each feature's contribution (Nordin et al. 2023). A higher SHAP value signifies a more substantial contribution of the feature to the predicted value (Li et al. 2022c). In this investigation, we employ the SHAP explanation model to compute the contributions of various risk features in water environment treatment PPP projects, capitalizing on its commendable computational performance and intuitive attributes.
denotes a subset of features utilized in the model, with the stipulation that j is excluded from the set S;
signifies the total count of features;
represents the prediction derived from the feature values within set S;
delineates the contribution of feature j to val.
Improved voting mechanism






EXPERIMENTS AND ANALYSIS
Data collection and analysis
A significant portion of research concerning risks in PPP projects is centered on risk identification and classification, risk analysis and evaluation, alongside risk allocation and management strategies (Wang et al. 2018). Extensive research has been conducted over the past decade to investigate risk management issues in PPP projects, identifying various types of risks, such as financial, operational, political, and environmental risks (Xu et al. 2010). Water environment systems embody dynamic, complex, open systems with temporal, spatial, and volumetric variations. This complexity results in distinct techno-economic characteristics of water environment treatment PPP projects compared to purely commercial PPP projects, including strong quasi-public interest, high difficulty in integrating governance technologies, complex assessment of governance effects, and difficult project coordination and collaboration (An et al. 2018). The prevailing research challenge lies in identifying risk factors in water environment treatment PPP projects.
In the quest for a comprehensive understanding of the risks entailed in water environment treatment PPP projects, it is imperative to delve into a multi-dimensional data exploration. The richness and depth of data harnessed will fundamentally dictate the precision and insightfulness of the risk assessment framework being developed. Using keywords or subject terms in both Chinese and English, such as ‘PPP’, ‘risk’, and ‘water environment treatment’, a combined search was conducted in databases such as China National Knowledge Infrastructure (CNKI), Institute of Science Information (ISI) Web of Science, and ScienceDirect to find relevant literature for risk factor analysis. This resulted in a preliminary list of risk factors for water environment treatment PPP projects, as detailed in Table 1. The existing literature mainly discusses risks caused by the government, risks caused by social capital, and risks generated by the external environment. Government-caused risks primarily stem from government involvement in project management, including tax adjustments (Liu & Xue 2018; Li et al. 2022b), government intervention and credit issues (Cui et al. 2019; Wang et al. 2021), and inadequacies in existing laws, regulations, and regulatory systems (Feng et al. 2022; Su & Cao 2022). Social capital-caused risks mainly arise from actual project construction and operation, such as completion risks (Feng et al. 2022; Su & Cao 2022), construction technology risks (Zhang et al. 2021), contract change risks (Li & Wang 2019; Feng et al. 2022; Su et al. 2022), delay risks (Li & Wang 2019; Li et al. 2022a, 2022b; Su & Cao 2022), cost overrun risks (El-Kholy & Akal 2021; Su & Cao 2022), insufficient project revenue risks (El-Kholy & Akal 2021; Su et al. 2022), dispute and infringement risks (Chou & Lin 2013; Wang et al. 2019; Fu et al. 2023), and social capital change risks (Wang et al. 2019; El-Kholy & Akal 2021). External environment risks refer to risks directly or indirectly caused by the external environment, including environmental damage risks (An et al. 2018; Owolabi et al. 2020; Su & Cao 2022), geological condition risks (Cui et al. 2019; Feng et al. 2022), social stability risks (Wang et al. 2019; Li et al. 2021), public satisfaction (Li et al. 2020a, 2020b; Fu et al. 2023), inflation risks (Wang et al. 2018; Zhang et al. 2021), and force majeure (Li & Wang 2018; Wang et al. 2018).
Preliminary list of risk factors for water environment treatment PPP projects
Risk type . | Specific risk indicators . |
---|---|
Government-induced | Tax adjustment risk |
Government intervention and credit issues | |
Inadequate legal and regulatory frameworks | |
Social capital-induced | Quality completion risk |
Construction technology risk | |
Contract change risk | |
Schedule delay risk | |
Operational cost overrun risk | |
Project revenue shortfall risk | |
Dispute and infringement risk | |
Social capital change risk | |
External environment-induced | Environmental damage risk |
Geological condition risk | |
Social stability risk | |
Public opinion risk (public satisfaction) | |
Inflation risk | |
Force majeure risk (political, natural conditions) |
Risk type . | Specific risk indicators . |
---|---|
Government-induced | Tax adjustment risk |
Government intervention and credit issues | |
Inadequate legal and regulatory frameworks | |
Social capital-induced | Quality completion risk |
Construction technology risk | |
Contract change risk | |
Schedule delay risk | |
Operational cost overrun risk | |
Project revenue shortfall risk | |
Dispute and infringement risk | |
Social capital change risk | |
External environment-induced | Environmental damage risk |
Geological condition risk | |
Social stability risk | |
Public opinion risk (public satisfaction) | |
Inflation risk | |
Force majeure risk (political, natural conditions) |
While the collated comprehensive risk indicators offer valuable insights, there exist notable limitations in their applicability to this research. Primarily, these indicators predominantly stem from a project-centric analysis, lacking in differentiation among the risk profiles attributable to varied stakeholders involved in the project. Furthermore, these indicators heavily rely on qualitative data, which poses substantial challenges in terms of comprehensive and accurate acquisition in practical scenarios.
Additionally, the existing body of research specifically focusing on risks associated with water environment governance projects remains relatively scant. This gap has led to the preliminary list of risk indicators being more aligned with those typical of general PPP construction projects rather than being tailored to the unique nuances of water environment governance projects. Consequently, this initial compilation of risk indicators does not completely resonate with the specific context and requirements of the current study. This disparity underscores the necessity for a more targeted and nuanced approach in identifying and analyzing risk factors pertinent to water environment governance PPP projects.
Jiujiang City's unique environmental, geographical, and economic characteristics make it an ideal case study for water environment treatment PPP projects. Located along the Yangtze River and home to parts of Poyang Lake, Jiujiang (28°41′-30°05′N, 113°56′-116°54′E) faces specific ecological challenges and opportunities. Its role in China's Yangtze River Economic Belt as a green development city further emphasizes its significance in national environmental sustainability efforts. The Jiujiang City Water Environment Treatment PPP Project for this study was launched in 2018, with a total investment of 7.699 billion RMB. It has a designed sewage treatment capacity of 145,000 m3/day, a designed pipeline length of 188.3 km, a service urban area of 56.5 km, and a population of 796,000. The project operation period is 20 years, including a 2–3-year construction period, including six sub-projects.
To surmount the challenges posed by the qualitative nature of some data, data augmentation techniques were deployed (Mazher et al. 2018). This entailed leveraging expert insights to quantitatively represent qualitative data, thereby augmenting the dataset and enhancing the comprehensiveness of the analysis. Expert interviews were conducted based on the preliminary list of project risk factors (refer to Table 2 for expert information), with the aim of compensating for the scarcity of research on risks in water environment governance projects. The data were sourced from the public data platform of the Ministry of Ecology and Environment of China (https://www.mee.gov.cn/), the Jiujiang City water environment treatment project's official document (https://www.yeec.com.cn/hbjt/index/index.html) and the project team's negotiation memorandum. Utilizing publicly available data along with enterprise data supports the scientific nature of this study in terms of data accessibility and authenticity (Shrestha et al. 2018). Five years' worth of project risk factor information was compiled, and the specific contents of various risk categories were categorized. This compilation yielded a set of 12 risk data features for water environment treatment PPP projects, covering the natural environment, ecological environment, socio-economic, and project entity subsystems. The evaluation indicators are delineated in Table 3.
Expert basic information table
Basic information . | Category . | Sample size . | % . |
---|---|---|---|
Type of affiliated unit | Government agencies | 4 | 20 |
Institutions of higher learning | 4 | 20 | |
Water environment management enterprises | 7 | 35 | |
General PPP project enterprises | 3 | 15 | |
Others | 2 | 10 | |
Related project work or research experience | Within 1 year | 1 | 5 |
1–3 years | 6 | 30 | |
3–5 years | 9 | 45 | |
Over 5 years | 4 | 20 | |
Degree of understanding of related projects | Very well understanding | 10 | 50 |
Better understanding | 7 | 35 | |
General understanding | 3 | 15 | |
Little understanding | 0 | 0 |
Basic information . | Category . | Sample size . | % . |
---|---|---|---|
Type of affiliated unit | Government agencies | 4 | 20 |
Institutions of higher learning | 4 | 20 | |
Water environment management enterprises | 7 | 35 | |
General PPP project enterprises | 3 | 15 | |
Others | 2 | 10 | |
Related project work or research experience | Within 1 year | 1 | 5 |
1–3 years | 6 | 30 | |
3–5 years | 9 | 45 | |
Over 5 years | 4 | 20 | |
Degree of understanding of related projects | Very well understanding | 10 | 50 |
Better understanding | 7 | 35 | |
General understanding | 3 | 15 | |
Little understanding | 0 | 0 |
Risk data feature table for water environment treatment PPP projects
System name . | Risk feature name . | Risk feature-related evaluation indicator set . |
---|---|---|
Natural environment subsystem | Water environment | Hydro-sediment, water quality, water temperature, water level, sediment |
Acoustic environment | Noise | |
Atmospheric environment | Dust, exhaust emissions, local climate | |
Surface environment | Solid waste, soil nutrients, geology, soil erosion, soil salinization, soil marshification, landslides | |
Ecological environment subsystem | Terrestrial organisms | Terrestrial animal and plant growth risks |
Aquatic organisms | Safety risks of aquatic animals, aquatic plants, aquatic microorganisms | |
Socio-economic subsystem | Livelihood security | Public satisfaction, employment opportunities |
Local economic development | Regional industry, regional agriculture, urban planning, surrounding landscape, regional economic risk | |
Project entity subsystem | Legal risks | Dispute, breach, infringement risks, planning, standards, and contract change risks |
Operational risks | Construction risks caused by social capital, operation and maintenance management risks | |
Financial risks | Interest rate change risk, revenue shortfall risk, social capital change risk | |
Force majeure risks | Force majeure due to political and natural conditions |
System name . | Risk feature name . | Risk feature-related evaluation indicator set . |
---|---|---|
Natural environment subsystem | Water environment | Hydro-sediment, water quality, water temperature, water level, sediment |
Acoustic environment | Noise | |
Atmospheric environment | Dust, exhaust emissions, local climate | |
Surface environment | Solid waste, soil nutrients, geology, soil erosion, soil salinization, soil marshification, landslides | |
Ecological environment subsystem | Terrestrial organisms | Terrestrial animal and plant growth risks |
Aquatic organisms | Safety risks of aquatic animals, aquatic plants, aquatic microorganisms | |
Socio-economic subsystem | Livelihood security | Public satisfaction, employment opportunities |
Local economic development | Regional industry, regional agriculture, urban planning, surrounding landscape, regional economic risk | |
Project entity subsystem | Legal risks | Dispute, breach, infringement risks, planning, standards, and contract change risks |
Operational risks | Construction risks caused by social capital, operation and maintenance management risks | |
Financial risks | Interest rate change risk, revenue shortfall risk, social capital change risk | |
Force majeure risks | Force majeure due to political and natural conditions |
Relationship between existing system and government perspective system.
In this context, our study emphasizes the importance of continuous operation risks and natural ecological environment risks in the risk categories prioritized by the government in water environment treatment PPP projects. Discussing specific projects in this domain also enables the collection of subjective risk data, crucial for empirical model validation. Financial risks, representing budget overruns, funding gaps, or unexpected financial emergencies, often stem from market volatility or fluctuating interest rates, posing significant challenges to project financial stability (Li & Wang 2018; Akomea-Frimpong et al. 2021). Operational risks cover potential issues in technical and managerial aspects of project execution, where technical failures, managerial shortcomings, or operational lapses can lead to delays, cost increases, or compromised performance (Xiang et al. 2022). The inherent complexity of water treatment projects often heightens these risks, underscoring the need for effective operational strategies and skilled project management.
Political risks arise from changes in policy or government stability, with PPP projects being particularly vulnerable to such shifts. Changes in policies or government can alter project priorities, introduce regulatory challenges, or affect funding, thereby impacting project timelines and financial frameworks (Bao et al. 2022). Environmental risks, crucial due to the ecological nature of water treatment PPP projects, include adverse weather conditions, geological surprises, or new environmental regulations, which can lead to delays, increased compliance costs, or in extreme cases, project suspension (Ma et al. 2020).
In essence, the study categorizes risk indicators into subsystems: the natural environment and ecological environment subsystems primarily encompass the government's regulatory risks related to the natural ecology, while the socio-economic subsystem addresses the public impact of water environment treatment PPP projects. The project entity subsystem acknowledges the government's role in ensuring sustainable PPP project operation but excludes engineering risks not shared by the government, such as cost overrun risks and construction technology risks.
It is crucial to recognize that the nature of participating enterprises significantly influences risk assessment outcomes in PPP projects. In the realm of comprehensive water environment management in China, projects demand substantial investment and carry public characteristics. Consequently, state-owned enterprises with robust comprehensive strength predominate this sector. Meanwhile, private or foreign-funded enterprises often find their most viable strategy in forming alliances with state-owned entities, a trend reflected in the increasing mergers and acquisitions within the environmental protection industry. For the Jiujiang project, the involvement of the state-owned China Three Gorges Corporation aligns with the prevailing trend in the Chinese market for water environment treatment projects. State-owned enterprises typically bring stability and significant financing to such projects, though they also pose unique challenges, particularly in aligning with bureaucratic procedures and national policies. In contrast, private or foreign-invested enterprises introduce different dynamics, focusing more on cost efficiency and innovation, albeit with higher financial risks and market sensitivity. Our study incorporates parameters to reflect these diverse characteristics and risk profiles of the involved enterprises.
After determining the risk indicator system, the data categorization utilized the equal interval method, classifying 927 risk data entries into four risk levels: low, medium, higher, and high, predominantly consisting of low-level risks. This imbalance in the dataset steered us away from deep learning approaches, such as multilayer neural networks, due to their unsuitability in such contexts (Abdoli et al. 2023; Tsai & Chang 2023). Thus, a conventional machine learning strategy was employed in our research.
To understand the relationships between various indicators and risk levels, we used scatter matrix plots to visualize the risk data. Among the 29 indicators analyzed, four were chosen for detailed examination due to their direct impact on the project's risk dynamics: water quality, local industrial economy, local climate, and soil erosion. These indicators were selected for their direct relevance to the project's environmental, socio-economic, and operational aspects. For example, water quality is a direct indicator of the project's environmental impact, the local industrial economy reflects socio-economic effects, local climate pertains to operational challenges, and soil erosion addresses long-term sustainability (Tabari et al. 2021; Chen et al. 2023; Li et al. 2023; Zhu et al. 2023).
These tailored strategies in data collection and processing not only uphold the integrity of the analysis but also significantly contribute toward achieving a nuanced and actionable risk assessment model. The ensuing subsections will delve into the methodologies deployed to dissect this data and unearth invaluable insights into risk management for water environment treatment PPP projects.
Risk feature contribution analysis
In the context of water environment treatment PPP projects, the primary goal from a governmental perspective is to enhance water environment governance for ecological restoration. Within this framework, water environment risk is identified as a pivotal factor affecting the project's overall risk, emphasizing the necessity of effective operation and maintenance management for achieving the project's aims. This underscores the criticality of operation and maintenance management risk in ensuring the sustainability of the project. Additionally, the promotion of local economic development, closely tied to the betterment of societal conditions, is a fundamental objective of these government-led projects. Consequently, this objective's lower relative contribution to the overall risk assessment mirrors the government's prioritization of goals in water environment treatment PPP projects, highlighting a balanced approach to risk evaluation and project management.
Dataset construction
Utilizing the established risk feature set for water environment treatment PPP projects, we addressed the missing data values. The specific method initiates by setting a threshold to ascertain whether a feature is missing or not. Should the percentage of missing values for a feature surpass this threshold, the feature is excluded. In this study, the threshold for missing feature deletion is set at 80%. If the threshold is not surpassed, the KNN algorithm is used to locate the k-nearest samples to the sample with the missing value, and the average value of their corresponding features.

After processing missing features and dimensionless treatment, the distribution of feature values is between 0 and 1. The dataset is divided into training and testing sets in a 7:3 ratio, and a SVM classifier is used to classify the data.
Model construction and training
The efficacy of our stacking ensemble learning model is fundamentally linked to the precision and diversity of its base classifiers. This aligns with Chung et al. (2023)'s principle of ‘good but different’ in ensemble learning. Our initial phase involved experimenting with various machine learning classifiers in Python's Scikit-learn library, conducted on the Jupyter Notebook platform. This phase led to the independent development of six classifiers: KNN, CART, LDA, NB, SVM, and WETPR-SVM. Each classifier underwent rigorous training on the dataset, utilizing cross-validation, random search, and learning curves for optimal hyperparameter tuning.
The RBFSVM's effectiveness is attributed to its ability to project data into a higher-dimensional space, making complex, nonlinear relationships linearly separable. This characteristic is especially beneficial in water environment governance PPP projects, where risk factors often exhibit nonlinear interdependencies. The RBFSVM's aptitude for deciphering these complexities is a critical factor in its high classification accuracy.
Furthermore, the Gaussian kernel, also known as the RBF kernel, excels in handling nonlinear classification challenges. It transforms input data into a higher-dimensional feature space, effectively capturing complex, nonlinear patterns specific to our dataset. This capability of the Gaussian kernel to linearly separate intricate risk factor interrelations in water environment treatment PPP projects is unparalleled, especially when compared to the linear kernel functions.
As shown in Figure 7, integration of the Gaussian kernel in our model demonstrates a significant advancement in the classification of project risks, particularly for water environment governance PPP projects. Its ability to unravel the non-linearity and complexity inherent in these projects and map them into a higher-dimensional space was instrumental in achieving enhanced classification accuracy. This feature enabled a more refined and precise classification of project risks, substantiated by the RBFSVM's superior classification accuracy (0.9043) over LinearSVM (0.8191) and Sigmoid (0.8297).
Model comparison and evaluation
This study's ensemble learning model ultimately completes the four-classification task for water environment governance PPP project risk. Therefore, we use accuracy (accuracy), macro-average precision (Macro_P), macro-average recall (Macro_R), and macro-average F1 score (Macro_F1) as four indicators to evaluate the performance of the model.
In Equations (3)–(6), represents the number of positive samples predicted as positive by the model;
represents the number of negative samples predicted as positive by the model;
represents the number of positive samples predicted as negative by the model;
represents the number of negative samples predicted as negative by the model; n represents the number of risk feature categories, and
,
and
represent the precision, recall, and F1 scores of the model for different categories, respectively.
To evaluate the performance of various classification algorithms, we utilized multiple methodologies to train the dataset and assess their effectiveness. LDA stands as the sole linear algorithm in this mix, while the rest are nonlinear. The relevant steps involved are as follows: (1) partition the training set; (2) appraise the algorithm models using 10-fold cross-validation; (3) generate six unique models for predicting new data; and (4) compare their classification accuracy. As depicted in Table 4, the WETPR-SVM model yields the highest Accuracy, Macro_P, Macro_R, and Macro_F1 scores, which are 0.9025, 0.9055, 0.9026, and 0.9021, respectively. These results indicate that the WETPR-SVM model developed in this study surpasses traditional singular machine learning classification models in terms of overall performance. It demonstrates proficiency in addressing the classification problem of water environment governance PPP project risk with enhanced accuracy and generalizability, showcasing superior classification capacity. Thus, WETPR-SVM is selected as the optimal model for predicting the risk classification of water environment governance PPP projects.
Performance evaluation of prediction models
Classifier . | Accuracy . | Macro_P . | Macro_R . | Macro_F1 . |
---|---|---|---|---|
KNN | 0.8532 | 0.8561 | 0.8537 | 0.853 |
CART | 0.8251 | 0.8252 | 0.8249 | 0.8253 |
LDA | 0.8469 | 0.8471 | 0.847 | 0.8469 |
NB | 0.8778 | 0.8781 | 0.8779 | 0.8775 |
SVM | 0.8467 | 0.8465 | 0.8469 | 0.8462 |
WETPR-SVM | 0.9025 | 0.9055 | 0.9026 | 0.9021 |
Classifier . | Accuracy . | Macro_P . | Macro_R . | Macro_F1 . |
---|---|---|---|---|
KNN | 0.8532 | 0.8561 | 0.8537 | 0.853 |
CART | 0.8251 | 0.8252 | 0.8249 | 0.8253 |
LDA | 0.8469 | 0.8471 | 0.847 | 0.8469 |
NB | 0.8778 | 0.8781 | 0.8779 | 0.8775 |
SVM | 0.8467 | 0.8465 | 0.8469 | 0.8462 |
WETPR-SVM | 0.9025 | 0.9055 | 0.9026 | 0.9021 |
Box plot 7 comparison indicates that the WETPR-SVM model boasts the smallest prediction accuracy range of 0.07, suggesting high stability and consistent performance across diverse datasets. Conversely, the NB model exhibits the largest interquartile range of prediction accuracy at 0.2636, likely due to its struggle with multi-source heterogeneous data, missing values, and class imbalance issues. The CART model, with the lowest median prediction accuracy bound of 0.8251, may falter due to its decision tree-based algorithm being sensitive to noisy data and overfitting issues. These findings underscore the superiority of the WETPR-SVM model in predicting water environment treatment PPP projects' risk classifications.
In summary, the ensemble learning-based approach posited in this study adeptly leverages various machine learning algorithms' strengths and addresses single algorithm limitations with multi-source heterogeneous data, missing values, and class imbalance issues, enhancing prediction accuracy and generalization capability.
DISCUSSION
Our study makes a contribution to risk management within PPP frameworks, with a particular focus on water environment treatment projects. We have delineated a comprehensive array of risk factors and demonstrated their profound impact on project success. Crucially, our analysis from a governmental perspective underlines the necessity of government oversight in ensuring the success and sustainability of PPP projects.
Central to our findings is the ensemble learning model based on the Stacking method, marking a considerable leap in risk prediction accuracy. This model's capacity to adeptly manage multi-source heterogeneous risk data facilitates a more refined risk classification and empowers stakeholders with informed decision-making capabilities. The empirical validation using data from the Jiujiang City project substantiates the WETPR-SVM model's superiority in risk prediction.
Our analysis revealed primary risk indicators such as water environment risk, operational risk, and local economic development risk. These indicators are instrumental in devising effective risk mitigation strategies. We found that prioritizing these risks enhances operational strategies, governance quality, and proactive local economic participation, ultimately fostering increased project success.
Furthermore, our research extends its implications beyond water treatment projects to encompass various PPP projects across different sectors and regions. The methodologies and insights gleaned are applicable to broader risk management strategies. This research bridges the gap between academic investigation and practical applications, contributing to sustainable development in environmental governance projects.
However, our findings also unearth several areas for future exploration. There is a critical need for further research to extend these methodologies to diverse PPP contexts, evaluating the scalability and adaptability of our model under varying environmental and economic conditions. Additionally, investigating the complex interplay of identified risk factors in various PPP scenarios could yield deeper insights, enhancing the efficacy of risk management strategies.
CONCLUSION
Our study focused on risk management in water environment treatment PPP projects and presents a comprehensive, data-driven model for risk assessment and prediction. The implementation of our ensemble learning model, specifically tailored for such projects, showcases the advanced application of machine learning techniques, significantly improving risk prediction precision and reliability. This approach is critical for proactive risk management, ensuring the success and sustainability of these projects.
The core findings of our study are as follows:
- (1)
The ensemble learning model demonstrates exceptional predictive accuracy, marking a significant advancement in integrating machine learning into PPP project risk management.
- (2)
The identification of key risk indicators – water environment, operational risk, and local economic development – is crucial for developing effective risk mitigation strategies and aligning stakeholder objectives.
- (3)
Our model's innovative weighted voting mechanism effectively overcomes the challenges of missing or abnormal data, enhancing robustness and real-world applicability.
- (4)
Empirical validation using the Jiujiang City project data underscores our model's practical relevance and effectiveness, setting a new benchmark in PPP project risk management practices.
Our study's industry contributions are manifold. By integrating advanced machine learning with the specific risk management requirements of water environment treatment PPP projects, we provide a pioneering, data-driven approach to navigate complex risk landscapes. The model enhances risk prediction accuracy and offers actionable insights for robust risk mitigation, potentially boosting the success and sustainability of PPP projects in water treatment, thus contributing to sustainable urban development goals.
Acknowledging our study as foundational, we intend to work with policymakers and practitioners to translate our research findings into actionable insights and policy recommendations, contributing to the legal and policy framework of water environment treatment PPP projects. Additionally, the methodological framework developed in this study, while applied to Jiujiang, is structured to accommodate adaptability to other similar regions. The risk assessment and prediction model is designed to be customized based on the distinct risk profiles, data availability, and governance structures of other regions, thereby facilitating its broader application. The versatility of the model is a stepping stone toward our future endeavors to further validate and augment the model's applicability across diverse regional settings, thereby contributing to the robustness and generalizability of the risk assessment framework for water environment treatment PPP projects.
FUNDING
This work was supported by the National Social Science Funds of China [grant number 17BGL156].
ETHICAL APPROVAL
This paper does not contain any studies with human participants or animals performed by any of the authors.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.