Abstract
One way to improve the infrastructure, operations, monitoring, maintenance, and management of wastewater treatment systems is to use machine learning modelling to make smart forecasting, tracking, and failure prediction systems. This method aims to use industry data to treat the wastewater treatment model. Gradient-Boosted Decision Tree (GBDT) algorithms were used gradually to predict wastewater plant parameters. In addition, we used the Slime Mould Algorithm (SMA) for feature extraction and other acceptable tuning procedures. The input and effluent Chemical Oxygen Demand (COD) prediction for effluent treatment systems applies to the GBDT approaches employed in this study. GBDT-SMA employs artificial intelligence to provide precise method modelling for complex systems. Several training and model testing techniques were used to determine the best topology for the neural network models and decision trees. The GBDT-SMA model performed best across all methods. With 500 data, GBDT-SMA achieved an accuracy of 96.32%, outperforming other models like Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Deep Convolutional Neural Network (DCNN), and K-neighbours RF, which reached an accuracy of 82.97, 87.45, 85.98, and 91.45%, respectively.
HIGHLIGHTS
The Slime Mould Algorithm was employed for feature extraction and other suitable tuning techniques.
The prediction of input and effluent COD for effluent treatment systems is applicable to the GBDT approaches employed in this study.
To enable precise method modelling for complicated systems, GBDT-SMA leverages artificial intelligence.
INTRODUCTION
The rapid growth of technology is causing substantial changes in the way decisions are made in the water sector. Water supply and wastewater distribution system operations, monitoring, maintenance, and management issues are getting more numerous and complex in varied urban, industrial, and agricultural settings. These adjustments are a direct result of modern technology's growing influence in this industry.
Industrial sewage is a significant contributor to urban water pollution and is one of its main causes (Kirchem et al. 2020). Fortunately, the quick development of sewage components and wastewater treatment (WWT) procedures has resulted from the continual advancement of research and technology. Based on their traits, these therapies can be divided into three categories: physical, chemical, and microbiological techniques (Zhu et al. 2020). Pharmaceuticals are added to the sewage in the chemical method to help hazardous components change into safe substances that adhere to disposal requirements. This method has been shown to be successful in reducing the amount of industrial sewage-related water contamination. For this purpose, chemicals like polyacrylamide chloride (PAC), polyacrylamide (PAM), FeCl2, NaOH, and others are frequently used. The drugs react with the dangerous elements found in the sewage by introducing these molecules, creating safe compounds.
The sewage is subjected to the coagulation process, which involves the inclusion of certain chemicals such as PAC and PAM (Hernández del Olmo et al. 2019). The proper dosage of coagulants, which is primarily controlled by the index of water quality and influent quantity, is crucial for achieving the appropriate effluent water quality (Thürlimann et al. 2018). Due to the complex nature of coagulant chemical theory, which is influenced by a wide range of variables, including turbidity (TUR), pH, electrical conductivity (CON), flow rate, total phosphorus, total nitrogen, ammonia nitrogen, chemical oxygen requirement, and others, determining the proper dosage can be difficult. The majority of WWT facilities today use manual or automatic control techniques as a result. However, manual control requires the knowledge of skilled experts because it is subject to human variability. Traditional control methods, such as the proportional integral derivative (PID) controller standard, have a weak ability to adapt to complex systems and fall short of the requirements for high performance (Rubio et al. 2016). The PID controller experiences considerable time delays and poor parameter synchronization. While using minimal dosages leaves the target effluent quality unachievable, using excessive dosages results in waste and higher expenditures. Intelligent dosage algorithms are increasingly used to address these issues and effectively regulate additional drugs. These algorithms provide customizable control, lower agent usage costs, and higher-quality effluent water.
Particularly in the context of pumping system applications, clustering, a sort of unsupervised algorithm intended to discover clusters within a given dataset, offers a viable option for problem identification and prediction. Olesen & Shaker (2020) provides a thorough analysis of the sophisticated preventative maintenance techniques used in pumping systems. The K-means clustering algorithm is used in a supervised setting by Uhlmann et al. (2018) to assess its efficacy in mistake detection. Fuzzy clustering is used by Daher et al. (2020) to evaluate the severity of failures, while Cao et al. (2019) determined the Remaining Useful Life (RUL) of a distillation column using a fuzzy C-means and ANFIS combination. The fact that clustering techniques have not yet been used in the context of waste treatment facilities is notable.
Although these machine learning (ML) techniques have significantly increased the efficacy of wastewater systems, they have primarily been applied to certain operational features (weather forecasting, figuring out the makeup of particular particles, or calculating the number of sensors needed). Although the available measurements appear to be unrelated, there has not been enough use of ML to identify any possible relationships between them. Instead of being used especially for wastewater treatment, clustering algorithms have been used in a variety of diagnostic procedures with diverse purposes.
Gradient-Boosted Decision Trees (GBDTs) can classify and predict a wide range of problems. This approach improves education by requiring fewer iterations for a workable solution and a less complex goal. Ahmadianfar et al. (2020) developed the Gradient Based Optimizer (GBO) around 2020. Newton's strategy inspired the GBO. It contains the local escaping operator and the Gradient Search Rule. GBO combines population-based research with GB. GBO is a powerful and efficient algorithm that corrects previous flaws. GBO employs a GB technique to bypass impractical regions in favour of usable ones. Population-based strategies are also used in this Google method. Six engineering challenges and 28 mathematical operations were used to put the GBO to the test. GBO, unimodal, multimodal, and composition challenges were used to investigate exploitative, explorative, and local optimality avoidance. According to the statistics, GBO was more successful than other optimizers in finding solutions. With GBO, local optimum solutions and early convergence are uncommon (Deb et al. 2021).
To overcome the drawbacks of the classic Slime Mould Algorithm (SMA) and increase coordination between local and global exploration and exploitation, an advanced SMA variation based on Cauchy mutation (CM) and crossover mutation was presented. This improved collaboration between local and international exploration and extraction (Asha & Roberts 2022). The solution is updated significantly after the initial search agent is created. First, update the search agent and search SMA. The second stage employs CM to alter the SMA-based search agent. An optimal search agent from the previous generation is identified using crossover mutation.
The rest of the article is structured as surveys: in Section 2, we review previous studies that addressed the privacy issue and proposed solutions. In Section 3 of our report, we discuss the implications of the findings. In the following section, we will go over the proposed method. Section 4 includes the performance evaluation report. In the final section, we discuss our future research plans and conclude.
Research contributions
This strategy intends to handle the WWT model utilizing business-related data. GBDT methods gradually predicted the wastewater plant's characteristics.
We also extracted features using the SMA and other suitable tuning techniques. This tool helps users take corrective action and complete the procedure by the standards. Due to its complex mechanism, dynamic changes in some components, and capacity to get around some shortcomings of traditional mathematical models for WWT plants, it is a valuable technology.
The intention is to make the WWT model more predictable. The GBDT procedures employed in this work apply to the input and effluent COD prediction for effluent treatment methods.
GBDT-SMA uses artificial intelligence to model complicated systems with accuracy.
LITERATURE SURVEY
Khudair et al. (2018) evaluated the acceptability of Baghdad City groundwater for human consumption using the Water Quality Index. The water quality index was created by combining four water parameters: pH, chloride concentration, sulphate concentration, and dissolved solids content. Using IBM's SPSS Statistics 19, we were able to predict variations in the groundwater's WQI with accuracy. The results of the Artificial Neural Network Modelling (ANNM) showed a high level of prediction efficiency, with values of 0.038 and 0.005 for the sum of squares error functions in the training and testing samples, respectively. Additionally, the coefficient of determination demonstrated an importance value of 0.973.
According to Farhi et al. (2021), using fuzzy multilevel control was beneficial in managing the aeration volume. This is done to ensure that the reaction is as strong as possible. The fundamental advantage of adopting these systems in sewage treatment is that neural networks can flawlessly mimic any non-linear model. A neural network could calculate the optimal range for each water quality indicator by leveraging the correlation between the metrics to construct the field. The network would be able to determine the system model due to this.
According to Weng et al. (2021), rising sewage discharge has harmed the ecological environment and resulted in a tremendous waste of resources, interrupting people's everyday lives and making it more challenging to accomplish their work. Furthermore, in many parts of China, the recycling of sewage treatment must be finished before it can be considered adequate, with the output from sewage treatment systems having to achieve the reuse level. This is primarily due to the country's lack of freshwater resources.
Many authors, including Li et al. (2021), the study argues that to maintain China's long-term economic growth, to cleanse the water and use it for other purposes, more advanced methods, technologies, and preventative measures must be used to treat sewage. As a result, China's economy will expand consistently. Governments have prioritized better sewage treatment globally, boosting their efforts and regulations.
According to Dou et al. (2019), the overall effect load of the sewage treatment procedure contributed to instability. The entering water and sewage flow significantly impact the water quality as it enters the sewage treatment plant. As a result, the sludge builds up. It is impractical to manage large sewage treatment systems using old-fashioned methods. In today's society, we require this.
Rahman et al.'s (2018) examination into the impact of wastewater containing synthetic dye effluents on Red Amaranthus seedlings. The effluent from a Belkuch loom dyeing factory was sprayed on the seedlings in various amounts to improve their quality. When treated with a 5% dilution, the most significant percentage of seeds germinated (98.3%). As effluent concentrations increased, the amount of germination decreased. The level of documented toxicity is highest at this dose.
According to Islam et al. (2015), Indian spinach had a 76% germination percentage when dyeing effluents were used and a 79% germination percentage when pharmaceutical effluents were used. The effect of these effluents on leafy plants was studied. Their investigation revealed this. In contrast to pharmaceutical effluent treatment, which resulted in 85, 79, and 84% germination, dyeing effluent treatment resulted in 28, 80, and 84% germination of kangkong, jute, and stem amaranthus, respectively. These findings were obtained by comparing the treatment of pharmaceutical and dye wastewater. The ability of crop seedlings to germinate was influenced by various industrial sewage. Compared to other industrial effluents, beverage effluents had minor detrimental effects on the germination and seedling development of green vegetable seeds and seedlings. This was evident in every leafy vegetable studied.
Ahmed et al. (2019) compared heavy metal bioaccumulation in vegetables during the wet and dry seasons on irrigated farmland near an industrial complex in Ghazipur, Bangladesh, Bangladeshi veggies. During the wet season, chromium, copper, zinc, cadmium, and arsenic levels in red amaranthus, spinach, pumpkin, and bottle gourd bioaccumulated less. According to the researchers, the climate affects the number of heavy metals plants absorb.
Qurie et al. (2015) proposed removing chlorpyrifos from the environment using a micelle-clay combination and cutting-edge treatment technology. Water purification and desalination, the two primary metabolites of chlorpyrifos are also eliminated. Batch adsorption was used to investigate diethyl triphosphoric acid and 3,5,6-trichloro-2-pyridinyl. Na-montmorillonite and Octadecyl trimethyl ammonium bromide had to be combined with clay to create a micelle. After that, activated carbon was added. Pesticide levels were determined using a Waters 2695 HPLC. Langmuir and Freundlich's isotherms were used to study adsorption kinetics. A contact time of 30–5 min is ideal for removing chlorpyrifos. After 180 min, 90% of 0.5 g of a micelle-clay complex containing 100 mg/L was removed. When it was built, the facility. A sand and micelle clay filter column is more effective than chlorpyrifos at removing TPC. Using these two adsorbents to capture chlorpyrifos could help clean up the environment. The WWTP of Al-Quds University was used for this investigation as well. Two processes were used: ultrafiltration with a hollow fibre unit (UF-HF) and a spiral wound unit (UF-SW), followed by an activated carbon column.
Ahamad et al. (2019) investigated surface water quality using regression analysis and neural network modelling. When we tested our models on data from two lakes at Tezpur University in Assam, India, we found that they provided a reliable indication of water pollution. The two lakes have a strong correlation, with R2 values between 0.69 and 0.82. ANNM of total solid and Biochemical Oxygen Demand (BOD) revealed a strong correlation between actual and predicted values. Both variables have strong correlations. The existing works summary was discussed in Table 1.
Summary of wastewater treatment systems
Author . | Title . | Method . | Advantage . | Disadvantage . |
---|---|---|---|---|
Weng et al. (2021) | Application of Artificial Neural Network Methods to Predict the Parakai Lake Water Quality Index | Artificial Neural Network (ANN) | Handling of Large and Multivariate Datasets, Feature Extraction and Selection | Data Requirements, Complexity |
Ahamad et al. (2019) | ANNs and ANN Regression Analysis for Surface Water Quality Prediction | Artificial Neural Network | Using time delay cells to deal with the dynamic nature of sample data | Biomass Pre-treatment, Regeneration and Reusability |
Sandu et al. (2017) | For best efficiency, the water flow arrangement at the wastewater treatment facility should include grit removal and filters. | Hybrid wastewater treatment system | Reducing the computation problems because the weights of the input and hidden layer need not be adjusted | Energy Consumption, Maintenance and Operation |
Shahnaz et al. (2020) | Acacia auriculiformis biomass was analyzed raw, acid-modified, and EDTA-complexed to eliminate hexavalent chromium. | Adsorption is increased by a variety of nanocellulose-related characteristics. | Solving the problem of low prediction accuracy | Disposal and Environmental Impact, Selectivity and Interference |
Zhu et al. (2021) | ML aids in selection of carbon-based materials for tetracycline and sulphamethoxazole adsorption. | Carbon-based materials adsorption | When subjected to varied environmental circumstances such as temperature, solution pH, and different adsorbent kinds, the created ML models outperformed classic isotherm models in terms of generalization | Data Requirements, Complexity |
Kaetzl et al. (2019) | On-farm wastewater treatment using biochar from local agro residues reduces pathogens from irrigation water for safer food production in developing countries | Biochar using wastewater treatment | The biochar filters outperformed, or were at least comparable to, the sand and rice husk filters | Energy Consumption, Maintenance and Operation |
Author . | Title . | Method . | Advantage . | Disadvantage . |
---|---|---|---|---|
Weng et al. (2021) | Application of Artificial Neural Network Methods to Predict the Parakai Lake Water Quality Index | Artificial Neural Network (ANN) | Handling of Large and Multivariate Datasets, Feature Extraction and Selection | Data Requirements, Complexity |
Ahamad et al. (2019) | ANNs and ANN Regression Analysis for Surface Water Quality Prediction | Artificial Neural Network | Using time delay cells to deal with the dynamic nature of sample data | Biomass Pre-treatment, Regeneration and Reusability |
Sandu et al. (2017) | For best efficiency, the water flow arrangement at the wastewater treatment facility should include grit removal and filters. | Hybrid wastewater treatment system | Reducing the computation problems because the weights of the input and hidden layer need not be adjusted | Energy Consumption, Maintenance and Operation |
Shahnaz et al. (2020) | Acacia auriculiformis biomass was analyzed raw, acid-modified, and EDTA-complexed to eliminate hexavalent chromium. | Adsorption is increased by a variety of nanocellulose-related characteristics. | Solving the problem of low prediction accuracy | Disposal and Environmental Impact, Selectivity and Interference |
Zhu et al. (2021) | ML aids in selection of carbon-based materials for tetracycline and sulphamethoxazole adsorption. | Carbon-based materials adsorption | When subjected to varied environmental circumstances such as temperature, solution pH, and different adsorbent kinds, the created ML models outperformed classic isotherm models in terms of generalization | Data Requirements, Complexity |
Kaetzl et al. (2019) | On-farm wastewater treatment using biochar from local agro residues reduces pathogens from irrigation water for safer food production in developing countries | Biochar using wastewater treatment | The biochar filters outperformed, or were at least comparable to, the sand and rice husk filters | Energy Consumption, Maintenance and Operation |
Kaushal & Mahajan (2021) created a hydroponic bench-scale system employing ornamental evergreen plants to lower turbidity and COD. By removing pollutants, Lata and Siddharth (Lata 2021) offered a biological method for controlling water quality. They suggested collecting plant species that could be used to make biofuel briquettes instead of wood and other fuel. Nonetheless, the biggest barrier to the widespread adoption of these cutting-edge eco-friendly technologies is the considerable investment and maintenance expenses associated with them, considering that many processes have only been evaluated in the laboratory.
In addition to typical trash, new pollutants such as industrial chemicals, herbicides, medicines, and personal care products are becoming increasingly prevalent in sewage according to Ahmed et al. (2021). Adsorption regimens can effectively remove these developing pollutants proposed by Koul et al. (2022). Because antibiotics risk eradicating the microbial species that currently exist in natural water bodies, they must be eliminated. Baladi et al. (2022) presented the photochemical destruction of antibiotics, such as penicillin G (PENG), to clean up wastewater containing non-degradable antibiotics as a green and effective advanced oxidation technology. By trapping toxic substances, nanoparticles can also be employed to remove them from wastewater systems. The effectiveness of Magnetic-MXene as a nanoparticle-based WWT system has been established. However, additional research is still needed to boost the efficacy of all comparable systems when scaled up (Hojjati-Najafabadi et al. 2022).
Problem statement of existing system is many WWT systems in ageing infrastructure require maintenance or replacement, which can be costly and time-consuming. Due to capacity restrictions, some WWT systems may require assistance in handling massive amounts of wastewater, resulting in overflows and spills. WWT systems can be energy-intensive in terms of consumption, resulting in high operational expenses and greenhouse gas emissions. Many people may need to be made aware of the importance of WWT and may contribute to the problem by improperly disposing of domestic trash due to a lack of public awareness.
PROPOSED SYSTEM
Wastewater treatment
Water treatment disinfects contaminated water so it can be reused for agricultural and human consumption. This treatment incorporates mechanical, physical, chemical, and biological procedures, engineering design, commercial expertise, and creativity. Water consumption in municipal, industrial, and residential settings generates significant effluent. pH, suspended particles, dissolved solids, turbidity, colour, and other factors contribute to wastewater's properties. Because each water source has a unique potential for pollution, it must be treated before it can be reused or returned to its original state. Physical, chemical, and biological WWT technologies exist. Flotation, sedimentation, aeration, and filtration are all forms of physical treatment. Chlorine, ozone, and neutralization are all used in chemical treatment. The following treatment procedures are listed. As coagulants, alum or iron (III) sulphate can be used. Because of its adsorption capabilities, carbon is helpful in both chemical and physical processes. Using bacteria, microbes, and biochemical processes, the natural water treatment method transforms wastewater into drinking water. The aerobic decomposition of water pollutants produces carbon dioxide. The anaerobic breakdown of methane produces carbon dioxide and biogas, which can be used as fuel. Manure, an anaerobic waste, has agricultural applications.
The importance of WWT
Many water supplies are dangerous for human and animal consumption due to urbanization, the use of fossil fuels, and pollution from municipal, industrial, and domestic sources. Humans and animals have many similarities. Third-world countries often lack comprehensive policies to prevent or penalize environmental deterioration. Untreated water is regularly poured into rivers and lakes by factories. To protect marine life, these water sources must be cleaned or treated before use.
Reports on research of WWT methods
Several techniques can successfully eliminate heavy metals from wastewater, such as co-precipitation/adsorption, membrane filtration, precipitation, ion exchange, adsorption, and adsorption. Activated carbon is expensive, yet adsorption has been efficiently studied. To clean wastewater from beverage industries, researchers tested the efficiency of commercial Calgon carbon (F-300) with activated carbon from coconut shells. Their findings show coconut shell activated carbon is more effective at absorbing organic molecules than commercial activated carbon. Previous research has employed coagulation/flocculation, membrane filtration, oxidation, and other techniques to remove COD from industrial wastewater, but they are expensive, time-consuming, and difficult. On average, 1.2 m3 of biogas is generated per m3 of wastewater.
Chemical Oxygen Demand
COD is primarily used in WWT systems to monitor overall organic pollution and evaluate the efficacy of the treatment procedure. It shows how many organic chemicals are in the wastewater and how many can be oxidized chemically. Operators can optimize treatment procedures and guarantee environmental compliance by monitoring COD levels. COD sensors continually monitor the UV absorbance of organic components in deionized water as shown in Figure 1. It is located at the WWT facility's outlet. An aperture in the process probe housing allows for using a cuvette. This is the location of the flash photometer. This aperture measures the UV extinction of the immersion media around the probe. The wiper cleans the measurement window. A standard measurement is used to account for the effect of turbidity on UV absorption. Because dissolved organic molecules absorb UV light, UV absorption measurements can be used to calculate the total amount of dissolved organic compounds in water. We quantify the organic component using the spectral absorption coefficient at 254 nm. Automatic baseline compensation and turbidity correction are performed using a 550 nm reference wavelength. This sensor is a turbidity-compensated multiple-beam absorption photometer. The SAC at 254 nm is deducted from the SAC at 550 nm to account for turbidity. NTPA001/2005 recognizes COD values.
Gradient-Boosted Decision Tree
In contrast to many other approaches, the GBDT methodology solves problems by optimizing function space. This technique is more adaptive, scalable, and resistant to non-linear complexity than linear models. In hierarchical relationship structures, GBDT simulates non-linear decision boundaries. GBDT implementations build initial learners to maximize negative correlation with the gradient of the loss function. GBDT's use of a negative rise enables global convergence in its models. This is a violation of marketing etiquette. Gradient-boosting combines loss function reduction, learner prediction, and loss function optimization. These three elements round up the technique. Each problem's loss function is unique. In regression and classification, logarithmic loss and MSE are widely used. Each iteration of the boosting approach can be enhanced by focusing on the residual loss. Decision trees are introduced progressively as weak learners to reduce the loss function. The model's existing trees have remained the same. Gradient descent minimizes size-related loss in trees. The dataset was also useable { SoftMax's loss function Gradient descent guaranteed model convergence. The learning rate defines the update step size to avoid overfitting, and the number of trees, M, is represented by the maximum number of training iterations per tree. It is shown how to minimize partition loss
. The following describes how the GBDT model works.
T represents the number of leaves on a tree. Learn how the loss function L() evaluates model performance using training data and utilizes the term ‘fitness’ to describe the model's complexity . Furthermore, the term's detrimental consequences on model complexity.
To find a solution for a specific sample Z, GBDT employs D additive functions, as shown in Eq. (5).
In the GBDT model, each tree forecasts the pseudo-residuals of the trees before it, given that the loss function can be distinguished. Negative gradient function with user-specified loss function As predictions are acquired, the loss function is reduced by training each successive model. When introducing a gradient-boosting model with insufficient trees, over- and under-fitting might occur. Both problems are avoidable.
The SMA
The SMA (Kaushal & Mahajan 2021) is a novel population-based metaheuristic inspired by the rhythmic behaviours observed in natural slime moulds. The SMA distinguishes itself by having a positive–negative feedback loop that facilitates optimal feeding pathways. It, like slime moulds, modifies its search path dynamically based on the quality of food discovered.
The SMA simulates three essential phenomena: grabble, wrap, and approach. While aggressively seeking food sources, the grabble phenomena guarantees that the algorithm avoids clashing with slime mould. It resembles the behaviour of slime moulds as they navigate their environment in this way. The wrap phenomena mimics the velocity matching observed in slime moulds, allowing the SMA to match its pace with the slime moulds' movements. This synchronization makes it easier to explore and exploit the search space. Finally, the approach phenomenon captures slime moulds' learning process as they migrate towards the feeding centre. The SMA method has comparable adaptive learning techniques, which improves its capacity to converge on optimal solutions. Overall, the SMA derives its success from the distinct properties and behaviours of slime moulds, resulting in a potent metaheuristic capable of solving difficult optimization problems effectively.






In Algorithm I, the SMA's process is described as follows.
RESULTS AND DISCUSSION
The performance of WWT was covered in this session. Performance measurements are accuracy, precision, recall, F-score, training and testing validation, RMSE, and receiver operating characteristic (ROC) analysis. The proposed approach was evaluated against Deep Convolutional Neural Networks (DCNNs) (Zhang & Gu 2022), K-neighbour Random Forest (Zhang et al. 2018), Convolutional Neural Networks (CNNs) (Su et al. 2022), and Artificial Neural Networks (ANNs) (Abyaneh 2018).
Performance metrics
The most effective way to evaluate WWT performance is to compare its accuracy to a confusion matrix. Developing seven classification performance indicators based on the confusion matrix is typical. The indicators considered were accuracy, precision, recall, F-score, training and testing validation, RMSE, and ROC analysis. The value of these metrics is determined by the outcome of a binary classification, which might be positive or negative. FP and FN indicate that emotional states were misclassified, whereas TP and TN suggest that conditions were accurately detected. True positives and true negatives both exhibit correct recognition of shapes. The many efficiency metrics that have been discussed will be explained in the paragraphs that follow.
Even though the RMSE can have any value between 0 and infinity, lower RMSE values are often desirable. A part of a ROC curve will be below the curve.
The model's effectiveness is assessed using Area Under the Receiver Operator Characteristic Curve (AUROC). ROC curves serve as a practical illustration of the beneficial interaction between true and false positives.
Precision analysis
Precision analysis of the GBDT-SMA approach using existing systems
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours RF . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 72.89 | 80.32 | 75.55 | 85.17 | 90.21 |
200 | 72.91 | 83.23 | 78.14 | 86.21 | 93.34 |
300 | 71.45 | 82.56 | 77.12 | 85.32 | 91.22 |
400 | 73.21 | 83.44 | 78.21 | 88.19 | 93.65 |
500 | 74.11 | 82.19 | 76.21 | 89.32 | 94.11 |
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours RF . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 72.89 | 80.32 | 75.55 | 85.17 | 90.21 |
200 | 72.91 | 83.23 | 78.14 | 86.21 | 93.34 |
300 | 71.45 | 82.56 | 77.12 | 85.32 | 91.22 |
400 | 73.21 | 83.44 | 78.21 | 88.19 | 93.65 |
500 | 74.11 | 82.19 | 76.21 | 89.32 | 94.11 |
Recall analysis
Recall analysis for the GBDT-SMA technique with existing systems
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 67.34 | 77.43 | 72.98 | 82.45 | 87.14 |
200 | 66.13 | 76.14 | 71.18 | 84.19 | 86.34 |
300 | 68.32 | 79.56 | 74.55 | 84.23 | 89.16 |
400 | 71.43 | 80.23 | 75.66 | 83.17 | 90.45 |
500 | 72.34 | 81.45 | 74.19 | 86.44 | 91.67 |
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 67.34 | 77.43 | 72.98 | 82.45 | 87.14 |
200 | 66.13 | 76.14 | 71.18 | 84.19 | 86.34 |
300 | 68.32 | 79.56 | 74.55 | 84.23 | 89.16 |
400 | 71.43 | 80.23 | 75.66 | 83.17 | 90.45 |
500 | 72.34 | 81.45 | 74.19 | 86.44 | 91.67 |
F-score analysis
In Figure 4 and Table 4, a comparison is made between the F-scores of GBDT-SMA and other existing methods. The comparison graph demonstrates the superior F-score performance achieved through ML. For instance, when analyzing 100 data, the GBDT-SMA model achieved an F-score value of 94.78%, while the ANN, CNN, DCNN, and K-neighbours RF models only achieved F-scores of 78.67, 86.14, 82.56, and 89.45%, respectively. However, it is essential to note that the GBDT-SMA model demonstrated maximum performance using different data sizes. Furthermore, when analyzing data under 500, the F-score value of GBDT-SMA was 98.45%, while the other models achieved F-scores of 82.33, 89.13, 85.63, and 93.67% for ANN, CNN, DCNN, and K-neighbours RF models, respectively.
Analysis of the F-score for the GBDT-SMA technique with existing systems
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 78.67 | 86.14 | 82.56 | 89.45 | 94.78 |
200 | 80.56 | 88.45 | 81.78 | 90.15 | 96.56 |
300 | 82.45 | 87.15 | 83.45 | 92.24 | 95.67 |
400 | 81.67 | 87.34 | 84.17 | 91.56 | 96.14 |
500 | 82.33 | 89.13 | 85.63 | 93.67 | 98.45 |
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 78.67 | 86.14 | 82.56 | 89.45 | 94.78 |
200 | 80.56 | 88.45 | 81.78 | 90.15 | 96.56 |
300 | 82.45 | 87.15 | 83.45 | 92.24 | 95.67 |
400 | 81.67 | 87.34 | 84.17 | 91.56 | 96.14 |
500 | 82.33 | 89.13 | 85.63 | 93.67 | 98.45 |
RMSE analysis
RMSE analysis for the GBDT-SMA method with existing systems
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 30.56 | 25.78 | 35.67 | 40.56 | 21.98 |
200 | 31.67 | 26.87 | 36.98 | 41.78 | 22.67 |
300 | 32.45 | 27.45 | 37.23 | 42.33 | 22.18 |
400 | 33.78 | 28.19 | 38.11 | 43.56 | 23.56 |
500 | 34.18 | 29.78 | 39.89 | 44.81 | 24.76 |
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 30.56 | 25.78 | 35.67 | 40.56 | 21.98 |
200 | 31.67 | 26.87 | 36.98 | 41.78 | 22.67 |
300 | 32.45 | 27.45 | 37.23 | 42.33 | 22.18 |
400 | 33.78 | 28.19 | 38.11 | 43.56 | 23.56 |
500 | 34.18 | 29.78 | 39.89 | 44.81 | 24.76 |
Accuracy analysis
Analysis of GBDT-SMA accuracy with existing systems
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 80.34 | 87.56 | 83.56 | 89.16 | 92.56 |
200 | 81.56 | 88.15 | 85.19 | 90.56 | 94.76 |
300 | 81.45 | 87.34 | 84.16 | 89.54 | 96.55 |
400 | 82.78 | 86.15 | 84.66 | 92.56 | 97.16 |
500 | 82.97 | 87.45 | 85.98 | 91.45 | 96.32 |
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 80.34 | 87.56 | 83.56 | 89.16 | 92.56 |
200 | 81.56 | 88.15 | 85.19 | 90.56 | 94.76 |
300 | 81.45 | 87.34 | 84.16 | 89.54 | 96.55 |
400 | 82.78 | 86.15 | 84.66 | 92.56 | 97.16 |
500 | 82.97 | 87.45 | 85.98 | 91.45 | 96.32 |
Analysis of the accuracy of the GBDT-SMA technique with existing systems.
Execution time analysis
Analysis of the GBDT-SMA technique for execution time with existing systems
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 156.890 | 187.764 | 132.674 | 106.768 | 89.678 |
200 | 164.304 | 185.650 | 143.841 | 112.876 | 90.638 |
300 | 168.453 | 187.541 | 138.653 | 110.540 | 93.237 |
400 | 170.612 | 188.714 | 140.803 | 134.197 | 90.045 |
500 | 173.094 | 189.486 | 142.543 | 136.249 | 91.478 |
Number of data from dataset . | ANN . | CNN . | DCNN . | K-neighbours Random Forest . | GBDT-SMA . |
---|---|---|---|---|---|
100 | 156.890 | 187.764 | 132.674 | 106.768 | 89.678 |
200 | 164.304 | 185.650 | 143.841 | 112.876 | 90.638 |
300 | 168.453 | 187.541 | 138.653 | 110.540 | 93.237 |
400 | 170.612 | 188.714 | 140.803 | 134.197 | 90.045 |
500 | 173.094 | 189.486 | 142.543 | 136.249 | 91.478 |
Training and testing validation analyses
Epochs . | Training validation . | Testing validation . |
---|---|---|
0 | 0.069 | 0.065 |
10 | 0.067 | 0.060 |
20 | 0.059 | 0.056 |
30 | 0.055 | 0.051 |
40 | 0.049 | 0.047 |
50 | 0.046 | 0.042 |
60 | 0.039 | 0.037 |
70 | 0.036 | 0.031 |
Epochs . | Training validation . | Testing validation . |
---|---|---|
0 | 0.069 | 0.065 |
10 | 0.067 | 0.060 |
20 | 0.059 | 0.056 |
30 | 0.055 | 0.051 |
40 | 0.049 | 0.047 |
50 | 0.046 | 0.042 |
60 | 0.039 | 0.037 |
70 | 0.036 | 0.031 |
Analysis of the GBDT-SMA technique for execution time with existing systems.
ROC CURVE ANALYSIS
CONCLUSION
The current work aims to improve the effectiveness of the GBDT algorithm in treating industrial wastewater facilities by incorporating the SMA technique. The study's findings show that ML algorithms may successfully anticipate wastewater. The results show that more data, particularly data connected to alert situations, is required to provide more exact and complete forecasts. By including such data, the algorithm's performance can be improved by expanding the distribution of this class, generating useful insights and enhancing forecast accuracy. Gradually deployed GBDT algorithms were used to predict wastewater plant parameters. Additionally, we applied the SMA for feature extraction and other acceptable tuning procedures. When this method was used with existing models like ANN, CNN, DCNN, and K-neighbours Random Forest, it was found that the models had little effect on prediction accuracy, with the proposed model coming out on top with an accuracy of 96.32%. This system's infrastructure, operations, monitoring, maintenance, and management will be optimized in the future. GBDT-SMA can be utilized to identify and maximize the recovery of valuable resources from wastewater, such as nutrients, metals, and energy. This supports the concept of a circular economy by minimizing waste generation and promoting the reuse and recycling of resources.
The goal of future efforts is to improve existing approaches by compiling a large database containing all of the data used in the model. This rich dataset will considerably improve forecast accuracy. Furthermore, future work will emphasize the use of advanced algorithms that allow the system to recognize patterns and aid better decision-making in areas such as network resilience, energy efficiency, wastewater reduction, cost reduction, and others. The ongoing refining of this technique will result in considerable advances in water distribution system infrastructure, operations, monitoring, maintenance, and management.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.