One way to improve the infrastructure, operations, monitoring, maintenance, and management of wastewater treatment systems is to use machine learning modelling to make smart forecasting, tracking, and failure prediction systems. This method aims to use industry data to treat the wastewater treatment model. Gradient-Boosted Decision Tree (GBDT) algorithms were used gradually to predict wastewater plant parameters. In addition, we used the Slime Mould Algorithm (SMA) for feature extraction and other acceptable tuning procedures. The input and effluent Chemical Oxygen Demand (COD) prediction for effluent treatment systems applies to the GBDT approaches employed in this study. GBDT-SMA employs artificial intelligence to provide precise method modelling for complex systems. Several training and model testing techniques were used to determine the best topology for the neural network models and decision trees. The GBDT-SMA model performed best across all methods. With 500 data, GBDT-SMA achieved an accuracy of 96.32%, outperforming other models like Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Deep Convolutional Neural Network (DCNN), and K-neighbours RF, which reached an accuracy of 82.97, 87.45, 85.98, and 91.45%, respectively.

  • The Slime Mould Algorithm was employed for feature extraction and other suitable tuning techniques.

  • The prediction of input and effluent COD for effluent treatment systems is applicable to the GBDT approaches employed in this study.

  • To enable precise method modelling for complicated systems, GBDT-SMA leverages artificial intelligence.

The rapid growth of technology is causing substantial changes in the way decisions are made in the water sector. Water supply and wastewater distribution system operations, monitoring, maintenance, and management issues are getting more numerous and complex in varied urban, industrial, and agricultural settings. These adjustments are a direct result of modern technology's growing influence in this industry.

Industrial sewage is a significant contributor to urban water pollution and is one of its main causes (Kirchem et al. 2020). Fortunately, the quick development of sewage components and wastewater treatment (WWT) procedures has resulted from the continual advancement of research and technology. Based on their traits, these therapies can be divided into three categories: physical, chemical, and microbiological techniques (Zhu et al. 2020). Pharmaceuticals are added to the sewage in the chemical method to help hazardous components change into safe substances that adhere to disposal requirements. This method has been shown to be successful in reducing the amount of industrial sewage-related water contamination. For this purpose, chemicals like polyacrylamide chloride (PAC), polyacrylamide (PAM), FeCl2, NaOH, and others are frequently used. The drugs react with the dangerous elements found in the sewage by introducing these molecules, creating safe compounds.

The sewage is subjected to the coagulation process, which involves the inclusion of certain chemicals such as PAC and PAM (Hernández del Olmo et al. 2019). The proper dosage of coagulants, which is primarily controlled by the index of water quality and influent quantity, is crucial for achieving the appropriate effluent water quality (Thürlimann et al. 2018). Due to the complex nature of coagulant chemical theory, which is influenced by a wide range of variables, including turbidity (TUR), pH, electrical conductivity (CON), flow rate, total phosphorus, total nitrogen, ammonia nitrogen, chemical oxygen requirement, and others, determining the proper dosage can be difficult. The majority of WWT facilities today use manual or automatic control techniques as a result. However, manual control requires the knowledge of skilled experts because it is subject to human variability. Traditional control methods, such as the proportional integral derivative (PID) controller standard, have a weak ability to adapt to complex systems and fall short of the requirements for high performance (Rubio et al. 2016). The PID controller experiences considerable time delays and poor parameter synchronization. While using minimal dosages leaves the target effluent quality unachievable, using excessive dosages results in waste and higher expenditures. Intelligent dosage algorithms are increasingly used to address these issues and effectively regulate additional drugs. These algorithms provide customizable control, lower agent usage costs, and higher-quality effluent water.

Particularly in the context of pumping system applications, clustering, a sort of unsupervised algorithm intended to discover clusters within a given dataset, offers a viable option for problem identification and prediction. Olesen & Shaker (2020) provides a thorough analysis of the sophisticated preventative maintenance techniques used in pumping systems. The K-means clustering algorithm is used in a supervised setting by Uhlmann et al. (2018) to assess its efficacy in mistake detection. Fuzzy clustering is used by Daher et al. (2020) to evaluate the severity of failures, while Cao et al. (2019) determined the Remaining Useful Life (RUL) of a distillation column using a fuzzy C-means and ANFIS combination. The fact that clustering techniques have not yet been used in the context of waste treatment facilities is notable.

Although these machine learning (ML) techniques have significantly increased the efficacy of wastewater systems, they have primarily been applied to certain operational features (weather forecasting, figuring out the makeup of particular particles, or calculating the number of sensors needed). Although the available measurements appear to be unrelated, there has not been enough use of ML to identify any possible relationships between them. Instead of being used especially for wastewater treatment, clustering algorithms have been used in a variety of diagnostic procedures with diverse purposes.

Gradient-Boosted Decision Trees (GBDTs) can classify and predict a wide range of problems. This approach improves education by requiring fewer iterations for a workable solution and a less complex goal. Ahmadianfar et al. (2020) developed the Gradient Based Optimizer (GBO) around 2020. Newton's strategy inspired the GBO. It contains the local escaping operator and the Gradient Search Rule. GBO combines population-based research with GB. GBO is a powerful and efficient algorithm that corrects previous flaws. GBO employs a GB technique to bypass impractical regions in favour of usable ones. Population-based strategies are also used in this Google method. Six engineering challenges and 28 mathematical operations were used to put the GBO to the test. GBO, unimodal, multimodal, and composition challenges were used to investigate exploitative, explorative, and local optimality avoidance. According to the statistics, GBO was more successful than other optimizers in finding solutions. With GBO, local optimum solutions and early convergence are uncommon (Deb et al. 2021).

To overcome the drawbacks of the classic Slime Mould Algorithm (SMA) and increase coordination between local and global exploration and exploitation, an advanced SMA variation based on Cauchy mutation (CM) and crossover mutation was presented. This improved collaboration between local and international exploration and extraction (Asha & Roberts 2022). The solution is updated significantly after the initial search agent is created. First, update the search agent and search SMA. The second stage employs CM to alter the SMA-based search agent. An optimal search agent from the previous generation is identified using crossover mutation.

The rest of the article is structured as surveys: in Section 2, we review previous studies that addressed the privacy issue and proposed solutions. In Section 3 of our report, we discuss the implications of the findings. In the following section, we will go over the proposed method. Section 4 includes the performance evaluation report. In the final section, we discuss our future research plans and conclude.

Research contributions

  • This strategy intends to handle the WWT model utilizing business-related data. GBDT methods gradually predicted the wastewater plant's characteristics.

  • We also extracted features using the SMA and other suitable tuning techniques. This tool helps users take corrective action and complete the procedure by the standards. Due to its complex mechanism, dynamic changes in some components, and capacity to get around some shortcomings of traditional mathematical models for WWT plants, it is a valuable technology.

  • The intention is to make the WWT model more predictable. The GBDT procedures employed in this work apply to the input and effluent COD prediction for effluent treatment methods.

  • GBDT-SMA uses artificial intelligence to model complicated systems with accuracy.

Khudair et al. (2018) evaluated the acceptability of Baghdad City groundwater for human consumption using the Water Quality Index. The water quality index was created by combining four water parameters: pH, chloride concentration, sulphate concentration, and dissolved solids content. Using IBM's SPSS Statistics 19, we were able to predict variations in the groundwater's WQI with accuracy. The results of the Artificial Neural Network Modelling (ANNM) showed a high level of prediction efficiency, with values of 0.038 and 0.005 for the sum of squares error functions in the training and testing samples, respectively. Additionally, the coefficient of determination demonstrated an importance value of 0.973.

According to Farhi et al. (2021), using fuzzy multilevel control was beneficial in managing the aeration volume. This is done to ensure that the reaction is as strong as possible. The fundamental advantage of adopting these systems in sewage treatment is that neural networks can flawlessly mimic any non-linear model. A neural network could calculate the optimal range for each water quality indicator by leveraging the correlation between the metrics to construct the field. The network would be able to determine the system model due to this.

According to Weng et al. (2021), rising sewage discharge has harmed the ecological environment and resulted in a tremendous waste of resources, interrupting people's everyday lives and making it more challenging to accomplish their work. Furthermore, in many parts of China, the recycling of sewage treatment must be finished before it can be considered adequate, with the output from sewage treatment systems having to achieve the reuse level. This is primarily due to the country's lack of freshwater resources.

Many authors, including Li et al. (2021), the study argues that to maintain China's long-term economic growth, to cleanse the water and use it for other purposes, more advanced methods, technologies, and preventative measures must be used to treat sewage. As a result, China's economy will expand consistently. Governments have prioritized better sewage treatment globally, boosting their efforts and regulations.

According to Dou et al. (2019), the overall effect load of the sewage treatment procedure contributed to instability. The entering water and sewage flow significantly impact the water quality as it enters the sewage treatment plant. As a result, the sludge builds up. It is impractical to manage large sewage treatment systems using old-fashioned methods. In today's society, we require this.

Rahman et al.'s (2018) examination into the impact of wastewater containing synthetic dye effluents on Red Amaranthus seedlings. The effluent from a Belkuch loom dyeing factory was sprayed on the seedlings in various amounts to improve their quality. When treated with a 5% dilution, the most significant percentage of seeds germinated (98.3%). As effluent concentrations increased, the amount of germination decreased. The level of documented toxicity is highest at this dose.

According to Islam et al. (2015), Indian spinach had a 76% germination percentage when dyeing effluents were used and a 79% germination percentage when pharmaceutical effluents were used. The effect of these effluents on leafy plants was studied. Their investigation revealed this. In contrast to pharmaceutical effluent treatment, which resulted in 85, 79, and 84% germination, dyeing effluent treatment resulted in 28, 80, and 84% germination of kangkong, jute, and stem amaranthus, respectively. These findings were obtained by comparing the treatment of pharmaceutical and dye wastewater. The ability of crop seedlings to germinate was influenced by various industrial sewage. Compared to other industrial effluents, beverage effluents had minor detrimental effects on the germination and seedling development of green vegetable seeds and seedlings. This was evident in every leafy vegetable studied.

Ahmed et al. (2019) compared heavy metal bioaccumulation in vegetables during the wet and dry seasons on irrigated farmland near an industrial complex in Ghazipur, Bangladesh, Bangladeshi veggies. During the wet season, chromium, copper, zinc, cadmium, and arsenic levels in red amaranthus, spinach, pumpkin, and bottle gourd bioaccumulated less. According to the researchers, the climate affects the number of heavy metals plants absorb.

Qurie et al. (2015) proposed removing chlorpyrifos from the environment using a micelle-clay combination and cutting-edge treatment technology. Water purification and desalination, the two primary metabolites of chlorpyrifos are also eliminated. Batch adsorption was used to investigate diethyl triphosphoric acid and 3,5,6-trichloro-2-pyridinyl. Na-montmorillonite and Octadecyl trimethyl ammonium bromide had to be combined with clay to create a micelle. After that, activated carbon was added. Pesticide levels were determined using a Waters 2695 HPLC. Langmuir and Freundlich's isotherms were used to study adsorption kinetics. A contact time of 30–5 min is ideal for removing chlorpyrifos. After 180 min, 90% of 0.5 g of a micelle-clay complex containing 100 mg/L was removed. When it was built, the facility. A sand and micelle clay filter column is more effective than chlorpyrifos at removing TPC. Using these two adsorbents to capture chlorpyrifos could help clean up the environment. The WWTP of Al-Quds University was used for this investigation as well. Two processes were used: ultrafiltration with a hollow fibre unit (UF-HF) and a spiral wound unit (UF-SW), followed by an activated carbon column.

Ahamad et al. (2019) investigated surface water quality using regression analysis and neural network modelling. When we tested our models on data from two lakes at Tezpur University in Assam, India, we found that they provided a reliable indication of water pollution. The two lakes have a strong correlation, with R2 values between 0.69 and 0.82. ANNM of total solid and Biochemical Oxygen Demand (BOD) revealed a strong correlation between actual and predicted values. Both variables have strong correlations. The existing works summary was discussed in Table 1.

Table 1

Summary of wastewater treatment systems

AuthorTitleMethodAdvantageDisadvantage
Weng et al. (2021Application of Artificial Neural Network Methods to Predict the Parakai Lake Water Quality Index Artificial Neural Network (ANN) Handling of Large and Multivariate Datasets, Feature Extraction and Selection Data Requirements, Complexity 
Ahamad et al. (2019ANNs and ANN Regression Analysis for Surface Water Quality Prediction Artificial Neural Network Using time delay cells to deal with the dynamic nature of sample data Biomass Pre-treatment, Regeneration and Reusability 
Sandu et al. (2017For best efficiency, the water flow arrangement at the wastewater treatment facility should include grit removal and filters. Hybrid wastewater treatment system Reducing the computation problems because the weights of the input and hidden layer need not be adjusted Energy Consumption, Maintenance and Operation 
Shahnaz et al. (2020Acacia auriculiformis biomass was analyzed raw, acid-modified, and EDTA-complexed to eliminate hexavalent chromium. Adsorption is increased by a variety of nanocellulose-related characteristics. Solving the problem of low prediction accuracy Disposal and Environmental Impact, Selectivity and Interference 
Zhu et al. (2021ML aids in selection of carbon-based materials for tetracycline and sulphamethoxazole adsorption. Carbon-based materials adsorption When subjected to varied environmental circumstances such as temperature, solution pH, and different adsorbent kinds, the created ML models outperformed classic isotherm models in terms of generalization  Data Requirements, Complexity 
Kaetzl et al. (2019On-farm wastewater treatment using biochar from local agro residues reduces pathogens from irrigation water for safer food production in developing countries Biochar using wastewater treatment The biochar filters outperformed, or were at least comparable to, the sand and rice husk filters Energy Consumption, Maintenance and Operation 
AuthorTitleMethodAdvantageDisadvantage
Weng et al. (2021Application of Artificial Neural Network Methods to Predict the Parakai Lake Water Quality Index Artificial Neural Network (ANN) Handling of Large and Multivariate Datasets, Feature Extraction and Selection Data Requirements, Complexity 
Ahamad et al. (2019ANNs and ANN Regression Analysis for Surface Water Quality Prediction Artificial Neural Network Using time delay cells to deal with the dynamic nature of sample data Biomass Pre-treatment, Regeneration and Reusability 
Sandu et al. (2017For best efficiency, the water flow arrangement at the wastewater treatment facility should include grit removal and filters. Hybrid wastewater treatment system Reducing the computation problems because the weights of the input and hidden layer need not be adjusted Energy Consumption, Maintenance and Operation 
Shahnaz et al. (2020Acacia auriculiformis biomass was analyzed raw, acid-modified, and EDTA-complexed to eliminate hexavalent chromium. Adsorption is increased by a variety of nanocellulose-related characteristics. Solving the problem of low prediction accuracy Disposal and Environmental Impact, Selectivity and Interference 
Zhu et al. (2021ML aids in selection of carbon-based materials for tetracycline and sulphamethoxazole adsorption. Carbon-based materials adsorption When subjected to varied environmental circumstances such as temperature, solution pH, and different adsorbent kinds, the created ML models outperformed classic isotherm models in terms of generalization  Data Requirements, Complexity 
Kaetzl et al. (2019On-farm wastewater treatment using biochar from local agro residues reduces pathogens from irrigation water for safer food production in developing countries Biochar using wastewater treatment The biochar filters outperformed, or were at least comparable to, the sand and rice husk filters Energy Consumption, Maintenance and Operation 

Kaushal & Mahajan (2021) created a hydroponic bench-scale system employing ornamental evergreen plants to lower turbidity and COD. By removing pollutants, Lata and Siddharth (Lata 2021) offered a biological method for controlling water quality. They suggested collecting plant species that could be used to make biofuel briquettes instead of wood and other fuel. Nonetheless, the biggest barrier to the widespread adoption of these cutting-edge eco-friendly technologies is the considerable investment and maintenance expenses associated with them, considering that many processes have only been evaluated in the laboratory.

In addition to typical trash, new pollutants such as industrial chemicals, herbicides, medicines, and personal care products are becoming increasingly prevalent in sewage according to Ahmed et al. (2021). Adsorption regimens can effectively remove these developing pollutants proposed by Koul et al. (2022). Because antibiotics risk eradicating the microbial species that currently exist in natural water bodies, they must be eliminated. Baladi et al. (2022) presented the photochemical destruction of antibiotics, such as penicillin G (PENG), to clean up wastewater containing non-degradable antibiotics as a green and effective advanced oxidation technology. By trapping toxic substances, nanoparticles can also be employed to remove them from wastewater systems. The effectiveness of Magnetic-MXene as a nanoparticle-based WWT system has been established. However, additional research is still needed to boost the efficacy of all comparable systems when scaled up (Hojjati-Najafabadi et al. 2022).

Problem statement of existing system is many WWT systems in ageing infrastructure require maintenance or replacement, which can be costly and time-consuming. Due to capacity restrictions, some WWT systems may require assistance in handling massive amounts of wastewater, resulting in overflows and spills. WWT systems can be energy-intensive in terms of consumption, resulting in high operational expenses and greenhouse gas emissions. Many people may need to be made aware of the importance of WWT and may contribute to the problem by improperly disposing of domestic trash due to a lack of public awareness.

In this session, wastewater plant characteristics were predicted gradually using GBDT algorithms. We also extracted features using the SMA and other suitable tuning techniques. This tool helps users take corrective action and complete the procedure by the standards. Due to its complex mechanism, dynamic changes in some components, and capacity to get around some shortcomings of traditional mathematical models for WWT plants, it is a valuable technology. The intention is to make the WWT model more predictable. The GBDT methodologies employed in this study can be used to forecast the COD in the input and effluent of effluent treatment operations. The GBDT-SMA leverages artificial intelligence to represent complex systems accurately. Three back-propagation GBDT-SMA versions were developed to prevent the accumulation of suspended particles, COD, and mixed liquid solids in an external water treatment tank. The GBDT-SMA method's block diagram is shown in Figure 1.
Figure 1

Block diagram of wastewater using the GBDT-SMA method.

Figure 1

Block diagram of wastewater using the GBDT-SMA method.

Close modal

Wastewater treatment

Water treatment disinfects contaminated water so it can be reused for agricultural and human consumption. This treatment incorporates mechanical, physical, chemical, and biological procedures, engineering design, commercial expertise, and creativity. Water consumption in municipal, industrial, and residential settings generates significant effluent. pH, suspended particles, dissolved solids, turbidity, colour, and other factors contribute to wastewater's properties. Because each water source has a unique potential for pollution, it must be treated before it can be reused or returned to its original state. Physical, chemical, and biological WWT technologies exist. Flotation, sedimentation, aeration, and filtration are all forms of physical treatment. Chlorine, ozone, and neutralization are all used in chemical treatment. The following treatment procedures are listed. As coagulants, alum or iron (III) sulphate can be used. Because of its adsorption capabilities, carbon is helpful in both chemical and physical processes. Using bacteria, microbes, and biochemical processes, the natural water treatment method transforms wastewater into drinking water. The aerobic decomposition of water pollutants produces carbon dioxide. The anaerobic breakdown of methane produces carbon dioxide and biogas, which can be used as fuel. Manure, an anaerobic waste, has agricultural applications.

The importance of WWT

Many water supplies are dangerous for human and animal consumption due to urbanization, the use of fossil fuels, and pollution from municipal, industrial, and domestic sources. Humans and animals have many similarities. Third-world countries often lack comprehensive policies to prevent or penalize environmental deterioration. Untreated water is regularly poured into rivers and lakes by factories. To protect marine life, these water sources must be cleaned or treated before use.

Reports on research of WWT methods

Several techniques can successfully eliminate heavy metals from wastewater, such as co-precipitation/adsorption, membrane filtration, precipitation, ion exchange, adsorption, and adsorption. Activated carbon is expensive, yet adsorption has been efficiently studied. To clean wastewater from beverage industries, researchers tested the efficiency of commercial Calgon carbon (F-300) with activated carbon from coconut shells. Their findings show coconut shell activated carbon is more effective at absorbing organic molecules than commercial activated carbon. Previous research has employed coagulation/flocculation, membrane filtration, oxidation, and other techniques to remove COD from industrial wastewater, but they are expensive, time-consuming, and difficult. On average, 1.2 m3 of biogas is generated per m3 of wastewater.

Chemical Oxygen Demand

COD is primarily used in WWT systems to monitor overall organic pollution and evaluate the efficacy of the treatment procedure. It shows how many organic chemicals are in the wastewater and how many can be oxidized chemically. Operators can optimize treatment procedures and guarantee environmental compliance by monitoring COD levels. COD sensors continually monitor the UV absorbance of organic components in deionized water as shown in Figure 1. It is located at the WWT facility's outlet. An aperture in the process probe housing allows for using a cuvette. This is the location of the flash photometer. This aperture measures the UV extinction of the immersion media around the probe. The wiper cleans the measurement window. A standard measurement is used to account for the effect of turbidity on UV absorption. Because dissolved organic molecules absorb UV light, UV absorption measurements can be used to calculate the total amount of dissolved organic compounds in water. We quantify the organic component using the spectral absorption coefficient at 254 nm. Automatic baseline compensation and turbidity correction are performed using a 550 nm reference wavelength. This sensor is a turbidity-compensated multiple-beam absorption photometer. The SAC at 254 nm is deducted from the SAC at 550 nm to account for turbidity. NTPA001/2005 recognizes COD values.

Gradient-Boosted Decision Tree

In contrast to many other approaches, the GBDT methodology solves problems by optimizing function space. This technique is more adaptive, scalable, and resistant to non-linear complexity than linear models. In hierarchical relationship structures, GBDT simulates non-linear decision boundaries. GBDT implementations build initial learners to maximize negative correlation with the gradient of the loss function. GBDT's use of a negative rise enables global convergence in its models. This is a violation of marketing etiquette. Gradient-boosting combines loss function reduction, learner prediction, and loss function optimization. These three elements round up the technique. Each problem's loss function is unique. In regression and classification, logarithmic loss and MSE are widely used. Each iteration of the boosting approach can be enhanced by focusing on the residual loss. Decision trees are introduced progressively as weak learners to reduce the loss function. The model's existing trees have remained the same. Gradient descent minimizes size-related loss in trees. The dataset was also useable { SoftMax's loss function Gradient descent guaranteed model convergence. The learning rate defines the update step size to avoid overfitting, and the number of trees, M, is represented by the maximum number of training iterations per tree. It is shown how to minimize partition loss . The following describes how the GBDT model works.

Step 1: The model's initial constant value is shown here for your convenience.
(1)
Step 2: Counts the total number of times the function has been used from 1 to D.
Step 2.1: Equation (2) can calculate the maximum gain obtained with the smallest possible step size when averaging the tree weights.
(2)

T represents the number of leaves on a tree. Learn how the loss function L() evaluates model performance using training data and utilizes the term ‘fitness’ to describe the model's complexity . Furthermore, the term's detrimental consequences on model complexity.

Step 2.2 updates the model as follows
(3)
Step 3: returns after applying D additive functions to generate the result
(4)

To find a solution for a specific sample Z, GBDT employs D additive functions, as shown in Eq. (5).

In the GBDT model, each tree forecasts the pseudo-residuals of the trees before it, given that the loss function can be distinguished. Negative gradient function with user-specified loss function As predictions are acquired, the loss function is reduced by training each successive model. When introducing a gradient-boosting model with insufficient trees, over- and under-fitting might occur. Both problems are avoidable.

The SMA

The SMA (Kaushal & Mahajan 2021) is a novel population-based metaheuristic inspired by the rhythmic behaviours observed in natural slime moulds. The SMA distinguishes itself by having a positive–negative feedback loop that facilitates optimal feeding pathways. It, like slime moulds, modifies its search path dynamically based on the quality of food discovered.

The SMA simulates three essential phenomena: grabble, wrap, and approach. While aggressively seeking food sources, the grabble phenomena guarantees that the algorithm avoids clashing with slime mould. It resembles the behaviour of slime moulds as they navigate their environment in this way. The wrap phenomena mimics the velocity matching observed in slime moulds, allowing the SMA to match its pace with the slime moulds' movements. This synchronization makes it easier to explore and exploit the search space. Finally, the approach phenomenon captures slime moulds' learning process as they migrate towards the feeding centre. The SMA method has comparable adaptive learning techniques, which improves its capacity to converge on optimal solutions. Overall, the SMA derives its success from the distinct properties and behaviours of slime moulds, resulting in a potent metaheuristic capable of solving difficult optimization problems effectively.

The SMA technique begins with a population that is formed at random inside its upper and lower boundaries, where ‘dim’ is the problem dimension and ‘N’ is the population size (i.e., slime mould). Next, an objective function is used to evaluate the population. In the subsequent stage, each iteration of the grabbling, wrapping, and approaching phenomenon updates the population. Additionally, several parameters, including the fitness weight of slime mould, which can speed convergence and prevent local explanations, are used to control the SMA's progression. The vibration parameter guarantees the accuracy of each slime mould during an early investigation or later exploitation. The SMA's detailed step-by-step procedure, which includes approaching, wrapping, and grabbing food, can be precisely described as a survey.
(5)
where
(6)
where va and vd stands for the best fitness, is adjusted to reflect slime mould predicted fitness. The Index signifies the sequence of fitness values sorted (ascends in the minimum value problem), L and U represent the lower and upper boundaries of the search range and illustrate the random importance in [0,1], and represents the best fitness gotten in all iterations. ranks the first half of the population. The vibration parameters balance exploration and exploitation and .

In Algorithm I, the SMA's process is described as follows.

The performance of WWT was covered in this session. Performance measurements are accuracy, precision, recall, F-score, training and testing validation, RMSE, and receiver operating characteristic (ROC) analysis. The proposed approach was evaluated against Deep Convolutional Neural Networks (DCNNs) (Zhang & Gu 2022), K-neighbour Random Forest (Zhang et al. 2018), Convolutional Neural Networks (CNNs) (Su et al. 2022), and Artificial Neural Networks (ANNs) (Abyaneh 2018).

Performance metrics

The most effective way to evaluate WWT performance is to compare its accuracy to a confusion matrix. Developing seven classification performance indicators based on the confusion matrix is typical. The indicators considered were accuracy, precision, recall, F-score, training and testing validation, RMSE, and ROC analysis. The value of these metrics is determined by the outcome of a binary classification, which might be positive or negative. FP and FN indicate that emotional states were misclassified, whereas TP and TN suggest that conditions were accurately detected. True positives and true negatives both exhibit correct recognition of shapes. The many efficiency metrics that have been discussed will be explained in the paragraphs that follow.

Accuracy: Regarding accuracy, we refer to the ratio of the total number of true negatives and true positives to all values. This ratio shows how closely our projected value matches the variable's actual value.
(7)
Recall: In this situation, this cannot be separated from the memory or actual favourable rates. This is one method of assessing a classifier's ability to anticipate outcomes reliably. Equation (8) provides the definition.
(8)
Precision (Prec): The accuracy rate measures how successfully the data has been classified. This is just one way to think about the accuracy rate. According to the formula in
(9)
F-Measure (F): The harmonic mean of the two notions is shown below, and precision and sensitivity have now achieved a point of convergence. This particular trade-off must be considered because increasing sensitivity comes at the expense of increasing accuracy. Equation (10) allows us to quantify this, which we accomplish.
(10)
Root mean squared error (RMSE): Similar to MSE, but with a square root sign at the end of the expression. For example, the mean absolute error can be expressed as follows:
(11)

Even though the RMSE can have any value between 0 and infinity, lower RMSE values are often desirable. A part of a ROC curve will be below the curve.

The model's effectiveness is assessed using Area Under the Receiver Operator Characteristic Curve (AUROC). ROC curves serve as a practical illustration of the beneficial interaction between true and false positives.

Precision analysis

The precision achieved by the GBDT-SMA methodology is compared to other existing methods in Figure 2 and Table 2. The graph demonstrates that using ML techniques enhanced precision dramatically. For example, when tested on 100 data points, GBDT-SMA achieves a precision of 90.21%, which is greater than other models such as ANN (72.89%), CNN (80.32%), DCNN (75.55%), and K-neighbours RF (85.17%). Notably, the GBDT-SMA model outperforms other methods. Similarly, when tested on 500 data points, GBDT-SMA achieves a precision of 94.11%, which is higher than the precision of different models such as ANN (74.11%), CNN (82.19%), DCNN (76.21%), and K-neighbours RF (89.32%).
Table 2

Precision analysis of the GBDT-SMA approach using existing systems

Number of data from datasetANNCNNDCNNK-neighbours RFGBDT-SMA
100 72.89 80.32 75.55 85.17 90.21 
200 72.91 83.23 78.14 86.21 93.34 
300 71.45 82.56 77.12 85.32 91.22 
400 73.21 83.44 78.21 88.19 93.65 
500 74.11 82.19 76.21 89.32 94.11 
Number of data from datasetANNCNNDCNNK-neighbours RFGBDT-SMA
100 72.89 80.32 75.55 85.17 90.21 
200 72.91 83.23 78.14 86.21 93.34 
300 71.45 82.56 77.12 85.32 91.22 
400 73.21 83.44 78.21 88.19 93.65 
500 74.11 82.19 76.21 89.32 94.11 
Figure 2

Precision analysis of the GBDT-SMA approach using existing systems.

Figure 2

Precision analysis of the GBDT-SMA approach using existing systems.

Close modal

Recall analysis

Regarding recall, Figure 3 and Table 3 compare the GBDT-SMA methodology to other existing methods. Figure 4 depicts the improvement in recall produced by the ML technique. For example, the recall value for GBDT-SMA is 87.14% for 100 data, while the recall values for ANN, CNN, DCNN, and K-neighbours RF models are 67.34, 72.98, and 82.45%, respectively. It is worth mentioning that the GBDT-SMA model outperforms the other models. Similarly, the recall value for GBDT-SMA is 91.67% for 500 data, whereas the recall values for ANN, CNN, DCNN, and K-neighbours RF models are 72.34, 81.45, 74.19, and 86.44%, respectively.
Table 3

Recall analysis for the GBDT-SMA technique with existing systems

Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 67.34 77.43 72.98 82.45 87.14 
200 66.13 76.14 71.18 84.19 86.34 
300 68.32 79.56 74.55 84.23 89.16 
400 71.43 80.23 75.66 83.17 90.45 
500 72.34 81.45 74.19 86.44 91.67 
Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 67.34 77.43 72.98 82.45 87.14 
200 66.13 76.14 71.18 84.19 86.34 
300 68.32 79.56 74.55 84.23 89.16 
400 71.43 80.23 75.66 83.17 90.45 
500 72.34 81.45 74.19 86.44 91.67 
Figure 3

Recall analysis of the GBDT-SMA approach using existing systems.

Figure 3

Recall analysis of the GBDT-SMA approach using existing systems.

Close modal
Figure 4

F-score analysis for the GBDT-SMA technique with existing systems.

Figure 4

F-score analysis for the GBDT-SMA technique with existing systems.

Close modal

F-score analysis

In Figure 4 and Table 4, a comparison is made between the F-scores of GBDT-SMA and other existing methods. The comparison graph demonstrates the superior F-score performance achieved through ML. For instance, when analyzing 100 data, the GBDT-SMA model achieved an F-score value of 94.78%, while the ANN, CNN, DCNN, and K-neighbours RF models only achieved F-scores of 78.67, 86.14, 82.56, and 89.45%, respectively. However, it is essential to note that the GBDT-SMA model demonstrated maximum performance using different data sizes. Furthermore, when analyzing data under 500, the F-score value of GBDT-SMA was 98.45%, while the other models achieved F-scores of 82.33, 89.13, 85.63, and 93.67% for ANN, CNN, DCNN, and K-neighbours RF models, respectively.

Table 4

Analysis of the F-score for the GBDT-SMA technique with existing systems

Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 78.67 86.14 82.56 89.45 94.78 
200 80.56 88.45 81.78 90.15 96.56 
300 82.45 87.15 83.45 92.24 95.67 
400 81.67 87.34 84.17 91.56 96.14 
500 82.33 89.13 85.63 93.67 98.45 
Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 78.67 86.14 82.56 89.45 94.78 
200 80.56 88.45 81.78 90.15 96.56 
300 82.45 87.15 83.45 92.24 95.67 
400 81.67 87.34 84.17 91.56 96.14 
500 82.33 89.13 85.63 93.67 98.45 

RMSE analysis

GBDT-SMA was compared to other approaches in this study regarding RMSE, as shown in Figure 5 and Table 5. The results show that ML improves performance significantly, with GBDT-SMA beating other models in terms of RMSE for different nodes. GBDT-SMA, for example, achieved an RMSE of 21.98% when using 100 data points, which is lower than the RMSE values obtained by ANN, CNN, DCNN, and K-neighbours RF models (30.56, 25.78, 35.67, and 40.56%, respectively). Similarly, with 500 data points, GBDT-SMA had an RMSE of 24.76%, whereas other models had larger RMSE values like 34.18, 29.78, 39.89, and 44.81% for ANN, CNN, DCNN, and K-neighbours RF models, respectively. Overall, these findings indicate GBDT-SMA's exceptional performance in this investigation.
Table 5

RMSE analysis for the GBDT-SMA method with existing systems

Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 30.56 25.78 35.67 40.56 21.98 
200 31.67 26.87 36.98 41.78 22.67 
300 32.45 27.45 37.23 42.33 22.18 
400 33.78 28.19 38.11 43.56 23.56 
500 34.18 29.78 39.89 44.81 24.76 
Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 30.56 25.78 35.67 40.56 21.98 
200 31.67 26.87 36.98 41.78 22.67 
300 32.45 27.45 37.23 42.33 22.18 
400 33.78 28.19 38.11 43.56 23.56 
500 34.18 29.78 39.89 44.81 24.76 
Figure 5

RMSE analysis for the GBDT-SMA method with existing systems.

Figure 5

RMSE analysis for the GBDT-SMA method with existing systems.

Close modal

Accuracy analysis

The accuracy of GBDT-SMA is compared to other approaches in Figure 6 and Table 6. The results suggest that ML techniques enhanced accuracy, with the GBDT-SMA performing exceptionally well. GBDT-SMA, for example, outperformed other models such as ANN, CNN, DCNN, and K-neighbours RF, which attained accuracies of 80.34, 87.56, 83.56, and 89.16%, respectively, while using 100 data points. Notably, the GBDT-SMA model performed best across all methods. Furthermore, when 500 data points were used, GBDT-SMA achieved an accuracy of 96.32%, outperforming other models like ANN, CNN, DCNN, and K-neighbours RF, which achieved accuracies of 82.97, 87.45, 85.98, and 91.45%, respectively.
Table 6

Analysis of GBDT-SMA accuracy with existing systems

Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 80.34 87.56 83.56 89.16 92.56 
200 81.56 88.15 85.19 90.56 94.76 
300 81.45 87.34 84.16 89.54 96.55 
400 82.78 86.15 84.66 92.56 97.16 
500 82.97 87.45 85.98 91.45 96.32 
Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 80.34 87.56 83.56 89.16 92.56 
200 81.56 88.15 85.19 90.56 94.76 
300 81.45 87.34 84.16 89.54 96.55 
400 82.78 86.15 84.66 92.56 97.16 
500 82.97 87.45 85.98 91.45 96.32 
Figure 6

Analysis of the accuracy of the GBDT-SMA technique with existing systems.

Figure 6

Analysis of the accuracy of the GBDT-SMA technique with existing systems.

Close modal

Execution time analysis

GBDT-SMA was compared to other approaches in this study regarding execution time. As shown in Figure 7 and Table 7, the results show that ML improves performance significantly, with GBDT-SMA beating other models in execution time for different sets. GBDT-SMA, for example, achieved an execution time of 89.678 ms when using 100 data points, which is lower than the execution time obtained by ANN, CNN, DCNN, and K-neighbours RF models (156.890, 187.764, 132.674, and 106.768 ms, respectively). Similarly, with 500 data points, GBDT-SMA had an execution time of 91.478 ms. In contrast, other models had more significant execution times, like 173.094, 189.486, 142.543, and 136.249 ms for ANN, CNN, DCNN, and K-neighbours RF models. Overall, these findings indicate GBDT-SMA's exceptional performance in this investigation.
Table 7

Analysis of the GBDT-SMA technique for execution time with existing systems

Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 156.890 187.764 132.674 106.768 89.678 
200 164.304 185.650 143.841 112.876 90.638 
300 168.453 187.541 138.653 110.540 93.237 
400 170.612 188.714 140.803 134.197 90.045 
500 173.094 189.486 142.543 136.249 91.478 
Number of data from datasetANNCNNDCNNK-neighbours Random ForestGBDT-SMA
100 156.890 187.764 132.674 106.768 89.678 
200 164.304 185.650 143.841 112.876 90.638 
300 168.453 187.541 138.653 110.540 93.237 
400 170.612 188.714 140.803 134.197 90.045 
500 173.094 189.486 142.543 136.249 91.478 
Table 8

Training and testing validation analyses

EpochsTraining validationTesting validation
0.069 0.065 
10 0.067 0.060 
20 0.059 0.056 
30 0.055 0.051 
40 0.049 0.047 
50 0.046 0.042 
60 0.039 0.037 
70 0.036 0.031 
EpochsTraining validationTesting validation
0.069 0.065 
10 0.067 0.060 
20 0.059 0.056 
30 0.055 0.051 
40 0.049 0.047 
50 0.046 0.042 
60 0.039 0.037 
70 0.036 0.031 
Figure 7

Analysis of the GBDT-SMA technique for execution time with existing systems.

Figure 7

Analysis of the GBDT-SMA technique for execution time with existing systems.

Close modal
Figure 8 shows the ROC analysis for the proposed model. Table 7, Table 8 and Figure 9 describe the training and testing validation of the proposed GBDT-SMA technique with different epochs. The data show, with 10 generations, it is 0.067 and 0.060 as training and testing loss, correspondingly. Similarly, with 70 periods, it is 0.036 and 0.031 as training and testing loss, correspondingly.
Figure 8

ROC curve for the GBDT-SMA technique with existing systems.

Figure 8

ROC curve for the GBDT-SMA technique with existing systems.

Close modal
Figure 9

Training and testing validations for the GBDT-SMA method.

Figure 9

Training and testing validations for the GBDT-SMA method.

Close modal

The current work aims to improve the effectiveness of the GBDT algorithm in treating industrial wastewater facilities by incorporating the SMA technique. The study's findings show that ML algorithms may successfully anticipate wastewater. The results show that more data, particularly data connected to alert situations, is required to provide more exact and complete forecasts. By including such data, the algorithm's performance can be improved by expanding the distribution of this class, generating useful insights and enhancing forecast accuracy. Gradually deployed GBDT algorithms were used to predict wastewater plant parameters. Additionally, we applied the SMA for feature extraction and other acceptable tuning procedures. When this method was used with existing models like ANN, CNN, DCNN, and K-neighbours Random Forest, it was found that the models had little effect on prediction accuracy, with the proposed model coming out on top with an accuracy of 96.32%. This system's infrastructure, operations, monitoring, maintenance, and management will be optimized in the future. GBDT-SMA can be utilized to identify and maximize the recovery of valuable resources from wastewater, such as nutrients, metals, and energy. This supports the concept of a circular economy by minimizing waste generation and promoting the reuse and recycling of resources.

The goal of future efforts is to improve existing approaches by compiling a large database containing all of the data used in the model. This rich dataset will considerably improve forecast accuracy. Furthermore, future work will emphasize the use of advanced algorithms that allow the system to recognize patterns and aid better decision-making in areas such as network resilience, energy efficiency, wastewater reduction, cost reduction, and others. The ongoing refining of this technique will result in considerable advances in water distribution system infrastructure, operations, monitoring, maintenance, and management.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abyaneh
H. Z.
2018
Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters
.
Journal of Environmental Health Science and Engineering
12
(
1
),
2
.
Ahamad
K. U.
,
Raj
P.
,
Barbhuiya
N. H.
&
Deep
A.
2019
Surface water quality modeling by regression analysis and artificial neural network
.
Advances in Waste Management
pp.
215
230
.
Springer, Singapore. doi: 10.1007/978-981-13-0215- 2_15
.
Ahmadianfar
I.
,
Bozorg-Haddad
O.
&
Chu
X.
2020
Gradient-based optimizer: a new metaheuristic optimization algorithm
.
Information Sciences
540
,
131
159
.
Ahmed
S.
,
Mofijur
M.
,
Nuzhat
S.
,
Chowdhury
A. T.
,
Rafa
N.
,
Uddin
M. A.
,
Inayat
A.
,
Mahlia
T.
,
Ong
H. C.
&
Chia
W. Y.
2021
Recent developments in physical, biological, chemical, and hybrid treatment techniques for removing emerging contaminants from wastewater
.
Journal of Hazardous Materials
416
,
125912
.
Asha
M. R.
&
Roberts
M. K.
2022
Artificial humming bird with data science enabled stability prediction model for smart grids
.
Sustainable Computing: Informatics and Systems
36
.
https://doi.org/10.1016/j.suscom.2022.100821
Cao
Q.
,
Samet
A.
,
Zanni-Merk
C.
,
de Bertrand de Beuvron
F.
&
Reich
C.
2019
An ontology-based approach for failure classification in predictive maintenance using fuzzy C-means and SWRL rules
.
Procedia Computer Science
159
,
630
639
.
Daher
A.
,
Hoblos
G.
,
Khalil
M.
&
Chetouani
Y.
2020
New prognosis approach for preventive and predictive maintenance – application to a distillation column
.
Chemical Engineering Research and Design
153
,
162
174
.
Deb
S.
,
Abdelminaam
D. S.
,
Said
M.
&
Houssein
E. H.
2021
Recent methodology-based gradient-based optimizer for economic load dispatch problem
.
IEEE Access
9
,
44322
44338
.
Farhi
N.
,
Kohen
E.
,
Mamane
H.
&
Shavitt
Y.
2021
Prediction of wastewater treatment quality using lstm neural network
.
Environmental Technology & Innovation
23
(
2
),
101632
,
https://doi.org/10.1016/j.eti.2021.101632
.
Hernández del Olmo
F.
,
Gaudioso
E.
,
Duro
N.
&
Dormido
R.
2019
Machine learning weather soft-sensor for advanced control of waste water treatment plants
.
Sensors
19
,
3139
.
Hojjati-Najafabadi
A.
,
Mansoorianfar
M.
,
Liang
T.
,
Shahin
K.
,
Wen
Y.
,
Bahrami
A.
,
Karaman
C.
,
Zare
N.
,
Karimi-Maleh
H.
&
Vasseghian
Y.
2022
Magnetic-MXene-based nanocomposites for water and wastewater treatment: a review
.
Journal of Water Process Engineering
47
,
102696
.
Islam
M. M.
,
Hossain
M. M.
,
Zakaria
M.
,
Rahman
G. K. M. M.
,
Naznin
A.
&
Munira
S.
2015
Effect of industrial effluents on germination of summer leafy vegetables
.
International Research Journal of Earth Sciences
3
,
16
23
.
Kaetzl
K.
,
Lübken
M.
,
Uzun
G.
,
Gehring
T.
,
Nettmann
E.
,
Stenchly
K.
&
Wichern
M.
2019
On-farm wastewater treatment using biochar from local agroresidues reduces pathogens from irrigation water for safer food production in developing countries
.
Science of the Total Environment
682
,
601
610
.
Khudair
B. H.
,
Jasim
M. M.
&
Alsaqqar
A. S.
2018
Artificial neural network model for the prediction of groundwater quality
.
Civil Engineering Journal
4
(
12
),
2959
2970
.
Kirchem
D.
,
Lynch
M. A.
,
Bertsch
V.
&
Casey
E.
2020
Modelling demand response with process models and energy systems models: potential applications for wastewater treatment within the energy water nexus
.
Applied Energy
260
(
2020
),
114321
.
https://doi.org/10.1016/j.apenergy.2019.114321
.
Koul
B.
,
Poonia
A. K.
,
Singh
R.
&
Kajla
S.
2022
Strategies to cope with the emerging waste water contaminants through adsorption regimes
. In:
Development in Wastewater Treatment Research and Processes
(M. Shah, S. Rodriguez-Couto & J. Biswas, eds.).
Elsevier
,
Amsterdam
,
The Netherlands
, pp.
61
106
.
Li
W.
,
Wang
X.
&
Feng
Q.
2021
Final prediction of product quality in batch process based on bidirectional neural network algorithm
.
IOP Conference Series: Earth and Environmental Science
692
(
3
),
Article ID 032091
.
Qurie
M.
,
Khamis
M.
,
Ayyad
I.
,
Scrano
L.
,
Lelario
F.
&
Bufo
A.
2015
Removal of chlorpyrifos using micelle–clay complex and advanced treatment technology
.
Desalination and Water Treatment
57
(
33
),
15687
15696
.
Rahman
M.
,
Rayhan
M. Y. H.
,
Chowdhury
M. A. H.
,
Mohiuddin
K. M.
&
Chowdhury
M. A. K
2018
Phytotoxic effect of synthetic dye effluents on seed germination and early growth of red amaranthus
.
Fundamental and Applied Agriculture
3
(
2
),
480
490
.
Rubio
J.
,
Hernández
J.
,
Ávila
F.
,
Stein
J.
&
Meléndez
A.
2016
Sistema sensor para el monitoreo ambiental basado en redes Neuronales
.
Ingeniería, Investigación y Tecnología
17
(
2
),
211
222
.
Sandu
M.
,
Bode
F.
,
Danca
P.
&
Voicu
I.
2017
Water flow structure optimization between the screenings and grit removals in a wastewater plant
. In
ENERGY and ENVIRONMENT (CIEM), 2017 International Conference on. IEEE
, pp.
101
104
.
Su
B.
,
Lin
Y.
,
Wang
J.
,
Quan
X.
,
Chang
Z.
&
Rui
C.
2022
Deep learning target detection system for sewage treatment
.
Computational Intelligence and Neuroscience
2022
,
11
.
Thürlimann
C.
,
Dürrenmatt
D.
&
Villez
K.
2018
Soft-sensing with qualitative trend analysis for waste water treatment plant control
.
Control Engineering Practice
70
,
121
133
.
Uhlmann
E.
,
Pontes
R. P.
,
Geisert
C.
&
Hohwieler
E.
2018
Cluster identification of sensor data for predictive maintenance in a selective laser melting machine tool
.
Procedia Manufacturing
24
,
60
65
.
Weng
G.
,
Pei
C.
,
Ren
J.
,
Ye
C.
,
Cui
Q.
,
Qing
H.
,
Liu
Y.
&
Guan
X.
2021
Photovoltaic output prediction of regional energy internet based on LSTM algorithm
.
Journal of Physics: Conference Series
1732
,
012083
,
1
9
.
Zhang
X.
&
Gu
Y.
2022
Research on wastewater treatment monitoring algorithms based on deep convolutional neural networks
.
Wireless Communications and Mobile Computing
2022
,
11
.
Article ID 1767295
.
Zhang
S.
,
Li
X.
,
Zong
M.
,
Zhu
X.
&
Wang
R.
2018
Efficient K-NN classification with different numbers of nearest neighbors
.
IEEE Transactions on Neural Networks and Learning Systems
29
,
1774
1785
.
Zhu
X.
,
Wan
Z.
,
Tsang
D. C. W.
, He, M., Hou, D., Su, Z. & Shang, J.
2021
Machine learning for the selection of carbon-based materials for tetracycline and sulfamethoxazole adsorption
.
Chemical Engineering Journal
406
,
126782. https://doi.org/10.1016/j.cej.2020.126782
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).