ABSTRACT
The critical requirement for treating distillery wastewater is globally recognized due to its significant environmental impact and regulatory requirements. This paper reviews the literature on wastewater treatment, focusing on the application of machine learning (ML) algorithms for analyzing large amounts of data and identifying complex patterns. The study uses the Scopus, ScienceDirect, and Web of Science databases for bibliometric analysis. ML has become increasingly attractive in engineering due to its ability to improve predictions for process output variables. It is used in chemistry and engineering to improve computational chemistry, plan materials synthesis, and model contaminant remediation processes. The research proposes future research directions for distillery wastewater treatment using ML approaches. The aim of this review is to critically evaluate the application of ML models in wastewater treatment, drawing insights from existing studies and exploring their potential application to distillery wastewater. This study provides comparisons and offers recommendations for future research in this field.
HIGHLIGHTS
This study offers a thorough investigation of machine learning (ML) in the treatment of wastewater, together with bibliometric data that highlight trends, gaps, and areas for future research, particularly with implications for distillery wastewater.
Wastewater studies highlight the need to strengthen the application of ML models in remediating distillery wastewater.
INTRODUCTION
Ensuring the proper disposal of wastes released by diverse industries is a significant global concern. The discharge of wastewater from industrial sources is considered a significant contributor to environmental pollution. Water pollution has become a widespread problem, leading to adverse effects on both the environment and humans (Li et al. 2011; Bijekar et al. 2022). The production of alcoholic drinks generates a type of industrial wastewater called distillery wastewater (Mohana et al. 2009). The distillery wastewater contains various pollutants, a high chemical oxygen demand (COD), ranging from 80,000 to 160,000 mg/L, and a high biochemical oxygen demand (BOD), ranging from 35,000 to 50,000 mg/L, indicating a high oxygen demand for the breakdown of organic matter present in the wastewater. Moreover, this wastewater has a low pH, a dark brown color, and unpleasant smells (Mikucka & Zielińska 2020; Nikhar et al. 2023). Distillery wastewater treatment is crucial for reducing environmental effects and ensuring safe disposal. Effective management ensures that operations continue smoothly and sustainably, while also ensuring that discharged effluent meets regulatory standards.
Distillery stillage can be treated using anaerobic, aerobic, and physico-chemical methods. Anaerobic digestion converts organic matter into biogas, a renewable energy source. Valorization recovers valuable compounds from distillery wastewater, which is a significant source of polysaccharides, volatile fatty acids, and natural antioxidants, such as polyphenols and other bioactive chemicals, promoting sustainability and revenue generation in industries such as pharmaceuticals, cosmetics, and food (Mikucka & Zielińska 2020). The high concentrations of organic pollutants, colorants, and toxic substances in distillery effluent present considerable remediation challenges. Biological methods, such as anaerobic digestion or activated sludge, are frequently applied. However, these methods are subject to certain constraints, including the incomplete removal of recalcitrant compounds, sensitivity to operational conditions, and slower treatment times. Although chemical treatments, such as coagulation or advanced oxidation processes, can be effective, they frequently lead to high operational costs, the requirement for chemical reagents, and the production of secondary pollutants. Despite the efficiency of physical methods such as membrane filtration, these are energy-intensive and are subject to issues such as membrane contamination. In contrast, adsorption provides numerous benefits: it is cost-effective, simple to operate, and capable of eliminating a variety of organic and inorganic contaminants (Dubey et al. 2005), such as toxic compounds and color. In addition, adsorption is considered environmentally friendly due to its low energy requirements and the potential use of biodegradable organic adsorbents. However, challenges remain, like the need for proper disposal or regeneration of spent adsorbent, competition among adsorbates, and the influence of pH, temperature, and salinity on adsorption capacity (Rashid et al. 2021). Biodegradable organic adsorbents, like biomass, can serve as soil conditioners or fertilizers, whereas certain inorganic adsorbents can be regenerated or reused, hence reducing the negative environmental impact. Despite the potential for secondary waste generation, adsorption is an extremely effective and adaptable technique for treating distillery wastewater, especially in removing complex organic compounds and heavy metals (Ratna et al. 2021). Strategies to mitigate the effects of secondary waste include the utilization of organic adsorbents, which are typically biodegradable and can be recycled as soil conditioners or fertilizers (Sayara et al. 2020). Moreover, advancements in adsorbent regeneration methods can further reduce waste production. These factors provide adsorption as a feasible and sustainable method for wastewater treatment, particularly when included within a circular waste management system emphasizing resource recovery (Mohana et al. 2009; Chowdhary et al. 2018; Nikhar et al. 2023). Adsorption, when combined with machine learning (ML) techniques, enables process optimization, which reduces costs, enhances the efficiency of pollutant removal, and addresses the limitations of conventional methods such as chemical coagulation, membrane filtration, and biological treatments. These methods often suffer from high operational costs, energy consumption, incomplete pollutant removal, and the generation of additional waste or byproducts, whereas adsorption combined with ML can help improve efficiency and minimize waste generation relative to these traditional methods.
ML has become a prominent method for enhancing complex methods in wastewater treatment. In contrast to conventional approaches that typically depend on fixed parameters and restricted datasets, ML algorithms possess the capability to examine extensive datasets, uncover patterns, forecast results, and enhance treatment methodologies. In the area of wastewater treatment, ML models, including artificial neural networks (ANN) (Paranjape 2023), decision trees (Chauhan et al. 2023), and ensemble methods like random forests (Bahrami et al. 2024), have been effectively utilized to forecast treatment efficiency, enhance operational parameters, and minimize energy consumption. Through the analysis of previous studies, ML can offer valuable insights into the dynamics of pollutant removal (Aryafar et al. 2019; Adelodun et al. 2020), identify essential operational variables (Nair et al. 2018; Gadekar & Ahammed 2019) and recommend optimal conditions for treatment processes (Witek-Krowiak et al. 2014; Dehghani et al. 2021) such as adsorption. Although ML has demonstrated significant potential in the optimization of treatment processes for a variety of wastewater categories, there is a critical research gap in its application to distillery wastewater. This review addresses this gap by analyzing the potential of ML models in wastewater treatment and examining related studies that have applied ML to processes such as adsorption for other wastewater types, as there is a paucity of studies on distillery wastewater. Critical aspects, such as the application of ML in wastewater treatment using adsorption processes, can be explored by employing bibliometric analysis. Bibliometric analysis aids in identifying highly cited publications and popular research themes, providing insights into the progression of research trends. The bibliometric approach also enables the evaluation of international collaboration among researchers and institutions, promoting a collaborative environment to tackle wastewater treatment challenges. Furthermore, it outlines areas that lack research attention, emphasizing potential research gaps, and assesses the effects of policies and regulations (Lundberg 2023; Nan et al. 2023). Bibliometric analysis aids environmental engineering researchers in enhancing research strategies, making informed investment decisions, and setting achievable goals, ultimately contributing to sustainable wastewater treatment approaches (Ellegaard 2018). Table 1 presents the overview of review methodologies and their comparison with bibliometric analysis.
Review methodologies . | Description . | Comparison with bibliometric analysis . | References . |
---|---|---|---|
Narrative reviews | Offer a thorough and qualitative analysis of a subject, based on the authors' interpretation | May present a deficiency in objectivity when evaluating growth and effect; bibliometric analysis provides quantifiable indicators for trends and output | Ogbezode et al. (2023); Singh et al. (2024) |
Scoping reviews | Aim to identify literature on a certain subject and uncover essential concepts and gaps without applying fixed inclusion criteria | Provide a general picture but might not be able to provide a quantitative understanding of research impact; bibliometric analysis finds trends | Lam et al. (2015); Price et al. (2024) |
Systematic reviews | Thoroughly collect literature on certain theories or topics of investigation | Apply close consideration to studies and might overlook more general patterns; bibliometric analysis offers an understanding of the entire landscape of research | Dhote et al. (2021); Ilyas et al. (2021); Singh et al. (2023a, 2023b) |
Critical reviews | Provide critical analysis of the literature, stressing areas of strength and weakness and summarizing results | Offer subjective perspectives; bibliometric analysis contributes objective, data-driven insights on research output and collaboration | Al-Tohamy et al. (2022); Alvi et al. (2023) |
Rapid reviews | Implemented for providing timely evidence on topics, sometimes employing less rigorous methodologies | Speed may be prioritized over the need for depth; however, bibliometric analysis enables a thorough examination of the existing literature | Matos & Roebeling (2022) |
Review methodologies . | Description . | Comparison with bibliometric analysis . | References . |
---|---|---|---|
Narrative reviews | Offer a thorough and qualitative analysis of a subject, based on the authors' interpretation | May present a deficiency in objectivity when evaluating growth and effect; bibliometric analysis provides quantifiable indicators for trends and output | Ogbezode et al. (2023); Singh et al. (2024) |
Scoping reviews | Aim to identify literature on a certain subject and uncover essential concepts and gaps without applying fixed inclusion criteria | Provide a general picture but might not be able to provide a quantitative understanding of research impact; bibliometric analysis finds trends | Lam et al. (2015); Price et al. (2024) |
Systematic reviews | Thoroughly collect literature on certain theories or topics of investigation | Apply close consideration to studies and might overlook more general patterns; bibliometric analysis offers an understanding of the entire landscape of research | Dhote et al. (2021); Ilyas et al. (2021); Singh et al. (2023a, 2023b) |
Critical reviews | Provide critical analysis of the literature, stressing areas of strength and weakness and summarizing results | Offer subjective perspectives; bibliometric analysis contributes objective, data-driven insights on research output and collaboration | Al-Tohamy et al. (2022); Alvi et al. (2023) |
Rapid reviews | Implemented for providing timely evidence on topics, sometimes employing less rigorous methodologies | Speed may be prioritized over the need for depth; however, bibliometric analysis enables a thorough examination of the existing literature | Matos & Roebeling (2022) |
The review evaluates the use of ML models in wastewater treatment, exploring their potential in distillery wastewater, and providing insights, comparisons, and recommendations for future research. To achieve the following objectives are set: (i) to conduct a comprehensive review of the existing literature on ML applications in wastewater treatment using the Scopus, ScienceDirect, and Web of Science databases; (ii) to evaluate and compare the performance of different ML models employed in wastewater treatment, notwithstanding the lack of specific studies on distillery wastewater; (iii) to infer how the findings from various wastewater treatments can be applied to distillery wastewater treatment; (iv) to identify and present prospective research opportunities for the incorporation of ML models in distillery wastewater treatment, based on current trends and identified gaps.
Wastewater treatments are crucial for resource recycling and need to be sustainable (Chowdhari et al. 2022; Mohnish & Parag 2024). ML and artificial intelligence are key strategies in various fields, including health, environment, energy, education, pollution control, and disaster management. These models use input and output data to make predictions and adjustments, enhancing efficiency and sustainability by dealing with complex problems (Singh et al. 2023a, 2023b). Despite the lack of studies that specifically apply ML to distillery wastewater, this gap suggests a new opportunity for future research. This review identifies critical areas in which ML can be useful in addressing the distinctive treatment challenges associated with distillery wastewater by comparing the results of studies on other wastewater types. A bibliometric study was selected to thoroughly outline the current state of research on ML in wastewater treatment. This methodology enables the identification of trends within the literature, encompassing countries at the leading edge of research, the most frequently cited publications, and emerging research domains. Through quantitative analysis of the current literature, a data-driven perspective of the field can be provided, which is crucial for identifying gaps and potential directions for future study. This bibliometric analysis explores ML applications in wastewater treatment using adsorption up to 2024.
BIBLIOMETRIC METHOD
Bibliometric analysis is a method used to explore and analyze scientific data in various fields. It helps researchers understand the evolution of specific fields and identify emerging areas. The application of bibliometric analysis in research is relatively new and underdeveloped in some areas. This article offers an examination of methodologies and processes involved in the execution of research utilizing bibliometric analysis.
Search criteria
Bibliometric analysis
Time-series analysis
Keyword analysis
The red cluster focuses on artificial intelligence, water pollutants, and pollutant removal. The blue cluster discusses wastewater, wastewater management, and adsorption mechanisms. The green cluster focuses on ML models and adsorption capacities for collaborative planning and decision-making.
Country-wise analysis
Journal distribution
APPLICATIONS OF ML
The ANN is the most extensively used and popular among several ML models in wastewater treatments. The ANN model follows the same methodology as the human brain. Like neurons utilize past data to improve their response in the present, ANN also learns from the past and responds in the form of predictions (Ismail et al. 2019; Wang et al. 2022).
Adsorption, coagulation/flocculation, membrane processes, and advanced oxidation processes are just a few of the conventional methods for removing pollutants based on chemical, biological, and physical properties (Jadhav & Mahajan 2013; Aloulou et al. 2021; Yadav et al. 2021). In this review, the principle physico-chemical processes that applied ML for contamination extraction are uncovered in discrete sections, including solely adsorption processes. The properties of each of these ML models are summarized in Table 2.
ML algorithm . | Advantages . | Limitations . | References . |
---|---|---|---|
ANN |
|
| Dhumal (2024); Jadhav et al. (2023); Jawad et al. (2021); Patel & Jha (2015); Singh et al. (2023a, 2023b); Singh et al. (2019) |
Support vector machines (SVM) |
|
| El Alaoui El Fels et al. (2023); Kamyab-Talesh et al. (2019); Zhang et al. (2023) |
Decision trees (DT) |
|
| Jalal & Ezzedine (2020); Kaparthi & Bumblauskas (2020); Kumar et al. (2013); Solano Meza et al. (2019) |
Random forest (RF) |
|
| Ghaedi et al. (2014); Rodriguez-Galiano et al. (2014); Takarina et al. (2024) |
Regression models |
|
| Hosseinzadeh et al. (2020); Kausar et al. (2020); Marchionni et al. (2014); Noor et al. (2023) |
Genetic algorithms |
|
| Halim et al. (2015); Kazadi Mbamba & Batstone (2023); Piuleac et al. (2013) |
ML algorithm . | Advantages . | Limitations . | References . |
---|---|---|---|
ANN |
|
| Dhumal (2024); Jadhav et al. (2023); Jawad et al. (2021); Patel & Jha (2015); Singh et al. (2023a, 2023b); Singh et al. (2019) |
Support vector machines (SVM) |
|
| El Alaoui El Fels et al. (2023); Kamyab-Talesh et al. (2019); Zhang et al. (2023) |
Decision trees (DT) |
|
| Jalal & Ezzedine (2020); Kaparthi & Bumblauskas (2020); Kumar et al. (2013); Solano Meza et al. (2019) |
Random forest (RF) |
|
| Ghaedi et al. (2014); Rodriguez-Galiano et al. (2014); Takarina et al. (2024) |
Regression models |
|
| Hosseinzadeh et al. (2020); Kausar et al. (2020); Marchionni et al. (2014); Noor et al. (2023) |
Genetic algorithms |
|
| Halim et al. (2015); Kazadi Mbamba & Batstone (2023); Piuleac et al. (2013) |
ML is a powerful tool for evaluating the technical performance of wastewater treatment plants (WWTPs), providing users with greater flexibility and capabilities in cost reduction, time analysis, effluent quality control, energy optimization, monitoring, and fault detection.
Despite increased interest in using ML for wastewater treatment, research on its usage in distillery wastewater treatment is limited. As a result, comparisons have been drawn with studies on other types of wastewater treatment. Research by Antwi et al. (2017) used the backpropagation-ANN model, showing higher accuracy, reliability, and efficiency than the multiple nonlinear regression model in both biogas and methane predictions, as it could explain more of the variation in the data and capture the complex nonlinear relationships between input and output parameters. Sharafati et al. (2020) demonstrated how ML models can be used to simulate wastewater discharge characteristics, save operating costs, and lessen environmental effects. In addition, time lags in treatment processes are addressed for better management techniques in a study by Wang et al. (2021). An ML framework was suggested that makes use of RF and deep neural network (DNN) models to understand the complex interaction between operational parameters and effluent quality. Bagheri et al. (2019) highlighted the advantages of hybrid ML models in distillery wastewater treatment, emphasizing faster results and lower expertise requirements for parameter tuning. They also pointed to the integration potential of various artificial intelligence (AI)/ML techniques, such as clustering analysis and image recognition, for effective monitoring and control. Torregrossa et al. (2018) applied ML models for energy cost modeling in distillery WWTPs in North-West Europe, showcasing superior technical performance compared to traditional approaches and emphasizing the need for open-sourcing methodologies for wider accessibility. In a separate instance, Filipe et al. (2019) utilized ML modeling to predict wastewater intake rates and optimize pump operations in distillery wastewater treatment. This approach demonstrated adaptability, continuous learning, and predictive capabilities, resulting in reduced tank overflows, energy savings, and significant cost reduction in the distillery wastewater treatment processes. Table 3 represents the use of ML models in wastewater engineering, utilizing input parameters such as pH, temperature, COD, and pollutant concentrations to predict effluent quality, treatment efficiency, and adsorption capacity.
Sr. No. . | Model name . | Influencing factors (inputs) . | Target factors (outputs) . | Type of wastewater and its treatment . | Dataset . | Model performance . |
---|---|---|---|---|---|---|
1 | RF, SVR, *SHAP (Ullah et al. 2023) | Adsorbent properties: carbon, oxygen, iron load Adsorption conditions: solution pH, temperature | Selenium removal capacity | Wastewater contaminated with selenium (biosorption technique) | A dataset comprising 40 samples | R2 of RF = 0.98, SVR = 0.98 and SHAP = 0.95 and root mean square error (RMSE): of RF = 0.35, SVR = 0.14 and SHAP = 0.23 |
2 | RF, GBT, ANN (Zhu et al. 2021) | CBMs properties: molar ratio of (O and N/C), point of zero charges, molar ratio of (H/C), total carbon content, molar ratio of (O/C), Ash content, Brunauer–Emmett–Teller surface area. Adsorption conditions: pH, temperature, initial concentration of TC or sulfamethoxazole (SMX) in comparison to CBMs dosage | Adsorption capacity of CBMs | Pharmaceutical wastewater (adsorption treatment) | Data based on 111 different materials | RF model for TC (R2 = 0.894) and SMX (R2 = 0.909) |
3 | RF, DT, and GB (Moosavi et al. 2021) | Agro waste characteristics: surface area, pore volume, pH, particle size. Adsorption conditions: type of dyes, dye initial concentration, pH | Adsorption capacity | Industrial wastewater (adsorption using agricultural waste) | 350 | Model accuracy RF = 0.92, DF = 0.83, GB = 0.84. |
4 | RF and ANN (De Miranda Ramos Soares et al. 2020) | Salinity, contact time, initial dye concentration, rotation, temperature, adsorbent dosage, pH | Final dye concentration, adsorption capacity, removal rate | Adsorption treatment of methylene blue dye in orange bagasse | 606 | RF: R2 = 0.9318ANN: R2 = 0.9257 |
5 | RF and ANN (Zhu Wang & Ok 2019) | Biochar characteristics: molar ratio of (O/C), surface area of biochar, mass percentage of total carbon in the biochar, cation exchange capacity, pH of biochar in water, ash content, biochar particle size, molar ratio of (O&N/C), molar ratio of (H/C). Adsorption conditions: initial concentration ratio of heavy metals to biochar, temperature, pH. Heavy metal properties: charge number, electronegativity, ion radius | Predict the adsorption capacity of biochar for heavy metals | Industrial Wastewater containing Heavy metals (adsorption process) | 353 | ANN: R2 = 0.948, RF: R2 = 0.973 |
6 | RF, SVM, and ANN (Ismail et al. 2023) | Temperature, ZIF-60 dose, and lead initial concentration | Percentage removal of lead | Lead contaminated wastewater (adsorption process) | 26 | R: RF = 0.9632, ANN = 0.9494 and SVM = 0.8907, RMSE:RF = 9.13; ANN = 9.79 and SVM = 14.04. |
7 | Rivest-Shamir-Adleman (RSA) algorithm and ANN (Agarwal et al. 2023) | Adsorbent dose, contact time, pH, and temperature | COD and color removal efficiency | Dyeing industry effluent (adsorption process using wheat straw activated carbon) | 30 | ANN (COD): R2 = 0.83, RMSE = 1.899; ANN (color): R2 = 0.817, RMSE = 1.873; RSA (COD): R2 = 0.938, RMSE = 1.097; RSA (color): R2 = 0.94; RMSE = 0.926. |
8 | SVR and ANN (Ahmad Aftab et al. 2023) | Congo red concentration, dosage, pH of solution, temperature, time | Adsorption capacity | Industrial wastewater (adsorption process) | 60 | R2 of ANN = 0.9884, SVR = 0.9816; RMSE: ANN = 0.5395, SVR = 0.4635. |
9 | XGBoost, gradient boosting decision tree (GBDT), Light gradient boosting machine (GBM), and RF (Abdi & Mazloom 2022) | Temperature, adsorbent surface area, initial arsenic concentration, adsorbent dosage, solution pH, contact time, and presence of anions | Adsorptive removal of As (V) | - | 280 | LightGBM: R2 = 0.9958, RMSE = 2.0688; XGBoost: R2 = 0.9879, RMSE = 2.8081; GBDT: R2 = 0.9812, RMSE = 2.9137; RF:R2 = 0.9799, RMSE = 3.3845 |
10 | LSSVM (Salahshoori et al. 2024) | Pore size, surface area, pH, initial pollutant concentration, adsorbent dosage, and contact time | Photocatalytic efficiency | - | 374 | R2 = 0.991 RMSE = 3.42 |
11 | RF, gradient boosting regression tree (GBRT), XGBoost, ANN (Xiong et al. 2023) | Surface area, adsorbent's oxidation state, polarizability, adsorbent's and solution pH, electronegativity, initial concentration of pollutant | Adsorption capacity | Wastewater containing arsenic (As) as a pollutant | 1,508 | R2; RF = 0.90; GBRT = 0.92 XGBoost = 0.93, ANN = 0.92. RMSE; RF = 0.27; GBRT = 0.23 XGBoost = 0.22 ANN = 0.24 |
12 | CatBoost, XGBoost, EN, KRR, MLR, LightGBM (Ekinci et al. 2023) | Flow rate, adsorbent dosage, total nitrogen, removed total suspended solids (TSS), total phosphorous, COD, BOD | Sludge production | Municipal wastewater | 75,920 | R2; MLR = 0.94 EN = 0.94 CatBoost = 0.89 KRR = 0.94, XGBoost = 0.83, LightGBMR = 0.89 |
13 | RF (Zhu et al. 2022) | Temperature, pH, Initial concentration of pollutant, and adsorbent dosage | Adsorption capacity | Pharmaceutical wastewater | 175 | R2 = 0.921 RMSE = 0.155 |
14 | EN, DT, GB (Hu et al. 2022) | Type of pollutant, and initial concentration of pollutant | Final pollutant concentration, adsorption capacity | Wastewater containing heavy metals as pollutants | 18 | R2; EN = 0.976, DT = 0.996, GB = 0.998 RMSE; RF = 0.168, DT = 0.134, GB = 0.103 |
15 | ABR, GB, and RF (Sharafati et al. 2020) | pH, BOD5, total dissolved solids (TDS), COD, total phosphorous (TP), TSS, and total nitrogen (TN) | TDS, BOD5, COD | Industrial wastewater | - | Correlation coefficient; ABR = 0.962 gradient boost regression (GBR) = 0.959 random forest regression (RFR) = 0.957 |
16 | RF, DNN (Wang et al. 2021) | Temperature, pH, TSS, flow rate, phosphate, total solids, and dissolved oxygen | Effluent TSS, phosphate | Municipal wastewater | 105,763 | R2: RF = 0.934, DNN = 0.935 |
Sr. No. . | Model name . | Influencing factors (inputs) . | Target factors (outputs) . | Type of wastewater and its treatment . | Dataset . | Model performance . |
---|---|---|---|---|---|---|
1 | RF, SVR, *SHAP (Ullah et al. 2023) | Adsorbent properties: carbon, oxygen, iron load Adsorption conditions: solution pH, temperature | Selenium removal capacity | Wastewater contaminated with selenium (biosorption technique) | A dataset comprising 40 samples | R2 of RF = 0.98, SVR = 0.98 and SHAP = 0.95 and root mean square error (RMSE): of RF = 0.35, SVR = 0.14 and SHAP = 0.23 |
2 | RF, GBT, ANN (Zhu et al. 2021) | CBMs properties: molar ratio of (O and N/C), point of zero charges, molar ratio of (H/C), total carbon content, molar ratio of (O/C), Ash content, Brunauer–Emmett–Teller surface area. Adsorption conditions: pH, temperature, initial concentration of TC or sulfamethoxazole (SMX) in comparison to CBMs dosage | Adsorption capacity of CBMs | Pharmaceutical wastewater (adsorption treatment) | Data based on 111 different materials | RF model for TC (R2 = 0.894) and SMX (R2 = 0.909) |
3 | RF, DT, and GB (Moosavi et al. 2021) | Agro waste characteristics: surface area, pore volume, pH, particle size. Adsorption conditions: type of dyes, dye initial concentration, pH | Adsorption capacity | Industrial wastewater (adsorption using agricultural waste) | 350 | Model accuracy RF = 0.92, DF = 0.83, GB = 0.84. |
4 | RF and ANN (De Miranda Ramos Soares et al. 2020) | Salinity, contact time, initial dye concentration, rotation, temperature, adsorbent dosage, pH | Final dye concentration, adsorption capacity, removal rate | Adsorption treatment of methylene blue dye in orange bagasse | 606 | RF: R2 = 0.9318ANN: R2 = 0.9257 |
5 | RF and ANN (Zhu Wang & Ok 2019) | Biochar characteristics: molar ratio of (O/C), surface area of biochar, mass percentage of total carbon in the biochar, cation exchange capacity, pH of biochar in water, ash content, biochar particle size, molar ratio of (O&N/C), molar ratio of (H/C). Adsorption conditions: initial concentration ratio of heavy metals to biochar, temperature, pH. Heavy metal properties: charge number, electronegativity, ion radius | Predict the adsorption capacity of biochar for heavy metals | Industrial Wastewater containing Heavy metals (adsorption process) | 353 | ANN: R2 = 0.948, RF: R2 = 0.973 |
6 | RF, SVM, and ANN (Ismail et al. 2023) | Temperature, ZIF-60 dose, and lead initial concentration | Percentage removal of lead | Lead contaminated wastewater (adsorption process) | 26 | R: RF = 0.9632, ANN = 0.9494 and SVM = 0.8907, RMSE:RF = 9.13; ANN = 9.79 and SVM = 14.04. |
7 | Rivest-Shamir-Adleman (RSA) algorithm and ANN (Agarwal et al. 2023) | Adsorbent dose, contact time, pH, and temperature | COD and color removal efficiency | Dyeing industry effluent (adsorption process using wheat straw activated carbon) | 30 | ANN (COD): R2 = 0.83, RMSE = 1.899; ANN (color): R2 = 0.817, RMSE = 1.873; RSA (COD): R2 = 0.938, RMSE = 1.097; RSA (color): R2 = 0.94; RMSE = 0.926. |
8 | SVR and ANN (Ahmad Aftab et al. 2023) | Congo red concentration, dosage, pH of solution, temperature, time | Adsorption capacity | Industrial wastewater (adsorption process) | 60 | R2 of ANN = 0.9884, SVR = 0.9816; RMSE: ANN = 0.5395, SVR = 0.4635. |
9 | XGBoost, gradient boosting decision tree (GBDT), Light gradient boosting machine (GBM), and RF (Abdi & Mazloom 2022) | Temperature, adsorbent surface area, initial arsenic concentration, adsorbent dosage, solution pH, contact time, and presence of anions | Adsorptive removal of As (V) | - | 280 | LightGBM: R2 = 0.9958, RMSE = 2.0688; XGBoost: R2 = 0.9879, RMSE = 2.8081; GBDT: R2 = 0.9812, RMSE = 2.9137; RF:R2 = 0.9799, RMSE = 3.3845 |
10 | LSSVM (Salahshoori et al. 2024) | Pore size, surface area, pH, initial pollutant concentration, adsorbent dosage, and contact time | Photocatalytic efficiency | - | 374 | R2 = 0.991 RMSE = 3.42 |
11 | RF, gradient boosting regression tree (GBRT), XGBoost, ANN (Xiong et al. 2023) | Surface area, adsorbent's oxidation state, polarizability, adsorbent's and solution pH, electronegativity, initial concentration of pollutant | Adsorption capacity | Wastewater containing arsenic (As) as a pollutant | 1,508 | R2; RF = 0.90; GBRT = 0.92 XGBoost = 0.93, ANN = 0.92. RMSE; RF = 0.27; GBRT = 0.23 XGBoost = 0.22 ANN = 0.24 |
12 | CatBoost, XGBoost, EN, KRR, MLR, LightGBM (Ekinci et al. 2023) | Flow rate, adsorbent dosage, total nitrogen, removed total suspended solids (TSS), total phosphorous, COD, BOD | Sludge production | Municipal wastewater | 75,920 | R2; MLR = 0.94 EN = 0.94 CatBoost = 0.89 KRR = 0.94, XGBoost = 0.83, LightGBMR = 0.89 |
13 | RF (Zhu et al. 2022) | Temperature, pH, Initial concentration of pollutant, and adsorbent dosage | Adsorption capacity | Pharmaceutical wastewater | 175 | R2 = 0.921 RMSE = 0.155 |
14 | EN, DT, GB (Hu et al. 2022) | Type of pollutant, and initial concentration of pollutant | Final pollutant concentration, adsorption capacity | Wastewater containing heavy metals as pollutants | 18 | R2; EN = 0.976, DT = 0.996, GB = 0.998 RMSE; RF = 0.168, DT = 0.134, GB = 0.103 |
15 | ABR, GB, and RF (Sharafati et al. 2020) | pH, BOD5, total dissolved solids (TDS), COD, total phosphorous (TP), TSS, and total nitrogen (TN) | TDS, BOD5, COD | Industrial wastewater | - | Correlation coefficient; ABR = 0.962 gradient boost regression (GBR) = 0.959 random forest regression (RFR) = 0.957 |
16 | RF, DNN (Wang et al. 2021) | Temperature, pH, TSS, flow rate, phosphate, total solids, and dissolved oxygen | Effluent TSS, phosphate | Municipal wastewater | 105,763 | R2: RF = 0.934, DNN = 0.935 |
Note. SHAP: SHapley Additive exPlanations, GBT: gradient boosting trees, TC: tetracycline, C: carbon, H: hydrogen, O: oxygen, N: nitrogen, GB: gradient boosting, XGBoost: eXtreme Gradient Boosting, Light GBM: Light Gradient Boosting Machine, LSSVM: least squares support vector machine, MLR: multiple linear regression, EN: elastic net, CatBoost: categorical boosting regression, KRR: kernel ridge regression, ABR: ada boost regression.
CHALLENGES AND INNOVATIONS
ML faces multiple challenges when applied to wastewater treatment, including issues related to interpretability, model reproducibility, data requirements, physical significance, and transparency. The complexity of ML models makes it challenging to grasp the rationale behind their predictions (Dunnington et al. 2021). Achieving reproducibility is crucial for ensuring the reliability of models, but variations in input data or environmental conditions can introduce uncertainties in performance (Lowe et al. 2022). The diverse characteristics of different wastewater types, including distillery wastewater, present additional challenges in obtaining high-quality data necessary for effective ML training (Zhong et al. 2021). Additionally, some ML models may lack a direct connection to the physical processes in WWTPs, hindering their practical adoption (Sundui et al. 2021). Transparency and fairness are vital considerations for the successful application of ML in the treatment of wastewater, especially given the potential environmental and public health implications of model decisions (Lowe et al. 2022).
Machine learning algorithms (MLAs) for wastewater treatment face challenges such as lack of reliable data, selecting appropriate input and output variables, and adapting to different operating conditions. These challenges are particularly pronounced in distillery wastewater treatment, which is characterized by complex effluent compositions, such as high COD and variable pH levels. Determining relevant variables, such as pH, temperature, and nutrient concentrations, can be complicated due to the variability in effluent characteristics over time and the presence of interfering substances. Model validation and generalization are also challenging because of the dynamic nature of distillery wastewater, which can lead to overfitting or poor generalization of unseen data. To address these issues, careful feature selection and continuous monitoring of key variables are needed to enhance model accuracy and robustness (Sundui et al. 2021).
The section outlines diverse strategies aimed at enhancing the effectiveness of ML in wastewater treatment, with particular implications for distillery wastewater treatment. These strategies encompass the adoption of interpretable ML models (Dunnington et al. 2021), hybrid models (Singh et al. 2023a, 2023b), transfer learning from related domains (Oliveira et al. 2021), data augmentation through synthetic data generation (Zhong et al. 2021), ensemble methods combining multiple ML models (Sharafati et al. 2020), online learning for adaptive real-time responsiveness (Sundui et al. 2021), feature engineering for relevant data extraction (Moosavi et al. 2021), uncertainty quantification for informed decision-making (Nguyen et al. 2022), and edge computing for deploying lightweight ML models directly on-site (Bourechak et al. 2023). Additionally, collaborative research and open data sharing are emphasized to accelerate advancements in wastewater treatment, including distillery wastewater, through collective collaboration (Sundui et al. 2021; Zhong et al. 2021).
WWTPs generate sludge, a crucial byproduct of environmental management. The treatment of distillery wastewater, in particular, results in sludge that is rich in organic compounds, which can significantly increase oxygen demand in aquatic systems, contributing to higher COD and BOD levels. This makes its handling critical, as improper disposal or treatment can lead to environmental contamination and further strain water resources. Shao et al. (2023) used MLAs like RF and eXtreme Gradient Boosting tree (XGBoost) showing superior predictive accuracy for wastewater treatment. A novel ML-based approach has been proposed for predicting effluent quality in WWTPs by analyzing relationships between influent and effluent pollution variables (Jafar et al. 2022). ML approaches have also been explored for the integrated operation of biological wastewater treatment systems, which combine nutrient recovery and algal biomass production from municipal wastewater. However, the unique composition of distillery wastewater presents an opportunity for further exploration of ML models tailored specifically to these systems. Defining system inputs and outputs is essential for effective application of these models (Sundui et al. 2021).
In summary, ML continues to revolutionize wastewater treatment, with emerging potential to address the challenges posed by distillery wastewater by providing accurate predictions, optimizing processes, and addressing environmental challenges. These advancements lay the groundwork for a future that is cleaner and more sustainable.
RESULTS AND DISCUSSION
The study employs bibliometric analysis to comprehensively analyze the scientific data pertaining to ML applications in wastewater treatment, specifically focusing on the adsorption process. Leveraging the extensive resources of the Scopus database, 62 publications were identified, offering valuable insights into the trends and characteristics within this domain. Through mathematical and statistical methodologies, significant findings were unveiled, with keyword exploration using VOSviewer software revealing distinct clusters related to methods, parameters, networks, and thermodynamics, thus providing a holistic perspective on the research landscape. The keyword analysis for ‘Machine Learning’, ‘Wastewater’, and ‘Adsorption’ (refer to Figure 3) identified several key clusters that represent the focal points of current research. For example, the ‘Machine Learning’ cluster has phrases like ‘decision trees’, ‘random forest,’ and ‘prediction models,’ demonstrating the importance of ML approaches in wastewater treatment. The ‘Adsorption’ cluster emphasized terms such as ‘charcoal,’ ‘biochar,’ and ‘adsorbent materials,’ indicating the ongoing emphasis on adsorption techniques for wastewater recovery. Furthermore, the ‘Wastewater’ cluster included phrases like ‘pollutant removal,’ ‘wastewater treatment,’ and heavy metals,’ highlighting the wide-ranging use of adsorption in this field. These results highlight important research directions and shed light on the growing use of ML models to improve the effectiveness of adsorption methods in wastewater treatment.
Additionally, the examination of the global collaborative landscape underlines the leading nations in ML research for wastewater treatment, including China, the United States, India, and Australia. However, the study highlights the lack of focus on distillery wastewater in these collaborations, identifying a gap in the research that presents opportunities for enhanced international cooperation in this crucial area. Moreover, the distribution of research across journals underscores the importance of platforms such as the Journal of Molecular Liquids, Science of the Total Environment, and Journal of Hazardous Materials in disseminating relevant research material, further emphasizing the significance of this field.
The following are future directions for advancing sustainable wastewater management through the utilization of ML techniques:
Develop ML models specifically tailored to distillery wastewater treatment, addressing the high COD, fluctuating pH, and unique effluent characteristics.
Advance hybrid ML models to optimize wastewater treatment processes, ensuring efficient contamination removal and resource recovery.
Explore transfer learning techniques to adapt ML models to diverse wastewater treatment scenarios, promoting adaptability and scalability in sustainable wastewater management practices.
Implement ensemble approaches to enhance prediction accuracy and reliability, facilitating informed decision-making for resource allocation and process optimization.
Embrace online learning algorithms to continuously update ML models with real-time data, enabling proactive monitoring and response to changing environmental conditions.
Prioritize feature engineering to extract actionable insights from wastewater data, supporting targeted interventions for sustainable water reuse and discharge management.
Foster interdisciplinary collaboration to integrate knowledge from fields such as environmental engineering, chemistry, and biotechnology while emphasizing the potential benefits of distillery wastewater valorization.
Encourage collaborative efforts among distillery industries and researchers to accelerate advancements in ML applications for distillery wastewater treatment.
CONCLUSION AND RECOMMENDATIONS
This study examines the role of ML in wastewater treatment, focusing on adsorption methods and utilizing a bibliometric analysis of literature up to 2024. The analysis reveals a critical research gap; no studies were found through bibliometric analysis (Scopus, ScienceDirect, and Web of Science databases) that apply ML specifically to distillery wastewater, highlighting a significant opportunity for future research. Insights from this analysis indicate that ML models, such as RF and neural networks, have demonstrated efficiency in treating different kinds of wastewater, which can offer valuable strategies for distillery wastewater treatment.
The study achieves its objectives by reviewing existing literature on ML applications in wastewater treatment, evaluating the performance of different ML models, and inferring how findings from other wastewater types can guide distillery wastewater treatment. Strengths of this work include systematically identifying research trends and gaps and presenting prospective research opportunities for incorporating ML in distillery wastewater treatment. However, the absence of direct studies on this specific application underscores the need for targeted research. Addressing these gaps can lead to more efficient, sustainable wastewater treatment methods.
By improving data availability, ensuring model transparency, and fostering interdisciplinary collaborations, this research provides a foundation for integrating ML approaches into distillery wastewater management. These advancements can promote both environmental protection and resource recovery, fostering sustainable practices in this underexplored area.
ETHICAL APPROVAL
This study did not involve human participants, human material, or human data, so it does not need an ethical approval document.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.