Abstract
This study identified literatures from the Web of Science Core Collection on the application of artificial intelligence in wastewater treatment from 2011 to 2022, through bibliometrics, to summarize achievements and capture the scientific and technological progress. The number of papers published is on the rise, and especially, the number of papers issued after 2018 has increased sharply, with China contributing the most in this regard, followed by the US, Iran and India. The University of Tehran has the largest number of papers, WATER is the most published journal, and Nasr M has the largest number of articles. Collaborative network has been developed mainly through cooperation between European countries, China and the US. Remote sensing in developing countries needs to be further integrated with water quality monitoring programs. It is worth noting that artificial neural network is a research hotspot in recent years. Through keyword clustering analysis, 'machine learning' and 'deep learning' are hot keywords that have emerged since 2019. The use of neural networks for predicting the effectiveness of treatment of difficult to degrade wastewater is a future research trend. The rapid advancement of deep learning provides the opportunity to build automated pipeline defect detection systems through image recognition.
HIGHLIGHTS
Using bibliometrics to explore the application of artificial intelligence in wastewater treatment.
The combination of deep learning has been developed.
Remote sensing technology has become a research hotspot.
Machine learning or deep learning modeling is widely used to build data-driven predictive models.
INTRODUCTION
Water is the most crucial resources for organisms' survival. In the past decades, the demand for and abuse of water resources have caused enormous pressure on the water supply. Moreover, many countries are facing serious water pollution issues. In contemporary industry, water is necessary. However, in the midst of development, many companies use water in inefficient ways and even sometimes pollute it, for example, by discharging wastewater unwisely (Elsey-Quirk et al. 2022). Wastewater treatment (WWT) is an efficacious way to address water scarcity and water quality deterioration (Abdeldayem et al. 2022). Further treatment or polishing of treated wastewater can provide high-quality water, such as drinking water (Altowayti et al. 2022). Therefore, the requirements for sustainable sources of clean water are therefore essential for wastewater treatment plants (WWTPs), as well as for many other natural and industrial organizations that depend on water availability (Faherty 2021). In addition to meeting consumer demand and providing the necessary quality-of-life upgrades to infrastructure, WWTPs must cope with complex regulatory measures to meet rising quality criteria (Lowe et al. 2022).
Artificial intelligence (AI) focuses on the study of computers to mimic some human thought processes and intelligent behaviors, thus enabling computers to achieve higher-level applications. After training, AI systems perform intelligent control and optimization of various processes to enhance the product conformity rate. It is used extensively in such sectors as industrial control (Seo et al. 2021), optimal design (Matheri et al. 2022), fault diagnosis (Fuente & Represa 1997) and intelligent detection (Chen & Chen 2020). Machine learning (ML) is at the heart of AI and includes almost all of the most influential methods in AI. It includes deep learning (DL), random forests (RF), support-vector machine (SVM) algorithms, etc. With state-of-the-art advances in computational algorithms and a modern computer computing power, ML has made significant contributions to the development and application of various types of AI. At the meantime, ML models have showcased their capabilities in optimizing, modeling and automating water supply and WWT applications (Yaqub et al. 2020; Zhu & Piotrowski 2020). In a way which provides computer assistance for complex problems in WWT using chemical, physical and biological processes, AI can optimize water-based applications and reduce capital expenditure (Torregrossa et al. 2018; Zhang et al. 2021). In the background of the progress for AI, it is a necessity to probe research hotspots in order to help the investigation and progress in the domain of WWT. Bibliometrics measure the size, authorship and lexicon by using mathematics, statistics and other analytical quantitative research methods, to study the distribution of information in the literature. The bibliometric survey can conclude the historical research results of AI in WWT, undertake hotspot analysis and predict the futuristic research trends (Xu et al. 2022).
A bibliometric survey allows for a summary of historical research results in WWT. Carmona & Abejón (2023) used the Scopus database as a source to examine the scientific literature on the removal of heavy metals from wastewater using electrodialysis, membrane distillation and forward osmosis. Ola et al. (2023) studied the progress of research on chitosan-based composite photocatalyst systems for WWT from a bibliometric perspective and developed new and efficient methods to synthesizes chitosan from fish by-products (waste) with high adsorption efficiency for the removal of various types of pollutants. The econometric literature can also be used to conduct a hotspot analysis and predict future research trends. Zhang et al. (2023) used CiteSpace to systematically evaluate the structure, trends, research hotspots and frontiers of greenhouse gas emissions from WWTPs from a bibliometric perspective. Kamilya et al. (2022) explored the technical progress and future scope of research trends on nutrient recovery from waste streams. In contrast, under the global trend of rapid development of the information technology revolution, there is no article about systematically analyzing the application of AI in the field of WWT from a bibliometric perspective.
In this paper, a bibliometric analysis method is applied to search the papers in the Web of Science Core Collection (WOS) from October 2012 to October 2022, matching the keywords ‘artificial intelligence’ and ‘wastewater treatment’. Countries and institutions are analyzed according to filtering and downloading needs. Collaborations, publications and citations are visualized by means of maps and charts. The subject knowledge base is examined by extracting author information (number of institutions, h-index, etc.), and cited literature data. Finally, the cited references are extracted, and key words are used to identify topical and current research trends.
MATERIALS AND METHODS
Data collection
Data were gathered from WOS. To eliminate the bias caused by database updates and to ensure the integrity of the search, the main search terms used were ‘wastewater’, ‘effluent’, ‘treatment’, ‘handover’, ‘machine learning’ and ‘artificial intelligence’.
Zhu et al. (2021) pointed out that synonyms would be found in the recalled keywords during data collection, such as (‘wastewaters’ and ‘sewage’) (Table S1). In this endeavor, English was used as the only language, and publications like editorials, letters, case reports, etc., were excluded. Details such as titles, authors, (author-defined) keywords, abstracts and references were extracted from the eligible documents and exported them in TXT format. Due to the nature of CiteSpace, the data collector does not drop articles that appear to be ineligible, as they may interfere with potential data linkages of the text and reduce the accuracy/sensitivity of the analysis results (Li & Chen 2016).
Data analysis
CiteSpace 6.1.R6 software (Drexel University, Philadelphia, PA, USA) is used as a citation visualization and parsing software. During analysis using CiteSpace, data are saved as ‘down_xxx’ and stored and loaded according to the program's specific data format requirements. In this study, the slice length was always set to 1. The VOSviewer (Leiden University, Leiden, Netherlands) tool is adept at generating text maps of any type, visualizing the relationships between documents in stunning graphics to quickly target key documents in the subject area. Data were visualized using CiteSpace and Vosviewer for country and institution. The Bibliometrix package was loaded by using RStudio (version 1.4.1717) for co-occurrence of keywords and RStudio was used to visualize temporal trends in keywords.
RESULTS
Article distributions in various years
Countries, institutions and regions
The publications come from 82 countries, among which 10 countries with the highest number of publications were statistically represented. China has the greatest number (n = 210), accounting for 28.07% of the total number of published articles, followed by the US (n = 170, 17.38%), Iran (n = 77, 10.29%) and India (n = 70, 9.37%). Of these, the US (0.33) and China (0.31) had the highest centrality, much higher than the other countries. This indicates that both China and the US have made a contribution and have a certain impact in this research area (Table 1). The top 10 countries with the most published articles were cited a total of 12,081 times (excluding self-citations). China is the most cited country (3,174, 26.27%). The US and Iran are the two most commonly cited countries on average. This indicated that there is no strong link between average article citations and centrality.
No. . | Countries/regions . | Number of publications . | Number of citations . | Average article citations . | Centrality . |
---|---|---|---|---|---|
1 | China | 210 | 3,174 | 15.11428571 | 0.31 |
2 | Unites States | 130 | 1,521 | 11.7 | 0.33 |
3 | Iran | 77 | 2,011 | 26.11688312 | 0.15 |
4 | India | 70 | 1,180 | 16.85714286 | 0.2 |
5 | South Korea | 52 | 534 | 10.26923077 | 0.02 |
6 | Australia | 40 | 630 | 15.75 | 0.04 |
7 | Canada | 40 | 594 | 14.85 | 0.01 |
8 | Saudi Arabia | 37 | 293 | 7.918918919 | 0.12 |
9 | United Kingdom | 36 | 771 | 21.41666667 | 0.07 |
10 | Spain | 29 | 431 | 14.86206897 | 0.2 |
No. . | Countries/regions . | Number of publications . | Number of citations . | Average article citations . | Centrality . |
---|---|---|---|---|---|
1 | China | 210 | 3,174 | 15.11428571 | 0.31 |
2 | Unites States | 130 | 1,521 | 11.7 | 0.33 |
3 | Iran | 77 | 2,011 | 26.11688312 | 0.15 |
4 | India | 70 | 1,180 | 16.85714286 | 0.2 |
5 | South Korea | 52 | 534 | 10.26923077 | 0.02 |
6 | Australia | 40 | 630 | 15.75 | 0.04 |
7 | Canada | 40 | 594 | 14.85 | 0.01 |
8 | Saudi Arabia | 37 | 293 | 7.918918919 | 0.12 |
9 | United Kingdom | 36 | 771 | 21.41666667 | 0.07 |
10 | Spain | 29 | 431 | 14.86206897 | 0.2 |
In total, all records encompassed 1,242 organizations. The top 10 organizations and co-cited authors are as shown in Table 2. University of Tehran (Iran) has the largest number of publications, with 19 publications, accounting for 2.5% of the total, and the highest centrality of 0.08, followed by Duy Tan University (18 publications, Vietnam) and Chinese Academy of Sciences (16 publications, China). Duy Tan University (Vietnam) was the most frequently cited, with a total of 743 citations (6.18%), noticeably more than any other institution. In the top 10 organizations publishing articles, 4 are from China, 1 is from the US and the remaining 5 are from developing Asia.
No. . | Institutions . | Number of publications . | Number of citations . | Centrality . |
---|---|---|---|---|
1 | Univ Tehran (Iran) | 19 | 560 | 0.08 |
2 | Duy Tan Univ (Vietnam) | 18 | 743 | 0.06 |
3 | Chinese Acad Sci (China) | 16 | 218 | 0.05 |
7 | Tsinghua Univ (China) | 14 | 184 | 0.07 |
4 | Univ Tabriz (Iran) | 13 | 249 | 0.01 |
6 | AREEO (Iran) | 12 | 514 | 0.04 |
5 | Harbin Inst Technol (China) | 10 | 139 | 0 |
8 | Tongji Univ (China) | 10 | 58 | 0 |
9 | King Khalid Univ (Saudi Arabia) | 10 | 28 | 0 |
10 | Univ Illinois (US) | 10 | 60 | 0 |
No. . | Institutions . | Number of publications . | Number of citations . | Centrality . |
---|---|---|---|---|
1 | Univ Tehran (Iran) | 19 | 560 | 0.08 |
2 | Duy Tan Univ (Vietnam) | 18 | 743 | 0.06 |
3 | Chinese Acad Sci (China) | 16 | 218 | 0.05 |
7 | Tsinghua Univ (China) | 14 | 184 | 0.07 |
4 | Univ Tabriz (Iran) | 13 | 249 | 0.01 |
6 | AREEO (Iran) | 12 | 514 | 0.04 |
5 | Harbin Inst Technol (China) | 10 | 139 | 0 |
8 | Tongji Univ (China) | 10 | 58 | 0 |
9 | King Khalid Univ (Saudi Arabia) | 10 | 28 | 0 |
10 | Univ Illinois (US) | 10 | 60 | 0 |
Journals
Considering the top 10 journals from 2012 to 2022, the top three were WATER (n = 31), WATER RESEARCH (n = 24) and SCIENCE OF THE TOTAL ENVIRONMENT (n = 23). In terms of citations, SCIENCE OF THE TOTAL ENVIRONMENT had the most citations (n = 812), followed by WATER (n = 495) and WATER RESEARCH (n = 278). Of the top 10 publications, only WATER SCIENCE AND TECHNOLOGY had a centrality greater than 0.1, which suggests that this journal has a certain level of authority in this field (Table 3).
No. . | Journals . | Number of publications . | Number of citations . | Centrality . |
---|---|---|---|---|
1 | WATER | 31 | 495 | 0.02 |
2 | WATER RESEARCH | 24 | 278 | 0.05 |
3 | SCIENCE OF THE TOTAL ENVIRONMENT | 23 | 812 | 0.06 |
4 | SUSTAINABILITY | 20 | 117 | 0.01 |
5 | WATER SCIENCE AND TECHNOLOGY | 18 | 205 | 0.11 |
6 | JOURNAL OF CLEANER PRODUCTION | 17 | 241 | 0.01 |
7 | JOURNAL OF ENVIRONMENTAL MANAGEMENT | 16 | 194 | 0.02 |
8 | ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH | 13 | 82 | 0.02 |
9 | JOURNAL OF WATER PROCESS ENGINEERING | 13 | 67 | 0.01 |
10 | PROCESS SAFETY AND ENVIRONMENTAL PROTECTION | 13 | 284 | 0.02 |
No. . | Journals . | Number of publications . | Number of citations . | Centrality . |
---|---|---|---|---|
1 | WATER | 31 | 495 | 0.02 |
2 | WATER RESEARCH | 24 | 278 | 0.05 |
3 | SCIENCE OF THE TOTAL ENVIRONMENT | 23 | 812 | 0.06 |
4 | SUSTAINABILITY | 20 | 117 | 0.01 |
5 | WATER SCIENCE AND TECHNOLOGY | 18 | 205 | 0.11 |
6 | JOURNAL OF CLEANER PRODUCTION | 17 | 241 | 0.01 |
7 | JOURNAL OF ENVIRONMENTAL MANAGEMENT | 16 | 194 | 0.02 |
8 | ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH | 13 | 82 | 0.02 |
9 | JOURNAL OF WATER PROCESS ENGINEERING | 13 | 67 | 0.01 |
10 | PROCESS SAFETY AND ENVIRONMENTAL PROTECTION | 13 | 284 | 0.02 |
The number of journals has risen significantly since 2018, it can be forecasted that the number of publications will continually grow (Figure S1). Of these journals, SUSTAINABILITY is one that has shone brightly in recent years. From 2013 to 2019, the annual number of articles published in it has increased from less than 300 initially to over 7,000. In 2013, the value was 0.861, equivalent to 80% of the impact factor (1.077). In 2019, the value (1.711) was equivalent to just 66.4% of the impact factor (2.576). Since 2019, WATER has been the most published journal in the field and the journal has been able to quickly capture and research hotspots.
Authors of the top 10 publications
A total of 3,295 authors published articles in this area. The top 10 authors, including the author's country, affiliation and other information as shown in Table 4. The two authors with the largest number of articles are Nasr M and Pourghasemi HR, both published seven articles (0.94%). The most cited author is Choubin B (n = 267, 2.21%). However, only three authors have an h-index above 5, Pourghasemi HR is the author with the top h-index of 7.
No. . | Authors . | Countries . | Affiliations . | No. of publications . | Number of citations . | H-index . |
---|---|---|---|---|---|---|
1 | Nasr M | Egypt | Alexander Academy | 7 | 90 | 6 |
2 | Pourghasemi HR | Iran | Shiraz University | 7 | 66 | 7 |
3 | Choubin B | Iran | University of Tehran | 5 | 267 | 4 |
4 | Harrou F | Saudi Arabia | King Abdullah University of Science and Technology (KAUST) | 5 | 119 | 5 |
5 | Jang A | South Korea | Sungkyunkwan University (SKKU) | 5 | 28 | 3 |
6 | Mahmoud AS | US | Virginia Tech | 5 | 38 | 3 |
7 | Qiao JF | China | Beijing University of Technology | 5 | 152 | 5 |
8 | LY QV | Korea | Sejong University | 5 | 25 | 3 |
9 | Sun Y | US | Utah State University | 5 | 119 | 6 |
10 | Wang GM | China | Chinese Academy of Medical Sciences | 5 | 152 | 5 |
No. . | Authors . | Countries . | Affiliations . | No. of publications . | Number of citations . | H-index . |
---|---|---|---|---|---|---|
1 | Nasr M | Egypt | Alexander Academy | 7 | 90 | 6 |
2 | Pourghasemi HR | Iran | Shiraz University | 7 | 66 | 7 |
3 | Choubin B | Iran | University of Tehran | 5 | 267 | 4 |
4 | Harrou F | Saudi Arabia | King Abdullah University of Science and Technology (KAUST) | 5 | 119 | 5 |
5 | Jang A | South Korea | Sungkyunkwan University (SKKU) | 5 | 28 | 3 |
6 | Mahmoud AS | US | Virginia Tech | 5 | 38 | 3 |
7 | Qiao JF | China | Beijing University of Technology | 5 | 152 | 5 |
8 | LY QV | Korea | Sejong University | 5 | 25 | 3 |
9 | Sun Y | US | Utah State University | 5 | 119 | 6 |
10 | Wang GM | China | Chinese Academy of Medical Sciences | 5 | 152 | 5 |
A total of 37 publications have been published since 2018. Among them, the most cited articles were about the assessment of environmental efficiency and technologies for phytoremediation and activated carbon treatment of wastewater (Ansari et al. 2018). Under the premise of sufficient data, they established a predictive model based on ANN to estimate the cost of removing chlorothalonil pesticides by activated carbon and fermenting biohydrogen by Pistia stratiotes. Pourghasemi HR's publications are related to agricultural engineering geology, environmental science and ecology (Alalm & Nasr 2018; Mthethwa et al. 2018). He has published a total of 26 publications. He was awarded the titles of ‘Highly Cited Interdisciplinary Researcher’ and ‘Highly Cited Interdisciplinary Scientist’ in 2019 and 2020, consecutively. Choubin B's publications focus on the use of ML modeling for flood risk and sensitivity assessment (Taromideh et al. 2022; Choubin et al. 2023), using integrated learning modeling to explore the sensitivity of the model to groundwater salinity/hardness. He was the first person who analyzed the performance of 10 advanced ML techniques, including ANN, boosted regression trees (BRT), classification and regression trees (CART) and generalized linear models, to illustrate the central importance of ML modeling for landslide susceptibility and assessment variables in GIS and R open-source software (Pourghasemi & Rahmati 2018). Subsequently, landslide sensitivity modeling, adaptive neuro-fuzzy inference systems (Youssef et al. 2022) and a new set of meta-heuristics for flood sensitivity mapping applied to ML methods were developed (Chen et al. 2018). In 37 articles published by Choubin et al. (2020), models were constructed using ML for a multitude of areas of prediction, including climate prediction, flood sensitivity prediction (Choubin et al. 2023), groundwater hardness sensitivity and other topics (Mosavi et al. 2021).
Co-cited top 10 articles
Co-citation profiling can be used not only to show researchers' concern and to reveal developments in the structure of science but also to reveal cutting edge and domain analysis. Table 5 shows the top 10 most cited papers from 2012 to 2022. The most frequently cited paper is RF (n = 37), published in the journal MACHINE LEARNING in 2001. RF is an ensemble learning method which can handle classification and regression problems very successfully. It is highly sought after for its simplicity. A total of three review articles in the 10 co-cited articles examine and discuss the usage of data-driven soft sensing technologies in wastewater biological treatment plants. The contributions focus on the available state of the art of these soft sensing technologies and the specific tasks and potential they may face in practical applications. Further, there are four articles on ML, two of which introduce RF and SVM algorithms, and the other one optimizes the algorithms. Also, there is one paper that constructs an adaptive network-based fuzzy inference system (ANFIS) with promising applications in automatic control and signal processing.
No. . | Titles . | Authors . | Sources . | Year . | Total citations . |
---|---|---|---|---|---|
1 | Random Forests | Breiman L | Machine learning | 2001 | 37 |
2 | Prediction of effluent concentration in a wastewater treatment plant using machine learning models | Guo H | Journal of Environmental Sciences | 2015 | 37 |
3 | Data-derived soft-sensors for biological wastewater treatment plants: An overview | Haimi H | Environmental Modelling & Software | 2013 | 28 |
4 | Data-driven performance analyses of wastewater treatment plants: A review | Newhart KB | Water research | 2019 | 27 |
5 | ANFIS: adaptive-network-based fuzzy inference system | Jang JSR | IEEE Transactions on Systems, Man, and Cybernetics | 1993 | 26 |
6 | Transforming data into knowledge for improved wastewater treatment operation: A critical review of techniques | Corominas L | Environmental Modelling & Software | 2018 | 25 |
7 | Prediction of wastewater treatment plant performance using artificial neural networks | Hamed MM | Environmental Modelling & Software | 2004 | 25 |
8 | Support-Vector Networks | Cortes C | Machine learning | 1995 | 21 |
9 | Greedy function approximation: A gradient boosting machine | Friedman JH | The Annals of Statistics | 2001 | 21 |
10 | Use of artificial neural network black-box modeling for the prediction of wastewater treatment plants performance | Mjalli FS | Journal of Environmental Management | 2007 | 21 |
No. . | Titles . | Authors . | Sources . | Year . | Total citations . |
---|---|---|---|---|---|
1 | Random Forests | Breiman L | Machine learning | 2001 | 37 |
2 | Prediction of effluent concentration in a wastewater treatment plant using machine learning models | Guo H | Journal of Environmental Sciences | 2015 | 37 |
3 | Data-derived soft-sensors for biological wastewater treatment plants: An overview | Haimi H | Environmental Modelling & Software | 2013 | 28 |
4 | Data-driven performance analyses of wastewater treatment plants: A review | Newhart KB | Water research | 2019 | 27 |
5 | ANFIS: adaptive-network-based fuzzy inference system | Jang JSR | IEEE Transactions on Systems, Man, and Cybernetics | 1993 | 26 |
6 | Transforming data into knowledge for improved wastewater treatment operation: A critical review of techniques | Corominas L | Environmental Modelling & Software | 2018 | 25 |
7 | Prediction of wastewater treatment plant performance using artificial neural networks | Hamed MM | Environmental Modelling & Software | 2004 | 25 |
8 | Support-Vector Networks | Cortes C | Machine learning | 1995 | 21 |
9 | Greedy function approximation: A gradient boosting machine | Friedman JH | The Annals of Statistics | 2001 | 21 |
10 | Use of artificial neural network black-box modeling for the prediction of wastewater treatment plants performance | Mjalli FS | Journal of Environmental Management | 2007 | 21 |
No. . | Titles . | Authors . | Sources . | Year . | Total citations . |
---|---|---|---|---|---|
1 | Random forests | Breiman L | Machine learning | 2001 | 37 |
2 | Prediction of effluent concentration in a wastewater treatment plant using machine learning models | Guo H | Journal of Environmental Sciences | 2015 | 37 |
3 | Data-derived soft-sensors for biological wastewater treatment plants: An overview | Haimi H | Environmental Modelling & Software | 2013 | 28 |
4 | Data-driven performance analyses of wastewater treatment plants: A review | Newhart KB | Water research | 2019 | 27 |
5 | ANFIS: adaptive-network-based fuzzy inference system | Jang JSR | IEEE Transactions on Systems, Man, and Cybernetics | 1993 | 26 |
6 | Transforming data into knowledge for improved wastewater treatment operation: A critical review of techniques | Corominas L | Environmental Modelling & Software | 2018 | 25 |
7 | Prediction of wastewater treatment plant performance using artificial neural networks | Hamed MM | Environmental Modelling & Software | 2004 | 25 |
8 | Support-Vector Networks | Cortes C | Machine learning | 1995 | 21 |
9 | Greedy Function Approximation: A Gradient Boosting Machine | Friedman JH | The Annals of Statistics | 2001 | 21 |
10 | Use of artificial neural network black-box modeling for the prediction of wastewater treatment plants performance | Mjalli FS | Journal of Environmental Management | 2007 | 21 |
No. . | Titles . | Authors . | Sources . | Year . | Total citations . |
---|---|---|---|---|---|
1 | Random forests | Breiman L | Machine learning | 2001 | 37 |
2 | Prediction of effluent concentration in a wastewater treatment plant using machine learning models | Guo H | Journal of Environmental Sciences | 2015 | 37 |
3 | Data-derived soft-sensors for biological wastewater treatment plants: An overview | Haimi H | Environmental Modelling & Software | 2013 | 28 |
4 | Data-driven performance analyses of wastewater treatment plants: A review | Newhart KB | Water research | 2019 | 27 |
5 | ANFIS: adaptive-network-based fuzzy inference system | Jang JSR | IEEE Transactions on Systems, Man, and Cybernetics | 1993 | 26 |
6 | Transforming data into knowledge for improved wastewater treatment operation: A critical review of techniques | Corominas L | Environmental Modelling & Software | 2018 | 25 |
7 | Prediction of wastewater treatment plant performance using artificial neural networks | Hamed MM | Environmental Modelling & Software | 2004 | 25 |
8 | Support-Vector Networks | Cortes C | Machine learning | 1995 | 21 |
9 | Greedy Function Approximation: A Gradient Boosting Machine | Friedman JH | The Annals of Statistics | 2001 | 21 |
10 | Use of artificial neural network black-box modeling for the prediction of wastewater treatment plants performance | Mjalli FS | Journal of Environmental Management | 2007 | 21 |
Considering the 25 most cited articles (Figure S2), 11 of them centered on the use of AI technologies to improve plant performance in terms of reducing operational costs, as well as enhancing effluent quality. Eight of the 11 articles are dedicated to the construction of predictive wastewater concentration models based on different ML methods such as ANN (Guo et al. 2015; Kim et al. 2016), KNN (Chen & Schmidt 2016; Nadiri et al. 2018), FNN (Han et al. 2018; Hernández-del-Olmo et al. 2019) and DL (Bagheri et al. 2015; Naghibi et al. 2016)).
Torregrossa et al. (2018) built a new approach using energy prices as model parameters, in which an ML algorithm-based approach was proposed, and its relevance was evaluated for the first time. A high-performance energy cost model was generated for WWT works, which has been used in a database of 317 WWTPs in northwestern Europe. Bagheri et al. (2015) mixed liquid volatile suspended solids, pH, dissolved oxygen, temperature, total suspended solid, chemical oxygen demand and total nitrogen (TN) as inputs to a neural network, and developed a hybrid artificial neural network-genetic algorithm model to accurately determine the sludge volume index for better prediction of sludge expansion in WWTPs. Another paper developed an ML soft sensor that can predict unobservable measurements from existing data based on ML and models of past influent states at WWTPs. This soft sensor can predict weather conditions while the operator is monitoring WWTPs (Hernández-del-Olmo et al. 2019).
Key words
The key words include the core ideas, research topics and research methods of the literature, which can explore the hotspots and emerging trends in this research field. To extract the 17 key words with the strongest citation burst, the key words include title, abstract, author keywords, key words plus (Table S1). An analysis of the outbreak years of key words in the figure shows that the burst key words from 2013 to 2019 are mainly artificial neural network (2013–2019), optimization (2014–2019) and system (2014–2019).
Tümer & Edebalı (2015) used 4-month daily records from the Konya WWTP to simulate the Konya WWTP using multiple linear regression and artificial neural networks with different architectures in SPSS and MATLAB software. By comprehensively considering the input values of pH, temperature, chemical oxygen demand (COD), total suspended solids (TSS) and biochemical oxygen demand, the treatment efficiency of the device is determined, and it is proved that ANN is more satisfactory than the multiple linear regression model. Ozkan et al. (2009) used the 3-year daily input data of the same WWTP to define the output biochemical oxygen demand concentration of the Kaiseri WWTP to estimate the data through the training of a multi-layer artificial neural network model. The Levenberg–Marquaurdt algorithm is used to train the artificial neural network structure with five inputs and two hidden layers, and the best structure is obtained. Under this structure, the mean square error is 0.45 (MSE = 0.45), the average absolute error is 0.445, R2 = 0.915 and the effect is good.
Skoczko et al. (2017) studied the Bystre WWTP near Gitycko for 2 years and used the chemical parameters of sewage and the amount of sewage flowing into the facility as input variables, to establish an artificial neural network model to approximate the concentration and value of the basic quality parameters of sewage treatment. It was found that the concentration of TN and total phosphorus (TP) had the greatest influence on the variables and could best reflect the change of total suspended solids (TSM) in the treated sewage.
The key words that have appeared since 2019 are predictive control (2019–2021), adaptive regression spline (2019–2021), classification (2020–2021). It is easy to see that these words belong to the professional vocabulary in the field of ML. Due to the increasing concern about eutrophication, Ly et al. (2022) examined and compared six different ML algorithms from shallow to DL architectures and introduced the potential applications of ML to predict sewage quality. These models were developed to detect TP in the outlet, providing a reliable method for predicting sewage effluent quality.
As a statistical learning theory, SVM is suitable for small sample space classification. (Fang et al. 2019) proposed a geometric feature and environmental pollution factor advanced feature fusion algorithm based on multi-kernel function SVM, which improved the DL algorithm in the analysis of sewage membrane penetration. It requires a lot of calculation and training samples, but it shows unsatisfactory and unfavorable conditions in the case of a small sample field.
ML approach can also diagnose faults in WWT process. Xu et al. (2018) built an integrated classifier with a weighted limit learning machine as the base classified machine for the Bagging integrated framework. An improved Bagging composite WWT fault diagnosis modeling method based on a weighted limit learning machine was proposed to augment the accuracy of fault class recognitions.
Dual-map overlay
DISCUSSION
Reasons for the contribution differences in various countries
In terms of overall publication quantity, the top three countries are China, the US and Iran. Collecting training data is labor-intensive and time-consuming. As the most populous country, China has an advantaged role in the field of big data, which is crucial for the development of ML applications (Sommerville et al. 2021). China started late in AI, and lags behind some other developed countries at the basic levels. However, thanks to the vigorous support of national policies, the technology has been developed to the point where it is not very far behind, or even slightly ahead of, other countries in the basic algorithms. As far as Iran, it is one of the few countries in the Middle East with a complete industrial system and a food self-sufficiency rate of 90%. The development of agricultural and industrial production in Iran requires a large amount of water resources. These objective conditions make Iran attach great importance to the research in the field of water. The US has a strong economy and technology, in conjunction with which it places an emphasis on education associated with robotics, leading to its importance in this research area.
Water pollution around the world is inevitable, but the degree and situation are different. In general, water resources in Europe are in good condition. At the same time, Europe has a high degree of industrialization. Taking Germany as an example, the commonly used processes of domestic WWT include aeration process, partial biological filter, anaerobic sludge and many other methods (Rios-Miguel et al. 2023). The technology is relatively mature and the industry is highly developed. At the meantime, European countries are more environmentally conscious. As early as the 1980s, 23 European countries set up the nonprofit organization European Water Pollution Control Association, which focuses solely on tackling the management and improvement of the water environment and effectively ensuring the continued healthy development of the WWT industry. Germany, France, Switzerland and Italy have also developed their own set of standard procedures for the design and construction approval of WWTPs. Compared to other countries, the water resources in Europe are in good condition and the WWT technology is comparatively nice. The US has the largest number of WWTPs in the world. In the 1970s, the US began to automatically control WWTPs. Most of the WWTPs have been automatically controlled and tested. Compared to the US, China has a much larger population, despite its slightly larger land area. There are also developing countries with high population densities such as India and Vietnam, making it even more urgent to improve WWT in these countries. In the 1970s, most developed countries generally reached the secondary treatment level of WWT. In effort to further save costs on operating, investment and construction, developed countries began to modify traditional process flows and researched new technologies. Among these, WWT and utilization technology has received a high degree of attention. China's WWT is undergoing a profound transformation from small scale, low level, single variety and severe failure to meet demand to a considerable scale and level, with significantly improved variety and quality, and initially meeting the requirements of national economic development. WWT technology has progressively played an important role in the overall WWTPs sectors. The mathematical model includes the aspects of the hydraulic process, the pollutant removal, the oxygen transfer and the organic carbon and linearization. The intelligence of the project is directly determined by the fact that the researchers have excellent mathematical modeling thinking, no matter which model they are building, or by the way of ML or DL.
AI is undergoing significant progress in WWT. Most of the researchers in this area are from China, the US and Iran. The collaborative network has been developed mainly through cooperation between European countries, China and the US. Some of the top-ranked institutions do not have high centrality. Greater collaboration between agencies can improve the quality of their publications, thus greatly benefits this research area. The model is considered to be a formidable tool in the field of WWT. Activated sludge models and anaerobic digestion models have been used quite a bit since the beginning of the 21st century (Chen et al. 2022). Since 2012, AI has been used mostly in ANN algorithms for WWT, mainly for predicting wastewater parameters, simulating WWT processes and sensors (Sommerville et al. 2021). Although ANN is strong, it cannot replace the application of discharge mechanism models. In the last 2 years, ML has developed a standardized and applicable system of model parameter measurement methods for relevant mechanism models. Control methods which are applicable to different process control parameters in different WWT processes have been established, and eventually a combination of ANN control and model control has been found. This has greatly compensated for the shortcomings of ANN (El-Din et al. 2004). As of 2019, ML can be used to solve complex problems involving a large number of nonlinear processes or combinatorial spaces and can address typical problems in natural and engineered water systems (Hwang & Tu 2021). It could be predicted that using neural network ML algorithms to assess the pollutant removal in synthetic or real industrial wastewater using isotherms, kinetic and thermodynamic mechanisms models will become a research hotspot (Da Silva Medeiros et al. 2021).
Research frontiers and hotspots
ML can make aggregate predictions of water resources and has advantages over traditional statistical tools. Nelson et al. (2021) developed a framework using ML to calculate monthly-resolution isotope time series using available climate and location data. The model's predictions can now be used for the past 70 years (1950–2019) at any location in Europe, to determine the source history of water or to understand the application of hydrological or meteorological processes that determine these values. This addresses that existing precipitation isotope models are often less accurate for examining features such as long-term trends or interannual variability.
Remote sensing as a driver of water management decisions in developing countries needs to be further integrated with water quality monitoring programs. Rahman et al. (2021) used a hybrid combination of locally weighted linear regression (LWLR), stochastic subspace (RSS), reduced error pruned tree (REP Tree), random forest (RF) and M5P model tree algorithms, to assess the probability of multiple types of floods across the country. Historical flood data (1988–2020), remotely sensed imagery (MODIS, Landsat 5–8 and Sentinel-1) as well as topographic, hydrogeological and environmental datasets were used to train and validate the proposed algorithms. Arias-Rodriguez et al. (2021) combined field measurements from the Mexican National Water Quality Monitoring System (RNMCA) (2013–2019) with Landsat-8 OLI, Sentinel-3 OLCI and Sentinel-2 MSI data to train an extreme learning machine (ELM), support-vector regression machine (SVR) and linear regression machine (LR) to estimate chlorophyll a (Chl-a), turbidity, TSM and transparency (SDD) efficacy assessments of available sensors to complement the often limited in situ measurements in such programs to build models that support monitoring tasks.
Water-related data collection and model interpretation rich and complex field water data have been collected for many years, including flow rates, temperatures, DO concentrations, turbidity and chlorine levels. Guo et al. (2022) used spatial autocorrelation, environmental kuznets curve (EKC) and logarithmic mean Divisia index model to study the spatial characteristics and driving factors of industrial wastewater discharge. As these data are often incomplete or weakly correlated, it is hard to capture the time-varying or nonlinear behavior of dynamic water systems using traditional statistics-based models. Chys et al. (2017) proposed a novel robust correlation real-time monitoring and control model for trace organic pollutants (Troc) removal by ozonation based on UVA and fluorescent substitutes, and the kinetic information was taken into account. ML models are better able to cope with rapidly changing conditions. Moghadam et al. (2021) chose Fanno Creek (Oregon, USA) to make a case study. Daily values of water temperature, specific conductivity, stream discharge, pH and DO concentration were used as input variables for a study describing the implementation of a DL approach applied to a recurrent neural network (RNN) algorithm. Ji et al. (2017) developed a promising AI model, known as SVM, to predict the DO concentration in a hypoxic river in southeastern China. Four different calibration models, being multiple linear regression, back propagation neural network, general regression neural network and SVM, were established. Meanwhile, their prediction accuracy was systemically investigated and compared. The result proposed that SVM model can effectively predict water quality, especially for highly impaired and anoxic river systems.
Most WWTPs use activated sludge and other processes to remove pollutants (organic carbon, nitrogen, phosphorus). Tens of thousands of different microbial species may be present in each wastewater system. Deterministic models based on the biokinetics of the activated sludge process are not particularly practical due to the complexity of the biological reactions. The variability of the treatment plant and ML techniques can predict sludge expansion in WWTPs with much higher accuracy without the burden of calibration. Su et al. (2022) studied neural network and convolutional neural network (CNN) algorithms in DL and then built a target detection system. Finally, the treated wastewater was detected and the comparison results with the traditional target detection system showed that the target detection system of the CNN algorithm was more stable in identifying the treated wastewater than the target detection system of the neural network algorithm. Zhao et al. (2022) developed an ML model to predict pH, total nitrogen (TN), total organic carbon (TOC) and total phosphorus (TP) for WWTPs. The results show that gradient-boosted decision trees (mean test R of 0.85–0.96) can accurately predict the above wastewater characteristics for both single and multi-objective models. Li et al. (2022) proposed a new hybrid modeling approach to predict NO emissions by integrating the first master model and DL techniques. The results show that the hybrid model has higher accuracy in modeling NO emissions from WWTPs compared to mechanical or pure DL models. Furthermore, the hybrid model is more applicable than a pure DL model with a pure mechanical model due to lower data requirements.
The rapid advancement of DL, especially in the last 5 years, provides a wonderful opportunity to build automated pipeline defect detection systems through image recognition. Accumulation of fat, oil and grease (FOG) in the sumps of wastewater pumping stations is a common failure cause for these facilities. Moreno-Rodenas et al. (2021) present a low-cost camera-based automated system for the observation of FOG layer dynamics in wastewater pumping stations at high-frequency over extended time windows. Optical imagery is processed through a deep-learning computer vision routine that allows describing FOG layer dynamics and various hydraulic processes in the pump sump. When deployed at remote locations, compressed processed datasets can be transmitted (edge AI computing), which could be very useful to the hydro-ecological monitoring community. Pham et al. (2022) proposed and implemented a novel DL approach that outperformed random forest and SVM methods for monitoring various ecosystems in estuaries. By integrating spatial and background paths into a novel bilateral segmentation network (BiSeNet), the processing speed and accuracy of ordinary neural networks are enhanced by more than 10 times.
CONCLUSIONS
An objective, systematic and comprehensive bibliometric analysis of WWT scientific research related to ML, DL and ANN, which belong to AI, was conducted. Between 2012 and 2022, the number of papers showed an upward trend, with a significant increase in the number of papers published after 2018. China (n = 210, 28.07%) contributed the most publications. Spain, Turkey and Italy are the first countries which started the research on using AI in WWT in 2019. Duy Tan University (Vietnam) is the most cited university (n = 743) and Tehran University (Iran) has the highest number, being 19, of published papers. Nasr M and Pourghasemi HR published the most articles, being 7. Among these, the most cited articles focus on the use of AI and metal insulator metal (MIM) in improving factory performance, as well as on the construction of predictive models of wastewater concentration by using ML-based ANN-based methods. WATER is the most published journal.
Major research trends at this stage include the combination of ML or DL modeling with remote sensing techniques to predict flooding or groundwater sensitivity in specific areas, implementing controls and building data-driven predictive models. Neural networks are used to predict the effectiveness of treatment for difficult-to-degrade wastewater, to build models for different pollutants, to select dominant algal species with high nutrient value and effectiveness.
FUNDING
This work is funded by Foundation of Shanghai Science and Technology Commission (20ZR1449700), National Natural Science Foundation of China (51678353, 52070127), Chunhui Program of Ministry of Education of China (HZKY20220059), Cooperation project between China and Central and Eastern European Countries (2022192), Cooperation project from Shanghai Chimbusco Marine Bunker Co., Ltd (C80ZH236014), and Open Fund of Anhui International Joint Research Center for Nano Carbon-based Materials and Environmental Health (NCMEH2022Y02).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.