Abstract
Machine learning (ML), a branch of artificial intelligence (AI), has been increasingly used in environmental engineering due to the ability to analyze complex nonlinear problems (such as ones connected with water quality management) through a data-driven approach. This study provides an overview of different ML algorithms applied for monitoring and predicting river water quality. Different parameters could be monitored or predicted, such as dissolved oxygen (DO), biological and chemical oxygen demand (BOD and COD), turbidity levels, the concentration of different ions (such as Mg2+ and Ca2+), heavy metal or other pollutant's concentration, pH, temperature, and many more. Although many algorithms have been investigated for the prediction of river water quality, there are several which are most commonly used in engineering practice. These models mostly include so-called supervised learning algorithms, such as artificial neural network (ANN), support vector machine (SVM), random forest (RF), decision tree (DT), and deep learning (DL). To further enhance prediction power, novel hybrid algorithms, could be used. However, the quality of prediction is not only dependent on the applied algorithm but also on the availability of previously mentioned water quality parameters, their selection, and the combination of input data used to train the ML model.
HIGHLIGHTS
Classification, prediction, and anomaly detection algorithms were reviewed.
Hydrometeorology data can be used to compensate for missing parameter data.
Algorithms can struggle with generalization aspects important for real applications.
Covering critical sampling points and periods could enhance prediction accuracy.
Hybrid models could overcome the limitations of single models.
INTRODUCTION
In recent decades, the surface water quality (WQ) of the river streams has been negatively impacted by pollutants and wastes (Khullar & Singh 2021). The deteriorated WQ may bring about serious negative consequences on humans, aquatic life, and the environment in general. Moreover, climate change represents an additional pressure on surface WQ by reducing WQ during the low-flow seasons and increasing the river water temperature over the year. The required quality of surface water is defined by the framework directive on water and the law on water. In accordance with the Directive (2000/60/EC), the main goal is the sustainable management of all water systems, and it refers to determining the impact and pressure on water bodies because these are the main causes of pollution. The law on water includes decrees that define the issue in more detail. The most significant is the decree on limit values of emission of polluting substances in water, as well as the decree on limit values of priority and priority hazardous substances that pollute surface waters. The given decrees specify the limit values of the main parameters that define the required WQ (‘Sl. glasnik RS’, br. 50/2012, 24/2014). To define WQ, different data collection techniques could be applied, such as sampling and analysis in the field, laboratory analyses, and the application (appl.) of monitoring sensors that operate in real-time. High-quality sensors can be quite expensive, need regular maintenance to function optimally, and require calibration to ensure accurate water quality data parameters. With the development of technology, research goes toward an optimized way of managing WQ (Ahmed et al. 2019a, 2019b; Park et al. 2020). This research intends to analyze the existing challenges and find opportunities for appl. of machine learning (ML) algorithms (algo.) in the river WQ management. It envelops several aspects of ML appl. within the following issues: (a) WQ estimation and prediction (Wagle et al. 2020; Khullar & Singh 2021); (b) WQ classification (Abuzir & Abuzir 2022); and (c) WQ anomaly detection (Russo et al. 2021). Exhaustive investigation throughout these issues enables the development of ML algo. as a tool for the decision-making process in the river basin, with an ultimate goal to improve the river's health and maintain wildlife.
WATER QUALITY
The term ‘water quality’ can often be defined in terms of the chemical, physical, and biological water indicators (Antanasijević et al. 2013). Different parameters could affect and serve as indicators of WQ, which could be expressed through the water quality index (WQI) and water quality classes (WQCs). Among the most frequently monitored parameters are chemical (DO, COD, BOD, total dissolved solids (TDS), nitrates , pH, etc.), physical (water temperature (WT), turbidity, electrical conductivity (EC), solids, etc.), and biological (chlorophyll-a). Precise definitions of several of them and their contribution to WQ are previously published in papers written by Khullar & Singh (2022) and Syeed et al. (2023).
Water quality index and water quality classes
Based on the WQI, WQC for each water body could be established. An example of such classification is provided in Table 1 (Ahmed et al. 2019a, 2019b).
WQI rate . | Classification . |
---|---|
0–25 | Very bad |
25–50 | Bad |
50–70 | Medium |
70–90 | Good |
90–100 | Excellent |
WQI rate . | Classification . |
---|---|
0–25 | Very bad |
25–50 | Bad |
50–70 | Medium |
70–90 | Good |
90–100 | Excellent |
ML ALGORITHMS IN ENVIRONMENTAL ENGINEERING AND WATER QUALITY MANAGEMENT
ML has a wide appl. in environmental science (ES) and engineering (EE), thanks to its high precision, flexible customization, and ability in solving complex data patterns (Maganathan et al. 2020). According to Zhong et al. (2021) from 1990 to 2020, 5,855 publications were generated, as a result of ML appl. in EE in fields of water (47.63%), air (27.32%), soil (21.02), and sediment (4.02%). Four general appl. of ML in the field of ES and EE are provided by Zhu et al. (2022), and they are (1) making predictions, (2) identifying feature importance, (3) detecting anomalies, and (4) discovering new materials or chemicals. Appl. of (1) and (3) techniques are mostly reflected in supervised (regression or classification) learning (SupVL), but also through unsupervised (clustering) learning (unSupVL), to a lesser extent. Techniques within (2) and (4) can be implemented through SupVL, using for example linear discriminant analysis (LDA), a classification technique for (2). SupVL is dominantly applied for EE issues, such as the prediction of particulate matter (PM2.5), water resource availability, and modeling of biochemical wastewater treatment systems (Zhong et al. 2021). Over the past few decades, different ML models have been developed to solve various water engineering management problems (Syeed et al. 2023). Surface WQ profiling is one of the high priorities especially in developing countries. According to Zhu et al. (2022), ML algo. applied in WQ evaluation of surface river waters are bootstrapped wavelet neural network (BWNN), ANN, autoregressive integrated moving average (ARIMA), bootstrapped artificial neural network (BANN), long short-term memory (LSTM), Nash–Sutcliffe efficiency (NSE), polynomial neural network (PNN), cascade correlation neural network (CCNN), Tsinghua/Temporary DeepSpeed (TDS), deep neural network (DNN), support vector regression (SVR), RF, SVM, and convolutional neural network (CNN). Significance of ML appl. in the river research is evident in the number of publications from 2000 to 2020, which increased from 310 (in 2000) to 3,444 (in 2020). Until the 2000s, SupVL appl. was dominant, but after 2000s, it gradually equalized with unSupVL. Trend analysis also showed that unSupVL and SupVL dominated the field of river research (1990–2020), while NN and DL have gained more attention in this field, featuring in 15–21% of the total publications over the last two decades (Ho & Goethals 2022). Frequently used ML models for WQ classification, WQ estimation and prediction, and anomaly detection as parts of river WQ management, are tree-structured algo. DT and RF, SVM and ANN, and LSTM as a type of ANN.
Tree-structured algorithms
Decision nodes are used to make any decision and have multiple branches, whereas the leaf nodes are the output of those decisions, and do not contain any further branches. Each node represents features in a category to be classified and each subset defines a value that can be taken by the node (Abuzir & Abuzir 2022).
Support vector machine algorithms
Long short-term memory (LSTM)
RESULTS AND DISCUSSION: APPLICATION OF ML ALGORITHMS IN RIVER WATER QUALITY MANAGEMENT
Water quality classification
Anthropological activities from urban and rural areas are the most common causes of deteriorated water quality (Nazir et al. 2016); hence, WQ assessment and WQI estimation are vital for preserving human and environmental health (Wang et al. 2017; Zhu et al. 2022; Syeed et al. 2023). WQ parameters that were mostly used in selected articles are DO, BOD, nitrate , pH, EC (Bui et al. 2020; Sillberg et al. 2021; Al-Adhaileh & Alsaade 2021; Hassan et al. 2021), and in lesser extent COD, total solids (TS), phosphate (Bui et al. 2020), turbidity (Bui et al. 2020; Sillberg et al. 2021; Abuzir & Abuzir 2022), fecal coliform (FC) (Bui et al. 2020; Al-Adhaileh & Alsaade 2021), total coliform (TC) (Hassan et al. 2021), total coliform bacteria (TCB), salinity, TDS, suspended solids (SS) (Sillberg et al. 2021), total organic carbon (TOC) (Abuzir & Abuzir 2022), and ammonia nitrogen (AN) (Shamsuddin et al. 2022). Table 2 contains applied ML algo. regarding this assessment within reviewed papers.
Domain of application . | Type of algorithm . | References . |
---|---|---|
Water quality classification (determination of water quality index (WQI) and water classes) | Neural network (NN) (artificial neural network (ANN), feedforward neural network (FFNN)) | Hassan et al. (2021), Al-Adhaileh & Alsaade (2021), and Shamsuddin et al. (2022) |
Random forest (RF) | Bui et al. (2020) and Hassan et al. (2021) | |
Multinomial logistic regression (MLR) | Hassan et al. (2021) | |
Support vector machine (SVM) | Hassan et al. (2021) and Shamsuddin et al. (2022) | |
Bagged tree model (BTM) | Hassan et al. (2021) | |
Decision tree (DT) (M5P) | Bui et al. (2020) and Shamsuddin et al. (2022) | |
K-nearest neighbor (KNN) | Al-Adhaileh & Alsaade (2021) | |
Random tree (RT) | Bui et al. (2020) | |
Reduced error pruning tree (REPT) | Bui et al. (2020) | |
Hybrid models: 12 hybrid algo. as combinations of standalones with bagging (BA), CV parameter selection (CVPS), and randomizable filtered classification (RFC), attribute-realization (AR) and SVM (AR-SVM), adaptive neuro-fuzzy inference system (ANFIS) | Bui et al. (2020), Sillberg et al. (2021), and Al-Adhaileh & Alsaade (2021) |
Domain of application . | Type of algorithm . | References . |
---|---|---|
Water quality classification (determination of water quality index (WQI) and water classes) | Neural network (NN) (artificial neural network (ANN), feedforward neural network (FFNN)) | Hassan et al. (2021), Al-Adhaileh & Alsaade (2021), and Shamsuddin et al. (2022) |
Random forest (RF) | Bui et al. (2020) and Hassan et al. (2021) | |
Multinomial logistic regression (MLR) | Hassan et al. (2021) | |
Support vector machine (SVM) | Hassan et al. (2021) and Shamsuddin et al. (2022) | |
Bagged tree model (BTM) | Hassan et al. (2021) | |
Decision tree (DT) (M5P) | Bui et al. (2020) and Shamsuddin et al. (2022) | |
K-nearest neighbor (KNN) | Al-Adhaileh & Alsaade (2021) | |
Random tree (RT) | Bui et al. (2020) | |
Reduced error pruning tree (REPT) | Bui et al. (2020) | |
Hybrid models: 12 hybrid algo. as combinations of standalones with bagging (BA), CV parameter selection (CVPS), and randomizable filtered classification (RFC), attribute-realization (AR) and SVM (AR-SVM), adaptive neuro-fuzzy inference system (ANFIS) | Bui et al. (2020), Sillberg et al. (2021), and Al-Adhaileh & Alsaade (2021) |
Hassan et al. (2021) utilized ML algo. from Table 2 and developed a software appl. that used MLR to predict WQ in India in real-time for three classes (good, poor, and unsuitable for drinking). RF model was used to handle missing data. Performance for MLR, RF, BT, NN, and SVM classification models was 99.83, 98.99, 98.99, 98.65, and 96.98%. The highest variable importance obtained by , pH, EC, DO, TC, and BOD were with NN (19.67), BT (36.805), BT (81.494), BT (147.558), BT (105.166), and BT (130.173). Similar to the previous article, Shamsuddin et al. 2022 utilized ANN, DT, and SVM for multiclass classification of river WQ of Langat River Basin, but showed the opposite results in the best classification model. The efficiency of ANN handling big datasets and predicting WQI was overcome by SVM, whose applicability to small datasets was surpassed and improved by the kernel function. The most numerous WQC was III and II defined as water supply/fisheries. Preferences for ANN, DT, and SVM utilization were the ability to model nonlinear and complex relationships between input and output variables, easy and widely used classification techniques, and modeling of nonlinear relationships between input variables. So all three models achieved more than 85% performance, with macro accuracies and precision values of 96.35%, 91.97% for SVM, 95.62%, 92.06% for ANN, and 94.71%, 89.22% for DT. Sillberg et al. (2021) applied an integrated approach AR-SVM, along with 11 water quality parameters (WQPs) to classify Chao Rivers WQ. Linear regression proved to be the most suitable function for WQ classification, with six QPs, and accuracy, and precision of 0.94, 0.84. Water classes variated, however poor WQ class III prevailed. The main WQP in WQC and their confidence values were NH3-N (0.80), TCB (0.79), FCB (0.78), BOD (0.76), DO (0.69), and Sal (0.64). A smaller number of significant variables aided the AR-SVM model by minimizing some limitations. AR-SVM had the same results for 15 of the 16 datasets (93.75%) approving good correspondence with traditional WQI calculation. With applied three to six WQPs, the AR-SVM model showed a potent approach in classifying river WQ with an accuracy of 0.86–0.95. Al-Adhaileh & Alsaade (2021) for WQI prediction utilized an ANFIS and KNN, FFNN for WQC of different water bodies across India. WQI determined by ANFIS showed high efficiency and accuracy and a regression coefficient of 96.17%. FFNN model showed superior robustness in classifying the WQC with high accuracy and precision of 100 and 99.961%, while KNN had 80.63 and 82.50%. ANFIS and FFNN as ANN, compared to Hassan et al. (2021) and Shamsuddin et al. (2022), showed the best performances for the prediction of WQI. ANFIS confirmed its ability to monitor drinking and contaminated water with high accuracy. Determined WQ was classified as poor, hence the proposed method has been defined as helpful in water treatment and management. The determination of monthly WQI of Iran River by Bui et al. (2020), implied the appl. of four standalones and 12 hybrid data-mining models, presented in Table 2. Main WQPs in WQC were FC and TS. Different WQP combinations provided different levels of model performances for each one of them. The rank of the algo. based on the prediction power (best to worst) was BA-RT, BA-RF, BA-M5P, CVPS-RF, RF, RFC-RF, BA-REPT, M5P, CVPS-M5P, RFC-REPT, RFC-M5P, REPT RT, RFC-RT, CVPS-RT, and CVPS-REPT. Among 16 validated algo. all models performed well, but BA-RT have the highest power (R2 = 0.941), while CVPS-REPT had the lowest (R2 = 0.853) in predicting WQI. Hybrid tree-based models (especially the bagging algo.) were more robust and flexible than standalone models. Among standalone, RF outperformed M5P, REPT, and RT. Nearly all algo. overestimated WQI values, but RT, BA-RT, and CVPS-REPT did not. Even though hybrid BA-RT outperformed the other models, it did not predict extreme WQI accurately.
Water quality prediction
Monitoring and prediction of WQ are very important as they can enhance water management, including WQ preparation and regulation, higher quality and development of irrigation strategy, the efficiency of aquaculture, and improved drinking water preparation and strategies for prevention of water contamination (Al-Adhaileh & Alsaade 2021; Khullar & Singh 2022). Table 3 summarizes several ML algo. which can be used for prediction purposes. ANN, DNN, and SVM models were more frequently used in comparison to other models. It might be due to the advantages they offer (Krishnan et al. 2022). Among the most commonly used WQ parameters for surface, WQ prediction are DO, WT, pH, SS, nitrates (NOx), TDS, EC, turbidity, BOD, and COD. However, other parameters were occasionally used such as FC, chlorides, sulfates, and organic and inorganic pollutants were also occasionally used, depending on the availability of data, type, and locality of the river (Syeed et al. 2023). The mentioned indicators of WQ are interconnected since one parameter can affect the value of another. Hence, it is important to evaluate each parameter's significance and their mutual correlations (Zhu et al. 2022). For instance, many authors have recognized that DO is one of the most globally concerned WQ indicators that had strong correlations with certain input parameters such as pH value, temperature, and NOx concentration (Zhu et al. 2022). It consequently can affect the prediction accuracy of the applied ML model, and therefore, those correlations should be thoroughly investigated. Despite the prediction of concentration/level of certain parameters, or general WQ based on several parameters, other predictions, such as monthly runoff prediction (Samantaray et al. 2022) or water level prediction could be applied (Baek et al. 2020). A common approach for modeling future water status is to collect a large number of data from previously published articles or public services/monitoring stations and generate databases which could serve as input for ML model development for certain purposes. However, sometimes there is a problem of missing data. That problem caused by incomplete data could be overcome by adding another type of data such as hydrological data (which is more often available) to model development (Zhi et al. 2021) or using additional data post-processing tools such as Multivariate Bayesian Uncertainty Processor (MBUP) (Zhou 2020). Developing more than just one model is beneficial as it allows comparison between utilized models for investigated purposes and choosing the most appropriate (highest accuracy, lowest error, etc.). Specific problems could occur when predicting concentrations of certain components of agricultural drainage river basins, because of the existence of self-purification mechanisms and nonpoint source transport of pollutants. ML algo., namely ANN and SVM, succeeded to predict total nitrogen and total phosphorus concentrations in rivers in China. However, SVM showed better generalization as it avoided the occurrence of overtraining and optimized fewer parameters based on the structural risk minimization principle. To optimize the parameters of models, genetic algo., trial-and-error analysis were used (Liu & Lu 2014). Facing different limitations, single models could be outperformed by different hybrid models. For instance, although some authors report good prediction ability of deep learning LSTM model (Liu et al. 2019), some authors such as Khullar & Singh (2022) reported that single CNN and LSTM models could be often characterized as highly complex and with low prediction accuracy, which could be overcome by improved Bi-LSTM model which proved the ability to be adapted for various WQ samples from different sources (Khullar & Singh 2022). Hybrid models also proved efficient for important short-term WQ prediction, such as in the case when an advanced data denoising technique – complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) was integrated with extreme gradient boosting and RF to predict six WQ indicators (Lu & Ma 2020).
Domain of application . | Type of algorithm . | References . |
---|---|---|
Water quality prediction and estimation | ANN and their variations (backpropagation neural network (BPNN), general regression neural network (GRNN), recurrent neural network (RNN), deep neural networks (DNN), their variations (convolutional neural network (CNN), long short-term memory (LSTM), and combinations (CNN-LSTM)) | Antanasijević et al. (2013), Liu & Lu (2014), Haghiabi et al. (2018), Liu et al. (2019), Baek et al. (2020), Bilali & Taleb (2020), and Khullar & Singh (2022) |
Group method of data handling (GMDH) | Haghiabi et al. (2018) | |
SVM | Liu & Lu (2014) and Haghiabi et al. (2018) | |
Extra tree regression (ETR) | Asadollah et al. (2021) | |
Support vector regression (SVR) | Bilali & Taleb (2020), Asadollah et al. (2021), and Khullar & Singh (2022) | |
Decision tree regression (DTR) | Asadollah et al. (2021) | |
Decision tree (DT)-based hybrid models: CEEMDAN-RF and CEEMDAN-XGBoost | Lu & Ma (2020) | |
DNN-based hybrid models: Bi-LSTM model (DLBL-WQA) | Khullar & Singh (2022) |
Domain of application . | Type of algorithm . | References . |
---|---|---|
Water quality prediction and estimation | ANN and their variations (backpropagation neural network (BPNN), general regression neural network (GRNN), recurrent neural network (RNN), deep neural networks (DNN), their variations (convolutional neural network (CNN), long short-term memory (LSTM), and combinations (CNN-LSTM)) | Antanasijević et al. (2013), Liu & Lu (2014), Haghiabi et al. (2018), Liu et al. (2019), Baek et al. (2020), Bilali & Taleb (2020), and Khullar & Singh (2022) |
Group method of data handling (GMDH) | Haghiabi et al. (2018) | |
SVM | Liu & Lu (2014) and Haghiabi et al. (2018) | |
Extra tree regression (ETR) | Asadollah et al. (2021) | |
Support vector regression (SVR) | Bilali & Taleb (2020), Asadollah et al. (2021), and Khullar & Singh (2022) | |
Decision tree regression (DTR) | Asadollah et al. (2021) | |
Decision tree (DT)-based hybrid models: CEEMDAN-RF and CEEMDAN-XGBoost | Lu & Ma (2020) | |
DNN-based hybrid models: Bi-LSTM model (DLBL-WQA) | Khullar & Singh (2022) |
Anomaly detection
The process of identifying unexpected problems in water supply data, such as missing values, unusual patterns, or inconsistent specifications, is called anomaly detection. Anomaly detection is done by applying ML models that may or may not require model calibration against a labeled dataset, like the SupVL ML model in the first and the unSupVL ML model in the second case. Given that the SupVL model requires large datasets, the unSupVL models can be used as the alternative (Russo et al. 2021). In Table 4, a few ML algo. used for anomaly detection within reviewed articles are presented.
Domain of application . | Type of algorithm . | References . |
---|---|---|
Water quality anomaly detection | Logistic regression | Muharemi et al. (2019) |
SVM | Muharemi et al. (2019) | |
LSTM | Muharemi et al. (2019) and Miau & Hung (2020) | |
ANN | Muharemi et al. (2019) and Miau & Hung (2020) | |
DNN | Muharemi et al. (2019) | |
RNN | Muharemi et al. (2019) | |
LDA (linear discriminant analysis) | Muharemi et al. (2019) | |
CNN with an extreme learning machine (ELM) (CNN-ELM) | Miau & Hung (2020) | |
Sec2sec | Miau & Hung (2020) | |
Conv-GRU (CNN and GRU model) | Miau & Hung (2020) | |
BAR (Bayesian autoregressive) model and IF (Isolation forest) algo. | Liu et al. (2020) |
Domain of application . | Type of algorithm . | References . |
---|---|---|
Water quality anomaly detection | Logistic regression | Muharemi et al. (2019) |
SVM | Muharemi et al. (2019) | |
LSTM | Muharemi et al. (2019) and Miau & Hung (2020) | |
ANN | Muharemi et al. (2019) and Miau & Hung (2020) | |
DNN | Muharemi et al. (2019) | |
RNN | Muharemi et al. (2019) | |
LDA (linear discriminant analysis) | Muharemi et al. (2019) | |
CNN with an extreme learning machine (ELM) (CNN-ELM) | Miau & Hung (2020) | |
Sec2sec | Miau & Hung (2020) | |
Conv-GRU (CNN and GRU model) | Miau & Hung (2020) | |
BAR (Bayesian autoregressive) model and IF (Isolation forest) algo. | Liu et al. (2020) |
Muharemi et al. (2019), in their article, attended to check whether ML models give more accurate results than logistic regression and which model performs best for WQ data. To identify anomalies in WQ data authors applied ML algo. SVM, ANN, DNN, RNN, LSTM, and LDA and all were compared to the logistic regression algo. used for data classification. Experiment results were ranked based on the F1 value (as a measure of accuracy) and the SVM model performed the best with 0.9891, followed by DNN (0.9485), LSTM (0.9023), RNN (0.8345), logistic regression (0.6027), ANN (0.5768), and LDA (0.0820). All models show vulnerability in the case of unbalanced datasets giving worse results, except for SVM, logistic regression, and ANN which were less vulnerable. This is pointing out the clear laxity of the ML model when using unbalanced WQ data. But the results also revealed logistic regression's ability to explain the relationship between one dependent variable and one or more independent variables and SVM's ability to accurately predict data series in the case of nonlinear and nonstationary underlying systems. NN algo. can stimulate the structure of the human brain and include a network of many interconnected neurons (ANN), they are effective and reliable models with many hidden layers (DNN) and can use previous time series information and use recurrent loops where the output state is fed back to the input state of the cell (RNN). Considering LSTM, it has long-term benefits for making accurate predictions, and learning useful information while forgetting useless information, while LDA gives excellent results for independent measurements, but classical recognition techniques pose a problem for this model. Research conducted by Miau & Hung (2020), focused on the comparison of ANN, CNN, LSTM, Sec2sec, and Conv-GRU models while dealing with water level prediction in the Danshui River Basin in Taiwan. Achieved performance expressed through Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) was as follows: Conv-GRU (RMSE – 0.774, MAE – 0.567, MAPE – 30.684), LSTM (RMSE – 1.032, MAE – 0.620, MAPE – 31.035), and CNN (RMSE – 1.144, MAE – 0.745, MAPE – 37.154). The error between actual and predicted values by the Conv-GRU model was minimal and it had the best results when predicting the river level, LSTM and CNN had slightly higher errors when predicting the river level, but smaller than ANN and Sec2sec. CNN achieved good predicting results since it could pick out local trends and observe the same patterns repeating themselves in different places. Only integrated, the CNN and GRU model outperformed the other four models in prediction performances, by being a time series modeler, which provides an early indication of anomalous behavior. Sec2sec provided sequence-by-sequence forecasting, based on multi-step time series forecasting, and LSTM and ANN confirmed its abilities as mentioned in Muharemi et al. (2019). Anomaly detection performed by Liu et al. (2019) implied appl. of Potomac River in West Virginia, USA data, by integrating the BAR model and the IF algo. The evaluation index was represented by error indicators RMSE, MAE, and MSE (Mean Square Error), while turbidity (TURB), specific conductivity (SC), and DO were used as quality parameters. Error indicator values were RMSE (TURB – 0.1694, SC – 0.0831, DO – 0.0332), MAE (TURB – 0.1086, SC – 0.0453, DO – 0.0282), and MSE (TURB – 0.0287, SC – 0.0069, DO – 0.0011). Both models showed excellent results in anomaly detection. The developed integration model showed accuracy in the detection of water quality anomalies and revealed the ability to provide effective early warning for emergency operations.
FUTURE PERSPECTIVE
The application of ML in surface/river water management has many opportunities. However, all the opportunities face different challenges. Although many authors already recognized the importance of comparative analysis and included several ML models to determine the most suitable one, it is important to highlight that prediction accuracy depends also on input parameters. Hence, careful selection of available WQPs is of key importance. Several ML algo. struggle with the generalization aspect, which is important for real applications within different areas. The inclusion of other variables (hydrological, morphological, geological, etc.) in model development and assessment of the model presented in one study should be considered for other rivers with diverse climates and hydrology (Asadollah et al. 2021). Quantification of the uncertainty of the regression model caused by missing input data is highly challenging. To compensate for missing data concerning WQ parameters, hydrometeorology data could be used (Zhi et al. 2021). In order to achieve higher prediction accuracies, future studies should be strategically planned. Besides the choice of ML algo., their comparison and parameter selection, include covering critical sampling points and sampling periods, when higher oscillations of input parameter concentrations are expected (Zhi et al. 2021). The utilization of hybrid ML models has, generally, been an attractive solution as they can overcome the limitations of single models and achieve higher performance and accuracy than single ML models (Khullar & Singh 2021). This trend has been observed for all three application domains, to a certain extent. Accordingly, researchers should consider this aspect in future studies.
CONCLUSION
ML has relatively recently found its purpose within EE including river water management. Various ML algo. have proved their applicability for monitoring, classifying, and predicting river WQ and detecting anomalies. Among the most common algo. that proved its efficiency within the mentioned research were DT, ANN, DNN, and SVM and DNN-based algo. for classification, prediction, and anomaly detection purposes, respectively. Limitations of single models are found to be overcome by a hybrid approach. The application of AI for the mentioned purposes is beneficial from economic, ecological, and strategic aspects. However, full potential and real appl. of these systems is yet to be investigated and implemented.
ACKNOWLEDGEMENTS
This research was supported by the Science Fund of the Republic of Serbia, grant number 6707, REmote WAter quality monitoRing anD IntelliGence – REWARDING and by the Ministry of Science, European Union's Horizon Europe Marie Sklodowska-Curie Actions (MSCA) under grant agreement project number 101086387 – REMARKABLE and Technological Development and Innovation through project no. 451-03-47/2023-01/200156 ‘Innovative scientific and artistic research from the FTS (activity) domain’.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories (https://scholar.google.com).
CONFLICT OF INTEREST
The authors declare there is no conflict.