Abstract
Improving the performance of machine learning (ML) algorithms is essential for accurately estimating water quality parameters (WQPs). For the first time, a novel hybrid framework, namely the adaptive neural fuzzy inference system–discrete wavelet transform–gradient-based optimization (ANFIS–DWT–GBO), for estimation of electrical conductivity (EC) and total dissolved solids (TDS), is used. Before estimating WQPs, the performance of the ANFIS–DWT–GBO is proven by several benchmark data sets. In addition, three benchmark algorithms, including ANFIS, ANFIS–DWT, and ANFIS–GBO, are used to demonstrate the strength of the novel framework. The principal component analysis (PCA) method determines the best input combination in EC and TDS estimation. The consequences show that the ANFIS–DWT–GBO produces very successful and competitive results in benchmark data sets modeling and WQPs estimation compared to other algorithms. This result is due to the simultaneous use of DWT and optimization algorithm in the proposed framework. DWT can process WQP data before applying it to the algorithms. The GBO is utilized to optimize the hyperfine parameters in the ANFIS. The results show that the highest accuracy of estimating EC and TDS is in Mollasani and Gotvand stations, respectively. The correlation coefficient (R) value in the Mollasani station is 0.99, and in the Gotvand station it is 0.98.
HIGHLIGHTS
Introducing a novel hybrid framework based on the ANFIS, discrete wavelet transform, and optimization algorithm, namely the ANFIS–DWT–GBO.
Proving the performance of the ANFIS–DWT–GBO by using several benchmark data sets.
Employment of the ANFIS–DWT–GBO for estimating water quality parameters (WQPs).
The proposed framework has the potential to analyze other engineering problems.
Graphical Abstract
INTRODUCTION
Rivers are among the most important water resources whose quality is very important (Kumar et al. 2020). Recently, the pollution of rivers has increased for various reasons, which can cause many problems for humans and the environment. Therefore, river water pollution is a vital issue (Antanasijević et al. 2020), and the accurate assessment of water quality parameters (WQPs) is an important challenge. One method of estimating WQPs is to perform multiple tests and sampling, which is associated with many errors. The need for basic and boundary conditions and high costs are other disadvantages of physical and mathematical methods (Imani et al. 2021). In recent years, the development of machine learning (ML) algorithms has solved the problems of traditional methods (Dawood et al. 2021). High accuracy and speed and no need for basic and boundary conditions are the advantages of ML algorithms (Shah et al. 2021). One of the best ways to investigate river pollution problems is to model and analyze them using artificial intelligence (Abba et al. 2020).
Researchers in recent years have used several classical algorithms to estimate and predict WQPs (Olyaie et al. 2017). Bilali et al. (2021) employed different ML algorithms for WQPs prediction. This study showed that the stochastic gradient descent (SGD) and artificial neural network (ANN) had better accuracy than other algorithms. Applying random forest (RF), support vector regression (SVR), and neural networks (NNs) for WQPs estimation (Guo et al. 2020) showed that the NN had better applications. The deep learning methods used to predict water quality are highly accurate (Khullar & Singh 2022). The application of the long short-term memory (LSTM), multi-linear regression (MLR), and ANN showed that ANN and MLR had better outcomes (Kouadri et al. 2022).
In recent years, researchers have sought to increase the accuracy of ML algorithms (Morshed-Bozorgdel et al. 2022). One way to improve the performance of simulation algorithms is to use optimization algorithms to determine the important parameters and produce new hybrid algorithms (Farzin et al. 2022). With the advent of new optimization algorithms, the production of a new hybrid algorithm can increase the accuracy of algorithms and be used to model and estimate various parameters. Recently, new optimization algorithms have been introduced to find the best solutions to various problems (Abualigah et al. 2021). Since important fine-tuning parameters of simulation algorithms require trial and error, new optimization algorithms can solve this problem well.
The shuffled frog leaping algorithm (SFLA) and genetic algorithm (GA) were used to train SVR to model and predict WQPs (Mahmoudi et al. 2016). Kadkhodazadeh & Farzin (2021) used the gradient-based optimization (GBO) algorithm for least-squares support vector machine (LSSVM) training. The results showed that the LSSVM–GBO performs better than classical algorithms. Song et al. (2021a, 2021b) developed a new hybrid algorithm called synchrosqueezed wavelet transform (WT)–improved sparrow search algorithm (ISSA)–LSTM to predict water quality. The results showed that the new hybrid algorithm has high accuracy. Hybrid algorithms based on one-dimensional residual convolutional neural networks (1-DRCNNs) and bi-directional gated recurrent unit (BiGRU) algorithms were introduced which had better performance than previous algorithms (Yan et al. 2021). In another study, Banadkooki et al. (2020) used new hybrid ML algorithms to predict the total dissolved solids (TDS). Tizro et al. (2021) applied ANN, adaptive neuro-fuzzy inference system (ANFIS), and ANFIS-subtractive clustering in the estimated TDS of the Zayandehrood river. Wu et al. (2022) predicted dissolved oxygen (DO) by combining extreme gradient boosting (XGBoost), ISSA, and LSTM algorithms. Kadkhodazadeh & Farzin (2022) introduced a new hybrid algorithm integrating arithmetic optimization algorithm (AOA) and LSSVM and developed its application in estimating WQPs. Results of the technique for order of preference by similarity to ideal solution (TOPSIS) method showed that LSSVM–AOA has a better performance than classical and hybrid algorithms. In addition, the results of the Monte Carlo method (MCM) showed that the LSSVM–AOA has less uncertainty.
Another way to increase the accuracy of ML algorithms is to use WT in the input data processing. WT prevents data overlap and the effect of noise in modeling ML algorithms. WT is used for signal processing and time-series data. Recently, WT methods have been widely used due to their simple implementation and noise cancellation (Alizadeh et al. 2017). This method decomposes disturbing signals and eliminates noisy signals while preserving the important features of time-series data (Yu et al. 2020). WT has been widely used in estimating WQPs and studies related to water resources management and hydrology (Parmar et al. 2019; Bayatvarkeshi et al. 2020; Jamei et al. 2020).
To further improve the accuracy of estimating WQPs, we propose a novel hybrid framework. According to a review of studies, there is no study on estimating WQPs using the ANFIS, discrete wavelet transform (DWT), and GBO. Simultaneous use of the DWT and GBO in improving the accuracy of simulation algorithms can lead to significant results. The DWT and GBO can increase estimation accuracy by analyzing WQPs and adjusting ANFIS parameters. Therefore, the present study used the ANFIS–DWT–GBO to estimate EC and TDS in the Karun river. First, to prove the better performance of the ANFIS–DWT–GBO, its performance is evaluated with three benchmark data sets. The ANFIS–DWT–GBO results are compared with the ANFIS, ANFIS–DWT, and ANFIS–GBO. The principal component analysis (PCA) method is then used to determine the important EC and TDS estimation inputs. In the next step, based on the best input combination, the ability of the ANFIS, ANFIS–DWT, ANFIS–GBO, and ANFIS–DWT–GBO to estimate EC and TDS will be evaluated. Finally, a superior hybrid framework is proposed to improve the performance of the ANFIS.
MATERIALS AND METHODS
Adaptive neuro-fuzzy inference system
Discrete wavelet transform

Gradient-based optimization
The GBO algorithm, which mimics population-based and gradient methods, was first introduced by Ahmadianfar et al. (2020). GBO uses Newton's approach to achieve better positions in the search space and uses the two main operators, gradient search rule (GSR) and local escaping operator (LEO). The structure of GBO is summarized in the following sections. For more details, see Ahmadianfar et al. (2020).
Gradient search rule






Local escaping operator







Novel hybrid framework (ANFIS–DWT–GBO)
Input data are processed using DWT and divided into training (70%) and testing (30%) data.
The initial parameters of the GBO and ANFIS (mean and standard deviation (SD) of Gaussian MFs) as decision variables are randomly determined.
The GBO is used to train the ANFIS.
In this phase, the stop criterion is checked. The solution has been obtained if the stop criterion is satisfied; otherwise, the simulation process repeats.
Principal component analysis

Benchmark data set
To fully evaluate and validate the performance of the ANFIS–DWT–GBO, several benchmark data sets, including Housing, LSVT, and Servo, were used. The features of the benchmark data sets are listed in Table 1. Benchmark data sets are a good tool for evaluating the accuracy of new algorithms and comparing them with other algorithms (Henríquez & Ruz 2017). Each benchmark data set was obtained from several experiments or studies. For more information about benchmark data sets, please see https://archive.ics.uci.edu/ml/datasets.php.
Details of benchmark data sets
. | Housing . | LSVT . | Servo . |
---|---|---|---|
Train data | 354 | 88 | 117 |
Test data | 152 | 38 | 50 |
Attributes | 13 | 309 | 4 |
Attribute characteristics | Real | Real | Categorical, Integer |
Data set characteristics | Multivariate | Multivariate | Multivariate |
. | Housing . | LSVT . | Servo . |
---|---|---|---|
Train data | 354 | 88 | 117 |
Test data | 152 | 38 | 50 |
Attributes | 13 | 309 | 4 |
Attribute characteristics | Real | Real | Categorical, Integer |
Data set characteristics | Multivariate | Multivariate | Multivariate |
Evaluation criteria



Case study and data sources



Statistical specifications of input and output data
. | . | Input . | Output . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | Q . | ![]() | ![]() | SAR . | ![]() | Sum.A . | Sum.C . | Mg2+ . | Na+ . | ![]() | pH . | EC . | TDS . |
. | Unit . | m3/s . | Mg/l . | Mg/l . | – . | Mg/l . | Mg/l . | Mg/l . | Mg/l . | Mg/l . | Mg/l . | – . | μ mho/cm . | Mg/l . |
Armand | CS | 2.07 | 0.28 | 0.13 | 1.65 | 4.56 | 0.27 | 1.73 | 1.40 | 2.77 | 1.15 | −5.51 | 0.21 | 0.20 |
CV | 0.66 | 0.33 | 0.32 | 0.05 | 0.39 | 0.30 | 0.09 | 0.33 | 1.04 | 0.27 | 0.40 | 0.22 | 0.22 | |
SD | 147.77 | 0.76 | 0.53 | 0.11 | 1.35 | 1.63 | 0.47 | 0.45 | 15.16 | 0.28 | 3.22 | 119.96 | 78.38 | |
Average | 221.51 | 2.30 | 1.61 | 2.00 | 3.65 | 5.41 | 0.47 | 1.33 | 13.14 | 1.15 | 7.97 | 534.97 | 347.36 | |
Ahvaz | CS | 2.52 | 1.86 | 1.20 | 1.88 | −0.91 | 1.11 | 1.88 | 0.74 | 0.73 | 1.20 | −8.00 | 1.08 | 1.01 |
CV | 0.92 | 0.26 | 0.27 | 0.04 | 0.22 | 0.21 | 0.18 | 0.24 | 0.34 | 0.27 | 0.35 | 0.38 | 0.40 | |
SD | 691.85 | 1.31 | 1.95 | 0.33 | 0.61 | 2.67 | 2.44 | 0.47 | 2.84 | 1.05 | 2.83 | 556.33 | 364.78 | |
Average | 746.43 | 4.85 | 7.06 | 7.00 | 2.69 | 12.57 | 13.00 | 1.95 | 8.16 | 3.81 | 7.98 | 1,430.26 | 902.13 | |
Gotvand | CS | 3.43 | 0.90 | 1.22 | 1.68 | −0.23 | 0.69 | 0.45 | 5.19 | 1.55 | 1.67 | −8.66 | 1.66 | 1.55 |
CV | 0.94 | 0.26 | 0.29 | 0.06 | 0.25 | 0.18 | 0.16 | 0.25 | 0.30 | 0.24 | 0.39 | 0.37 | 0.38 | |
SD | 464.71 | 0.78 | 1.33 | 0.25 | 0.64 | 1.58 | 1.35 | 0.26 | 1.59 | 0.38 | 3.14 | 355.15 | 224.91 | |
Average | 493.65 | 2.93 | 4.61 | 6.54 | 2.56 | 8.43 | 7.98 | 1.13 | 5.15 | 1.61 | 7.98 | 941.53 | 586.76 | |
Mollasani | CS | 2.74 | 1.46 | 2.05 | 1.97 | −0.34 | 1.38 | 1.36 | 1.11 | 1.96 | 1.18 | −0.46 | 1.44 | 1.16 |
CV | 0.97 | 0.31 | 0.51 | 0.45 | 0.15 | 0.40 | 0.41 | 0.41 | 0.57 | 0.50 | 0.34 | 0.39 | 0.39 | |
SD | 648.65 | 1.35 | 2.86 | 0.68 | 0.43 | 2.40 | 2.50 | 0.93 | 3.91 | 1.91 | 2.27 | 529.90 | 336.98 | |
Average | 665.14 | 4.35 | 6.77 | 3.64 | 2.83 | 13.37 | 13.35 | 2.28 | 6.78 | 3.76 | 7.91 | 1,341.65 | 844.94 |
. | . | Input . | Output . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | Q . | ![]() | ![]() | SAR . | ![]() | Sum.A . | Sum.C . | Mg2+ . | Na+ . | ![]() | pH . | EC . | TDS . |
. | Unit . | m3/s . | Mg/l . | Mg/l . | – . | Mg/l . | Mg/l . | Mg/l . | Mg/l . | Mg/l . | Mg/l . | – . | μ mho/cm . | Mg/l . |
Armand | CS | 2.07 | 0.28 | 0.13 | 1.65 | 4.56 | 0.27 | 1.73 | 1.40 | 2.77 | 1.15 | −5.51 | 0.21 | 0.20 |
CV | 0.66 | 0.33 | 0.32 | 0.05 | 0.39 | 0.30 | 0.09 | 0.33 | 1.04 | 0.27 | 0.40 | 0.22 | 0.22 | |
SD | 147.77 | 0.76 | 0.53 | 0.11 | 1.35 | 1.63 | 0.47 | 0.45 | 15.16 | 0.28 | 3.22 | 119.96 | 78.38 | |
Average | 221.51 | 2.30 | 1.61 | 2.00 | 3.65 | 5.41 | 0.47 | 1.33 | 13.14 | 1.15 | 7.97 | 534.97 | 347.36 | |
Ahvaz | CS | 2.52 | 1.86 | 1.20 | 1.88 | −0.91 | 1.11 | 1.88 | 0.74 | 0.73 | 1.20 | −8.00 | 1.08 | 1.01 |
CV | 0.92 | 0.26 | 0.27 | 0.04 | 0.22 | 0.21 | 0.18 | 0.24 | 0.34 | 0.27 | 0.35 | 0.38 | 0.40 | |
SD | 691.85 | 1.31 | 1.95 | 0.33 | 0.61 | 2.67 | 2.44 | 0.47 | 2.84 | 1.05 | 2.83 | 556.33 | 364.78 | |
Average | 746.43 | 4.85 | 7.06 | 7.00 | 2.69 | 12.57 | 13.00 | 1.95 | 8.16 | 3.81 | 7.98 | 1,430.26 | 902.13 | |
Gotvand | CS | 3.43 | 0.90 | 1.22 | 1.68 | −0.23 | 0.69 | 0.45 | 5.19 | 1.55 | 1.67 | −8.66 | 1.66 | 1.55 |
CV | 0.94 | 0.26 | 0.29 | 0.06 | 0.25 | 0.18 | 0.16 | 0.25 | 0.30 | 0.24 | 0.39 | 0.37 | 0.38 | |
SD | 464.71 | 0.78 | 1.33 | 0.25 | 0.64 | 1.58 | 1.35 | 0.26 | 1.59 | 0.38 | 3.14 | 355.15 | 224.91 | |
Average | 493.65 | 2.93 | 4.61 | 6.54 | 2.56 | 8.43 | 7.98 | 1.13 | 5.15 | 1.61 | 7.98 | 941.53 | 586.76 | |
Mollasani | CS | 2.74 | 1.46 | 2.05 | 1.97 | −0.34 | 1.38 | 1.36 | 1.11 | 1.96 | 1.18 | −0.46 | 1.44 | 1.16 |
CV | 0.97 | 0.31 | 0.51 | 0.45 | 0.15 | 0.40 | 0.41 | 0.41 | 0.57 | 0.50 | 0.34 | 0.39 | 0.39 | |
SD | 648.65 | 1.35 | 2.86 | 0.68 | 0.43 | 2.40 | 2.50 | 0.93 | 3.91 | 1.91 | 2.27 | 529.90 | 336.98 | |
Average | 665.14 | 4.35 | 6.77 | 3.64 | 2.83 | 13.37 | 13.35 | 2.28 | 6.78 | 3.76 | 7.91 | 1,341.65 | 844.94 |
Correlation coefficient results between input and output parameters in different stations.
Correlation coefficient results between input and output parameters in different stations.
Steps to present a novel hybrid framework and WQPs estimation
This study offers a novel hybrid framework for estimating WQPs. The following steps are performed to prove the performance of the proposed framework and to estimate WQPs:
The first step consisted of two parts:
Part 1. Three benchmark data sets (Housing, LSVT, and Servo) evaluated the performance of ANFIS and ANFIS–GBO algorithms without data preprocessing.
Part 2. Benchmark data sets were preprocessed by DWT and then applied to ANFIS and ANFIS–GBO algorithms. This means that the ANFIS–DWT and ANFIS–DWT–GBO algorithms were tested in benchmark data sets.
The PCA method selected the best input combination of WQPs to the different algorithms in the four stations.
After determining the best input combination of WQPs, EC and TDS similar to the first step (part 1 and part 2) were estimated by the novel framework and other algorithms.
Flowchart to prove the performance of a novel hybrid framework and estimate WQPs.
Flowchart to prove the performance of a novel hybrid framework and estimate WQPs.
RESULTS
Benchmark data sets modeling
Evaluation criteria in benchmark data sets modeling
. | Train . | Test . | ||||
---|---|---|---|---|---|---|
MAE . | RRMSE . | R . | MAE . | RRMSE . | R . | |
Housing | ||||||
ANFIS | 1.61 | 0.24 | 0.96 | 10.07 | 2.35 | −0.06 |
ANFIS–DWT | 1.70 | 0.33 | 0.96 | 8.47 | 1.54 | 0.36 |
ANFIS–GBO | 3.37 | 0.00 | 0.84 | 4.17 | 0.36 | 0.63 |
ANFIS–DWT–GBO | 3.13 | 0.03 | 0.85 | 3.17 | 0.06 | 0.68 |
LSVT | ||||||
ANFIS | 0.04 | 0.00 | 1.00 | 1,754 | 11.25 | −0.46 |
ANFIS–DWT | 0.00 | 0.00 | 1.00 | 2,748 | 4.14 | 0.11 |
ANFIS–GBO | 304.68 | 0.01 | 0.96 | 260.14 | 0.03 | 0.94 |
ANFIS–DWT–GBO | 65.36 | 0.00 | 1.00 | 135.86 | 0.02 | 0.98 |
Servo | ||||||
ANFIS | 0.20 | 0.18 | 0.98 | 0.58 | 1.63 | 0.55 |
ANFIS–DWT | 0.01 | 0.01 | 0.99 | 1.41 | 1.30 | 0.56 |
ANFIS–GBO | 0.38 | 0.00 | 0.90 | 0.55 | 0.03 | 0.90 |
ANFIS–DWT–GBO | 0.32 | 0.00 | 0.94 | 0.42 | 0.02 | 0.95 |
. | Train . | Test . | ||||
---|---|---|---|---|---|---|
MAE . | RRMSE . | R . | MAE . | RRMSE . | R . | |
Housing | ||||||
ANFIS | 1.61 | 0.24 | 0.96 | 10.07 | 2.35 | −0.06 |
ANFIS–DWT | 1.70 | 0.33 | 0.96 | 8.47 | 1.54 | 0.36 |
ANFIS–GBO | 3.37 | 0.00 | 0.84 | 4.17 | 0.36 | 0.63 |
ANFIS–DWT–GBO | 3.13 | 0.03 | 0.85 | 3.17 | 0.06 | 0.68 |
LSVT | ||||||
ANFIS | 0.04 | 0.00 | 1.00 | 1,754 | 11.25 | −0.46 |
ANFIS–DWT | 0.00 | 0.00 | 1.00 | 2,748 | 4.14 | 0.11 |
ANFIS–GBO | 304.68 | 0.01 | 0.96 | 260.14 | 0.03 | 0.94 |
ANFIS–DWT–GBO | 65.36 | 0.00 | 1.00 | 135.86 | 0.02 | 0.98 |
Servo | ||||||
ANFIS | 0.20 | 0.18 | 0.98 | 0.58 | 1.63 | 0.55 |
ANFIS–DWT | 0.01 | 0.01 | 0.99 | 1.41 | 1.30 | 0.56 |
ANFIS–GBO | 0.38 | 0.00 | 0.90 | 0.55 | 0.03 | 0.90 |
ANFIS–DWT–GBO | 0.32 | 0.00 | 0.94 | 0.42 | 0.02 | 0.95 |
Comparison of the performance of algorithms in modeling benchmark data sets in the testing period.
Comparison of the performance of algorithms in modeling benchmark data sets in the testing period.
Comparison of the performance of the worst algorithm (ANFIS) and the best algorithm (ANFIS–DWT–GBO) in benchmark data sets modeling.
Comparison of the performance of the worst algorithm (ANFIS) and the best algorithm (ANFIS–DWT–GBO) in benchmark data sets modeling.
Input selection by PCA
Tables 4 and 5 show the results of the PCA method. The results indicated that in Armand station, the first three PCs included >76% of the input data variance, in which the first principal component was 37.35%, the second principal component was 26.20%, and the third principal component was 12.58% of the total variance. The first component included Ca2+ and SAR as the most important parameters. PC2 included Sum.A, Sum.C, Mg2+, and Na+ and the third component included Q,,
. In the Ahvaz station, the first component accounted for 63.92% of the total variance. This component included Ca2+, SAR, Sum.C, and Mg2+ as the most important parameters. The second component, which included
, Sum.A, and
, accounted for 11.65% of the total variance in both periods. The third component, which included Q,
, and pH, accounted for 10.87% of the total variance in both periods. In the Gotvand station, PC1 (54.67% of the variance) was contributed mainly by Ca2+, SAR, Sum.C. The second component (14.02% of the variance) included Sum.A, and
, and the third component (11.13% of the variance) included Q,
, and pH. Also, the results indicated that in Mollasani station, the first three PCs included 82.30% of the input data variance, in which the first principal component was 60.98%, the second principal component was 11.73%, and the third principal component was 9.59% of the total variance. The first component included Ca2+, SAR, Sum.C, and Mg2+ as the most important parameters. The second component included Sum.A, Na+, and
, and the third component included Q,
,
, and pH.
Loadings of 11 WQPs on the first three PCs
. | Armand . | Ahvaz . | Gotvand . | Mollasani . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PC1 . | PC2 . | PC3 . | PC1 . | PC2 . | PC3 . | PC1 . | PC2 . | PC3 . | PC1 . | PC2 . | PC3 . | |
Q | −0.23 | 0.35 | 0.48 | −0.18 | 0.34 | 0.38 | −0.10 | 0.35 | 0.41 | −0.15 | 0.35 | 0.38 |
Ca2+ | 0.74 | 0.11 | 0.04 | 0.57 | −0.19 | −0.01 | 0.70 | 0.23 | 0.01 | 0.45 | 0.06 | 0.01 |
![]() | 0.01 | −0.29 | −0.40 | −0.02 | −0.42 | −0.02 | −0.01 | −0.25 | 0.46 | 0.01 | −0.11 | −0.34 |
SAR | 0.54 | −0.16 | 0.10 | 0.42 | −0.07 | 0.04 | 0.52 | −0.33 | 0.02 | −0.35 | 0.18 | 0.00 |
![]() | 0.00 | −0.40 | 0.74 | −0.02 | −0.20 | 0.68 | 0.00 | −0.01 | −0.02 | −0.01 | −0.14 | 0.62 |
Sum.A | 0.05 | −0.43 | −0.16 | 0.02 | 0.43 | −0.05 | −0.02 | 0.31 | −0.14 | 0.03 | 0.35 | −0.08 |
Sum.C | 0.31 | −0.44 | 0.15 | 0.66 | −0.15 | 0.07 | −0.41 | 0.02 | −0.03 | −0.64 | 0.03 | −0.06 |
Mg2+ | −0.12 | −0.24 | −0.02 | 0.52 | 0.13 | 0.04 | 0.14 | −0.01 | 0.02 | 0.49 | 0.16 | 0.07 |
Na+ | 0.03 | −0.11 | −0.04 | −0.08 | 0.21 | 0.02 | −0.01 | −0.23 | 0.03 | −0.00 | 0.38 | 0.03 |
![]() | 0.03 | 0.03 | −0.09 | −0.02 | −0.44 | −0.02 | 0.01 | 0.71 | 0.00 | 0.02 | 0.71 | −0.17 |
pH | 0.00 | 0.01 | −0.01 | 0.00 | 0.00 | 0.62 | 0.00 | 0.02 | 0.77 | 0.00 | −0.09 | −0.56 |
. | Armand . | Ahvaz . | Gotvand . | Mollasani . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PC1 . | PC2 . | PC3 . | PC1 . | PC2 . | PC3 . | PC1 . | PC2 . | PC3 . | PC1 . | PC2 . | PC3 . | |
Q | −0.23 | 0.35 | 0.48 | −0.18 | 0.34 | 0.38 | −0.10 | 0.35 | 0.41 | −0.15 | 0.35 | 0.38 |
Ca2+ | 0.74 | 0.11 | 0.04 | 0.57 | −0.19 | −0.01 | 0.70 | 0.23 | 0.01 | 0.45 | 0.06 | 0.01 |
![]() | 0.01 | −0.29 | −0.40 | −0.02 | −0.42 | −0.02 | −0.01 | −0.25 | 0.46 | 0.01 | −0.11 | −0.34 |
SAR | 0.54 | −0.16 | 0.10 | 0.42 | −0.07 | 0.04 | 0.52 | −0.33 | 0.02 | −0.35 | 0.18 | 0.00 |
![]() | 0.00 | −0.40 | 0.74 | −0.02 | −0.20 | 0.68 | 0.00 | −0.01 | −0.02 | −0.01 | −0.14 | 0.62 |
Sum.A | 0.05 | −0.43 | −0.16 | 0.02 | 0.43 | −0.05 | −0.02 | 0.31 | −0.14 | 0.03 | 0.35 | −0.08 |
Sum.C | 0.31 | −0.44 | 0.15 | 0.66 | −0.15 | 0.07 | −0.41 | 0.02 | −0.03 | −0.64 | 0.03 | −0.06 |
Mg2+ | −0.12 | −0.24 | −0.02 | 0.52 | 0.13 | 0.04 | 0.14 | −0.01 | 0.02 | 0.49 | 0.16 | 0.07 |
Na+ | 0.03 | −0.11 | −0.04 | −0.08 | 0.21 | 0.02 | −0.01 | −0.23 | 0.03 | −0.00 | 0.38 | 0.03 |
![]() | 0.03 | 0.03 | −0.09 | −0.02 | −0.44 | −0.02 | 0.01 | 0.71 | 0.00 | 0.02 | 0.71 | −0.17 |
pH | 0.00 | 0.01 | −0.01 | 0.00 | 0.00 | 0.62 | 0.00 | 0.02 | 0.77 | 0.00 | −0.09 | −0.56 |
Eigenvalues and percentage of variance in different stations
. | . | Eigenvalue . | Variance (%) . | Cumulative eigenvalue . | Cumulative variance (%) . |
---|---|---|---|---|---|
Armand | PC1 | 4.10 | 37.35 | 4.10 | 37.35 |
PC2 | 2.88 | 26.20 | 6.98 | 63.55 | |
PC3 | 1.38 | 12.58 | 8.36 | 76.13 | |
Ahvaz | PC1 | 7.03 | 63.92 | 7.03 | 63.92 |
PC2 | 1.28 | 11.65 | 8.31 | 75.57 | |
PC3 | 1.19 | 10.87 | 9.50 | 86.44 | |
Gotvand | PC1 | 6.01 | 54.67 | 6.01 | 54.67 |
PC2 | 1.54 | 14.02 | 7.55 | 68.69 | |
PC3 | 1.22 | 11.13 | 8.77 | 79.82 | |
Mollasani | PC1 | 6.70 | 60.98 | 6.70 | 60.98 |
PC2 | 1.29 | 11.73 | 7.99 | 72.71 | |
PC3 | 1.05 | 9.59 | 9.04 | 82.30 |
. | . | Eigenvalue . | Variance (%) . | Cumulative eigenvalue . | Cumulative variance (%) . |
---|---|---|---|---|---|
Armand | PC1 | 4.10 | 37.35 | 4.10 | 37.35 |
PC2 | 2.88 | 26.20 | 6.98 | 63.55 | |
PC3 | 1.38 | 12.58 | 8.36 | 76.13 | |
Ahvaz | PC1 | 7.03 | 63.92 | 7.03 | 63.92 |
PC2 | 1.28 | 11.65 | 8.31 | 75.57 | |
PC3 | 1.19 | 10.87 | 9.50 | 86.44 | |
Gotvand | PC1 | 6.01 | 54.67 | 6.01 | 54.67 |
PC2 | 1.54 | 14.02 | 7.55 | 68.69 | |
PC3 | 1.22 | 11.13 | 8.77 | 79.82 | |
Mollasani | PC1 | 6.70 | 60.98 | 6.70 | 60.98 |
PC2 | 1.29 | 11.73 | 7.99 | 72.71 | |
PC3 | 1.05 | 9.59 | 9.04 | 82.30 |
EC and TDS estimating
After proving the performance of the proposed framework in modeling benchmark data sets, we estimate the EC and TDS parameters. Similar to the benchmark data sets modeling at the beginning of this section, WQP estimates were performed without preprocessing the input data using ANFIS and ANFIS–GBO. In the next step, modeling was performed with the mentioned algorithms and with the data processed by DWT to test the accuracy of the proposed framework in estimating WQPs. Table 6 presents the results of ML algorithms in EC estimation at four stations. The MAE, RRMSE, and R calculated for the ANFIS–GBO and ANFIS–DWT–GBO combinations were different from ANFIS and ANFIS–DWT. ANFIS and ANFIS–DWT were the weakest algorithms in the testing period, and ANFIS–DWT–GBO were the best algorithms. ANFIS–GBO had better accuracy than ANFIS and ANFIS–DWT. In comparison, ANFIS–DWT–GBO performed better than ANFIS–GBO. At the Armand station, MAE, RRMSE, and R values were 22.41, 0.09, and 0.96, respectively. These values were 73.39, 0.04, and 0.99 for the Ahvaz station. Also, in the Gotvand station, the value of MAE was equal to 61.70, RRMSE was equal to 0.04, R was equal to 0.98, and Mollasani station was equal to 66.07, RRMSE was equal to 0.03, and R was equal to 0.99.
Results of algorithms for estimating EC in different stations
. | Train . | Test . | ||||
---|---|---|---|---|---|---|
MAE . | RRMSE . | R . | MAE . | RRMSE . | R . | |
Armand | ||||||
ANFIS | 19.37 | 0.23 | 0.97 | 26.07 | 0.30 | 0.95 |
ANFIS–DWT | 8.78 | 0.12 | 0.99 | 23.17 | 0.27 | 0.96 |
ANFIS–GBO | 23.15 | 0.00 | 0.95 | 22.74 | 0.10 | 0.96 |
ANFIS–DWT–GBO | 21.17 | 0.00 | 0.96 | 22.41 | 0.09 | 0.96 |
Ahvaz | ||||||
ANFIS | 43.83 | 0.15 | 0.98 | 98.58 | 0.21 | 0.96 |
ANFIS–DWT | 33.14 | 0.09 | 0.99 | 81.63 | 0.25 | 0.97 |
ANFIS–GBO | 44.72 | 0.00 | 0.98 | 78.21 | 0.04 | 0.98 |
ANFIS–DWT–GBO | 44.91 | 0.00 | 0.98 | 73.39 | 0.04 | 0.99 |
Gotvand | ||||||
ANFIS | 30.59 | 0.14 | 0.98 | 75.16 | 0.28 | 0.96 |
ANFIS–DWT | 17.48 | 0.06 | 0.99 | 42.35 | 0.22 | 0.97 |
ANFIS–GBO | 32.46 | 0.00 | 0.98 | 65.58 | 0.09 | 0.98 |
ANFIS–DWT–GBO | 40.62 | 0.01 | 0.98 | 61.70 | 0.04 | 0.98 |
Mollasani | ||||||
ANFIS | 40.20 | 0.12 | 0.99 | 91.27 | 0.22 | 0.97 |
ANFIS–DWT | 28.95 | 0.08 | 0.99 | 55.56 | 0.15 | 0.98 |
ANFIS–GBO | 43.74 | 0.00 | 0.99 | 68.25 | 0.04 | 0.98 |
ANFIS–DWT–GBO | 43.62 | 0.00 | 0.99 | 66.07 | 0.03 | 0.99 |
. | Train . | Test . | ||||
---|---|---|---|---|---|---|
MAE . | RRMSE . | R . | MAE . | RRMSE . | R . | |
Armand | ||||||
ANFIS | 19.37 | 0.23 | 0.97 | 26.07 | 0.30 | 0.95 |
ANFIS–DWT | 8.78 | 0.12 | 0.99 | 23.17 | 0.27 | 0.96 |
ANFIS–GBO | 23.15 | 0.00 | 0.95 | 22.74 | 0.10 | 0.96 |
ANFIS–DWT–GBO | 21.17 | 0.00 | 0.96 | 22.41 | 0.09 | 0.96 |
Ahvaz | ||||||
ANFIS | 43.83 | 0.15 | 0.98 | 98.58 | 0.21 | 0.96 |
ANFIS–DWT | 33.14 | 0.09 | 0.99 | 81.63 | 0.25 | 0.97 |
ANFIS–GBO | 44.72 | 0.00 | 0.98 | 78.21 | 0.04 | 0.98 |
ANFIS–DWT–GBO | 44.91 | 0.00 | 0.98 | 73.39 | 0.04 | 0.99 |
Gotvand | ||||||
ANFIS | 30.59 | 0.14 | 0.98 | 75.16 | 0.28 | 0.96 |
ANFIS–DWT | 17.48 | 0.06 | 0.99 | 42.35 | 0.22 | 0.97 |
ANFIS–GBO | 32.46 | 0.00 | 0.98 | 65.58 | 0.09 | 0.98 |
ANFIS–DWT–GBO | 40.62 | 0.01 | 0.98 | 61.70 | 0.04 | 0.98 |
Mollasani | ||||||
ANFIS | 40.20 | 0.12 | 0.99 | 91.27 | 0.22 | 0.97 |
ANFIS–DWT | 28.95 | 0.08 | 0.99 | 55.56 | 0.15 | 0.98 |
ANFIS–GBO | 43.74 | 0.00 | 0.99 | 68.25 | 0.04 | 0.98 |
ANFIS–DWT–GBO | 43.62 | 0.00 | 0.99 | 66.07 | 0.03 | 0.99 |
According to Table 7 for the TDS in four stations, the best results were related to the ANFIS–DWT–GBO. Based on the results, MAE, RRMSE, and R values in the Armand station were 14.68, 0.06, and 0.97, and in the Ahvaz station they were 55.90, 0.02, 0.98, respectively. Also, in the Gotvand station, the value of MAE was equal to 44.48, RRMSE was equal to 0.00, and R was equal to 0.98. MAE, RRMSE, and R values in the Mollasani station were 50.97, 0.00, and 0.98, respectively.
Results of algorithms for estimating TDS in different stations
. | Train . | Test . | ||||
---|---|---|---|---|---|---|
MAE . | RRMSE . | R . | MAE . | RRMSE . | R . | |
Armand | ||||||
ANFIS | 13.43 | 0.25 | 0.96 | 34.38 | 0.63 | 0.84 |
ANFIS–DWT | 7.69 | 0.14 | 0.98 | 15.78 | 0.35 | 0.93 |
ANFIS–GBO | 20.22 | 0.02 | 0.98 | 18.55 | 0.17 | 0.96 |
ANFIS–DWT–GBO | 15.23 | 0.00 | 0.99 | 14.68 | 0.06 | 0.97 |
Ahvaz | ||||||
ANFIS | 42.55 | 0.22 | 0.97 | 98.76 | 0.75 | 0.81 |
ANFIS–DWT | 30.61 | 0.11 | 0.99 | 78.50 | 0.45 | 0.90 |
ANFIS–GBO | 43.69 | 0.00 | 0.97 | 61.05 | 0.04 | 0.97 |
ANFIS–DWT–GBO | 43.89 | 0.00 | 0.97 | 55.90 | 0.02 | 0.98 |
Gotvand | ||||||
ANFIS | 32.14 | 0.24 | 0.96 | 85.90 | 1.06 | 0.79 |
ANFIS–DWT | 24.04 | 0.14 | 0.98 | 34.58 | 0.21 | 0.97 |
ANFIS–GBO | 34.46 | 0.00 | 0.96 | 41.62 | 0.03 | 0.97 |
ANFIS–DWT–GBO | 33.85 | 0.00 | 0.96 | 44.48 | 0.00 | 0.98 |
Mollasani | ||||||
ANFIS | 42.28 | 0.18 | 0.98 | 71.10 | 0.29 | 0.95 |
ANFIS–DWT | 26.33 | 0.10 | 0.99 | 57.48 | 0.22 | 0.97 |
ANFIS–GBO | 45.63 | 0.00 | 0.97 | 60.03 | 0.05 | 0.97 |
ANFIS–DWT–GBO | 44.82 | 0.00 | 0.97 | 50.97 | 0.00 | 0.98 |
. | Train . | Test . | ||||
---|---|---|---|---|---|---|
MAE . | RRMSE . | R . | MAE . | RRMSE . | R . | |
Armand | ||||||
ANFIS | 13.43 | 0.25 | 0.96 | 34.38 | 0.63 | 0.84 |
ANFIS–DWT | 7.69 | 0.14 | 0.98 | 15.78 | 0.35 | 0.93 |
ANFIS–GBO | 20.22 | 0.02 | 0.98 | 18.55 | 0.17 | 0.96 |
ANFIS–DWT–GBO | 15.23 | 0.00 | 0.99 | 14.68 | 0.06 | 0.97 |
Ahvaz | ||||||
ANFIS | 42.55 | 0.22 | 0.97 | 98.76 | 0.75 | 0.81 |
ANFIS–DWT | 30.61 | 0.11 | 0.99 | 78.50 | 0.45 | 0.90 |
ANFIS–GBO | 43.69 | 0.00 | 0.97 | 61.05 | 0.04 | 0.97 |
ANFIS–DWT–GBO | 43.89 | 0.00 | 0.97 | 55.90 | 0.02 | 0.98 |
Gotvand | ||||||
ANFIS | 32.14 | 0.24 | 0.96 | 85.90 | 1.06 | 0.79 |
ANFIS–DWT | 24.04 | 0.14 | 0.98 | 34.58 | 0.21 | 0.97 |
ANFIS–GBO | 34.46 | 0.00 | 0.96 | 41.62 | 0.03 | 0.97 |
ANFIS–DWT–GBO | 33.85 | 0.00 | 0.96 | 44.48 | 0.00 | 0.98 |
Mollasani | ||||||
ANFIS | 42.28 | 0.18 | 0.98 | 71.10 | 0.29 | 0.95 |
ANFIS–DWT | 26.33 | 0.10 | 0.99 | 57.48 | 0.22 | 0.97 |
ANFIS–GBO | 45.63 | 0.00 | 0.97 | 60.03 | 0.05 | 0.97 |
ANFIS–DWT–GBO | 44.82 | 0.00 | 0.97 | 50.97 | 0.00 | 0.98 |
Evaluation criteria of different algorithms for estimating EC and TDS in different stations.
Evaluation criteria of different algorithms for estimating EC and TDS in different stations.



Performance of the novel framework (ANFIS–DWT–GBO) for estimating EC in different stations.
Performance of the novel framework (ANFIS–DWT–GBO) for estimating EC in different stations.
Performance of the novel framework (ANFIS–DWT–GBO) for estimating TDS in different stations.
Performance of the novel framework (ANFIS–DWT–GBO) for estimating TDS in different stations.
DISCUSSION
According to the modeling results of three benchmark data sets and estimation of EC and TDS values in the Karun river by the studied algorithms, ANFIS–DWT–GBO, ANFIS–GBO, ANFIS–DWT , and ANFIS, according to the accuracy in the rankings, they were ranked first to fourth. The improved accuracy of ANFIS–DWT–GBO compared to other algorithms can be explained by using the GBO optimization algorithm to optimally determine important ANFIS parameters and process input data by DWT. In fact, the simultaneous use of the DWT and optimization algorithm significantly increased the accuracy of ANFIS. GBO, by finding the optimal ANFIS values, led to the estimation of various parameters with maximum accuracy. Also, the superiority of ANFIS–DWT and ANFIS–GBO algorithms compared to ANFIS proved the efficiency of the DWT and optimization algorithm in improving modeling accuracy. In fact, the GBO has operators that find the membership function parameters more accurately than the independent ANFIS. The positive impact of the GBO optimization algorithm on ANFIS accuracy was due to the use of Newton's approach to achieving better positions in the search space. Also, the use of GSR and LEO operators prevents GBO from falling into the trap of local optimization and finding the optimal values with high accuracy and speed. Also, DWT helps extract hidden information from input data by providing frequency and time information simultaneously.
The superiority of hybrid algorithms compared to classical algorithms and the improvement of the accuracy of simulation algorithms by optimization algorithms have been shown in other studies (Azad et al. 2018; Milan et al. 2021; Song et al. 2021a, 2021b; Wu et al. 2022). Also, the simultaneous effect of DWT and optimization algorithms on increasing the accuracy of simulation algorithms has been proven by Montaseri et al. (2018) and Anaraki et al. (2021). Also, based on the results of a study by Kadkhodazadeh & Farzin (2021), ANFIS–DWT–GBO performed better than LSSVM–GBO. This result could be due to DWT and the use of ANFIS instead of LSSVM. According to the results, ANFIS–DWT–GBO is skilled in achieving high accuracy and overcoming competitors.
CONCLUSION
The present study introduced a novel hybrid framework based on the ANFIS and the DWT and optimization algorithm. The ANFIS–DWT–GBO were compared with the ANFIS, ANFIS–DWT, and ANFIS–GBO on three benchmark data sets (Housing, LSVT, and Servo). The PCA method determined the best input combination for estimating EC and TDS at each station. After proving the performance of the ANFIS–DWT–GBO, its performance was used in EC and TDS estimating in four stations of Armand, Ahvaz, Gotvand, and Mollasani of Karun river. The main results are as follows:
- 1.
The results showed that ANFIS–DWT–GBO has the highest accuracy in the three benchmark data sets. MAE, RRMSE, and R in the Housing were 3.17, 0.06, and 0.68, respectively. Also in the LSVT were 135.86, 0.02, 0.98 and in the Servo were 0.42, 0.02, 0.95, respectively.
- 2.
According to the results of the PCA method, the first three components had the highest percentage of variance in all stations.
- 3.
Results estimating EC and TDS in four stations based on evaluation criteria showed that the ANFIS–DWT–GBO had the best performance among other algorithms. The results showed the highest accuracy in estimating EC and TDS parameters in Mollasani and Gotvand stations. MAE, RRMSE, and R in Mollasani station were 66.07, 0.03, 0.99, respectively, and in Gotvand station they were 44.48, 0.00, 0.98, respectively.
In this paper, the advantages of the GBO optimization algorithm (high estimation accuracy, fast convergence, and easy implementation) and DWT were used to improve the performance of the ANFIS. According to the results of this study, and due to the many advantages of the proposed framework, the future step is to expand the application of this framework to analyze other engineering problems. Also, the ANFIS–DWT–GBO has a high potential in modeling and predicting various parameters of water resources. As a result, we achieved the desired goals in presenting a new hybrid algorithm.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.