## Abstract

Water quality plays a crucial role in management of water resources. Water quality indexes (WQIs) are frequently used methods to assess water quality for drinking purposes. A WQI can be predicted using chemical analysis which might not, however, be viable for a longer period in all country-scale rivers. Thus, in this investigation, two neural-based soft computing techniques – an artificial neural network (ANN) and a generalized regression neural network (GRNN) – and one hybrid soft computing techniques – an adaptive neuro-fuzzy interference system (ANFIS) with four membership functions – were used to predict WQIs in Khorramabad, Biranshahr and Alashtar sub-watersheds in Iran. Ten distinct physiochemical parameters were used as input variables and WQI as output. Simultaneously, a correlation plot and pairs were used to ascertain the relation of input and output variables. The soft computing techniques were compared using six fitness criteria: Nash-Sutcliffe efficiency (NSE), mean absolute error (MAE), Legates-McCabe Index (LMI), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of correlation (CC). Results indicated that ANN better predicted WQI than did GRNN and ANFIS. Among the different membership functions of ANFIS, ANFIS_trimf was far better than were the others. Thus, it was concluded that ANN was a viable tool for the prediction of a WQI.

## HIGHLIGHTS

Water quality index values were predicted by using three soft computing techniques.

Ten distinct physiochemical parameters were utilized in modelling.

The soft computing techniques were compared using six fitness criteria: NSE, MAE, LMI, RMSE, MAPE, CC.

## INTRODUCTION

Industries, agriculture, and people pollute water resources through a variety of activities (Katyal 2011). In some watersheds, pollution has exceeded the permissible limit. There is global concern as water quality has degraded almost everywhere (Adriaenssens *et al.* 2004; Azad *et al.* 2018). Quality water is fundamental for sustainable living so the abatement of pollution and protection of water resources are necessary, which requires an assessment of water quality (Witek & Jarosiewicz 2009; Reza & Singh 2010; Tiri *et al.* 2018). Water quality can be determined by chemical, physical and biological analyses (Abbasi & Abbasi 2012; Ewaid & Abed 2017; Medeiros *et al.* 2017; Tiri *et al.* 2018). A Water Quality Index (WQI) is one of the commonly used methods for the assessment of water quality (Medeiros *et al.* 2017; Tiri *et al.* 2018). WQI group parameters can be utilized for grading water quality and hence used in classification of the health of water systems, such as rivers (Hasan *et al.* 2015; Ewaid & Abed 2017).

WQIs were proposed by Brown *et al.* (1970) and Horton (1965) and various methods have since been developed for calculating them (Debels *et al.* 2005; Tsegaye *et al.* 2006; Saeedi *et al.* 2010). Using a WQI, Kannel *et al.* (2007) analyzed seasonal and spatial changes of the Bagmati River. Debels *et al.* (2005) calculated a WQI using 9 physio-chemical parameters in the Chill'an River. Yidana & Yidana (2010) combined GIS with a multivariate statistical method to calculate a WQI.

Soft computing techniques have been devised for addressing non-stationary and non-linearity of quality of water. These techniques are attractive, because they directly and quickly model water quality (Gaya *et al.* 2020; Karim & Kamsani 2020; Yasin & Karim 2020; Hmoud Al-Adhaileh & Waselallah Alsaade 2021), and have a great ability to reduce errors and time of computation (Bhagat *et al.* 2019). M5P model tree, adaptive neuro-fuzzy interference system (ANFIS), support vector regression (SVM), random forest (RF), Gaussian process (GP), and artificial neural network (ANN), are among the most frequently used soft computing techniques (Barzegar *et al.* 2016). Chen & Zheng (2008) used soft computing techniques in the prediction of water quality and observed that ANN is the best model which gives the most appropriate result. Singh *et al.* (2009) implemented an ANN model in the modelling of water quality to compute biological oxygen and oxygen demand. Practical swarm optimization with ANN was applied to predict water quality of sewage effluent by Zheng *et al.* (2010). Gao *et al.* (2015) applied a special type of neuronal network: a back propagation neural network combined with practical swarm optimization in prediction of water quality. Nourani *et al.* (2013) employed ANN for computing a WQI and found that it outperformed other conventional methods. Emamgholizadeh *et al.* (2014) used ANN and ANFIS to estimate a WQI in the Karoom watershed and found that ANN predicted better than ANFIS. Different combinations of soft computing techniques have also been employed for the estimation of WQIs (Yaseen *et al.* 2018).

The reliability of soft computing techniques for WQIs has been amply demonstrated (Bui *et al.* 2020; Gaya *et al.* 2020; Najafzadeh & Lottfi-Dashbalagh 2020; Tung & Yaseen 2020; Riahi-Madvar *et al.* 2021). Since analysis of the water quality of all rivers might not be possible at frequent intervals on a countrywide scale for a substantial period of time (De LR Wagener *et al.* 2019), modeling of water quality with ease and fewer parameters provides motivation for the use of soft computing techniques. Although, a lot of literature is available in which water quality was predicted by the use of soft computing techniques, no one has predicted the water quality of three sub watersheds with a combination of ANFIS, ANN and GRNN (generalized regression neural network), which indicates its significance and novelty. The analysis of water quality in a laboratory is a very costly and time consuming process which requires collection of samples, transportation and testing. In this regard, the study presents a real-time system to evaluate an alternative approach based on soft computing techniques for predicting water quality. The objectives of this study are:

- i
to develop a neuro-fuzzy based model, an ANFIS, for the prediction of a WQI for Khorramabad, Biranshahr and Alashtar sub-watersheds in Iran;

- ii
to validate the output of ANN, GRNN and ANFIS; and

- iii
to compare the performances of the soft computing techniques using model fitness criteria.

The paper is organized as follows. In the second section, a description of the soft computing techniques and study area is given. Details of the methodology, data description, and model fitness criteria are also given. Results and discussion summarizing the performances of soft computing techniques are presented in the third section, followed by conclusions and references.

## MATERIALS AND METHODS

### Soft computing techniques

#### Artificial neural network (ANN)

*et al.*2019; Sihag

*et al.*2020; Singh 2020). ANN has a brain-like architecture and neuron system. It contains a single input layer, single target layer and one or multiple hidden layers. Every layer has a certain quantity of nodes, and the weighted relation between these layers depicts the node relationship. The input layer, which has the same number of nodes as the number of input parameters, delivers data to the network but does not assist in processing. The last processing unit is the target layer. Whenever an input layer receives input information that moves through the linkages among the nodes, the values are multiplied by the associated weights and added together to get the final target (Z

_{d}) to the unit.where

*A*

_{cd}= weight of interconnection from unit c to d,

*B*= input value at the input layer, and

_{c}*Z*= target obtained by the activation function to produce a target for unit d. Haykin (1999) has given a complete discussion of ANN. The main advantage of this method is that it learns automatically and produces an output which is not limited to the input provided. Also, its working is not affected by loss of data as it stores the input in its own networks instead of a database.

_{d}#### Generalized regression neural network (GRNN)

Specht (1991) was the first to introduce GRNN, which uses a normalized radial basis function (RBF) network with a single hidden component based at each training example. The kernel function, also known as the RBF, is a probability density function that includes neural networks, Gaussian processes, and support vector machines. The target values are the hidden-to-output weights, so the output is a weighted average of the target values of training bags near the specified input bags. The widths of the RBF components are the only weights which need to be investigated. There are only four levels in a GRNN structure. The input values are in the first level, the pattern elements are in the second level, the targets from this level are crisscrossed to the summation elements in the third level, and the output elements are in the final level. The first level is completely linked to the second, pattern level, where each element shows a training pattern and its output is a measure of the distance of input from the stored pattern. The optimal value of the user-defined parameter known as spread (s) is determined experimentally. For more information about GRNN, readers are referred to Specht (1991) and Wasserman (1993). The advantage of GRNN is that it can handle the noises in the input easily and use single-pass learning so no back proportion is needed.

#### Adaptive neuro-fuzzy interference system (ANFIS)

The configuration of ANFIS is displayed in Figure 1. There are five layers to this system (Sihag *et al.* 2019) and details of these five layers are as follows:

- 1.The membership degree is measured in the first layer. A membership degree is produced by every other node. The membership functions are being used in fuzzy sets.where
*x*and*y*are the outputs,*A*and_{i}*B*are the linguistic labels, and and are the degrees of membership functions for A_{i}_{i}and B_{i}, correspondingly. - 2.
- 3.
- 4.
- 5.

*et al.*2019; Ebtehaj

*et al.*2019; Gholami

*et al.*2019). Thus, the current study uses this membership function:where c

_{i}and

*σ*i are the parameters for the membership function. The ANFIS model has the advantage of having both numerical and linguistic knowledge.

### Area of the research work

The research project was done using flow and water quality data measured in a watershed consisting of three sub-watersheds: Khorramabad, Biranshahr, and Alashtar sub-watersheds, from Lorestan province, Iran, located between 48°03^{′} 10″E and 48°59^{′} 07″E, and between 33°11^{′} 47″N and 34°03^{′} 27″N with an area of 3,562.1 km^{2}. The elevation of the catchment area varies from 1,158 to 3,646 m above sea level. The observations were recorded from September 2014 to August 2017. Most of the rainfall occurs from November to May in a calendar year. The average rainfall is 442 mm, 484 mm, and 556 mm for Khorramabad, Biranshahr, and Alashtar sub-watersheds, respectively. Figure 2 shows the details of the study area and the black dots represent the sampling sites for the three watersheds.

### Methodology and data descriptions

Three types of parameters – biological, physical, and chemical – were used to analyze the water quality by which the WQI was determined (Dogan *et al.* 2009). These parameters were total dissolved solids (TDS), sodium (Na), sulfate (SO_{4}), electrical conductivity (EC), calcium (Ca), the potential of hydrogen (pH), bicarbonate ions (HCO), chlorides (Cl), magnesium (Mg), and potassium (K). With the use of these parameters, the WQI was calculated as follows:

- 1.
A weight was assigned to each parameter on the scale of one to five according to its significance for drinking suitability and human health. The zi values are given by Table 1 (Yidana

*et al.*2010; Varol & Davraz 2014; Vasant*et al.*2016; Şener*et al.*2017; Wagh*et al.*2017). - 2.The relative weight (Zi) was determined for each of the parameters. Details of the Zi are tabulated in Table 1 and its formula is:
- 3.
- 4.
- 5.

A total of 124 observations were used which were observed from September 2014 to August 2017. The dataset consists of 10 input variables, pH, Na, Mg, SO_{4}, K, TDS, K, Cl, HCO and EC, and 1 output variable, WQI. The pairs of all variables are represented in Figure 3 which shows the interrelation of all variables with each other and also gives information about the outliers which were not in large quantities. 70 percent of the entire dataset was used in a training stage of the soft computing techniques and 30 percent was used for testing the techniques. The characteristics of water quality parameters for the sub-watersheds are tabulated in Table 2. The characteristics of the three watersheds are similar except for some values which were higher in the Khorammabad watershed. Figure 4 shows the flow chart as well as the architecture of soft computing used in the investigation which suggests the implementation of ANN, GRNN and ANFIS in the study area.

Parameter . | Weight (z_{i})
. | Relative Weight (Z_{i})
. | Standard values from WHO . |
---|---|---|---|

TDS | 5 | 0.15625 | 500 |

EC | 2 | 0.0625 | 500 |

pH | 4 | 0.125 | 6.5–8.5 |

HCO | 3 | 0.09375 | 500 |

Cl | 4 | 0.125 | 250 |

SO_{4} | 4 | 0.125 | 250 |

Ca | 3 | 0.09375 | 75 |

Mg | 3 | 0.09375 | 50 |

Na | 2 | 0.0625 | 200 |

K | 2 | 0.0625 | 10 |

Total | 32 | 1 |

Parameter . | Weight (z_{i})
. | Relative Weight (Z_{i})
. | Standard values from WHO . |
---|---|---|---|

TDS | 5 | 0.15625 | 500 |

EC | 2 | 0.0625 | 500 |

pH | 4 | 0.125 | 6.5–8.5 |

HCO | 3 | 0.09375 | 500 |

Cl | 4 | 0.125 | 250 |

SO_{4} | 4 | 0.125 | 250 |

Ca | 3 | 0.09375 | 75 |

Mg | 3 | 0.09375 | 50 |

Na | 2 | 0.0625 | 200 |

K | 2 | 0.0625 | 10 |

Total | 32 | 1 |

All values are in mg/L except pH (on a scale) and EC (μS/cm).

Watershed . | Parameter . | TDS . | EC . | PH . | HCO . | Cl . | SO_{4}
. | Ca . | Mg . | Na . | K . | WQI . |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Biranshahr watershed | Mean | 269.21 | 419.91 | 7.94 | 102.15 | 16.08 | 18.14 | 55.50 | 14.17 | 5.81 | 1.10 | 31.81 |

Max | 484.0 | 745.0 | 8.39 | 128.14 | 92.17 | 55.71 | 76.15 | 26.74 | 45.98 | 3.91 | 49.57 | |

Min | 170.0 | 275.0 | 7.05 | 73.22 | 6.03 | 4.32 | 26.05 | 2.43 | 0.69 | 0.00 | 25.42 | |

Standard Deviation | 48.74 | 75.15 | 0.31 | 12.81 | 12.72 | 11.08 | 10.35 | 6.04 | 6.69 | 0.85 | 4.20 | |

Khorammabad watershed | Mean | 413.24 | 651.55 | 7.46 | 140.93 | 52.04 | 22.79 | 70.28 | 26.89 | 16.95 | 5.46 | 44.17 |

Max | 573.00 | 977.0 | 8.53 | 201.37 | 116.99 | 77.33 | 110.2 | 69.28 | 64.37 | 44.1 | 64.24 | |

Min | 247.00 | 386.0 | 6.52 | 82.38 | 24.82 | 0.00 | 24.05 | 10.94 | 3.45 | 0.78 | 33.00 | |

Standard Deviation | 82.55 | 132.49 | 0.55 | 27.40 | 16.51 | 22.95 | 15.12 | 10.42 | 13.89 | 6.50 | 7.05 | |

Alashtar watershed | Mean | 279.9 | 436.5 | 7.70 | 107.4 | 16.28 | 15.88 | 54.61 | 16.52 | 5.74 | 2.24 | 32.59 |

Max | 417.0 | 634.0 | 8.36 | 149.5 | 24.82 | 83.57 | 76.15 | 48.62 | 22.53 | 7.82 | 47.15 | |

Min | 150.0 | 241.0 | 6.47 | 56.4 | 3.55 | 2.40 | 24.05 | 3.65 | 0.69 | 0.00 | 23.65 | |

Standard Deviation | 64.2 | 101.9 | 0.49 | 23.6 | 4.81 | 14.13 | 15.46 | 8.21 | 4.87 | 2.28 | 4.98 |

Watershed . | Parameter . | TDS . | EC . | PH . | HCO . | Cl . | SO_{4}
. | Ca . | Mg . | Na . | K . | WQI . |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Biranshahr watershed | Mean | 269.21 | 419.91 | 7.94 | 102.15 | 16.08 | 18.14 | 55.50 | 14.17 | 5.81 | 1.10 | 31.81 |

Max | 484.0 | 745.0 | 8.39 | 128.14 | 92.17 | 55.71 | 76.15 | 26.74 | 45.98 | 3.91 | 49.57 | |

Min | 170.0 | 275.0 | 7.05 | 73.22 | 6.03 | 4.32 | 26.05 | 2.43 | 0.69 | 0.00 | 25.42 | |

Standard Deviation | 48.74 | 75.15 | 0.31 | 12.81 | 12.72 | 11.08 | 10.35 | 6.04 | 6.69 | 0.85 | 4.20 | |

Khorammabad watershed | Mean | 413.24 | 651.55 | 7.46 | 140.93 | 52.04 | 22.79 | 70.28 | 26.89 | 16.95 | 5.46 | 44.17 |

Max | 573.00 | 977.0 | 8.53 | 201.37 | 116.99 | 77.33 | 110.2 | 69.28 | 64.37 | 44.1 | 64.24 | |

Min | 247.00 | 386.0 | 6.52 | 82.38 | 24.82 | 0.00 | 24.05 | 10.94 | 3.45 | 0.78 | 33.00 | |

Standard Deviation | 82.55 | 132.49 | 0.55 | 27.40 | 16.51 | 22.95 | 15.12 | 10.42 | 13.89 | 6.50 | 7.05 | |

Alashtar watershed | Mean | 279.9 | 436.5 | 7.70 | 107.4 | 16.28 | 15.88 | 54.61 | 16.52 | 5.74 | 2.24 | 32.59 |

Max | 417.0 | 634.0 | 8.36 | 149.5 | 24.82 | 83.57 | 76.15 | 48.62 | 22.53 | 7.82 | 47.15 | |

Min | 150.0 | 241.0 | 6.47 | 56.4 | 3.55 | 2.40 | 24.05 | 3.65 | 0.69 | 0.00 | 23.65 | |

Standard Deviation | 64.2 | 101.9 | 0.49 | 23.6 | 4.81 | 14.13 | 15.46 | 8.21 | 4.87 | 2.28 | 4.98 |

### Correlation plot

This investigation used the 10 input variables to estimate the output variable in the three watersheds. The correlation coefficients were obtained to evaluate the correlation among the output and input variables. Figure 5 shows the correlation plot of training and testing datasets. The WQI yielded a good correlation with TDS and EC (0.97 and 0.97), good correlation with HCO, Cl and Mg (0.8, 0.86 and 0.77), average correlation with SO_{4}, Ca and Na (0.45, 0.58 and 0.54), and poor correlation with pH (−0.18) for the training dataset. Again, a good correlation with TDS and EC (0.97 and 0.97), good correlation with HCO, Cl and Mg (0.84, 0.72 and 0.87), average correlation with Ca, Na, K and pH (0.44, 0.47, 0.57 and −0.41), and poor correlation with SO_{4} (0.22) were obtained for the testing dataset.

### Model fitness criteria (MFC)

The performances of ANFIS, GRNN, ANN, and ANN-FFA were compared by model fitness criteria (MFC). These MFC were defined as:

- Nash-Sutcliffe Efficiency: Nash & Sutcliffe (1970) introduced Nash-Sutcliffe efficiency (NSE) which was used to evaluate the working of soft computing models. The range of NSE lies between −∞ to 1. If NSE is equal to 1, it shows a perfect result of the model. Efficiency equal to 0 means the model is as accurate as the mean of the experimental value and a negative value indicates a better prediction than the model (Wilcox
*et al.*1990; Legates & McCabe 1999). NSE can be computed as: - Legates-McCabe Index: The Legates-McCabe Index (LMI) was introduced by Legates & McCabe (1999). The values of the LMI lie between 0 and 1. A lesser value indicates a bad result and vice versa. The LMI can be calculated as:
- Coefficient of Correlation: The coefficient of correlation (CC) shows the interrelation between observed and measured values. The range of CC is from –1 to +1. It can be calculated as:where D represents the experimentally obtained values, E denotes the values obtained from the soft computing models, denotes the mean of the experimentally observed values, and F is the number of the dataset.

## RESULTS AND DISCUSSION

Three soft computing techniques, namely, ANN, GRNN, and ANFIS with four membership functions (ANFIS_trimf, ANFIS_trapmf, ANFIS_gbellmf, and ANFIS_gaussianmf), were used in this study. Analysis of soft computing techniques is a trial and error process. ANFIS and GRNN were executed by MatLab, while ANN used Weka (3.9). The soft computing techniques employ regulator parameters, and the model accuracy can increase or decrease by changing these parameters. Therefore, to improve accuracy, the optimal parameter values were determined by trial and error, and modeling was done thereafter. The optimal values of hyperparameters were as follows:

- 1.
For ANN: Momentum = 0.2, learning rate = 0.1, hidden layer = 01, neuron per hidden layer = 08, iteration = 1,500.

- 2.
For GRNN: Spread = 0.2.

Based on the optimal values of the hyperparameters, the soft computing techniques yielded MFC results for the prediction of WQI, and details of the MFC are summarized in Table 3. Table 3 suggests that all soft computing techniques produced good results for the training dataset, while ANN was the best in the case of testing the dataset with values of CC, LMI, NSE, RMSE, MAPE and MAE equal to 0.999, 0.9810, 0.9996, 0.1324, 0.2853, and 0.0987, respectively. The worst soft computing technique was ANFIS with trapmf membership function and with values of CC, LMI, NSE, RMSE, MAPE, and MAE of 0.3974, −0.3788, −2.6871, 12.6774, 20.5235, and 7.1767 respectively. Thus, the MFC values showed that ANN was the most accurate technique for the prediction of WQI, ahead of GRNN and ANFIS with four different membership functions.

MFC . | ANN . | GRNN . | ANFIS_trimf . | ANFIS_trapmf . | ANFIS_gbellmf . | ANFIS_gaussianmf . |
---|---|---|---|---|---|---|

Training Dataset | ||||||

CC | 0.99996 | 0.99990 | 0.9999999 | 0.9964 | 0.999991 | 0.999999 |

LMI | 0.98403 | 0.98400 | 0.9995241 | 0.9556 | 0.997197 | 0.998800 |

NSE | 0.99976 | 0.99960 | 0.9999997 | 0.9928 | 0.999982 | 0.999997 |

RMSE | 0.08530 | 00.8962 | 0.0021884 | 0.2856 | 0.012357 | 0.006735 |

MAPE | 0.29108 | 0.30010 | 0.0081858 | 0.9336 | 0.055585 | 0.022178 |

MAE | 0.09498 | 0.09899 | 0.0028298 | 0.2641 | 0.016666 | 0.007138 |

Testing Dataset | ||||||

CC | 0.9999 | 0.9733 | 0.9793 | 0.3974 | 0.7986 | 0.9291 |

LMI | 0.9810 | 0.7856 | 0.8900 | −0.3788 | 0.6598 | 0.8104 |

NSE | 0.9996 | 0.9452 | 0.9523 | −2.6871 | 0.5687 | 0.8533 |

RMSE | 0.1324 | 1.5457 | 1.4414 | 12.6774 | 4.3361 | 2.5285 |

MAPE | 0.2853 | 3.0443 | 1.4092 | 20.5235 | 4.3998 | 2.4634 |

MAE | 0.0987 | 1.1161 | 0.5725 | 7.1767 | 1.7707 | 0.9871 |

MFC . | ANN . | GRNN . | ANFIS_trimf . | ANFIS_trapmf . | ANFIS_gbellmf . | ANFIS_gaussianmf . |
---|---|---|---|---|---|---|

Training Dataset | ||||||

CC | 0.99996 | 0.99990 | 0.9999999 | 0.9964 | 0.999991 | 0.999999 |

LMI | 0.98403 | 0.98400 | 0.9995241 | 0.9556 | 0.997197 | 0.998800 |

NSE | 0.99976 | 0.99960 | 0.9999997 | 0.9928 | 0.999982 | 0.999997 |

RMSE | 0.08530 | 00.8962 | 0.0021884 | 0.2856 | 0.012357 | 0.006735 |

MAPE | 0.29108 | 0.30010 | 0.0081858 | 0.9336 | 0.055585 | 0.022178 |

MAE | 0.09498 | 0.09899 | 0.0028298 | 0.2641 | 0.016666 | 0.007138 |

Testing Dataset | ||||||

CC | 0.9999 | 0.9733 | 0.9793 | 0.3974 | 0.7986 | 0.9291 |

LMI | 0.9810 | 0.7856 | 0.8900 | −0.3788 | 0.6598 | 0.8104 |

NSE | 0.9996 | 0.9452 | 0.9523 | −2.6871 | 0.5687 | 0.8533 |

RMSE | 0.1324 | 1.5457 | 1.4414 | 12.6774 | 4.3361 | 2.5285 |

MAPE | 0.2853 | 3.0443 | 1.4092 | 20.5235 | 4.3998 | 2.4634 |

MAE | 0.0987 | 1.1161 | 0.5725 | 7.1767 | 1.7707 | 0.9871 |

The performance of WQI with soft computing techniques is plotted in Figure 6. For the training dataset, all the soft computing techniques worked well which is clearly shown in Figure 6 as only a few are outliers from the best agreement zone. Also, it followed the same trends as the actual WQI, except for some outliers. But, in the training dataset, ANN gave the best results, all of the ANN plot was in the best agreement zone and followed the same trend as the actual line. The plots of ANFIS_trapmf in the graphs and the line diagram did not follow the trends of actual WQI. Thus, Figure 6 indicates that ANN worked superbly in the prediction of the WQI.

The Taylor diagram, developed by Taylor in 2001 (Taylor 2001), is one of the most used and modern methods which shows how closely a pattern or set of patterns is summarized. It quantifies a set of patterns in terms of correlation, standard variation, and root mean square error. It is useful for solving the multi-aspects problem related to complex models. In this investigation, Taylor diagrams were constructed in ‘R’ language using the ‘plotrix’ package and Figure 7 shows the plot of a Taylor diagram with different soft computing techniques for the prediction of WQI. It also gives the same results as provided by the other plots. In training, all the soft computing techniques worked well in the prediction of WQI but, in testing, ANN was the most accurate in prediction. Also, ANFIS_trampmf was the most inaccurate with a correlation of 0.42 (Figure 7). Thus, the Taylor diagram (Figure 7) also concluded that ANN was the most reliable soft computing technique for the prediction of WQI.

The Violin cum box plot is a combination of the violin plot and box plot. In the violin cum box plot, the box plot is built between violin plots, giving the benefits of both of the plots. Just like the violin plot and box plot, the violin cum box plot also has multiple layers and marks the mean (horizontal bar in a box plot). The plot was created in the ‘R’ language environment using the ‘dplyr’ package. Figure 8 shows the violin cum box plot for the soft computing techniques for the prediction of WQI for training and testing datasets. Figure 8 shows that the violin cum box plot was approximately symmetrical with actual values except for ANFIS_trapmf for the training part but, for the testing part, all the plots of different soft computing techniques were different, with only the ANN plot being symmetrical with the actual one. Even the mean bar in the ANN plot was the same as for the actual WQI. Thus, the violin cum box plot (Figure 8) also supported the conclusion that ANN had an edge on GRNN and ANFIS in the prediction of WQI.

All three soft computing techniques predicted WQI, but in the case of ANN, greater points were placed in the agreement zone (Figure 6), the point of standard deviation was in an approximate overlap with the point of actual data (Figure 7), exactly the same multiple layers and mean bar with the actual one (Figure 8), and most accurate values of MFC (Table 3), and its reliability and accuracy were higher than that of the other two models. Therefore, this method can be used to predict WQI in the study area.

## CONCLUSIONS

Water quality determination entails a number of parameters that have to be measured for calculating a WQI. The use of soft computing techniques in the prediction of WQI can reduce the time and make the process easy. Three soft computing techniques, ANN, GRNN, and ANFIS with four membership functions, were used for the prediction of WQI in Khorram Abad, Biranshahr, and Alashtar sub-watersheds in Iran. The effectiveness of these soft computing techniques was determined using scatter (*x*–*y*) plot, Taylor diagram, violin cum box plot, and model fitness criteria which showed that ANN produced a good agreement of predicted values with LMI and RMSE equal to 0.9810 and 0.1324. The effectiveness of ANN was also suggested by the Taylor diagram, violin cum box plot, and scatter plot. The mean bar and multiple layers of the violin cum box plot and standard variation of the Taylor diagram were the same as the actual values. Between the membership functions of ANFIS, ANFIS_trimf predicted WQI more accurately than other membership functions, while ANFIS_trapmf gave the worst results. Results indicated that with the sampling and measurement of several hydro-chemical parameters and the use of soft computing techniques, WQI can be predicted with high accuracy. Because the construction and maintenance of water quality measurement stations are expensive, their construction and maintenance as well as monitoring cannot be justified economically. In such a case, soft computing techniques can be used to assess water quality.

## CONFLICT OF INTEREST

The authors have no conflict of interest.

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories. See: https://drive.google.com/file/d/15nLuKd2gyI_usQjhzpVooxg9tEyAoc-m/view?usp=sharing.