Water quality plays a crucial role in management of water resources. Water quality indexes (WQIs) are frequently used methods to assess water quality for drinking purposes. A WQI can be predicted using chemical analysis which might not, however, be viable for a longer period in all country-scale rivers. Thus, in this investigation, two neural-based soft computing techniques – an artificial neural network (ANN) and a generalized regression neural network (GRNN) – and one hybrid soft computing techniques – an adaptive neuro-fuzzy interference system (ANFIS) with four membership functions – were used to predict WQIs in Khorramabad, Biranshahr and Alashtar sub-watersheds in Iran. Ten distinct physiochemical parameters were used as input variables and WQI as output. Simultaneously, a correlation plot and pairs were used to ascertain the relation of input and output variables. The soft computing techniques were compared using six fitness criteria: Nash-Sutcliffe efficiency (NSE), mean absolute error (MAE), Legates-McCabe Index (LMI), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of correlation (CC). Results indicated that ANN better predicted WQI than did GRNN and ANFIS. Among the different membership functions of ANFIS, ANFIS_trimf was far better than were the others. Thus, it was concluded that ANN was a viable tool for the prediction of a WQI.

  • Water quality index values were predicted by using three soft computing techniques.

  • Ten distinct physiochemical parameters were utilized in modelling.

  • The soft computing techniques were compared using six fitness criteria: NSE, MAE, LMI, RMSE, MAPE, CC.

Industries, agriculture, and people pollute water resources through a variety of activities (Katyal 2011). In some watersheds, pollution has exceeded the permissible limit. There is global concern as water quality has degraded almost everywhere (Adriaenssens et al. 2004; Azad et al. 2018). Quality water is fundamental for sustainable living so the abatement of pollution and protection of water resources are necessary, which requires an assessment of water quality (Witek & Jarosiewicz 2009; Reza & Singh 2010; Tiri et al. 2018). Water quality can be determined by chemical, physical and biological analyses (Abbasi & Abbasi 2012; Ewaid & Abed 2017; Medeiros et al. 2017; Tiri et al. 2018). A Water Quality Index (WQI) is one of the commonly used methods for the assessment of water quality (Medeiros et al. 2017; Tiri et al. 2018). WQI group parameters can be utilized for grading water quality and hence used in classification of the health of water systems, such as rivers (Hasan et al. 2015; Ewaid & Abed 2017).

WQIs were proposed by Brown et al. (1970) and Horton (1965) and various methods have since been developed for calculating them (Debels et al. 2005; Tsegaye et al. 2006; Saeedi et al. 2010). Using a WQI, Kannel et al. (2007) analyzed seasonal and spatial changes of the Bagmati River. Debels et al. (2005) calculated a WQI using 9 physio-chemical parameters in the Chill'an River. Yidana & Yidana (2010) combined GIS with a multivariate statistical method to calculate a WQI.

Soft computing techniques have been devised for addressing non-stationary and non-linearity of quality of water. These techniques are attractive, because they directly and quickly model water quality (Gaya et al. 2020; Karim & Kamsani 2020; Yasin & Karim 2020; Hmoud Al-Adhaileh & Waselallah Alsaade 2021), and have a great ability to reduce errors and time of computation (Bhagat et al. 2019). M5P model tree, adaptive neuro-fuzzy interference system (ANFIS), support vector regression (SVM), random forest (RF), Gaussian process (GP), and artificial neural network (ANN), are among the most frequently used soft computing techniques (Barzegar et al. 2016). Chen & Zheng (2008) used soft computing techniques in the prediction of water quality and observed that ANN is the best model which gives the most appropriate result. Singh et al. (2009) implemented an ANN model in the modelling of water quality to compute biological oxygen and oxygen demand. Practical swarm optimization with ANN was applied to predict water quality of sewage effluent by Zheng et al. (2010). Gao et al. (2015) applied a special type of neuronal network: a back propagation neural network combined with practical swarm optimization in prediction of water quality. Nourani et al. (2013) employed ANN for computing a WQI and found that it outperformed other conventional methods. Emamgholizadeh et al. (2014) used ANN and ANFIS to estimate a WQI in the Karoom watershed and found that ANN predicted better than ANFIS. Different combinations of soft computing techniques have also been employed for the estimation of WQIs (Yaseen et al. 2018).

The reliability of soft computing techniques for WQIs has been amply demonstrated (Bui et al. 2020; Gaya et al. 2020; Najafzadeh & Lottfi-Dashbalagh 2020; Tung & Yaseen 2020; Riahi-Madvar et al. 2021). Since analysis of the water quality of all rivers might not be possible at frequent intervals on a countrywide scale for a substantial period of time (De LR Wagener et al. 2019), modeling of water quality with ease and fewer parameters provides motivation for the use of soft computing techniques. Although, a lot of literature is available in which water quality was predicted by the use of soft computing techniques, no one has predicted the water quality of three sub watersheds with a combination of ANFIS, ANN and GRNN (generalized regression neural network), which indicates its significance and novelty. The analysis of water quality in a laboratory is a very costly and time consuming process which requires collection of samples, transportation and testing. In this regard, the study presents a real-time system to evaluate an alternative approach based on soft computing techniques for predicting water quality. The objectives of this study are:

  • i

    to develop a neuro-fuzzy based model, an ANFIS, for the prediction of a WQI for Khorramabad, Biranshahr and Alashtar sub-watersheds in Iran;

  • ii

    to validate the output of ANN, GRNN and ANFIS; and

  • iii

    to compare the performances of the soft computing techniques using model fitness criteria.

The paper is organized as follows. In the second section, a description of the soft computing techniques and study area is given. Details of the methodology, data description, and model fitness criteria are also given. Results and discussion summarizing the performances of soft computing techniques are presented in the third section, followed by conclusions and references.

Soft computing techniques

Artificial neural network (ANN)

ANN, a concept taken from the human mind, is a widely used prediction technique (Sepahvand et al. 2019; Sihag et al. 2020; Singh 2020). ANN has a brain-like architecture and neuron system. It contains a single input layer, single target layer and one or multiple hidden layers. Every layer has a certain quantity of nodes, and the weighted relation between these layers depicts the node relationship. The input layer, which has the same number of nodes as the number of input parameters, delivers data to the network but does not assist in processing. The last processing unit is the target layer. Whenever an input layer receives input information that moves through the linkages among the nodes, the values are multiplied by the associated weights and added together to get the final target (Zd) to the unit.
formula
(1)
where Acd = weight of interconnection from unit c to d, Bc = input value at the input layer, and Zd = target obtained by the activation function to produce a target for unit d. Haykin (1999) has given a complete discussion of ANN. The main advantage of this method is that it learns automatically and produces an output which is not limited to the input provided. Also, its working is not affected by loss of data as it stores the input in its own networks instead of a database.

Generalized regression neural network (GRNN)

Specht (1991) was the first to introduce GRNN, which uses a normalized radial basis function (RBF) network with a single hidden component based at each training example. The kernel function, also known as the RBF, is a probability density function that includes neural networks, Gaussian processes, and support vector machines. The target values are the hidden-to-output weights, so the output is a weighted average of the target values of training bags near the specified input bags. The widths of the RBF components are the only weights which need to be investigated. There are only four levels in a GRNN structure. The input values are in the first level, the pattern elements are in the second level, the targets from this level are crisscrossed to the summation elements in the third level, and the output elements are in the final level. The first level is completely linked to the second, pattern level, where each element shows a training pattern and its output is a measure of the distance of input from the stored pattern. The optimal value of the user-defined parameter known as spread (s) is determined experimentally. For more information about GRNN, readers are referred to Specht (1991) and Wasserman (1993). The advantage of GRNN is that it can handle the noises in the input easily and use single-pass learning so no back proportion is needed.

Adaptive neuro-fuzzy interference system (ANFIS)

The configuration of ANFIS is displayed in Figure 1. There are five layers to this system (Sihag et al. 2019) and details of these five layers are as follows:

  • 1.
    The membership degree is measured in the first layer. A membership degree is produced by every other node. The membership functions are being used in fuzzy sets.
    formula
    (2)
    formula
    (3)
    where x and y are the outputs, Ai and Bi are the linguistic labels, and and are the degrees of membership functions for Ai and Bi, correspondingly.
  • 2.
    Based on the calculated membership degree, the performance of the second layer (fire strings) is derived. To produce the firing power, the previous layer's membership functions are multiplied together:
    formula
    (4)
    where O2i is the output of this layer called the fire string.
  • 3.
    At this stage, the firing capabilities are standardized. The steady nodes evaluate the contribution of the firing capabilities:
    formula
    (5)
  • 4.
    In this row, the ratio of the ith rule to the final result is evaluated using subsequent parameters:
    formula
    (6)
    where pi, qi, and ri are the consequent parameters.
  • 5.
    To obtain the total output, this layer uses a concise summary of input signals:
    formula
    (7)

Figure 1

Details of the ANFIS model.

Figure 1

Details of the ANFIS model.

Close modal
It has been shown that a Gaussian membership function can lead to accurate outputs (Azimi et al. 2019; Ebtehaj et al. 2019; Gholami et al. 2019). Thus, the current study uses this membership function:
formula
(8)
where ci and σi are the parameters for the membership function. The ANFIS model has the advantage of having both numerical and linguistic knowledge.

Area of the research work

The research project was done using flow and water quality data measured in a watershed consisting of three sub-watersheds: Khorramabad, Biranshahr, and Alashtar sub-watersheds, from Lorestan province, Iran, located between 48°03 10″E and 48°59 07″E, and between 33°11 47″N and 34°03 27″N with an area of 3,562.1 km2. The elevation of the catchment area varies from 1,158 to 3,646 m above sea level. The observations were recorded from September 2014 to August 2017. Most of the rainfall occurs from November to May in a calendar year. The average rainfall is 442 mm, 484 mm, and 556 mm for Khorramabad, Biranshahr, and Alashtar sub-watersheds, respectively. Figure 2 shows the details of the study area and the black dots represent the sampling sites for the three watersheds.

Figure 2

Sites of the research work.

Figure 2

Sites of the research work.

Close modal

Methodology and data descriptions

Three types of parameters – biological, physical, and chemical – were used to analyze the water quality by which the WQI was determined (Dogan et al. 2009). These parameters were total dissolved solids (TDS), sodium (Na), sulfate (SO4), electrical conductivity (EC), calcium (Ca), the potential of hydrogen (pH), bicarbonate ions (HCO), chlorides (Cl), magnesium (Mg), and potassium (K). With the use of these parameters, the WQI was calculated as follows:

  • 1.

    A weight was assigned to each parameter on the scale of one to five according to its significance for drinking suitability and human health. The zi values are given by Table 1 (Yidana et al. 2010; Varol & Davraz 2014; Vasant et al. 2016; Şener et al. 2017; Wagh et al. 2017).

  • 2.
    The relative weight (Zi) was determined for each of the parameters. Details of the Zi are tabulated in Table 1 and its formula is:
    formula
    (9)
  • 3.
    A scale for the quality rating (si) was calculated for each of the parameters as:
    formula
    (10)
  • 4.
    The sub-index level number (SILi) was calculated as:
    formula
    (11)
  • 5.
    The water quality index (WQI) was determined as:
    formula
    (12)
    where Coni is the concentration of parameters in mg/l, Stdi is the standard value of each parameter as per WHO, and n is the number of parameters.

A total of 124 observations were used which were observed from September 2014 to August 2017. The dataset consists of 10 input variables, pH, Na, Mg, SO4, K, TDS, K, Cl, HCO and EC, and 1 output variable, WQI. The pairs of all variables are represented in Figure 3 which shows the interrelation of all variables with each other and also gives information about the outliers which were not in large quantities. 70 percent of the entire dataset was used in a training stage of the soft computing techniques and 30 percent was used for testing the techniques. The characteristics of water quality parameters for the sub-watersheds are tabulated in Table 2. The characteristics of the three watersheds are similar except for some values which were higher in the Khorammabad watershed. Figure 4 shows the flow chart as well as the architecture of soft computing used in the investigation which suggests the implementation of ANN, GRNN and ANFIS in the study area.

Figure 3

Pairs of physio-chemical parameters.

Figure 3

Pairs of physio-chemical parameters.

Close modal
Figure 4

Flow chart of the investigation.

Figure 4

Flow chart of the investigation.

Close modal
Table 1

Details of weight and relative weight of parameters

ParameterWeight (zi)Relative Weight (Zi)Standard values from WHO
TDS 0.15625 500 
EC 0.0625 500 
pH 0.125 6.5–8.5 
HCO 0.09375 500 
Cl 0.125 250 
SO4 0.125 250 
Ca 0.09375 75 
Mg 0.09375 50 
Na 0.0625 200 
0.0625 10 
Total 32  
ParameterWeight (zi)Relative Weight (Zi)Standard values from WHO
TDS 0.15625 500 
EC 0.0625 500 
pH 0.125 6.5–8.5 
HCO 0.09375 500 
Cl 0.125 250 
SO4 0.125 250 
Ca 0.09375 75 
Mg 0.09375 50 
Na 0.0625 200 
0.0625 10 
Total 32  

All values are in mg/L except pH (on a scale) and EC (μS/cm).

Table 2

Characteristics of water quality parameters. All values are in mg/L, except pH (on a scale), EC (μS/cm) and WQI

WatershedParameterTDSECPHHCOClSO4CaMgNaKWQI
Biranshahr watershed Mean 269.21 419.91 7.94 102.15 16.08 18.14 55.50 14.17 5.81 1.10 31.81 
Max 484.0 745.0 8.39 128.14 92.17 55.71 76.15 26.74 45.98 3.91 49.57 
Min 170.0 275.0 7.05 73.22 6.03 4.32 26.05 2.43 0.69 0.00 25.42 
Standard Deviation 48.74 75.15 0.31 12.81 12.72 11.08 10.35 6.04 6.69 0.85 4.20 
Khorammabad watershed Mean 413.24 651.55 7.46 140.93 52.04 22.79 70.28 26.89 16.95 5.46 44.17 
Max 573.00 977.0 8.53 201.37 116.99 77.33 110.2 69.28 64.37 44.1 64.24 
Min 247.00 386.0 6.52 82.38 24.82 0.00 24.05 10.94 3.45 0.78 33.00 
Standard Deviation 82.55 132.49 0.55 27.40 16.51 22.95 15.12 10.42 13.89 6.50 7.05 
Alashtar watershed Mean 279.9 436.5 7.70 107.4 16.28 15.88 54.61 16.52 5.74 2.24 32.59 
Max 417.0 634.0 8.36 149.5 24.82 83.57 76.15 48.62 22.53 7.82 47.15 
Min 150.0 241.0 6.47 56.4 3.55 2.40 24.05 3.65 0.69 0.00 23.65 
Standard Deviation 64.2 101.9 0.49 23.6 4.81 14.13 15.46 8.21 4.87 2.28 4.98 
WatershedParameterTDSECPHHCOClSO4CaMgNaKWQI
Biranshahr watershed Mean 269.21 419.91 7.94 102.15 16.08 18.14 55.50 14.17 5.81 1.10 31.81 
Max 484.0 745.0 8.39 128.14 92.17 55.71 76.15 26.74 45.98 3.91 49.57 
Min 170.0 275.0 7.05 73.22 6.03 4.32 26.05 2.43 0.69 0.00 25.42 
Standard Deviation 48.74 75.15 0.31 12.81 12.72 11.08 10.35 6.04 6.69 0.85 4.20 
Khorammabad watershed Mean 413.24 651.55 7.46 140.93 52.04 22.79 70.28 26.89 16.95 5.46 44.17 
Max 573.00 977.0 8.53 201.37 116.99 77.33 110.2 69.28 64.37 44.1 64.24 
Min 247.00 386.0 6.52 82.38 24.82 0.00 24.05 10.94 3.45 0.78 33.00 
Standard Deviation 82.55 132.49 0.55 27.40 16.51 22.95 15.12 10.42 13.89 6.50 7.05 
Alashtar watershed Mean 279.9 436.5 7.70 107.4 16.28 15.88 54.61 16.52 5.74 2.24 32.59 
Max 417.0 634.0 8.36 149.5 24.82 83.57 76.15 48.62 22.53 7.82 47.15 
Min 150.0 241.0 6.47 56.4 3.55 2.40 24.05 3.65 0.69 0.00 23.65 
Standard Deviation 64.2 101.9 0.49 23.6 4.81 14.13 15.46 8.21 4.87 2.28 4.98 

Correlation plot

This investigation used the 10 input variables to estimate the output variable in the three watersheds. The correlation coefficients were obtained to evaluate the correlation among the output and input variables. Figure 5 shows the correlation plot of training and testing datasets. The WQI yielded a good correlation with TDS and EC (0.97 and 0.97), good correlation with HCO, Cl and Mg (0.8, 0.86 and 0.77), average correlation with SO4, Ca and Na (0.45, 0.58 and 0.54), and poor correlation with pH (−0.18) for the training dataset. Again, a good correlation with TDS and EC (0.97 and 0.97), good correlation with HCO, Cl and Mg (0.84, 0.72 and 0.87), average correlation with Ca, Na, K and pH (0.44, 0.47, 0.57 and −0.41), and poor correlation with SO4 (0.22) were obtained for the testing dataset.

Figure 5

Correlations plot of (a) Training and (b) Testing dataset. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2021.157.

Figure 5

Correlations plot of (a) Training and (b) Testing dataset. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2021.157.

Close modal

Model fitness criteria (MFC)

The performances of ANFIS, GRNN, ANN, and ANN-FFA were compared by model fitness criteria (MFC). These MFC were defined as:

  • Nash-Sutcliffe Efficiency: Nash & Sutcliffe (1970) introduced Nash-Sutcliffe efficiency (NSE) which was used to evaluate the working of soft computing models. The range of NSE lies between −∞ to 1. If NSE is equal to 1, it shows a perfect result of the model. Efficiency equal to 0 means the model is as accurate as the mean of the experimental value and a negative value indicates a better prediction than the model (Wilcox et al. 1990; Legates & McCabe 1999). NSE can be computed as:
    formula
    (13)
  • Mean Absolute Error: The mean absolute error (MAE) is the mean of the absolute difference between observed and predicted values. The range of MAE is 0 to 1. The formula for MAE is:
    formula
    (14)
  • Legates-McCabe Index: The Legates-McCabe Index (LMI) was introduced by Legates & McCabe (1999). The values of the LMI lie between 0 and 1. A lesser value indicates a bad result and vice versa. The LMI can be calculated as:
    formula
    (15)
  • Root Mean Square Error: The root mean square error (RMSE) is one of the most used error methods to assess model fitness. As the name indicates, it is the square root of the mean square error. A zero value indicates the best prediction and 1 indicates the worst case. The equation of RMSE is:
    formula
    (16)
  • Mean Absolute Percentage Error: The mean absolute percentage error (MAPE) is also known as mean absolute percentage deviation. It can be formulated as:
    formula
    (17)
  • Coefficient of Correlation: The coefficient of correlation (CC) shows the interrelation between observed and measured values. The range of CC is from –1 to +1. It can be calculated as:
    formula
    (18)
    where D represents the experimentally obtained values, E denotes the values obtained from the soft computing models, denotes the mean of the experimentally observed values, and F is the number of the dataset.

Three soft computing techniques, namely, ANN, GRNN, and ANFIS with four membership functions (ANFIS_trimf, ANFIS_trapmf, ANFIS_gbellmf, and ANFIS_gaussianmf), were used in this study. Analysis of soft computing techniques is a trial and error process. ANFIS and GRNN were executed by MatLab, while ANN used Weka (3.9). The soft computing techniques employ regulator parameters, and the model accuracy can increase or decrease by changing these parameters. Therefore, to improve accuracy, the optimal parameter values were determined by trial and error, and modeling was done thereafter. The optimal values of hyperparameters were as follows:

  • 1.

    For ANN: Momentum = 0.2, learning rate = 0.1, hidden layer = 01, neuron per hidden layer = 08, iteration = 1,500.

  • 2.

    For GRNN: Spread = 0.2.

Based on the optimal values of the hyperparameters, the soft computing techniques yielded MFC results for the prediction of WQI, and details of the MFC are summarized in Table 3. Table 3 suggests that all soft computing techniques produced good results for the training dataset, while ANN was the best in the case of testing the dataset with values of CC, LMI, NSE, RMSE, MAPE and MAE equal to 0.999, 0.9810, 0.9996, 0.1324, 0.2853, and 0.0987, respectively. The worst soft computing technique was ANFIS with trapmf membership function and with values of CC, LMI, NSE, RMSE, MAPE, and MAE of 0.3974, −0.3788, −2.6871, 12.6774, 20.5235, and 7.1767 respectively. Thus, the MFC values showed that ANN was the most accurate technique for the prediction of WQI, ahead of GRNN and ANFIS with four different membership functions.

Table 3

Detail of MFC for training and testing datasets

MFCANNGRNNANFIS_trimfANFIS_trapmfANFIS_gbellmfANFIS_gaussianmf
Training Dataset 
 CC 0.99996 0.99990 0.9999999 0.9964 0.999991 0.999999 
 LMI 0.98403 0.98400 0.9995241 0.9556 0.997197 0.998800 
 NSE 0.99976 0.99960 0.9999997 0.9928 0.999982 0.999997 
 RMSE 0.08530 00.8962 0.0021884 0.2856 0.012357 0.006735 
 MAPE 0.29108 0.30010 0.0081858 0.9336 0.055585 0.022178 
 MAE 0.09498 0.09899 0.0028298 0.2641 0.016666 0.007138 
Testing Dataset 
 CC 0.9999 0.9733 0.9793 0.3974 0.7986 0.9291 
 LMI 0.9810 0.7856 0.8900 −0.3788 0.6598 0.8104 
 NSE 0.9996 0.9452 0.9523 −2.6871 0.5687 0.8533 
 RMSE 0.1324 1.5457 1.4414 12.6774 4.3361 2.5285 
 MAPE 0.2853 3.0443 1.4092 20.5235 4.3998 2.4634 
 MAE 0.0987 1.1161 0.5725 7.1767 1.7707 0.9871 
MFCANNGRNNANFIS_trimfANFIS_trapmfANFIS_gbellmfANFIS_gaussianmf
Training Dataset 
 CC 0.99996 0.99990 0.9999999 0.9964 0.999991 0.999999 
 LMI 0.98403 0.98400 0.9995241 0.9556 0.997197 0.998800 
 NSE 0.99976 0.99960 0.9999997 0.9928 0.999982 0.999997 
 RMSE 0.08530 00.8962 0.0021884 0.2856 0.012357 0.006735 
 MAPE 0.29108 0.30010 0.0081858 0.9336 0.055585 0.022178 
 MAE 0.09498 0.09899 0.0028298 0.2641 0.016666 0.007138 
Testing Dataset 
 CC 0.9999 0.9733 0.9793 0.3974 0.7986 0.9291 
 LMI 0.9810 0.7856 0.8900 −0.3788 0.6598 0.8104 
 NSE 0.9996 0.9452 0.9523 −2.6871 0.5687 0.8533 
 RMSE 0.1324 1.5457 1.4414 12.6774 4.3361 2.5285 
 MAPE 0.2853 3.0443 1.4092 20.5235 4.3998 2.4634 
 MAE 0.0987 1.1161 0.5725 7.1767 1.7707 0.9871 

The performance of WQI with soft computing techniques is plotted in Figure 6. For the training dataset, all the soft computing techniques worked well which is clearly shown in Figure 6 as only a few are outliers from the best agreement zone. Also, it followed the same trends as the actual WQI, except for some outliers. But, in the training dataset, ANN gave the best results, all of the ANN plot was in the best agreement zone and followed the same trend as the actual line. The plots of ANFIS_trapmf in the graphs and the line diagram did not follow the trends of actual WQI. Thus, Figure 6 indicates that ANN worked superbly in the prediction of the WQI.

Figure 6

The performance of the WQI with soft computing techniques. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2021.157.

Figure 6

The performance of the WQI with soft computing techniques. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2021.157.

Close modal

The Taylor diagram, developed by Taylor in 2001 (Taylor 2001), is one of the most used and modern methods which shows how closely a pattern or set of patterns is summarized. It quantifies a set of patterns in terms of correlation, standard variation, and root mean square error. It is useful for solving the multi-aspects problem related to complex models. In this investigation, Taylor diagrams were constructed in ‘R’ language using the ‘plotrix’ package and Figure 7 shows the plot of a Taylor diagram with different soft computing techniques for the prediction of WQI. It also gives the same results as provided by the other plots. In training, all the soft computing techniques worked well in the prediction of WQI but, in testing, ANN was the most accurate in prediction. Also, ANFIS_trampmf was the most inaccurate with a correlation of 0.42 (Figure 7). Thus, the Taylor diagram (Figure 7) also concluded that ANN was the most reliable soft computing technique for the prediction of WQI.

Figure 7

Taylor diagram for soft computing techniques.

Figure 7

Taylor diagram for soft computing techniques.

Close modal

The Violin cum box plot is a combination of the violin plot and box plot. In the violin cum box plot, the box plot is built between violin plots, giving the benefits of both of the plots. Just like the violin plot and box plot, the violin cum box plot also has multiple layers and marks the mean (horizontal bar in a box plot). The plot was created in the ‘R’ language environment using the ‘dplyr’ package. Figure 8 shows the violin cum box plot for the soft computing techniques for the prediction of WQI for training and testing datasets. Figure 8 shows that the violin cum box plot was approximately symmetrical with actual values except for ANFIS_trapmf for the training part but, for the testing part, all the plots of different soft computing techniques were different, with only the ANN plot being symmetrical with the actual one. Even the mean bar in the ANN plot was the same as for the actual WQI. Thus, the violin cum box plot (Figure 8) also supported the conclusion that ANN had an edge on GRNN and ANFIS in the prediction of WQI.

Figure 8

Violin cum box plot for soft computing techniques, where trimf = Anfis_trimf, trapmf = ANFIS_trapmf, gbellmf = ANFIS_gbellmf, and gaussianmf = ANFIS_gaussianmf.

Figure 8

Violin cum box plot for soft computing techniques, where trimf = Anfis_trimf, trapmf = ANFIS_trapmf, gbellmf = ANFIS_gbellmf, and gaussianmf = ANFIS_gaussianmf.

Close modal

All three soft computing techniques predicted WQI, but in the case of ANN, greater points were placed in the agreement zone (Figure 6), the point of standard deviation was in an approximate overlap with the point of actual data (Figure 7), exactly the same multiple layers and mean bar with the actual one (Figure 8), and most accurate values of MFC (Table 3), and its reliability and accuracy were higher than that of the other two models. Therefore, this method can be used to predict WQI in the study area.

Water quality determination entails a number of parameters that have to be measured for calculating a WQI. The use of soft computing techniques in the prediction of WQI can reduce the time and make the process easy. Three soft computing techniques, ANN, GRNN, and ANFIS with four membership functions, were used for the prediction of WQI in Khorram Abad, Biranshahr, and Alashtar sub-watersheds in Iran. The effectiveness of these soft computing techniques was determined using scatter (xy) plot, Taylor diagram, violin cum box plot, and model fitness criteria which showed that ANN produced a good agreement of predicted values with LMI and RMSE equal to 0.9810 and 0.1324. The effectiveness of ANN was also suggested by the Taylor diagram, violin cum box plot, and scatter plot. The mean bar and multiple layers of the violin cum box plot and standard variation of the Taylor diagram were the same as the actual values. Between the membership functions of ANFIS, ANFIS_trimf predicted WQI more accurately than other membership functions, while ANFIS_trapmf gave the worst results. Results indicated that with the sampling and measurement of several hydro-chemical parameters and the use of soft computing techniques, WQI can be predicted with high accuracy. Because the construction and maintenance of water quality measurement stations are expensive, their construction and maintenance as well as monitoring cannot be justified economically. In such a case, soft computing techniques can be used to assess water quality.

The authors have no conflict of interest.

All relevant data are available from an online repository or repositories. See: https://drive.google.com/file/d/15nLuKd2gyI_usQjhzpVooxg9tEyAoc-m/view?usp=sharing.

Abbasi
T.
&
Abbasi
S. A.
2012
Water Quality Indices
.
Elsevier Science
,
Burlington, MA, USA
.
Adriaenssens
V.
,
De Baets
B.
,
Goethals
P. L.
&
De Pauw
N.
2004
Fuzzy rule-based models for decision support in ecosystem management
.
Science of the Total Environment
319
(
1–3
),
1
12
.
Azad
A.
,
Karami
H.
,
Farzin
S.
,
Saeedian
A.
,
Kashi
H.
&
Sayyahi
F.
2018
Prediction of water quality parameters using ANFIS optimized by intelligence algorithms (case study: Gorganrood River)
.
KSCE Journal of Civil Engineering
22
(
7
),
2206
2213
.
Azimi
H.
,
Bonakdari
H.
,
Ebtehaj
I.
,
Shabanlou
S.
,
Talesh
S. H. A.
&
Jamali
A.
2019
A pareto design of evolutionary hybrid optimization of ANFIS model in prediction abutment scour depth
.
Sādhanā
44
(
7
),
169
.
Barzegar
R.
,
Adamowski
J.
&
Moghaddam
A. A.
2016
Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran
.
Stochastic Environmental Research and Risk Assessment
30
(
7
),
1797
1819
.
Brown
R. M.
,
McClelland
N. I.
,
Deininger
R. A.
&
Tozer
R. G.
1970
A Water Quality Index- Do We Dare
.
Bui
D. T.
,
Khosravi
K.
,
Tiefenbacher
J.
,
Nguyen
H.
&
Kazakis
N.
2020
Improving prediction of water quality indices using novel hybrid machine-learning algorithms
.
Science of the Total Environment
721
,
137612
.
Chen
D. Y.
&
Zhang
X. Z.
2008
Application of variable structure neural network in prediction of future water quality parameters
.
Science and Technology Engineering
22
,
1577
1579
(In Chinese)
.
Debels
P.
,
Figueroa
R.
,
Urrutia
R.
,
Barra
R.
&
Niell
X.
2005
Evaluation of water quality in the Chillán River (Central Chile) using physicochemical parameters and a modified water quality index
.
Environmental Monitoring and Assessment
110
(
1–3
),
301
322
.
De LR Wagener
A.
,
Falcão
A. P.
,
Farias
C. O.
,
Molina
F. F.
,
da Silva Carreira
R.
,
Mauad
C.
&
Massone
C. G.
2019
Distribution and source apportionment of hydrocarbons in sediments of oil-producing continental margin: a fuzzy logic approach
.
Environmental Science and Pollution Research
26
(
17
),
17032
17044
.
Dogan
E.
,
Sengorur
B.
&
Koklu
R.
2009
Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique
.
Journal of Environmental Management
90
(
2
),
1229
1235
.
Ebtehaj
I.
,
Bonakdari
H.
&
Es-haghi
M. S.
2019
Design of a hybrid ANFIS–PSO model to estimate sediment transport in open channels
.
Iranian Journal of Science and Technology, Transactions of Civil Engineering
43
(
4
),
851
857
.
Emamgholizadeh
S.
,
Kashi
H.
,
Marofpoor
I.
&
Zalaghi
E.
2014
Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models
.
International Journal of Environmental Science and Technology
11
(
3
),
645
656
.
Ewaid
S. H.
&
Abed
S. A.
2017
Water quality index for Al-Gharraf river, southern Iraq
.
The Egyptian Journal of Aquatic Research
43
(
2
),
117
122
.
Gao
F.
,
Feng
M. Q.
&
Teng
S. F.
2015
On the way for forecasting the water quality by BP neural network based on the PSO
.
Journal of Safety and Environment
15
,
338
341
.
Gaya
M. S.
,
Abba
S. I.
,
Abdu
A. M.
,
Tukur
A. I.
,
Saleh
M. A.
,
Esmaili
P.
&
Wahab
N. A.
2020
Estimation of water quality index using artificial intelligence approaches and multi-linear regression
.
International Journal of Artificial Intelligence ISSN
2252
(
8938
),
8938
.
Gholami
A.
,
Bonakdari
H.
,
Ebtehaj
I.
,
Talesh
S. H. A.
,
Khodashenas
S. R.
&
Jamali
A.
2019
Analyzing bank profile shape of alluvial stable channels using robust optimization and evolutionary ANFIS methods
.
Applied Water Science
9
(
3
),
40
.
Hasan
H. H.
,
Jamil
N. R.
&
Aini
N.
2015
Water quality index and sediment loading analysis in Pelus River, Perak, Malaysia
.
Procedia Environmental Sciences
30
,
133
138
.
Haykin
S.
1999
Neural Networks: A Comprehensive Reference and Index (Second edition)
.
Prentice Hall
,
Upper Saddle River, NJ, USA
.
Hmoud Al-Adhaileh
M.
&
Waselallah Alsaade
F.
2021
Modelling and prediction of water quality by using artificial intelligence
.
Sustainability
13
(
8
),
4259
.
Horton
R. K.
1965
An index number system for rating water quality
.
Journal of Water Pollution Control Federation
37
(
3
),
300
306
.
Kannel
P. R.
,
Lee
S.
,
Lee
Y. S.
,
Kanel
S. R.
&
Khan
S. P.
2007
Application of water quality indices and dissolved oxygen as indicators for river water classification and urban impact assessment
.
Environmental Monitoring and Assessment
132
(
1–3
),
93
110
.
Karim
S. A. A.
&
Kamsani
N. F.
2020
Water Quality Index Using Fuzzy Regression
. In:
Water Quality Index Prediction Using Multiple Linear Fuzzy Regression Model
(S. A. Abdul Karim, & N. Fatonah Kamsani, eds).
Springer
,
Singapore
, pp.
37
53
.
Katyal
D.
2011
Water quality indices used for surface water vulnerability assessment
.
International Journal of Environmental Sciences
2
(
1
),
154
173
.
Medeiros
A. C.
,
Faial
K. R. F.
,
Faial
K. D. C. F.
,
da Silva Lopes
I. D.
,
de Oliveira Lima
M.
,
Guimarães
R. M.
&
Mendonça
N. M.
2017
Quality index of the surface water of Amazonian rivers in industrial areas in Pará, Brazil
.
Marine Pollution Bulletin
123
(
1–2
),
156
164
.
Najafzadeh
M.
&
Lottfi-Dashbalagh
M.
2020
Application of optimized neuro-fuzzy models for estimation of water quality index in natural rivers
.
Amirkabir Journal of Civil Engineering
,
53
(
8
),
17
20
.
Nourani
V.
,
Khanghah
T. R.
&
Sayyadi
M.
2013
Application of the Artificial Neural Network to monitor the quality of treated water
.
International Journal of Management & Information Technology
3
(
1
),
39
45
.
Reza
R.
&
Singh
G.
2010
Heavy metal contamination and its indexing approach for river water
.
International Journal of Environmental Science & Technology
7
(
4
),
785
792
.
Riahi-Madvar
H.
,
Dehghani
M.
,
Memarzadeh
R.
&
Gharabaghi
B.
2021
Short to long-term forecasting of river flows by heuristic optimization algorithms hybridized with ANFIS
.
Water Resources Management
35
(
4
),
1149
1166
.
Saeedi
M.
,
Abessi
O.
,
Sharifi
F.
&
Meraji
H.
2010
Development of groundwater quality index
.
Environmental Monitoring and Assessment
163
(
1–4
),
327
335
.
Şener
Ş.
,
Şener
E.
&
Davraz
A.
2017
Evaluation of water quality using water quality index (WQI) method and GIS in Aksu River (SW-Turkey)
.
Science of the Total Environment
584
,
131
144
.
Sepahvand
A.
,
Singh
B.
,
Sihag
P.
,
Nazari Samani
A.
,
Ahmadi
H.
&
Fiz Nia
S.
2019
Assessment of the various soft computing techniques to predict sodium absorption ratio (SAR)
.
ISH Journal of Hydraulic Engineering
. DOI: 10.1080/09715010.2019.1595185.
Sihag
P.
,
Tiwari
N. K.
&
Ranjan
S.
2019
Prediction of unsaturated hydraulic conductivity using adaptive neuro-fuzzy inference system (ANFIS)
.
ISH Journal of Hydraulic Engineering
25
(
2
),
132
142
.
Sihag
P.
,
Singh
B.
,
Sepah Vand
A.
&
Mehdipour
V.
2020
Modeling the infiltration process with soft computing techniques
.
ISH Journal of Hydraulic Engineering
26
(
2
),
138
152
.
Singh
K. P.
,
Basant
A.
,
Malik
A.
&
Jain
G.
2009
Artificial neural network modeling of the river water quality – a case study
.
Ecological Modelling
220
(
6
),
888
895
.
Specht
D. F.
1991
A general regression neural network
.
IEEE Transactions on Neural Networks
2
(
6
),
568
576
.
Taylor
K. E.
2001
Summarizing multiple aspects of model performance in a single diagram
.
Journal of Geophysical Research: Atmospheres
106
(
D7
),
7183
7192
.
Tiri
A.
,
Belkhiri
L.
&
Mouni
L.
2018
Evaluation of surface water quality for drinking purposes using fuzzy inference system
.
Groundwater for Sustainable Development
6
,
235
244
.
Tsegaye
T.
,
Sheppard
D.
,
Islam
K. R.
,
Tadesse
W.
,
Atalay
A.
&
Marzen
L.
2006
Development of chemical index as a measure of in-stream water quality in response to land-use and land cover changes
.
Water, Air, and Soil Pollution
174
(
1–4
),
161
179
.
Vasant
W.
,
Dipak
P.
,
Aniket
M.
,
Ranjitsinh
P.
,
Shrikant
M.
,
Nitin
D.
,
Manesh
A.
&
Abhay
V.
2016
GIS and statistical approach to assess the groundwater quality of Nanded Tehsil,(MS) India
. In:
Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems: Volume 1
,
Cham
.
Springer
, pp.
409
417
.
Wagh
V. M.
,
Panaskar
D. B.
,
Muley
A. A.
&
Mukate
S. V.
2017
Groundwater suitability evaluation by CCME WQI model for Kadava river basin, Nashik, Maharashtra, India
.
Modeling Earth Systems and Environment
3
(
2
),
557
565
.
Wasserman
P. D.
1993
Advanced Methods in Neural Computing
.
John Wiley & Sons, Inc.
,
New York, NY
.
Wilcox
B. P.
,
Rawls
W. J.
,
Brakensiek
D. L.
&
Wight
J. R.
1990
Predicting runoff from rangeland catchments: a comparison of two models
.
Water Resources Research
26
(
10
),
2401
2410
.
Witek
Z.
&
Jarosiewicz
A.
2009
Long-term changes in nutrient status of river water
.
Polish Journal of Environmental Studies
18
,
6
.
Yaseen
Z. M.
,
Deo
R. C.
,
Hilal
A.
,
Abd
A. M.
,
Bueno
L. C.
,
Salcedo-Sanz
S.
&
Nehdi
M. L.
2018
Predicting compressive strength of lightweight foamed concrete using extreme learning machine model
.
Advances in Engineering Software
115
,
112
125
.
Yasin
M. I.
&
Karim
S. A. A.
2020
A New Fuzzy Weighted Multivariate Regression to Predict Water Quality Index at Perak Rivers
. In:
Optimization Based Model Using Fuzzy and Other Statistical Techniques Towards Environmental Sustainability
(S. A. Abdul Karim, E. Abdul Kadir, & A. Haza Nasution, eds).
Springer
,
Singapore
, pp.
1
27
.
Yidana
S. M.
&
Yidana
A.
2010
Assessing water quality using water quality index and multivariate analysis
.
Environmental Earth Sciences
59
(
7
),
1461
1473
.
Yidana
S. M.
,
Banoeng-Yakubo
B.
&
Akabzaa
T. M.
2010
Analysis of groundwater quality using multivariate and spatial analyses in the Keta basin, Ghana
.
Journal of African Earth Sciences
58
(
2
),
220
234
.
Zheng
G. Y.
,
Luo
F.
&
Chen
W. B.
2010
Quality prediction of waste water treatment based on Immune Particle Swarm Neural Networks
.
Microprocessors
31
,
75
77
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).