Water Quality Index (WQI) is a unique and effective rating technique for assessing the quality of water. Nevertheless, most of the indices are not applicable to all water types as these are dependent on core physico-chemical water parameters that can make them biased and sensitive towards specific attributes including: (i) time, location and frequency for data sampling; (ii) number, variety and weights allocation of parameters. Therefore, there is a need to evaluate these indices to eliminate uncertainties that make them unpredictable and which may lead to manipulation of the water quality classes. The present study calculated five WQIs for two temporal periods: (i) June to December 2019 obtained in real time (using the Internet of Things (IoT) nodes) at inlet and outlet streams of Rawal Dam; (ii) 2012–2019 obtained from the Rawal Dam Water Filtration Plant, collected through GIS-based grab sampling. The computed WQIs categorized the collected datasets as ‘Very Poor’, primarily owing to the uneven distribution of the water samples that has led to class imbalance in the data. Additionally, this study investigates the classification of water quality using machine learning algorithms namely: Decision Tree (DT), k-Nearest Neighbor (KNN), Logistic Regression (LogR), Multilayer Perceptron (MLP) and Naive Bayes (NB); based on the parameters including: pH, dissolved oxygen, conductivity, turbidity, fecal coliform and temperature. The classification results showed that the DT algorithm outperformed other models with a classification accuracy of 99%. Although WQI is a popular method used to assess the water quality, there is a need to address the uncertainties and biases introduced by the limitations of data acquisition (such as specific location/area, type and number of parameters or water type) leading to class imbalance. This can be achieved by developing a more refined index that considers various other factors such as topographical and hydrological parameters with spatial temporal variations combined machine learning techniques to effectively contribute in estimation of water quality for all regions.

  • Evaluated five WQI based on six physico-chemical parameters to analyze their sensitivity toward selected location, type and frequency for data sampling.

  • Computed WQIs categorized the dataset as ‘Very Poor’ because of the uneven distribution of water samples leading to class imbalance.

  • Five ML models used in which Decision Tree classification accuracy is 99%.

  • For refined index topographical and hydrological parameters should be considered.

Water is a prime resource that is very vital for nature. It forms the chief constituent of ecosystem. Water is utilized in different fields of agriculture, forestry, livestock production, industrial and other creative activities. It is the primary need for industrial, agricultural and other growing household affairs (Hamzaoui-Azaza et al. 2011; Pazand & Hezarkhani 2012). The quality of water can be defined by its physical, biological and chemical characteristics. Water deterioration occurs due to growing populations or urbanization, anthropogenic activities and the tremendous increase in industrialization (Carpenter et al. 1998; Neal et al. 1998; Singh et al. 2004). Human health is being affected by waterborne diseases that have caused up to 5–10 million deaths worldwide (Chung & Yoo 2015). To maintain a stable civilization on Earth, good water quality has a significant role, therefore water needs to be monitored and managed properly.

Physico-chemical and biological parameters are mostly used to monitor the quality of water that should fall under set standards and guidelines. The occurrence of these parameters beyond the defined limit can be harmful for human health. To express water quality in some standard form researchers have come up with several water quality indices, which are the most effective tools used to describe the quality of water (Couillard & Lefebvre 1985; House & Newsome 1989; Secunda et al. 1998; Nives 1999; Jonnalagadda & Mhere 2001). Water Quality Index (WQI) is a mathematical tool that represents the water quality class by categorizing different water parameters into a standard numerical value that lies between 0 and 100. WQI classifies water quality typically in five classes or categories ranging from excellent to worst and summarizes the complex water quality data for the general public (Nives 1999).

Over many years, different water agencies have proposed several water quality indices but there was no major breakthrough until 1970. A two-phased approach was used to calculate such indices in which at first the raw water quality parameters are converted into a sub-index (SI) value and then further accumulated to a WQI value (Ott 1978). Scaled on a rate of 0–100, the WQI has five or six classes accordingly (Pesce & Wunderlin 2000; Sargaonkar & Deshpande 2003; Štambuk-Giljanović 2003; Tsegaye et al. 2006; Bouza-Deaño et al. 2008). A higher value yields a better WQI class, whereas lower WQI values correspond to a low or inferior class (Landwehr 1979; Brown et al. 1972; Lohani & Todino 1984; Dinius 1987). This classification has helped many studies to determine the quality of water (Bhatt & Pathak 1992; Kumar & Shukla 2002; Mulani et al. 2009; Khanna et al. 2013); it may also help to analyze the trend of water quality over a period of time and can identify how environmental impact and anthropogenic activities have affected the water quality for drinking or other water consumption.

The important water quality indices used worldwide include: the Weighted Arithmetic WQI Method (WAWQI) (Brown et al. 1972), Minimum Operator Index (MOI) (Smith 1990), National Sanitation Foundation WQI (NSF-WQI) (McClelland 1974), Canadian Council of Ministers of the Environment WQI (CCME-WQI) (of Ministers of the Environment 2001) and Oregon WQI (OWQI) (Cude 2001). Over many years, water quality has been assessed from different regions based on various techniques. Mostly, water samples are collected through GIS-based grab sampling, which is labor intensive and therefore consumes many resources. In 2006 (Lumb et al. 2006), CCME-WQI was computed for the years 1960–2002 for the Mackenzie River. The parameters used for the assessment include suspended solids, turbidity (Tur), trace metals and true color. The study revealed that the high presence of such trace metals had deteriorated the water quality. In 2006 (Davies 2006), CCME-WQI was calculated after collecting samples from six sites in central North America for the years 1969–2002, based on 17 parameters namely: chloride, total phosphorus, nitrate, ammonia, fecal coliform (FC) and dissolved oxygen (DO). This study reveals that CCME-WQI is not an ideal tool for water quality assessment as it is affected by the number of measurements, parameters and samples collected over a certain time duration.

In 2008, one paper (Qadir et al. 2008), assessed the Chenab river (Pakistan) which was monitored from September 2004 to April 2006. The samples were extracted for 24 physico-chemical parameters such as pH, DO, conductivity, total dissolved solids (TDS) from seven sites on a seasonal basis and were assessed using statistical techniques including hierarchical agglomerative cluster analysis or principal component analysis.

In 2009, one group (Kumar & Dua 2009), described the water quality of River Ravi (India), which was monitored for the months of January 2003 to December 2005. The water samples were collected using standard Century Water Analysis kit. The quality was assessed using the CCME-WQI method based on eight physico-chemical parameters including hardness, calcium, pH, TDS, DO, magnesium, alkalinity and conductivity. Their analysis shows that the water quality was directly proportional to the value of DO in water. In 2009, one study (Ramakrishnaiah et al. 2009), assessed the water quality for the pre-monsoon period after collecting samples from 269 locations in Tumkur Taluk (India) during February 2006. A weighted WQI method was used along with the Indian BIS standard to calculate the WQI based on 17 parameters: pH, conductivity, TDS, hardness, calcium, magnesium, sodium, bicarbonate, phosphate, nitrate, carbonate, chloride, sulphate, fluoride, potassium, iron and manganese. It was concluded that the water quality was excellent for this region due to the high values of iron, nitrate and chloride etc. In addition, the analysis showed a high correlation between magnesium and chloride. In 2009, one study (Rajankar et al. 2009), described 22 sites in Nagpur region (India) which were monitored for the years of 2005 and 2006 in the post monsoon, summer and winter seasons showing poor water quality for the region. The water samples were collected through grab sampling from tube and dug wells. There were nine parameters (pH, temperature (T), TDS etc.) that were used to calculate the WQI using the standard Q-value (NSF method). In another study in 2009 (De Rosemond et al. 2009), water quality data were extracted from 71 mining facilities in Canada for the year 2004. The National Environmental Effects Monitoring Office (NEEMO), Canada collected the physico-chemical parameters four times annually including DO, iron, lead, mercury, pH etc. and assessed the application of numeric water quality objectives including the freshwater aquatic life water quality guidelines (WQG) and Region-Specific Objectives (RSO) for the CCME-WQI. Their analysis showed that WQG is an effective tool but has a limited use in observing the spatial changes, whereas RSO is a much better option as the spatial changes can be easily evaluated.

In 2010 (Khan 2010), the water supply of Attock city was examined based on WQI after collecting 30 ground water samples from 30 sampling stations. Six physico-chemical parameters were extracted from the samples namely pH, TDS, DO, conductivity, sulphates and nitrates. The high concentration of nitrate ions made the quality of water unsuitable for human and animal consumption. In another study in 2010 (Vasanthavigar et al. 2010), 148 samples were examined that were collected from January to May 2008 from the Tamilnadu (India) region in the pre- and post-monsoon seasons. WQI was computed based on the Indian BIS standard combined with a weighted method using following parameters: calcium, magnesium, sodium, potassium, chlorine, pH, sulphate, conductivity and TDS. The analysis showed that the increase in anthropogenic activities during the post-monsoon season had led to an increase in magnesium, potassium and sulphates causing water quality deterioration. In 2010 (Das et al. 2010), WQI and Urbanization Index (UI) was calculated based on the All-India Public Health Engineering standard and also computed with the help of software. This WQI examined how rain water quality was affected by any constructional activities in six sampling stations in Kolkata (India). There were 36 samples collected in buckets and parameters such as TDS, pH, chloride, hardness, Tur, magnesium, calcium and T are used. According to the study, both WQI and UI were inversely proportional to one another, but their conclusion is based on very few samples.

In 2011 (Puri et al. 2011), water quality of Nagpur city (India) was studied for the months January to December 2008. The data were extracted from four permanent stations on a monthly basis and collected in sterilized glass bottles and afterwards assessed using NSF-WQI. The parameters monitored included: conductivity, chlorine, TDS, DO, hardness, biological oxygen demand (BOD), pH and FC. Results revealed that the lake water was unsafe for drinking and showed a medium to poor quality rating for all seasons except for during the monsoon. In 2011, one group (Sharma & Kansal 2011), analyzed the Yamuna river (India) using the CCME-WQI method. In the pre-monsoon and post-monsoon seasons, parameters such as pH, BOD, DO, FC and ammonia were extracted from four sites for the years 2000 to 2009. The river was highly polluted and belonged to the poor water quality class. In 2012 (Chowdhury et al. 2012), 34 water stations along the Faridpur-Barishal road in Bangladesh are monitored using data from March to April 2011. The water samples were collected through grab sampling. The six parameters including pH, TDS, DO, BOD, conductivity and T are used to assess the water quality based on the NSF-WQI and WAWQI methods. The results showed that BOD and DO were the most important parameters in determining the quality of water. In another study in 2012 (Srebotnjak et al. 2012), GEMS/Water program and the European Environment Agency (EEA) provided data to produce a composite index named Environmental Pollution Index (EPI). The data included 100 countries with 2 million samples of different lakes, rivers and reservoirs. The hot-deck imputation was applied for replacing missing values and has improved the results of the WQI. Their proposed index claimed to immediately identify the issues and problem affecting the quality of water like low presence of nitrogen or total phosphorus parameters.

In 2014 (Selvam et al. 2014), CCME-WQI was used to evaluate the water resources in Tuticorin (India). There were 14 physico-chemical parameters such as pH, conductivity, hardness, TDS, sulphate, phosphate examined from 72 samples collected from wells. The water quality was marked as fair in the pre-monsoon period and marked good in the post-monsoon period. Another study in 2014 (Liou et al. 2004), proposed a generalized WQI for Taiwan. The water samples were collected for the years 1994–2000 from 205 monitoring stations (21 main rivers). They used 12 parameters namely; DO, ammonia, nitrogen, Tur, FC, suspended solids, T, pH, cadmium, zinc, lead, copper and chromium. The proposed new index high- lighted the areas that were previously missed in the existing index and were affected by industrial activities in the Keya River. In a research carried out in 2014 (Nazeer et al. 2014), data for the Soan river (Pakistan) were examined through CCME-WQI method in the pre-monsoon (April to May) and post-monsoon (September to October) season of 2008. There were 18 samples collected through grab sampling and parameters such as pH, T, DO, TDS and conductivity were measured. The results were better in the pre-monsoon period but water was deemed unsuitable for human and animal consumption due to the high presence of nickel, lead and cadmium.

In 2016 (Ewaid 2017), Al-Gharraf river's quality was analyzed on a monthly basis from 10 stations for the year 2015. The NSF-WQI and Heavy Metal Pollution Index (HPI) were calculated based on 13 parameters namely; BOD, DO, Tur, nitrates, phosphates, T, pH and four heavy metals. The river quality is determined to be poor after analyzing the HPI due to the anthropogenic activities in the environment such as soil erosion, sewage discharge and other industrial activities. In 2018 (Bhatti et al. 2018), 29 water samples were collected from Nagarparkar, Pakistan using grab sampling. The quality was assessed using WAWQI based on 18 physico-chemical parameters including; pH, conductivity, TDS, Tur, alkalinity, hardness, chloride, DO, sulphate, calcium, magnesium, iron, cadmium, nickel, copper, manganese, arsenic and fluoride. The study revealed that only 35% of the parameters were within the defined WHO limits, while the remaining 65% were beyond the defined range for good water quality. In another study in 2018 (Wu et al. 2018), 96 sites in Lake Taihu were examined using samples from September 2014 to January 2016 on a seasonal basis. The Pesce & Wunderlin (2000) WQI and WQI min methods were used to assess the quality based on 13 physico-chemical parameters. The quality was high in autumn but the lowest in the winter season.

In 2019 (Golbaz et al. 2019), a unique swimming pool WQI (SPWQI) was developed for monitoring the quality of swimming pools based on 13 physico-chemical and biological parameters. The SPWQI is a modified version of the WAWQI method. This index helped in managing and treating the water quality. In another study in 2019 (Gupta et al. 2019), artificial neural networks (ANN) were used to develop a universal WQI based on the WHO parameters. The study revealed that ANN based on cascade forward architecture was successful for predicting the WQI using five physico-chemical parameters such as Tur, pH, conductivity, DO and FC. However, the limitation of ANN method was that it can vary with the change in parameters and therefore needs to be further worked on to get the desired results. In 2019 (Abbasnia et al. 2019), the quality of 654 dug wells in Sistan and Baluchistan, Iran was studied using the WAWQI method. Overall, the drinking quality of the dug wells was categorized as excellent and good.

In 2020 (Karunanidhi et al. 2020), 61 samples containing eight physico-chemical parameters such as calcium, sodium, sulphate, fluoride were collected through grab sampling from the Shanmuganadhi River basin, India using the WAWQI method. The WQI results showed that 52% of the samples were unfit for water consumption whereas 48% were classified as good. The researchers suggested to reduce the fluoride concentration, the groundwater samples should be treated or recharged using artificial methods for drinking water. In another study in 2020 (Chabuk et al. 2020), data for the water of Tirgis River, Iraq was assessed for wet and dry seasons in 2016 by collecting 12 parameters from 11 locations using the WAWQI method and GIS software. The results showed that the parameter concentrations were higher in the dry season compared with the wet season except for potassium, conductivity, TDS and bicarbonate. The computed WQI showed that the river had poor water quality due to human activities surrounding the river. The study also revealed that the application of WQI was only effective after the water was treated due to the high parameter concentrations that could be present in raw water. In 2020 (Ustaoğlu et al. 2020), the quality of the Turnasuyu Basin, Turkey was evaluated using the WAWQI method and data for the period February 2017 to January 2018. The water quality of the basin was deemed as good for public use. The physico-chemical parameters observed for the year did not exceed the permissible WHO limits. However, anthropogenic activities may impact the quality downstream of the basin. In 2020 (Seifi et al. 2020), the researchers altered the WAWQI index by introducing a Monte-Carlo simulation for weight allocation. The quality of the Kerman aquifer, Iran was observed by collecting 1189 samples during dry and wet seasons. The water quality of the aquifer was considered to be poor based on the computed WQI. The findings revealed that the Monte-Carlo method was useful for the WQI evaluation.

The review of past research has shown that mostly the CCME-WQI and the WAWQI methods were used to evaluate the quality of water. The most common parameters used to build these indices comprise the physical, chemical and biological water parameters. However, there was a certain amount of uncertainty present with the application of these water quality indices in that they are unpredictable in complex environmental situations (Silvert 2000). These indices are mostly biased, as they use a limited number of parameters and are developed for a specific place. These uncertainties were associated with the development and evaluation of the WQI. For instance, the quality of water may vary between two certain points of a lake at a specific time of the day. The physico-chemical properties can change from dawn to dusk in a single day because of the dynamic nature of the water bodies (Khan & Abbasi 1998). Therefore, there are some reasons why most of these indices fail to accurately classify water quality: (1) firstly, the sensitivity to the type of predefined parameters used for development of each standard, (2) the incorporation of a limited set of variables or parameters, and (3) the weight allocation to each parameter. The high concentration of a single parameter can increase the WQI value, which can manipulate the class or category of water quality. Therefore, no index has been universally accepted and there is a need to evaluate and perform a comparative analysis of these indices that can eliminate the uncertainties and biases in these standards.

In this study, five water quality indices were applied on the two datasets; (1) Dataset 1: the water samples collected using IoT sensors from selected locations at Rawal Dam and (2) Dataset 2: the data provided by the Rawal Dam Water Filtration Plant using GIS-based grab sampling. In addition, classification was performed on the two indices (WAWQI and OWQI) calculated by applying five machine learning algorithms that include Naive Bayes (NB), Multilayer Perceptron (MLP), Logistic Regression (LogR), k-Nearest Neighbor (KNN) and Decision Tree (DT).

The study area for the subject study was the Rawal Dam (Ali et al. 2013), which is located in the capital city of Islamabad within an isolated section of the Margalla Hills National Park at a longitude of 73°7′E, latitude of 33° 42′N and altitude of 1,800 m. The dam has a height of 133.5 ft, a depth of 102 ft and a surface area of 8.8 km2. The dam is 213 m long and 33.5 m high. The catchment area is 106.25 square miles with three zones namely Kurang, Nurpur village and Shahdara. Kurang river is the outlet stream of the dam. Four major streams and 43 small streams runoff into the Rawal Dam lake. However, during the rainy season, polluted runoff water, local spring discharges and untreated sewage water fall occasionally in the Kurang River and the streams. The reservoir has a maximum capacity of 58,581,810 m3. The citizens of both Rawalpindi and Islamabad receive about 22 million gallons per day of water from this dam. The lake has 15 different types of fish species including Rahu, Doula, Tilapia, Thaila, Carp fish and Mori. Bhara Kahu, Bani Gala, Malpur and Noorpur Shahan are located in the catchment area of the Rawal Dam, and are highly populated. With the development of housing colonies in the catchment area of Rawal Lake, the quality of water is deteriorating due to solid waste disposal and untreated sewage in the tributaries. Moreover, another pollution factor is the disposal of poultry waste, as the catchment area covers over 360 poultry sheds. Tourist attractions such as Murree Hills and Chattar Park are also located in the catchment area leading to another source of water pollution.

The dam was been selected as the subject study because we had easy access to the data as the dam provides the water supply to the city of Rawalpindi and Islamabad. The provision of data from government bodies usually has many administrative hindrances, therefore this makes the data invaluable. Moreover, the Rawal Dam is a rain-fed area that is interesting to explore due to the change in climatic factors.

Sample analysis

Figure 1 shows the streams and the experimental points selected for data collection. The physical parameters observed for the lake include ‘T’, ‘Tur’, ‘pH’, ‘DO’, ‘conductivity’ and ‘FC’. For Dataset 1, the T of the lake varied from 29 °C in June to 18.75 °C in December. Tur varied from 30 NTU in June to 429 NTU in December. pH varied from 1.74 in June to 6.23 in December. DO varied from 1.95 mg/L in June to 1.46 mg/L in December. Conductivity varied from 795.65 μs/cm in June to 30,803 μs/cm in December. For Dataset 2, the T of the lake varied from 23 °C in 2012 to 24 °C in 2019. Tur varied from 18 NTU in 2012 to 22 NTU in 2019. pH varied from 7.19 in 2012 to 7.25 in 2019. DO varied from 3.5 mg/L in 2012 to 6.6 mg/L in 2019. Conductivity varied from 736 μs/cm in 2012 to 520 μs/cm in 2019. FC varied from 170 colonies/100 ml in 2012 to 140 colonies/100 ml in 2019.

Figure 1

Location of the water sampling sites in the study area.

Figure 1

Location of the water sampling sites in the study area.

Close modal

Dataset collection

Dataset 1

The data collected from Rawal Dam from June to December of the year 2019 were named Dataset 1. The data were collected in real time using the Internet of Things (IoT) sensors (See Figure 1) that are deployed over identified stations at inlet and outlet streams of Rawal Lake. The data were transmitted to the local server using GSM technology for further preprocessing and analysis.

Five parameters were recorded namely: ‘T’, ‘Tur’, ‘pH’, ‘DO’ and ‘conductivity’. The dataset had 5672 instances. Figure 2(a) shows the change in concentrations of the parameters collected over time. The initial version of the dataset can be seen on Kaggle (https://www.kaggle.com/mahmedphdcs17seecs/rawal-dam-water-quality-dataset-2019).

Figure 2

Variation in the concentrations of parameters for Datasets 1 and 2. (a) June to December 2019 (Dataset 1), (b) 2012–2019 (Dataset 2).

Figure 2

Variation in the concentrations of parameters for Datasets 1 and 2. (a) June to December 2019 (Dataset 1), (b) 2012–2019 (Dataset 2).

Close modal

Dataset 2

Dataset 2 contains 1114 samples collected from years 2013 to 2018 through GIS-based grab sampling. This dataset was made up of six parameters namely ‘T’, ‘Tur’, ‘pH’, ‘DO’, ‘conductivity’ and ‘FC’. The dataset was provided by the Rawal Dam Water Filtration Plant. Figure 2(b) shows the variation of the parameters over the years in Dataset 2.

Preprocessing of dataset

The samples in both the datasets needed to be converted to a WQI value that can categorize them as best or worst, depending on the index method used. Table 1 displays the WQI values and their respective classifications. Overall, the parameters in Dataset 1 do not show a positive correlation except for ‘conductivity with Tur’ and ‘conductivity with pH’ as seen in Figure 3. The parameters in Dataset 2 do not show any positive correlations as seen in Figure 3.

Table 1

Classification of WQI values for five indices are represented as Excellent (E), Good (G), Fair (F), Poor (P), Very Poor (VP), Unfit for Drinking (U), Medium (M), Bad (B), Very Bad (VB), Marginal (Ml), Eminently suitable for all uses (ES), Suitable for all uses (S), Main use may be compromised (C), Unsuitable for several uses (Uns), Totally unsuitable for many uses (TU) (Brown et al. 1972; Smith 1990; McClelland 1974; Canadian Council of Ministers of the Environment 2001; Cude 2001)

IndexNo. of parametersWQI valueRating classClass no.
  0–25 
  25–50 
WAWQI 10 51–75 
  76–100 
  101–150 VP 
  Above 150 
  90–100 
  70–90 
NSF- 50–70 
WQI     
  25–50 
  0–25 VB 
  95.0–100.0 
  80.0–94.9 
CCME- Up to 47 65.0–79.9 
WQI     
  45.0–64.9 Ml 
  0.0–44.9 
  90–100 
  85–89 
OWQI 80–84 
  60–79 
  less than 60 VP 
  80–100 ES 
  60–79 
MOI 40–59 
  20–39 Uns 
  0–19 TU 
IndexNo. of parametersWQI valueRating classClass no.
  0–25 
  25–50 
WAWQI 10 51–75 
  76–100 
  101–150 VP 
  Above 150 
  90–100 
  70–90 
NSF- 50–70 
WQI     
  25–50 
  0–25 VB 
  95.0–100.0 
  80.0–94.9 
CCME- Up to 47 65.0–79.9 
WQI     
  45.0–64.9 Ml 
  0.0–44.9 
  90–100 
  85–89 
OWQI 80–84 
  60–79 
  less than 60 VP 
  80–100 ES 
  60–79 
MOI 40–59 
  20–39 Uns 
  0–19 TU 
Figure 3

Correlation among the water quality parameters in Dataset 1 and Dataset 2. Yellow and green colors represent a high correlation whereas blue represents low correlation.

Figure 3

Correlation among the water quality parameters in Dataset 1 and Dataset 2. Yellow and green colors represent a high correlation whereas blue represents low correlation.

Close modal

For applying different indices on the datasets, the SI or quality rating (q) was calculated based on the values of the physico-chemical parameters. The formulae of these indices are mentioned in detail below.

WAWQI method

In 1972, the WQI was calculated using a weighted arithmetic method (Brown et al. 1972). WAWQI is calculated with the equation mentioned in Equation (1):
(1)
where,
  • n = the number of parameters, here the value is 5 (for Dataset 1) and 6 (for Dataset 2),

  • qn = quality rating of the nth parameter given in Equation (2),

  • wn = unit weight of the nth parameter given in Equation (3).
    (2)
    where,
  • Sn = standard value of nth water quality parameter,

  • Vn = observed value of nth water quality parameter,

  • Vid = ideal value of nth water quality parameter.
    (3)
    where,
  • k = proportionality constant given in Equation (4).
    (4)

Table 2 shows the values computed by applying the formulae given in Equations (1)–(4).

Table 2

Water quality parameters and their corresponding values calculated using WAWQI

ParametersSnIdeal value (Vid)kUnit weight (wn)
Tur 2.3801 0.476023801 
pH 8.5 2.3801 0.280014001 
DO 15 14.6 2.3801 0.1586746 
Conductivity 400 2.3801 0.005950298 
T 30 2.3801 0.0793373 
FC 0.99 0.699 0.7062 
ParametersSnIdeal value (Vid)kUnit weight (wn)
Tur 2.3801 0.476023801 
pH 8.5 2.3801 0.280014001 
DO 15 14.6 2.3801 0.1586746 
Conductivity 400 2.3801 0.005950298 
T 30 2.3801 0.0793373 
FC 0.99 0.699 0.7062 

CCME-WQI method

In 2005, the CCME-WQI (of Ministers of the Environment 2001) an objective-based index was introduced that was adaptable to the site specificity. CCME-WQI ranges from 0 to 100, where 100 refers to excellent quality of water. CCME is based on three terms (1) scope F1, (2) frequency F2 and (3) amplitude F3. F1 is the number of parameters that do not fall under the water quality guidelines. F2 is the number of times the guidelines are not followed and F3 is the difference between the measurements and the guidelines. The term was divided by 1.732 to reduce the value to 100. CCME-WQI was calculated using the Equation (5). F1, F2 and F3 are given in Equations (6)–(8) and displayed in Table 3:
(5)
where,
(6)
(7)
(8)
where,
  • nse = normalized sum of excursions given in Equation (9):
    (9)
where, excursion is given in Equations (10) and (11).
When the test value must not exceed the objective:
(10)
when the test value must not be less than the objective:
(11)
Table 3

Water quality parameters and their corresponding values calculated using CCME-WQI for Dataset 1 and Dataset 2

DataScope (F1)Frequency (F2)Normalized sum of excursions (NSE)Amplitude (F3)CCME valueCCME-WQI rating
Dataset 1 80 40.13 15.745 94.03 25.05 Poor (0–44.9) (see Table 1
Dataset 2 66.67 45.18 18.21 94.79 28.18 Poor (0–44.9) (see Table 1
DataScope (F1)Frequency (F2)Normalized sum of excursions (NSE)Amplitude (F3)CCME valueCCME-WQI rating
Dataset 1 80 40.13 15.745 94.03 25.05 Poor (0–44.9) (see Table 1
Dataset 2 66.67 45.18 18.21 94.79 28.18 Poor (0–44.9) (see Table 1

NSF-WQI method

In 1970, 94 experts in water quality came up with a uniform method to measure and report the water quality based on nine physical, chemical and biological parameters namely; DO, FC, pH, BOD, nitrates, phosphates, T, Tur and total solids (TS) (McClelland 1974). The mathematical equation to calculate NSF-WQI is described in Equation (12):
(12)
where,
  • wn = unit weight of the nth parameter,

  • qn = quality rating of the nth parameter.

Table 4 shows the weightages updated accordingly due to the number of parameters used for the current study. The weightages are updated with respect to the ratio of the presently used NSF weightages.

Table 4

Water quality parameters and their corresponding weights calculated using NSF-WQI.

ParametersNSF-WQI weightagesNew weightages (Dataset 1)New weightages (Dataset 2)
DO 0.17 0.34 0.34 
pH 0.11 0.22 0.16 
0.1 0.20 0.1 
Tur 0.08 0.16 0.08 
Conductivity – 0.08 
FC 0.16 – 0.32 
ParametersNSF-WQI weightagesNew weightages (Dataset 1)New weightages (Dataset 2)
DO 0.17 0.34 0.34 
pH 0.11 0.22 0.16 
0.1 0.20 0.1 
Tur 0.08 0.16 0.08 
Conductivity – 0.08 
FC 0.16 – 0.32 

OWQI method

OWQI was developed in 1970 and based on parameters namely: DO, total phosphorus, T, pH, TS, FC, BOD and ammonium nitrate. OWQI was originally created for evaluating the streams of Oregon for general recreational uses. The equation for OWQI can be seen in Equation (13):
(13)
Here,
  • n = number of parameters,

  • SIi = SI is the sub-index for the nth parameter given in Table 5.

Table 5

SI calculation for T, pH, DO and FC in OWQI

ParametersSub-index calculation
T11C 11C<T29C 29C<T  
SIT= 100 SIT=76.54+4.172*T0.1623*T2 −2.0557E3*T3 SIT= 10  
DO DO3.3 mg/L 3.3 <DO<10.5 mg/L 10.5 mg/LDO  
SIDO= 10 SIDO=80.29 + 31.88 ∗ DO − 1.401 ∗ DO2 SIDO= 100  
pH (pH<4) || (11 <pH4pH<7 7pH8 8<pH11 
SIpH= 10 SIpH= 2.628 ∗ exp (pH ∗ 0.5200) SIpH= 100 SIpH= 100* exp ((pH − 8)* 0.5188) 
FC FC 50/100 mL 50/100 mL< FC ≤1600/100 mL 1600/100 mL < FC  
SIFC= 98 SIFC= 98 ∗ (exp ((FC − 50) ∗ −9.9178E − 4) SIFC= 10  
ParametersSub-index calculation
T11C 11C<T29C 29C<T  
SIT= 100 SIT=76.54+4.172*T0.1623*T2 −2.0557E3*T3 SIT= 10  
DO DO3.3 mg/L 3.3 <DO<10.5 mg/L 10.5 mg/LDO  
SIDO= 10 SIDO=80.29 + 31.88 ∗ DO − 1.401 ∗ DO2 SIDO= 100  
pH (pH<4) || (11 <pH4pH<7 7pH8 8<pH11 
SIpH= 10 SIpH= 2.628 ∗ exp (pH ∗ 0.5200) SIpH= 100 SIpH= 100* exp ((pH − 8)* 0.5188) 
FC FC 50/100 mL 50/100 mL< FC ≤1600/100 mL 1600/100 mL < FC  
SIFC= 98 SIFC= 98 ∗ (exp ((FC − 50) ∗ −9.9178E − 4) SIFC= 10  

OWQI was calculated for both datasets but with conductivity and Tur parameters excluded, as OQWI does not include an SI range for these parameters. Table 5 shows the SI formulae for T, DO, pH and FC. Table 6 shows the top 20 samples with SI calculations for Datasets 1 and 2.

Table 6

Top 20 samples of water quality parameters and their corresponding SI values calculated using OWQI for Datasets 1 and 2

Dataset 1
TSITDOSIDOpHSIpHOWQI value
29.67 10 9.31 95.079 2.31 10   12.213 
29.67 10 9.31 95.079 2.31 10   12.213 
29.48 10 10.1 98.781 2.35 10   12.216 
29.48 10 10.1 98.781 2.35 10   12.216 
29.67 10 9.05 93.478 2.42 10   12.212 
29.67 10 9.05 93.478 2.42 10   12.212 
29.67 10 9.24 94.667 2.35 10   12.213 
29.67 10 9.24 94.667 2.35 10   12.213 
29.48 10 9.39 95.534 1.74 10   12.214 
29.48 10 9.39 95.534 1.74 10   12.214 
29.48 10 7.67 81.810 6.91 95.528   17.1 
29.48 10 7.67 81.810 6.91 95.528   17.1 
29.48 10 8.62 90.415 6.64 83.015   17.093 
29.48 10 8.62 90.415 6.64 83.015   17.093 
29.39 10 9.62 96.740 5.58 47.838   16.867 
29.39 10 9.62 96.740 5.58 47.838   16.867 
29.48 10 9.36 95.365 6.34 71.024   17.059 
29.48 10 9.36 95.365 6.34 71.024   17.059 
29.48 10 9.31 95.079 6.64 83.015   17.103 
29.48 10 9.31 95.079 6.64 83.015   17.103 
Dataset 2
TSITDOSIDOpHSIpHFCSIFCOWQI value
23 −1158.62 3.5 14.13 7.19 100.00 170 87.00 27.62 
16 −317.47 3.8 20.62 7.45 100.00 53 97.71 39.48 
16 −317.47 2.5 10.00 7.99 100.00 63 96.74 19.79 
16 −317.47 2.3 10.00 8.05 97.44 55 97.52 19.78 
10 −0.32 4.6 36.71 8.18 91.08 57 97.32 0.63 
10 −0.32 4.2 28.89 8.41 80.84 57 97.32 0.63 
12 −73.62 3.6 16.32 8.6 73.25 40 98.00 30.75 
13 −121.51 3.1 10.00 7.99 100.00 70 96.08 19.73 
12 −73.62 3.6 16.32 8.3 85.59 45 98.00 30.94 
12 −73.62 3.9 22.73 8.5 77.15 55 97.52 40.89 
14 −177.70 3.4 11.91 8.18 91.08 28 98.00 23.39 
13 −121.51 2.4 10.00 8.34 83.83 46 98.00 19.69 
14 −177.70 2.4 10.00 8.4 81.26 44 98.00 19.72 
14 −177.70 10.00 8.3 85.59 60 97.03 19.73 
14 −177.70 10.00 8.44 79.59 50 98.00 19.71 
12 −73.62 2.4 10.00 8.44 79.59 40 98.00 19.57 
21 −855.26 3.6 16.32 8.13 93.48 60 97.03 31.72 
13 −121.51 3.6 16.32 8.8 66.03 45 98.00 31.03 
14 −177.70 3.6 16.32 8.42 80.42 53 97.71 31.45 
13 −121.51 5.5 52.67 8.4 81.26 95 93.72 75.95 
Dataset 1
TSITDOSIDOpHSIpHOWQI value
29.67 10 9.31 95.079 2.31 10   12.213 
29.67 10 9.31 95.079 2.31 10   12.213 
29.48 10 10.1 98.781 2.35 10   12.216 
29.48 10 10.1 98.781 2.35 10   12.216 
29.67 10 9.05 93.478 2.42 10   12.212 
29.67 10 9.05 93.478 2.42 10   12.212 
29.67 10 9.24 94.667 2.35 10   12.213 
29.67 10 9.24 94.667 2.35 10   12.213 
29.48 10 9.39 95.534 1.74 10   12.214 
29.48 10 9.39 95.534 1.74 10   12.214 
29.48 10 7.67 81.810 6.91 95.528   17.1 
29.48 10 7.67 81.810 6.91 95.528   17.1 
29.48 10 8.62 90.415 6.64 83.015   17.093 
29.48 10 8.62 90.415 6.64 83.015   17.093 
29.39 10 9.62 96.740 5.58 47.838   16.867 
29.39 10 9.62 96.740 5.58 47.838   16.867 
29.48 10 9.36 95.365 6.34 71.024   17.059 
29.48 10 9.36 95.365 6.34 71.024   17.059 
29.48 10 9.31 95.079 6.64 83.015   17.103 
29.48 10 9.31 95.079 6.64 83.015   17.103 
Dataset 2
TSITDOSIDOpHSIpHFCSIFCOWQI value
23 −1158.62 3.5 14.13 7.19 100.00 170 87.00 27.62 
16 −317.47 3.8 20.62 7.45 100.00 53 97.71 39.48 
16 −317.47 2.5 10.00 7.99 100.00 63 96.74 19.79 
16 −317.47 2.3 10.00 8.05 97.44 55 97.52 19.78 
10 −0.32 4.6 36.71 8.18 91.08 57 97.32 0.63 
10 −0.32 4.2 28.89 8.41 80.84 57 97.32 0.63 
12 −73.62 3.6 16.32 8.6 73.25 40 98.00 30.75 
13 −121.51 3.1 10.00 7.99 100.00 70 96.08 19.73 
12 −73.62 3.6 16.32 8.3 85.59 45 98.00 30.94 
12 −73.62 3.9 22.73 8.5 77.15 55 97.52 40.89 
14 −177.70 3.4 11.91 8.18 91.08 28 98.00 23.39 
13 −121.51 2.4 10.00 8.34 83.83 46 98.00 19.69 
14 −177.70 2.4 10.00 8.4 81.26 44 98.00 19.72 
14 −177.70 10.00 8.3 85.59 60 97.03 19.73 
14 −177.70 10.00 8.44 79.59 50 98.00 19.71 
12 −73.62 2.4 10.00 8.44 79.59 40 98.00 19.57 
21 −855.26 3.6 16.32 8.13 93.48 60 97.03 31.72 
13 −121.51 3.6 16.32 8.8 66.03 45 98.00 31.03 
14 −177.70 3.6 16.32 8.42 80.42 53 97.71 31.45 
13 −121.51 5.5 52.67 8.4 81.26 95 93.72 75.95 

MOI method

The MOI was derived based on the fact that different applications require different quality of water and therefore the parameters used for evaluating the quality may vary depending on the usage, such as general purpose, bathing, water supply or fish spawning. The sub-indices were calculated using rating curve graphs given in Cude (2001). Equation (14) shows the MOI method where the minimum SI for the parameter is considered as the WQI value:
(14)

Here,

  • n = number of parameters,

  • SI = SI is the sub-index for the nth parameter.

To apply the index on the current samples, parameters including T, pH, DO, FC and Tur were chosen, as this method does not have a rating curve SI for the conductivity parameter. Table 7 shows the top 20 samples with SI calculations for Datasets 1 and 2.

Table 7

Top 20 samples of water quality parameters and their corresponding SI values calculated using MOI for Datasets 1 and 2

Dataset 1
TSITDOSIDOpHSIpHTurSITurMOI value
29.67 92 9.31 105 2.31 23 201.35 940   23 
29.67 92 9.31 105 2.31 23 201.35 940   23 
29.48 92 10.1 114 2.35 23 126.68 591   23 
29.48 92 10.1 114 2.35 23 126.68 591   23 
29.67 92 9.05 102 2.42 24 107.67 502   24 
29.67 92 9.05 102 2.42 24 107.67 502   24 
29.67 92 9.24 104 2.35 23 238.68 1114   23 
29.67 92 9.24 104 2.35 23 238.68 1114   23 
29.48 92 9.39 106 1.74 17 204.74 955   17 
29.48 92 9.39 106 1.74 17 204.74 955   17 
29.48 92 7.67 87 6.91 67 33 154   67 
29.48 92 7.67 87 6.91 67 33 154   67 
29.48 92 8.62 97 6.64 65 34.35 160   65 
29.48 92 8.62 97 6.64 65 34.35 160   65 
29.39 91 9.62 109 5.58 54 45.89 214   54 
29.39 91 9.62 109 5.58 54 45.89 214   54 
29.48 92 9.36 106 6.34 62 48.61 227   62 
29.48 92 9.36 106 6.34 62 48.61 227   62 
29.48 92 9.31 105 6.64 65 52 243   65 
29.48 92 9.31 105 6.64 65 52 243   65 
Dataset 2
TSITDOSIDOpHSIpHTurSITurFCSIFCMOI value
23 71.45 3.5 39.55 7.19 70.21 18 84.00 170 1.53 1.53 
16 49.70 3.8 42.94 7.45 72.75 42.15 196.70 53 0.48 0.48 
16 49.70 2.5 28.25 7.99 78.02 46.7 217.93 63 0.57 0.57 
16 49.70 2.3 25.99 8.05 78.61 47.15 220.03 55 0.50 0.50 
10 31.06 4.6 51.97 8.18 79.88 22 102.67 57 0.51 0.51 
10 31.06 4.2 47.45 8.41 82.12 28 130.67 57 0.51 0.51 
12 37.28 3.6 40.68 8.6 83.98 34.8 162.40 40 0.36 0.36 
13 40.38 3.1 35.03 7.99 78.02 30.2 140.93 70 0.63 0.63 
12 37.28 3.6 40.68 8.3 81.05 32.7 152.60 45 0.41 0.41 
12 37.28 3.9 44.06 8.5 83.00 34.7 161.93 55 0.50 0.50 
14 43.49 3.4 38.42 8.18 79.88 50.2 234.27 28 0.25 0.25 
13 40.38 2.4 27.12 8.34 81.44 49.3 230.07 46 0.41 0.41 
14 43.49 2.4 27.12 8.4 82.02 65 303.33 44 0.40 0.40 
14 43.49 33.90 8.3 81.05 56 261.33 60 0.54 0.54 
14 43.49 33.90 8.44 82.41 60 280.00 50 0.45 0.45 
12 37.28 2.4 27.12 8.44 82.41 66 308.00 40 0.36 0.36 
21 65.23 3.6 40.68 8.13 79.39 13 60.67 60 0.54 0.54 
13 40.38 3.6 40.68 8.8 85.93 34 158.67 45 0.41 0.41 
14 43.49 3.6 40.68 8.42 82.22 30.35 141.63 53 0.48 0.48 
13 40.38 5.5 62.14 8.4 82.02 330 1540.00 95 0.86 0.86 
Dataset 1
TSITDOSIDOpHSIpHTurSITurMOI value
29.67 92 9.31 105 2.31 23 201.35 940   23 
29.67 92 9.31 105 2.31 23 201.35 940   23 
29.48 92 10.1 114 2.35 23 126.68 591   23 
29.48 92 10.1 114 2.35 23 126.68 591   23 
29.67 92 9.05 102 2.42 24 107.67 502   24 
29.67 92 9.05 102 2.42 24 107.67 502   24 
29.67 92 9.24 104 2.35 23 238.68 1114   23 
29.67 92 9.24 104 2.35 23 238.68 1114   23 
29.48 92 9.39 106 1.74 17 204.74 955   17 
29.48 92 9.39 106 1.74 17 204.74 955   17 
29.48 92 7.67 87 6.91 67 33 154   67 
29.48 92 7.67 87 6.91 67 33 154   67 
29.48 92 8.62 97 6.64 65 34.35 160   65 
29.48 92 8.62 97 6.64 65 34.35 160   65 
29.39 91 9.62 109 5.58 54 45.89 214   54 
29.39 91 9.62 109 5.58 54 45.89 214   54 
29.48 92 9.36 106 6.34 62 48.61 227   62 
29.48 92 9.36 106 6.34 62 48.61 227   62 
29.48 92 9.31 105 6.64 65 52 243   65 
29.48 92 9.31 105 6.64 65 52 243   65 
Dataset 2
TSITDOSIDOpHSIpHTurSITurFCSIFCMOI value
23 71.45 3.5 39.55 7.19 70.21 18 84.00 170 1.53 1.53 
16 49.70 3.8 42.94 7.45 72.75 42.15 196.70 53 0.48 0.48 
16 49.70 2.5 28.25 7.99 78.02 46.7 217.93 63 0.57 0.57 
16 49.70 2.3 25.99 8.05 78.61 47.15 220.03 55 0.50 0.50 
10 31.06 4.6 51.97 8.18 79.88 22 102.67 57 0.51 0.51 
10 31.06 4.2 47.45 8.41 82.12 28 130.67 57 0.51 0.51 
12 37.28 3.6 40.68 8.6 83.98 34.8 162.40 40 0.36 0.36 
13 40.38 3.1 35.03 7.99 78.02 30.2 140.93 70 0.63 0.63 
12 37.28 3.6 40.68 8.3 81.05 32.7 152.60 45 0.41 0.41 
12 37.28 3.9 44.06 8.5 83.00 34.7 161.93 55 0.50 0.50 
14 43.49 3.4 38.42 8.18 79.88 50.2 234.27 28 0.25 0.25 
13 40.38 2.4 27.12 8.34 81.44 49.3 230.07 46 0.41 0.41 
14 43.49 2.4 27.12 8.4 82.02 65 303.33 44 0.40 0.40 
14 43.49 33.90 8.3 81.05 56 261.33 60 0.54 0.54 
14 43.49 33.90 8.44 82.41 60 280.00 50 0.45 0.45 
12 37.28 2.4 27.12 8.44 82.41 66 308.00 40 0.36 0.36 
21 65.23 3.6 40.68 8.13 79.39 13 60.67 60 0.54 0.54 
13 40.38 3.6 40.68 8.8 85.93 34 158.67 45 0.41 0.41 
14 43.49 3.6 40.68 8.42 82.22 30.35 141.63 53 0.48 0.48 
13 40.38 5.5 62.14 8.4 82.02 330 1540.00 95 0.86 0.86 

Machine learning techniques

Five machine learning algorithms were applied in this study. The reason for selecting these five algorithms is that these are easier to use or understand and computationally more efficient. These algorithms are widely tested on small to medium sized datasets and have proved to give satisfactory performance in a minimal amount of time (Ashari et al. 2013)

k-Nearest Neighbors

k-Nearest Neighbor classifiers were introduced by Fix and Hodges in 1951 (Fix & Hodges 1952). KNN (Dasarathy 1991) is a type of instance-based learner that classifies the output by taking a majority vote of its neighbors. An instance-based learner (Aha et al. 1991) classifies an instance by comparing it to a collection of pre-classified samples. The distance function determines how similar two instances are and the classification function specifies how the similarities between two instances can get a final classification for a new instance. The class that is dominant among the k-nearest neighbors predictions is selected as the classification for the new instance. During classification, an unknown instance or tuple is classified by searching for the pattern space for the K training tuples that are closest to that unknown instance. This closeness is defined by a distance metric such as the Euclidean distance. This distance between two points X and Y is given in Equation (15):
(15)

Logistic Regression

LogR (Cox & Snell 1989) is a popular statistics-based classification technique. It is similar to the linear regression classifier but suitable for binary dependable attributes (such as 1 or 0 and yes or no) and not for continuous attributes. The response function makes sure that the value of the dependable variables falls between zero and one. The value predicted is the probability of an event in the range zero and one. Maximum likelihood (ML) is mostly chosen as the method for parameter estimation in LogR (King & Zeng 2001). The logistic function is given in Equation (16):
(16)

Naive Bayes

NB is a simple probabilistic classifier that falls under the Bayes Theorem (Friedman et al. 1997). It is very computationally efficient and assumes that the features are independent. It evaluates the relationship between the class and the feature for every instance and calculates the conditional probability of this relationship (Domingos & Pazzani 1997). Let D be the training set of tuples with their associated labels. Each tuple is represented by a set of attributes X={x1, x2, x3, ··· xn} with n attributes and C be the particular class of an instance or data sample X. Suppose H is the hypothesis that a tuple X belongs to a class C. In classification, the probability P(C|X) of a sample is determined; that the hypothesis holds for an observed tuple X. This is the posterior probability, which determines if an instance X belongs to a class C, given the description of the attributes is known. Whereas, prior probabilityP(C) is the number of times it occurs in a dataset. The probability that an instance X belongs to a class C can be computed by the following Bayes formula given in Equation (17):
(17)

Multilayer Perceptron

ANN (Fausett 2006) has a multilayer architecture in which neurons are connected to each other with a set of links called the synapses. Each link has a synaptic weight. The neurons are placed in the layers of the network and work in parallel. The first layer in the network is the input layer. The input nodes at this layer are simply the unprocessed information that enters the network. The input layer does not perform any computations. Then we have the hidden layer. A network can have many or zero hidden layers. The hidden layer is responsible for increasing the performance of the network. The last layer is the output layer. The output layer performs calculations that give the output for the whole network. The behavior of the output layer depends on the activity of the hidden layers.

Decision Tree

DT is a hierarchical structure of the decisions and their outcomes. It is used to identify the path for reaching a specific goal. A predefined class is provided to classify an instance by DT. DT is very popular because of its simplicity. It is made up of nodes and edges. The node with no incoming edges is called the ‘root’. The node with outgoing edges is called the ‘internal node’. All other nodes are called ‘leaves’. DT splits the internal node according to the value of a single attribute (Maimon & Rokach 2014).

In this section, the results of using five different indices that have been applied on the two datasets are discussed first and later the results of the classification of the water quality using machine learning algorithms are analyzed in detail.

Analysis of water quality indices

All indices generally showed that the Dataset 1 has ‘Poor’, ‘Unsuitable’ or ‘Unfit for Drinking’ water quality status. Figure 4 shows the water quality indices calculated using the five methods for Dataset 1. Here, the five months of 2019 and the respective count of water quality classes for each sample in these months are displayed. It can be seen that the WAWQI calculated for the month of October shows that the water quality mostly lies in ‘Excellent’ class while quality is ‘Unfit’ for the months of November and December. The CCME-WQI for Dataset 1 lies in the ‘Poor’ category for all five months. For NSF-WQI, the water quality varies from ‘Poor’ to ‘Excellent’ in the month of December while in other months the water quality remains ‘Poor’. OWQI calculated for the Dataset 1 classified all the water samples collected as ‘Very Poor’. This result shows that, like other indices, the outcome is mostly the same. However, to calculate these indices, some parameters had to be excluded, for example to calculate the OWQI, two parameters namely ‘conductivity’ and ‘Tur’ were ignored as the OWQI has no SI range for these parameters as seen in Table 5. Using only four parameters may have an impact on the information obtained from this index. For MOI, the samples were mostly categorized as ‘Totally Unsuitable’ for the months of October to December.

Figure 4

WQI of Dataset 1 using WAWQI, CCME-WQI, NSF-WQI, OWQI, MOI methods.

Figure 4

WQI of Dataset 1 using WAWQI, CCME-WQI, NSF-WQI, OWQI, MOI methods.

Close modal

The water quality for Dataset 2 can be seen in Figure 5. Here, the WAWQI, categorized the samples as ‘Unfit’ for all the years. The problem identified for the WAWQI is that the WQI values are affected by the addition of the ‘FC’ parameter, which classifies the samples as ‘Unfit for Drinking’. The CCME-WQI for Dataset 2 lies in the ‘Poor’ category for all the years. The drawback of this index is that, for calculating the CCME value, all the samples are used to compute a single value that is assigned to every sample and generates a single CCME-WQI class. NSF-WQI for Dataset 2, mostly falls in the ‘Unclassified’ class. This is the class that was assigned to the samples that do not fall in the ratings range defined by the index. This makes the NSF-WQI not applicable on all types of water samples. OWQI calculated for the Dataset 2 has categorized most of the water samples collected as ‘Very Poor’. Similarly, for MOI, the samples are categorized as ‘Totally Unsuitable’ for all the years.

Figure 5

WQI of Dataset 2 using WAWQI, CCME-WQI, NSF-WQI, OWQI, MOI methods.

Figure 5

WQI of Dataset 2 using WAWQI, CCME-WQI, NSF-WQI, OWQI, MOI methods.

Close modal

Figure 6(a) shows the comparison of indices month-wise for Dataset 1. The X-axis in Figure 6(a) represents the months, whereas the classes ‘Excellent’, ‘Good’ etc. are assigned numerical values in Y-axis so these can be compared to the five indices and month-wise changes could be observed. Here, it can be seen that the indices mostly lie in the 4–5 range, which represents the ‘Poor’ or ‘Unfit’ class. Therefore, giving a classification of Poor water quality for the Rawal lake. Moreover, it can also be observed that the results computed for Dataset 1 with NSF-WQI and WAWQI are the same throughout the months of June to December. Similarly, CCME-WQI and OWQI show the same classification for the water quality of Rawal lake. Moreover, for Dataset 2 the water quality either belonged to class 4 or 5 that represents the ‘Poor’ or ‘Unfit’ category, as seen in Figure 6(b). Some samples are unclassified as they did not fall under the range specified by the respective WQI used and are represented with a 10 scale in the Figure 6(b). Here, it can be observed that the CCME-WQI, MOI and WAWQI have assigned a similar classification to Rawal lake throughout the years.

Figure 6

(a) Comparison of indices over months June 2019 to December 2019 (Dataset 1), where Y-axis represents the class 0–5 for each index and X-axis represent the months. (b) Comparison of indices over years 2012–2019 (Dataset 2), where the Y-axis represents the class 0–5 for each index and the X-axis represents the years.

Figure 6

(a) Comparison of indices over months June 2019 to December 2019 (Dataset 1), where Y-axis represents the class 0–5 for each index and X-axis represent the months. (b) Comparison of indices over years 2012–2019 (Dataset 2), where the Y-axis represents the class 0–5 for each index and the X-axis represents the years.

Close modal

The indices have been computed with a limited set of parameters obtained from the Rawal Dam Lake. The constraint of this study was mainly the fact that these indices are developed based on certain selected parameters and the absence of some major parameters can impact the outcomes. As these indices can be unpredictable and every index has its own limitations or disadvantages. The WAWQI is very sensitive to the parameters, as a single parameter with a high concentration value can affect the index classification. Similarly, the NSF-WQI loses important information during processing of data, as the classification is dependent on the weights assigned to each parameter. Moreover, this index worked well only if the parameters involved were independent of each other. It requires all nine parameters and excluding any one may impact its performance. The calculation of CCME-WQI is a subjective process, as it involves the combination of three factors and several mathematical calculations compared with other indices. In addition, all these indices are not very generic and cannot be applied to all water types. Most indices are created for a specific location and may not be applicable to other sites, including the CCME-WQI that is developed for the province of British Columbia, Canada and OWQI that is developed for the state of Oregon.

Numerous WQIs have been developed for classifying water quality, but these have less global application. The validity of these indices depends on the data handling process, as valuable information might be lost. Consider NSF-WQI, in which eight out of nine parameters may have a satisfactory value while pH has the value zero. This would result in assignment of a high class to the water body that would be considered invalid, as a low pH would not be able to support marine life. The ability of these indices to handle missing values, outliers and other anomalies is still unknown. The removal or exclusion of certain parameters may influence the outcome of the indices. However, these indices may prove effective under specific conditions for the water body of interest.

Furthermore, the data collected through IoT nodes have its own limitations, as the nodes are deployed at the edges of inlet and outlet streams and there is no access to the center and other parts of the dam for data collection. Due to this constraint, the dataset suffered from class imbalance problems as the water at the edges and the near the bank of dam have high Tur. This meant that the majority of the data samples would be of the same category or class, leading to a lack of variation in the data samples gathered. From this it can be inferred that the factors, such as location, time and data collection frequency, may have a noticeable impact on the computation of water quality. Similarly, the calculation of the five WQIs gives a water quality classification of ‘Poor’ or ‘Very Poor’ for Rawal Dam, which may lead us to believe that the datasets collected may suffer from class imbalance. The high number of distinct values for parameters led to a skewed distribution, which has introduced this imbalance.

Classification of water quality using machine learning

The datasets used for the classification include the Dataset 1 and Dataset 2 with WAWQI and OWQI as class variables respectively. For water quality classification, five machine learning classifiers were applied on both datasets that included: DT, MLP, LogR, NB and KNN. NB and DT are less time consuming and gave a satisfactory performance compared with MLP, which is known for its flexibility and high classification accuracy (Su & Zhang 2006; Yang et al. 2015; Hemalatha & Rani 2017).

Datasets 1 and 2 were labelled with the six classes that were computed using the WAWQI and OWQI methods respectively. However, the dataset was imbalanced, as the samples were mostly classified as either Bad or Very Bad class. Both the datasets were split in a 60:20:20 ratio for training, validation and test set formation. This ratio is amongst the most commonly used ratios to overcome any possibilities of ‘overfitting’ (Hawkins 2004) that may occur due to class imbalance in the datasets. Generally, 20% of the validation data give a reasonable number of data samples that can estimate which model is best for classification. Similarly, testing this model on 20% of the unseen data will accurately assess the model's generalization ability (Ng 2017). The performance metrics that were used include Accuracy (Acc), Recall (Rec), Precision (Pre) and F1-Score (F1-Sc). Equations (18)–(21) show the formulae for these metrics. Here, True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) are represented for the actual and predicted classes:
(18)
(19)
(20)
(21)

For classification of the water quality status, the WAWQI classes namely Excellent (E), Good (G), Fair (F), Poor (P), Very Poor (VP) and Unfit for Drinking (U) have been used for Dataset 1. The dataset has five parameters namely ‘T’, ‘Tur’, ‘pH’, ‘DO’ and ‘conductivity’. A 99.6% accuracy was achieved with DT on the test set of Dataset 1. Whereas, KNN and NB performed well with 95 and 90% accuracy respectively. Table 8 shows the evaluation results. Figure 7 displays the confusion matrices of MLP, KNN, DT, NB and LogR. These matrices show the actual and predicted samples for all six classes. With DT, 365/367 samples of class ‘Excellent’ were correctly classified whereas two were misclassified as class ‘Good’. Similarly, 19/19 class ‘Good’ samples were correctly classified. For the ‘Fair’ and ‘Poor’ classes, one sample was misclassified as class ‘Excellent’.

Table 8

Classwise Pre, Rec, F1-Sc on test set (Dataset 1)

Pre
Rec
F1-Sc
ClassifierEGFPVPUEGFPVPUEGFPVPU
DT 99 90 100 100 100 100 99 100 95 97 100 100 99 95 98 99 100 100 
MLP 98 77 66 100 79 83 
KNN 95 50 71 59 68 98 100 26 24 47 74 99 97 34 36 52 71 99 
NB 88 78 95 99 81 97 93 79 96 
LogR 87 89 100 100 93 94 
Pre
Rec
F1-Sc
ClassifierEGFPVPUEGFPVPUEGFPVPU
DT 99 90 100 100 100 100 99 100 95 97 100 100 99 95 98 99 100 100 
MLP 98 77 66 100 79 83 
KNN 95 50 71 59 68 98 100 26 24 47 74 99 97 34 36 52 71 99 
NB 88 78 95 99 81 97 93 79 96 
LogR 87 89 100 100 93 94 
Figure 7

Confusion matrices of DT, LogR, KNN, MLP, NB on Dataset 1.

Figure 7

Confusion matrices of DT, LogR, KNN, MLP, NB on Dataset 1.

Close modal

For Dataset 2, the OWQI classes namely Excellent (E), Good (G), Fair (F), Poor (P), Very Poor (VP) and Unclassified (Un) have been used. The model was trained for supervised classification of OWQI classes on Dataset 2 that has six parameters including ‘T’, ‘Tur’, ‘pH’, ‘DO’, ‘FC’ and ‘conductivity’. MLP, KNN, NB, LogR and DT were applied to predict the OWQI classes. DT gave the best performance with a 96% accuracy on the test set. KNN, NB and LogR gave good performances followed by DT. The evaluation results are displayed in Table 9. Figure 8 displays the confusion matrices for MLP, KNN, DT, NB and LogR. These matrices showed the predicted and the actual samples for all six classes. The DT had classified 17/20 samples as belonging to the ‘Excellent’ class. Here, three were misclassified as class ‘Good’. Similarly, 5/5 ‘Good’ class samples were correctly predicted. For the ‘Fair’ class, one sample was misclassified as ‘Poor’ and three samples were misclassified as ‘Good’. For the ‘Poor’ class, one sample was misclassified as ‘Fair’ and for the ‘Unclassified’ class, one sample was misclassified as ‘Very Poor’. All 176/176 samples were correctly predicted for the class ‘Very Poor’. The evaluation results for both datasets are displayed in Table 10.

Table 9

Classwise Pre, Rec, F1-Sc on test set (Dataset 2)

Pre
Rec
F1-Sc
ClassifierEGFPVPUEGFPVPUEGFPVPU
DT 100 45 86 89 99 100 85 100 60 89 100 67 92 62 71 89 100 80 
NB 100 33 100 89 99 30 85 100 20 89 96 100 92 50 33 89 98 46 
LogR 83 10 43 97 67 100 20 33 99 67 91 13 38 98 67 
KNN 48 33 14 89 65 10 11 90 55 15 12 90 
MLP 76 84 80 
Pre
Rec
F1-Sc
ClassifierEGFPVPUEGFPVPUEGFPVPU
DT 100 45 86 89 99 100 85 100 60 89 100 67 92 62 71 89 100 80 
NB 100 33 100 89 99 30 85 100 20 89 96 100 92 50 33 89 98 46 
LogR 83 10 43 97 67 100 20 33 99 67 91 13 38 98 67 
KNN 48 33 14 89 65 10 11 90 55 15 12 90 
MLP 76 84 80 
Table 10

Acc on test Set (Datasets 1 and 2)

Dataset 1
Dataset 2
ClassifierAccAvg PreAvg RecAvg F1-ScAccAvg PreAvg RecAvg F1-Sc
DT 99.65 100 100 100 95.96 97 96 96 
KNN 95.15 93 94 93 91.5 76 78 77 
NB 90.4 86 90 88 91.5 97 91 92 
LogR 88 78 88 83 89.68 87 90 88 
MLP 76.8 71 77 72 66 60 66 63 
Dataset 1
Dataset 2
ClassifierAccAvg PreAvg RecAvg F1-ScAccAvg PreAvg RecAvg F1-Sc
DT 99.65 100 100 100 95.96 97 96 96 
KNN 95.15 93 94 93 91.5 76 78 77 
NB 90.4 86 90 88 91.5 97 91 92 
LogR 88 78 88 83 89.68 87 90 88 
MLP 76.8 71 77 72 66 60 66 63 
Figure 8

Confusion matrices of DT, LogR, KNN, MLP and NB (Dataset 2).

Figure 8

Confusion matrices of DT, LogR, KNN, MLP and NB (Dataset 2).

Close modal

Water quality classification using machine learning algorithms can prove to be a more effective and reliable method than the WQI, as the WQI uses the current instance data to perform various mathematical calculations, whereas machine learning algorithms consider the historical data or previous trends for water quality classification. Although the WQI calculation can help to label the respective dataset for performing classification using machine learning algorithms.

Therefore, the findings revealed that the application of the WQIs on water samples showed the unpredictable nature of each index, as each index comes with its own limitations whether it is: (1) their sensitivity to the high concentration to a specific parameter, or (2) their dependability on the weights assigned to each parameter, and (3) their application to certain locations or water types. Moreover, classification of water quality may be more effective by considering other environmental factors (Pu et al. 2016, 2019; Pu 2019; Pu et al. 2020) along with the classic water quality parameters already used to compute WQIs. Using such factors may eliminate the uncertainties and biases introduced by the selection of the location, type and number of parameters and weights assigned to these parameters. Therefore, developing a more advanced version of the water pollution index that takes as input topographical and hydrological parameters including: slope, lineament density and environmental parameters such as vegetation resistance, velocity distribution and natural irregular channel impact to the dam, etc. These parameters, when combined with machine learning techniques, could prove to be a more effective way to predict the water quality of any location.

In this study, the quality of Rawal Dam Lake's water was studied by applying five widely used water quality indices on two distinct datasets collected in real time using IoT sensors and through GIS-based grab sampling. The findings have shown that the indices may be affected by the process of how the data are collected, the number and type of parameters, time, frequency to measure the quality and the weights allocated to each parameter by the respective index that can increase the WQI value leading to the bias in the class allocation of WQI. The limitations of this study include: (1) the uneven distribution of type of water samples in the datasets and (2) that six physico-chemical parameters are used in this study, leading to water quality being categorized as either ‘Poor’ or ‘Very Poor’. Moreover, this paper presented the Rawal Dam data to find if machine learning could be useful to determine the class of water quality instead of WQI. The analysis showed that the DT algorithm had the highest accuracy of 99.6% and could be regarded as suitable for classifying the water quality of the dataset used. However, these indices have their own limitations and are mostly developed for a specific location/area or water type, making the indices generally less applicable to all water types and locations. Therefore, these water quality indices need to be updated to eliminate the uncertainties and biases introduced by the selection of the location, type and number of parameters and weights assigned to these parameters. For this aim, there is a need to develop more advanced and enhanced water pollution indices, based on other parameters such as topographical, environmental and hydrological parameters and including slope, lineament density, land use/land cover, rainfall, etc., combined with machine learning techniques that can effectively contribute to estimating the quality of water for all regions.

We would like to thank the Rawal Dam Water Filtration Plant for providing the data (referred as Dataset 2) and fulfilling other requirements for research purpose.

All relevant data are included in the paper or its Supplementary Information.

Abbasnia
A.
Yousefi
N.
Mahvi
A. H.
Nabizadeh
R.
Radfard
M.
Yousefi
M.
Alimohammadi
M.
2019
Evaluation of groundwater quality using water quality index and its suitability for assessing water for drinking and irrigation purposes: case study of Sistan and Baluchistan province (Iran)
.
Human and Ecological Risk Assessment: An International Journal
25
(
4
),
988
1005
.
Aha
D. W.
Kibler
D.
Albert
M. K.
1991
Instance-based learning algorithms
.
Machine Learning
6
(
1
),
37
66
.
Ali
M.
Qamar
A. M.
Ali
B.
2013
Data analysis, discharge classifications, and predictions of hydrological parameters for the management of Rawal dam in Pakistan
. In
2013 12th International Conference on Machine Learning and Applications
.
IEEE
, Vol
1
, pp.
382
385
.
Ashari
A.
Paryudi
I.
Tjoa
A. M.
2013
Performance comparison between naïve Bayes, decision tree and k-nearest neighbor in searching alternative design in an energy simulation tool
.
International Journal of Advanced Computer Science and Applications (IJACSA)
4
(
11
),
33
39
.
Bhatt
S.
Pathak
J.
1992
Assessment of water quality and aspects of pollution in a stretch of River gomti (Kumaun: Lesser Himalaya)
.
Journal of Environmental Biology
13
(
2
),
113
126
.
Bhatti
N.
Siyal
A.
Qureshi
A.
2018
Groundwater quality assessment using water quality index: a case study of Nagarparkar, Sindh, Pakistan
.
Sindh University Research Journal-SURJ (Science Series)
50
(
2
),
227
234
.
Bouza-Deaño
R.
Ternero-Rodríguez
M.
Fernández-Espinosa
A.
2008
Trend study and assessment of surface water quality in the Ebro river (Spain)
.
Journal of Hydrology
361
(
3–4
),
227
239
.
Brown
R. M.
McClelland
N. I.
Deininger
R. A.
O'Connor
M. F.
1972
A water quality index – crashing the psychological barrier
. In:
Indicators of Environmental Quality
.
Springer
,
Boston, MA
, pp.
173
182
.
Canadian Council of Ministers of the Environment
2001
Canadian Water Quality Guidelines for the Protection of Aquatic Life: CCME Water Quality Index 1.0, User's Manual
.
Carpenter
S. R.
Cole
J. J.
Essington
T. E.
Hodgson
J. R.
Houser
J. N.
Kitchell
J. F.
Pace
M. L.
1998
Evaluating alternative explanations in ecosystem experiments
.
Ecosystems
1
(
4
),
335
344
.
Chabuk
A.
Al-Madhlom
Q.
Al-Maliki
A.
Al-Ansari
N.
Hussain
H. M.
Laue
J.
2020
Water quality assessment along Tigris river (Iraq) using water quality index (WQI) and GIS software
.
Arabian Journal of Geosciences
13
(
14
),
1
23
.
Chowdhury
R. M.
Muntasir
S. Y.
Hossain
M. M.
2012
Water quality index of water bodies along Faridpur-Barisal road in Bangladesh
.
Global Engineering Technical Review
2
(
3
),
1
8
.
Chung
W.-Y.
Yoo
J.-H.
2015
Remote water quality monitoring in wide area
.
Sensors and Actuators B: Chemical
217
,
51
57
.
Couillard
D.
Lefebvre
Y.
1985
Analysis of water-quality indices
.
Journal of Environmental Management (United States)
21
(
2
),
161
179
.
Cox
D. R.
Snell
E. J.
1989
Analysis of Binary Data
, Vol.
32
.
CRC Press
,
USA
.
Cude
C. G.
2001
Oregon water quality index a tool for evaluating water quality management effectiveness 1
.
JAWRA Journal of the American Water Resources Association
37
(
1
),
125
137
.
Das
S.
Majumder
M.
Roy
D.
Mazumdar
A.
2010
Determination of urbanization impact on rain water quality with the help of water quality index and urbanization index
. In:
Impact of Climate Change on Natural Resource Management
.
Springer
,
Dordrecht
, pp.
131
142
.
Dasarathy
B. V.
1991
Nearest neighbor (NN) norms: NN pattern classification techniques
. In:
IEEE Computer Society Tutorial
.
De Rosemond
S.
Duro
D. C.
Dubé
M.
2009
Comparative analysis of regional water quality in Canada using the water quality index
.
Environmental Monitoring and Assessment
156
(
1–4
),
223
.
Dinius
S.
1987
Design of an index of water quality 1
.
JAWRA Journal of the American Water Resources Association
23
(
5
),
833
843
.
Domingos
P.
Pazzani
M.
1997
On the optimality of the simple Bayesian classifier under zero-one loss
.
Machine Learning
29
(
2
),
103
130
.
Fausett
L. V.
2006
Fundamentals of Neural Networks: Architectures, Algorithms and Applications
.
Pearson Education India
.
Fix
E.
Hodges
J. L.
Jr.
1952
Discriminatory Analysis-Nonparametric Discrimination: Small Sample Performance
.
Technical Report
.
California, University of Berkeley
.
Friedman
N.
Geiger
D.
Goldszmidt
M.
1997
Bayesian network classifiers
.
Machine Learning
29
(
2
),
131
163
.
Golbaz
S.
Nabizadeh
R.
Zarinkolah
S.
Mahvi
A. H.
Alimohammadi
M.
Yousefi
M.
2019
An innovative swimming pool water quality index (SPWQI) to monitor and evaluate the pools: design and compilation of computational model
.
Environmental Monitoring and Assessment
191
(
7
),
448
.
Gupta
R.
Singh
A.
Singhal
A.
2019
Application of ANN for water quality index
.
International Journal of Machine Learning and Computing
9
(
5
),
688
693
.
Hamzaoui-Azaza
F.
Ketata
M.
Bouhlila
R.
Gueddari
M.
Riberio
L.
2011
Hydrogeochemical characteristics and assessment of drinking water quality in Zeuss–Koutine aquifer, southeastern Tunisia
.
Environmental Monitoring and Assessment
174
(
1–4
),
283
298
.
Hawkins
D. M.
2004
The problem of overfitting
.
Journal of Chemical Information and Computer Sciences
44
(
1
),
1
12
.
Hemalatha
K.
Rani
K. U.
2017
Advancements in multi-layer perceptron training to improve classification accuracy
.
International Journal on Recent and Innovation Trends in Computing and Communication
5
(
6
),
353
357
.
House
M.
Newsome
D.
1989
Water quality indices for the management of surface water quality
. In:
Urban Discharges and Receiving Water Quality Impacts
.
Elsevier
,
Pergamon
, pp.
159
173
.
Jonnalagadda
S.
Mhere
G.
2001
Water quality of the Odzi river in the eastern highlands of Zimbabwe
.
Water Research
35
(
10
),
2371
2376
.
Karunanidhi
D.
Aravinthasamy
P.
Subramani
T.
Muthusankar
G.
2020
Revealing drinking water quality issues and possible health risks based on water quality index (WQI) method in the Shanmuganadhi river basin of south India
.
Environmental Geochemistry and Health
43
(
2
),
1
18
.
Khan
H. Q.
2010
Water quality index for municipal water supply of Attock city, Punjab, Pakistan
. In:
Survival and Sustainability
.
Springer
,
Berlin, Heidelberg
, pp.
1255
1262
.
Khan
F. I.
Abbasi
S.
1998
Multivariate hazard identification and ranking system
.
Process Safety Progress
17
(
3
),
157
170
.
Khanna
D.
Bhutiani
R.
Tyagi
B.
Tyagi
P. K.
Ruhela
M.
2013
Determination of water quality index for the evaluation of surface water quality for drinking purpose
.
International Journal of Science and Engineering
1
(
1
),
9
14
.
King
G.
Zeng
L.
2001
Explaining rare events in international relations
.
International Organization
55
(
3
),
693
715
.
Kumar
A.
Dua
A.
2009
Water quality index for assessment of water quality of River Ravi at Madhopur (India)
.
Global Journal of Environmental Sciences
8
(
1
),
49
57
.
Kumar
A.
Shukla
M.
2002
Water quality index (WQI) of river Sai at Raibareilly city U.P
.
Journal of Ecophysiology and Occupational Health
2
,
163
172
.
Landwehr
J. M.
1979
A statistical view of a class of water quality indices
.
Water Resources Research
15
(
2
),
460
468
.
Liou
S.-M.
Lo
S.-L.
Wang
S.-H.
2004
A generalized water quality index for Taiwan
.
Environmental Monitoring and Assessment
96
(
1–3
),
35
52
.
Lohani
B. N.
Todino
G.
1984
Water quality index for Chao Phraya River
.
Journal of Environmental Engineering
110
(
6
),
1163
1176
.
Lumb
A.
Halliwell
D.
Sharma
T.
2006
Application of CCME water quality index to monitor water quality: a case study of the Mackenzie river basin, Canada
.
Environmental Monitoring and Assessment
113
(
1–3
),
411
429
.
Maimon
O. Z.
Rokach
L.
2014
Data Mining with Decision Trees: Theory and Applications
, Vol.
81
.
World scientific
.
McClelland
N. I.
1974
Water Quality Index Application in the Kansas River Basin
, Vol.
74
.
US Environmental Protection Agency-Region VII
.
Mulani
S. K.
Mule
M.
Patil
S.
2009
Studies on water quality and zooplankton community of the Panchganga river in Kolhapur city
.
Journal of Environmental Biology
30
(
3
),
455
.
Neal
C.
House
W. A.
Jarvie
H. P.
Eatherall
A.
1998
The significance of dissolved carbon dioxide in major lowland rivers entering the north Sea
.
Science of the Total Environment
210
,
187
203
.
Ng
A.
2017
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
.
Coursera
.
Available from: https://www.coursera.org/learn/deep-neural-network (accessed 7 March 2021)
.
Nives
S.-G.
1999
Water quality evaluation by index in Dalmatia
.
Water Research
33
(
16
),
3423
3440
.
Ott
W. R.
1978
Environmental Indices: Theory and Practice
.
Pu
J. H.
Huang
Y.
Shao
S.
Hussain
K.
2016
Three-Gorges Dam fine sediment pollutant transport: turbulence SPH model simulation of multi-fluid flows. https://bradscholars.brad.ac.uk/handle/10454/8340.
Pu
J. H.
Hussain
A.
Guo
Y.-k.
Vardakastanis
N.
Hanmaiahgari
P. R.
Lam
D.
2019
Submerged flexible vegetation impact on open channel flow velocity distribution: an analytical modelling study on drag and friction
.
Water Science and Engineering
12
(
2
),
121
128
.
Puri
P.
Yenkie
M.
Sangal
S.
Gandhare
N.
Sarote
G.
Dhanorkar
D.
2011
Surface water (lakes) quality assessment in Nagpur city (India) based on water quality index (WQI)
.
Rasayan Journal of Chemistry
4
(
1
),
43
48
.
Qadir
A.
Malik
R. N.
Husain
S. Z.
2008
Spatio-temporal variations in water quality of Nullah Aik-tributary of the river Chenab, Pakistan
.
Environmental Monitoring and Assessment
140
(
1–3
),
43
59
.
Rajankar
P.
Gulhane
S.
Tambekar
D.
Ramteke
D.
Wate
S.
2009
Water quality assessment of groundwater resources in Nagpur region (India) based on WQI
.
Journal of Chemistry
6
(
3
),
905
908
.
Ramakrishnaiah
C.
Sadashivaiah
C.
Ranganna
G.
2009
Assessment of water quality index for the groundwater in Tumkur taluk, Karnataka state, India
.
Journal of Chemistry
6
(
2
),
523
530
.
Selvam
S.
Manimaran
G.
Sivasubramanian
P.
Balasubramanian
N.
Seshunarayana
T.
2014
GIS-based evaluation of water quality index of groundwater resources around Tuticorin coastal city, south India
.
Environmental Earth Sciences
71
(
6
),
2847
2867
.
Silvert
W.
2000
Fuzzy indices of environmental conditions
.
Ecological Modelling
130
(
1–3
),
111
119
.
Srebotnjak
T.
Carr
G.
de Sherbinin
A.
Rickwood
C.
2012
A global water quality index and hot-deck imputation of missing data
.
Ecological Indicators
17
,
108
119
.
Štambuk-Giljanović
N.
2003
Characteristics of water resources in Dalmatia according to established standards for drinking water
.
Journal of Water Supply: Research and Technology – AQUA
52
(
4
),
307
318
.
Su
J.
Zhang
H.
2006
A fast decision tree learning algorithm
. In
AAAI
. Vol.
6
, pp.
500
505
.
Tsegaye
T.
Sheppard
D.
Islam
K.
Tadesse
W.
Atalay
A.
Marzen
L.
2006
Development of chemical index as a measure of in-stream water quality in response to land-use and land cover changes
.
Water, Air, and Soil Pollution
174
(
1–4
),
161
179
.
Vasanthavigar
M.
Srinivasamoorthy
K.
Vijayaragavan
K.
Ganthi
R. R.
Chidambaram
S.
Anandhan
P.
Manivannan
R.
Vasudevan
S.
2010
Application of water quality index for groundwater quality assessment: Thirumanimuttar sub-basin, Tamilnadu, India
.
Environmental Monitoring and Assessment
171
(
1–4
),
595
609
.
Wu
Z.
Wang
X.
Chen
Y.
Cai
Y.
Deng
J.
2018
Assessing river water quality using water quality index in Lake Taihu Basin, China
.
Science of the Total Environment
612
,
914
922
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).