The successful prediction of the stream or river water quality is gaining the attention of various governmental agencies, and pollution control boards worldwide due to its useful applications in determining watershed health, biodiversity, ecology, and suitability of potable water needs of the river basin. The physically based computational water quality models would require large spatial and temporal information databases of climatic, hydrologic, and environmental variables and solutions of nonlinear, partial differential equations at each grid point in a river basin. These models suffer from estimability, convergence, stability, approximation, dispersion, and consistency issues. In such a problematic modeling scenario, an artificial neural network (ANN) modeling of 22 stream water quality parameters (SWQPs) is performed from easily measurable data of precipitation, temperature, and novel land use parameters obtained from Geographic Information System (GIS) analysis for the Godavari River Basin, India. The ANN models are compared with the more traditional, statistical linear, and nonlinear regression models for accuracy and performance statistics. This study obtains regression coefficients of 0.93, 0.78, 0.83, and 0.74 for electrical conductivity, dissolved oxygen, biochemical oxygen demand, and nitrate in testing using feedforward ANNs compared with a maximum of 0.45 using linear and nonlinear regressions. Principal component analysis (PCA) is performed to reduce the input data dimension. The subsequent modeling using radial basis function and ANNs is found to improve the overall regression coefficients slightly for the chosen four water quality parameters (WQPs). A closed form equation for electrical conductivity has been derived from MATLAB simulations. The successful modeling results indicate the effectiveness and potential of ANNs over the statistical regression approaches for estimating the highly nonlinear problem of stream water quality distributions.

  • A GIS, ANN-based causal WQ model is developed for a non-Karst watershed.

  • Novel land use factors are developed for the model.

  • PCA-based ANN models are found to be superior compared with others.

  • An equation for conductivity is developed from MATLAB simulations.

  • Land use parameters are also important along with climate parameters for water quality model development.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Freshwater availability on Earth is limited to 3%, and it is essential to protect human and other biotic ecosystems to ensure continued survival (NASA n.d.). Even though natural streams are the primary water source, they are most affected by many anthropogenic activities, such as rapid urbanization, agriculture, and industrial activities (Srivastava et al. 2017). Preventing unregulated waste deposition from industry and regularly monitoring and inspecting streams will enhance water quality (Shah & Joshi 2017). Currently, stream water quality is measured only at a few locations for any river basin. The desirable scenario would be to have stream water quality variables at a significant number of locations, requiring a lot more resources and continuous monitoring from various governmental agencies. Cost, time, and accessibility of the site are also significant impediments to conventional water quality testing methods for monitoring purposes.

Moreover, in-situ data will give point analysis rather than the overall water quality assessment for the entire river basin. In this regard, Remote Sensing and GIS tools could benefit continuous monitoring (Ritchie et al. 2003; Caballero et al. 2018). Several factors interact with natural chemical, biological, hydrological, and meteorological cycles to affect river water quality in a catchment. Many other factors, such as stream meandering, mixing processes in different dimensions, evaporation, and turbulence in the stream, affect the water quality of streams (Anmala et al. 2015). Various models have been developed to quantify the SWQPs on a location basis or using information specific to the measurement locations. Still, no comprehensive model has been developed due to the difficulties of accurately describing the watershed's physical, climatic, urban, and morphologic variations (Burigato Costa et al. 2019). A solution to this problem could be to predict the stream water quality from easily definable parameters from the source area. The parameters could be the nature of the watershed or morphological data, hydroclimatological variables, land use data, spectral reflectance characteristics, and specific information on point sources.

Many studies have been performed on large river basins worldwide to assess water quality. The findings have consistently shown that water quality has declined (Giri 2021). The significant parameters which have been taken into consideration to influence river water quality are the physical parameters such as temperature (T), turbidity (Turb), total suspended solids (TSS), and total dissolved solids (TDS); chemical parameters including dissolved oxygen (DO), salinity, pH, nitrate nitrogen (), electrical conductivity, sulfates (), phosphate (), total hardness (TH), total alkalinity (TA), calcium (Ca), magnesium (Mg), biochemical oxygen demand (BOD), and chemical oxygen demand (COD); and microbiological parameters such as fecal coliform (FC) and total coliform (TC) (Mandal et al. 2010). The seasonal variation of physicochemical parameters, namely T, pH, DO, free CO2, COD, BOD, carbonate, bicarbonate, TA, hardness, turbidity, Ca, Mg, sodium (Na), potassium (K), , , chloride (Cl), , electrical conductivity (EC), TDS, and TSS of the Ganga River Basin were investigated (Joshi et al. 2009). It has been found that the concentrations of pH, EC, TDS, TSS, turbidity, and sodium exceeded the prescribed limit compared with the year before.

The WQPs of the major river basins, including Ganga, Yamuna, and Cauvery in India, were studied (Mandal et al. 2010; Solaraj et al. 2010; Dwivedi et al. 2018). It has been reported that a high amount of BOD due to the polluted sewage disposal from urban centers is the primary cause of the deterioration of water quality (Gajendran et al. 2010). Kumar et al. (2016) have chosen 13 main river basins based on their catchment size exceeding 20,000 km2 and obtained the primary data from the Central Water Commission (CWC), Central Pollution Control Board (CPCB), and other scientific studies. They have concluded that the water quality of India's main rivers is unfit for human use due to high bacterial counts and anthropogenic inputs into the river systems.

Many studies have developed linkage models by incorporating watershed metrics, land use land cover (LULC) classifications, and meteorological data to create GIS and ANN-based water quality prediction models (Girija et al. 2007; Song et al. 2010; Anmala et al. 2015). Furthermore, in order to establish effective inputs to linkage models (GIS-ANN), researchers have applied statistical and dimension reduction approaches such as PCA to focus on major cause and effect variables and discovered that land use with climatic parameters provides better predictions than land use alone (Anmala & Venkateshwarlu 2019; Venkateshwarlu et al. 2020). Arslan (2013) studied the spatiomultivariate statistical analysis in the form of spatially weighted PCA for the Akarya River Basin, Turkey water quality dataset and suggested the incorporation of spatial structures, and patterns of water quality variables information into the PCA model. Fathi et al. (2018) investigated the state of water quality of the Beheshtabad River, in Iran. They have used PCA and clustering multivariate statistical techniques to examine temperature, phosphate, turbidity, dissolved oxygen, biochemical oxygen demand, EC, total solids, and pH to reduce data dimensionality and identify major cluster pollution zones. They have concluded that agricultural fertilizers, upstream wastewater runoff, and fish farms are the major variables influencing the water quality of the Beheshtabad River. Yang et al. (2020) mapped the spatial and temporal distribution of surface water quality variables in the Xin'anjiang River, Huangshan, China. Nine hundred and sixty water samples were collected on a monthly basis and were tested for 22 water quality indicators. The inverse distance weighted (IDW) method was used to interpolate the PCA comprehensive score to identify emerging pollution stations. In the current study, GIS, PCA, and ANNs are used in a causal modeling framework considering both climate and land use parameters for the predictions of WQPs.

Effective continuous monitoring techniques are needed for modeling to address the above problems. Girija et al. (2007) developed a four-layer GIS model to study the effect of LULC on water quality in the Brahmaputra River sub-basin. The BOD, DO, and total phosphorus are the sensitive parameters because of the strong nutrient inflow from farm fields. The nutrient intake has risen further, causing DO depletion in the region. Multivariate statistical techniques and feedforward, back-propagation algorithms were used to develop the water quality index (WQI) and further compared with R2 and RMSE values of different indices (Sinha & Das 2015). Furthermore, ANNs and multiple linear regression (MLR) techniques were used to test physicochemical parameters of water quality, and satisfactory results have been found using ANN models (Kadam et al. 2019). Anmala et al. (2015), Anmala & Venkateshwarlu (2019), and Venkateshwarlu et al. (2020) investigated the impact of LULC and hydrological characteristics to model and predict SWQPs of the Upper Green River basin, Kentucky, USA. Statistical regression methods, GIS and ANN-based models, and a substantial causal relationship between them were discovered. The ANN models have predicted the SWQPs more accurately than the nonlinear regression models. The PCA and canonical correlation analysis (CCA) were also performed to find out the reduced set of causal variables in the stream water quality modeling (Venkateshwarlu et al. 2020)). The morphometric characteristics of the Siddheswari river basin were studied by Sutradhar (2020). They found that the basin underwent significant soil degradation due to poor management practices.

Due to its simplicity, researchers use statistical modeling more commonly than physically based models to assess stream water quality at a watershed scale. However, the real-world scenario is different, and the statistical methods give over-simplistic results at the watershed scale. The ANN models best describe the nonlinear relationship between response variables and predictors compared with more straightforward statistical regression methods. The issues of nonlinearity, multicollinearity, heteroscedasticity, and model prediction accuracy are best handled using artificial intelligence based and machine learning models such as ANNs, Regression Trees, Random Forests, and ensemble methods (KC et al. 2019). Although the ANNs, their hybrid variants, and other machine learning methods are successfully employed for stream water quality predictions (Bayram et al. 2012; Heddam 2014; Ravansalar & Rajaee 2015; Najafzadeh & Ghaemi 2019), the current study extends the causal modeling framework of Anmala et al. (2015) (and Anmala & Venkateshwarlu 2019; Venkateshwarlu et al. 2020) for non-Karst watershed containing stream network using statistical methods and ANNs for deeper understanding of the problem. The specific objectives of the study are to (i) study the influence of climate and land use parameters on WQP estimation using a causal modeling framework, (ii) determine land use factors of the Godavari River Basin using GIS analysis, (iii) study the suitability of statistical and ANN models in water quality modeling for non-Karst watershed stream networks, and (iv) investigate PCA in reducing the input data dimension for effective causal modeling of stream water quality.

The study area of the current investigation is the Godavari River Basin. The Godavari River Basin is one of the largest river basins in peninsular India, called Dakshina Ganga. After the Ganges Basin, the Godavari Basin is the country's second-largest basin, covering roughly 9.5% of the country's total land area. Overall, the basin covers the drainage area of 312,812 km2 bounded between the longitudes of 70°24′ and 83°4′ E, and latitudes of 16°19′ and 22°34′ N. The river originates in Sahyadris, at an altitude of 1,067 m over mean sea level near Triambakeshwar in the Nashik district of Maharashtra and flows in the Deccan region toward the Eastern Ghats from the Western side. The river flows in a few states and forms interstate boundaries between Telangana and Maharashtra, Telangana, and Chhattisgarh. The river covers the major geographic areas in Maharashtra (48.7%), Telangana (19.87%), Andhra Pradesh (3.53%), Chhattisgarh (10.69%), Madhya Pradesh (10.17%), Odisha (5.67%), Karnataka (1.41%), and lesser area in Pondicherry, respectively (Ministry of Jal Shakti n.d.). The basin receives most of its rainfall during the southwest monsoon season (July–September), with maximum and minimum annual rainfall ranging from 881 to 1,395 mm, with an average of 1,110 mm. The yearly maximum temperature ranges from 31 to 33.5 °C. The western portion of the basin is much hotter compared with the central, northern, and eastern regions.

A watershed is an area that receives the water from the highest ridge portion and follows toward the lowest level, called PourPoint. To delineate this aligned system, digital elevation models (DEMs) were processed using the hydrological tools in the QGIS processing toolbox. The SRTM (shuttle radar topography mission) DEM dataset with a resolution of 30 m was utilized for watershed delineation (USGS n.d.). The combined DEM covering the study area was considered input for delineating the watershed, which follows the methodology depicted in the flow chart given in Figure 1. In other words, the watershed DEM is downloaded first and pre-processed. Then FILL operation is performed on DEM. The flow accumulation and drainage directions are extracted first. Then the streams are extracted and PourPoint is located. Then the file is clipped and converted into a watershed shape file.
Figure 1

The watershed delineation flowchart.

Figure 1

The watershed delineation flowchart.

Close modal

The monthly available water quality data from January 2019 to April 2021 is collected from the Telangana State Pollution Control Board (TSPCB) website (Telangana State Pollution Control Board n.d.). The geotagged water sampling locations were exported into QGIS software for correlating with geospatial data. The rainfall and temperature data were collected from the Indian Meteorological Department (IMD) (Indian Meteorological Department n.d.). The rainfall data were available from 1901 to 2020 in the NetCDF format with 0.25° × 0.25° grid resolution, and temperature data were available from 1951 to 2020 with 1° × 1° grid resolution. To process the NetCDF file python console used in the QGIS software, all files were run in a loop and were extracted in the Excel format. For 2021-year rainfall and temperature data collected from the NASA website (POWER Data Access Viewer n.d.), data were downloaded by specifying the latitude and longitude of the sample station. The monthly water quality data available at 13 sampling stations in the Telangana state of Godavari River Basin are considered for the present study. This data was available consistently for 22 parameters that are considered in the present study. In developing a causal model for stream water quality problem, Anmala et al. (2015) have used two-day cumulative precipitation, temperature, urban, forest, and agricultural land use factors as cause parameters successfully. Developing a similar causal model for Godavari River Basin, the independent or cause parameters considered are climate parameters such as mean daily precipitation (P), maximum temperature (Tmax), minimum temperature (Tmin), and land use parameters such as urban land use factor (UL), forest land use factor (FL), agricultural land use factor (AL), grass land use factor (GL), and shrub land use factors (SL) at each of the sampling stations. The dependent WQPs are DO, pH, EC, etc., up to 22 of them as shown in various result tables.

GIS land use analysis for geospatial data

To better characterize the influences of the urban, agricultural and industrial discharges on stream water quality, land use factors are developed by classifying the watershed area into five main classes: Urban, Forest, Agricultural, Grass, and Shrub. To classify these factors, GlobeLand30 LULC (global land cover mapping at 30 m resolution) data were downloaded from the www.globallandcover.com site. Sub-basins were delineated for the sample stations after processing the LULC data into the required basin shape. All the upper portions of the areas were delineated, and rasters were converted to vector files. Furthermore, these basin vectors were projected into processed LULC, and an operation of clip extent is performed to get the LULC into the delineated sample station area. Then from the field calculator, the size of each field is calculated and added to all the corresponding field areas as one other type of the classified areas (Urban, Shrub, Grass, Forest, and Agricultural). In the same way, all the selected consistent data stations (12 of them) were delineated and merged with QGIS software and clipped for corresponding sampling stations to further extract the land use factors for the analysis. In short, the upper drainage portion of each sampling station is delineated. The LULC layer is added and clipped into the required shapefile. Then similar fields are identified, and areas of each classification are calculated. From this data, the LULC factors are calculated as follows:
formula
(1)
formula
(2)
formula
(3)
formula
(4)
formula
(5)
These land use factors are used as inputs to the model along with the climate parameters mean daily precipitation, and maximum and minimum temperatures. The detailed GIS land use analysis procedure to obtain land use factors is shown in Figure 2.
Figure 2

GIS land use analysis to obtain land use factors.

Figure 2

GIS land use analysis to obtain land use factors.

Close modal
The land use map of the Godavari Watershed is shown in Figure 3.
Figure 3

Land use map of Godavari River Basin.

Figure 3

Land use map of Godavari River Basin.

Close modal

Linear and nonlinear regression

A single linear variable and multivariate regression models are developed first for each of the WQPs by treating the mean daily precipitation, maximum and minimum temperatures, urban, forest, agricultural, grass land and shrub land use factors as independent variables. Then nonlinear regression models are developed for each WQP using the model expressions of SPSS given in Table 1. The A1, A2, etc., are the model constants or undetermined coefficients specific to those model expressions. The nonlinear regression models are developed separately for all the climate parameters, land use parameters, and combined climate and land use parameters.

Table 1

The nonlinear regression models using SPSS

Nonlinear model numberNonlinear model nameNonlinear model expression
Asymptotic Regression A1 + A2·exp(A3·x) 
Asymptotic Regression A1 − (A2·(A3^x)) 
Density (A1 + A2·x)^(−1/A3) 
Gauss A1·(1 − A3·exp(−A2·x^2)) 
Gompertz A1·exp(−A2·exp(−A3·x)) 
Johnson-Schumacher A1·exp(−A2/(x + A3)) 
Log-Modified (A1 + A3·x)^A2 
Log-Logistic A1 − ln(1 + A2·exp(−A3·x)) 
Metcherlich Law of Diminishing Returns A1 + A2·exp(−A3·x) 
10 Michaelis Menten A1·x/(x + A2) 
11 Morgan-Mercer-Florin (A1·A2 + A3·x^A4)/(A2 + x^A4) 
12 Peal-Reed A1/(1 + A2·exp(−(A3·x + A4·x^2 + A5·x^3))) 
13 Ratio of Cubics (A1 + A2·x + A3·x^2 + A4·x^3)/(A5·x^3) 
14 Ratio of Quadratics (A1 + A2·x + A3·x^2)/(A4·x^2) 
15 Richards A1/((1 + A3·exp(−A2·x))^(1/A4)) 
16 Verhulst A1/(1 + A3·exp(−A2·x)) 
17 Von Bertalanffy (A1^(1 − A4) − A2·exp(−A3·x))^(1/(1 − A4)) 
18 Weibull A1 − A2·exp(−A3·x^A4) 
19 Yield Density (A1 + A2·x + A3·x^2)^(−1) 
Nonlinear model numberNonlinear model nameNonlinear model expression
Asymptotic Regression A1 + A2·exp(A3·x) 
Asymptotic Regression A1 − (A2·(A3^x)) 
Density (A1 + A2·x)^(−1/A3) 
Gauss A1·(1 − A3·exp(−A2·x^2)) 
Gompertz A1·exp(−A2·exp(−A3·x)) 
Johnson-Schumacher A1·exp(−A2/(x + A3)) 
Log-Modified (A1 + A3·x)^A2 
Log-Logistic A1 − ln(1 + A2·exp(−A3·x)) 
Metcherlich Law of Diminishing Returns A1 + A2·exp(−A3·x) 
10 Michaelis Menten A1·x/(x + A2) 
11 Morgan-Mercer-Florin (A1·A2 + A3·x^A4)/(A2 + x^A4) 
12 Peal-Reed A1/(1 + A2·exp(−(A3·x + A4·x^2 + A5·x^3))) 
13 Ratio of Cubics (A1 + A2·x + A3·x^2 + A4·x^3)/(A5·x^3) 
14 Ratio of Quadratics (A1 + A2·x + A3·x^2)/(A4·x^2) 
15 Richards A1/((1 + A3·exp(−A2·x))^(1/A4)) 
16 Verhulst A1/(1 + A3·exp(−A2·x)) 
17 Von Bertalanffy (A1^(1 − A4) − A2·exp(−A3·x))^(1/(1 − A4)) 
18 Weibull A1 − A2·exp(−A3·x^A4) 
19 Yield Density (A1 + A2·x + A3·x^2)^(−1) 

ANN modeling

An ANN consists of an input layer, a hidden layer, and an output layer. Each layer consists of a certain number of nodes/neurons. In the current study, only one hidden layer is used. Each node in the input layer is connected to all the nodes in the hidden layer, and each of the hidden nodes is connected to the node in the output layer. All of the connections are associated with certain weights evaluated during the neural network's training process. The hidden and output layers also have a bias node for generalization of network classification through the region away from the origin. The weights are initialized to random values and are iteratively evaluated to obtain converged weights after the training process. The neural network training is done using a robust backpropagation learning algorithm for weight updation. The details of neural network modeling could be found in Rumelhart et al. (1986) and Hecht-Nielsen (1988). Separate neural networks are used for the different SWQPs, limiting the output node to one SWQP at a time. The input nodes are fixed to be eight nodes which include mean daily precipitation (P), maximum temperature (Tmax), minimum temperature (Tmin), urban land use factor (UL), forest land use factor (FL), agricultural land use factor (AL), grass land use factor (GL), and shrub land use factors (SL) at each of the sampling stations. The outputs of the neural network are evaluated for accuracy in terms of the regression coefficient (R2), root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), index of agreement (D), and Nash–Sutcliffe coefficient of efficiency (NSE). Their definitions are given below:
formula
(6)
formula
(7)
formula
(8)
formula
(9)
formula
(10)
formula
(11)
where refers to observations and refers to the model's predictions, and n stands for the number of samples. and refer to the observed and predicted mean values, respectively.

Principal component analysis

The PCA is a dimensionality reduction method that finds the uncorrelated variables known as ‘principal components’ minimizing the information loss and explaining the maximum variance of the data (Jolliffe & Cadima 2016). It is a very useful statistical technique to analyze large datasets making the interpretation easier. In other words, it finds the effective linear combinations of input parameters when there is a large number of parameters to choose from. It essentially reduces the dataset to an eigenvalue/eigenvector problem and has been reinvented with different names in several fields of science and engineering including data science most recently.

Statistical linear and nonlinear regression using SPSS

The correlation matrix of input and output variables is shown in Table 2. There are only five correlations (using Pearson correlation coefficients) which are more than or equal to 0.4 or less than or equal to −0.4 between the input and output variables. Electrical conductivity, sulfate, TDS show correlations of 0.41, 0.42, and 0.40 with agricultural land use, sodium shows a correlation of 0.45 with precipitation, and sulfate shows a correlation of −0.40 with forest land use. The linear regression results using SPSS with climate parameters, land use parameters, and all parameters are shown in Table 3.

Table 2

Correlation matrix between input and output variables

Climate and land use parameter (→)
Water quality parameter (↓)PTomaxTminALFLGLSLUL
DO (mg/L) 0.12 −0.05 −0.02 −0.19 0.14 0.32 0.30 0.08 
pH 0.08 0.06 0.04 0.06 −0.09 0.05 −0.09 0.16 
Electrical conductivity (mS/cm) 0.03 0.02 0.01 0.41 −0.38 −0.39 −0.34 −0.03 
BOD (mg/L) −0.01 −0.09 −0.05 0.11 −0.06 −0.25 −0.17 −0.12 
COD (mg/L) −0.06 −0.20 −0.11 0.25 −0.21 −0.34 −0.23 −0.07 
Nitrate −0.08 −0.22 −0.28 0.13 −0.10 −0.18 −0.11 −0.06 
Total Coliform (MPN/100 ml) 0.07 0.10 0.14 0.12 −0.12 −0.11 −0.10 0.04 
Turbidity (NTU) 0.08 −0.03 0.09 −0.06 0.10 −0.07 −0.17 −0.07 
Total Alk. (mg/L) −0.24 0.21 −0.06 0.39 −0.37 −0.33 −0.41 0.06 
Chloride (mg/L) 0.11 −0.05 −0.02 0.35 −0.33 −0.33 −0.24 −0.06 
Hardness (mg/L) −0.25 0.18 −0.08 0.36 −0.34 −0.34 −0.40 0.03 
Calcium (mg/L) −0.25 0.17 −0.06 0.36 −0.33 −0.36 −0.43 0.00 
Magnesium (mg/L) −0.15 0.15 −0.06 0.35 −0.33 −0.32 −0.34 −0.01 
Sulphate (mg/L) 0.23 −0.11 0.13 0.42 − 0.40 −0.35 −0.31 −0.06 
Sodium (mg/L) 0.16 −0.04 0.09 0.34 −0.32 −0.31 −0.22 −0.05 
TDS (mg/L) 0.02 0.01 −0.01 0.40 −0.38 −0.38 −0.34 −0.03 
TSS (mg/L) 0.19 −0.27 0.02 −0.10 0.10 0.07 0.09 0.04 
Total Phosphate (mg/L) 0.04 −0.10 0.00 0.15 −0.11 −0.24 −0.13 −0.10 
Potassium (mg/L) 0.19 −0.07 0.08 0.15 −0.11 −0.23 −0.17 −0.09 
Fluoride (mg/L) −0.01 0.25 0.18 0.00 0.00 −0.02 −0.12 0.05 
Sodium % 0.45 −0.21 0.22 0.20 −0.21 −0.12 −0.11 0.06 
SAR 0.27 −0.22 0.05 0.34 −0.33 −0.29 −0.23 0.00 
Climate and land use parameter (→)
Water quality parameter (↓)PTomaxTminALFLGLSLUL
DO (mg/L) 0.12 −0.05 −0.02 −0.19 0.14 0.32 0.30 0.08 
pH 0.08 0.06 0.04 0.06 −0.09 0.05 −0.09 0.16 
Electrical conductivity (mS/cm) 0.03 0.02 0.01 0.41 −0.38 −0.39 −0.34 −0.03 
BOD (mg/L) −0.01 −0.09 −0.05 0.11 −0.06 −0.25 −0.17 −0.12 
COD (mg/L) −0.06 −0.20 −0.11 0.25 −0.21 −0.34 −0.23 −0.07 
Nitrate −0.08 −0.22 −0.28 0.13 −0.10 −0.18 −0.11 −0.06 
Total Coliform (MPN/100 ml) 0.07 0.10 0.14 0.12 −0.12 −0.11 −0.10 0.04 
Turbidity (NTU) 0.08 −0.03 0.09 −0.06 0.10 −0.07 −0.17 −0.07 
Total Alk. (mg/L) −0.24 0.21 −0.06 0.39 −0.37 −0.33 −0.41 0.06 
Chloride (mg/L) 0.11 −0.05 −0.02 0.35 −0.33 −0.33 −0.24 −0.06 
Hardness (mg/L) −0.25 0.18 −0.08 0.36 −0.34 −0.34 −0.40 0.03 
Calcium (mg/L) −0.25 0.17 −0.06 0.36 −0.33 −0.36 −0.43 0.00 
Magnesium (mg/L) −0.15 0.15 −0.06 0.35 −0.33 −0.32 −0.34 −0.01 
Sulphate (mg/L) 0.23 −0.11 0.13 0.42 − 0.40 −0.35 −0.31 −0.06 
Sodium (mg/L) 0.16 −0.04 0.09 0.34 −0.32 −0.31 −0.22 −0.05 
TDS (mg/L) 0.02 0.01 −0.01 0.40 −0.38 −0.38 −0.34 −0.03 
TSS (mg/L) 0.19 −0.27 0.02 −0.10 0.10 0.07 0.09 0.04 
Total Phosphate (mg/L) 0.04 −0.10 0.00 0.15 −0.11 −0.24 −0.13 −0.10 
Potassium (mg/L) 0.19 −0.07 0.08 0.15 −0.11 −0.23 −0.17 −0.09 
Fluoride (mg/L) −0.01 0.25 0.18 0.00 0.00 −0.02 −0.12 0.05 
Sodium % 0.45 −0.21 0.22 0.20 −0.21 −0.12 −0.11 0.06 
SAR 0.27 −0.22 0.05 0.34 −0.33 −0.29 −0.23 0.00 

The bold values indicate appreciable correlation between water quality parameters and climate parameters, land use factors.

Table 3

Linear regression results using SPSS

S. No.ParameterLinear regression (SPSS)
R2
P, Tmax, TminAL, FL, GL, SL, ULP, Tmax, Tmin, AL, FL, GL, SL, UL
DO (mg/L) 0.03 0.18 0.21 
pH 0.03 0.12 0.15 
Electrical conductivity (mS/cm) 0.00 0.27 0.28 
BOD (mg/L) 0.01 0.19 0.19 
COD (mg/L) 0.09 0.30 0.36 
Nitrate 0.08 0.09 0.16 
Total Coliform (MPN/100 ml) 0.02 0.04 0.08 
Turbidity (NTU) 0.02 0.08 0.10 
Total Alk. (mg/L) 0.10 0.27 0.37 
10 Chloride (mg/L) 0.13 0.22 0.24 
11 Hardness (mg/L) 0.09 0.24 0.34 
12 Calcium (mg/L) 0.08 0.25 0.33 
13 Magnesium (mg/L) 0.06 0.19 0.26 
14 Sulphate (mg/L) 0.07 0.21 0.29 
15 Sodium % 0.25 0.06 0.32 
16 TDS (mg/L) 0.00 0.27 0.28 
17 TSS (mg/L) 0.14 0.03 0.17 
18 Total Phosphate (mg/L) 0.02 0.16 0.17 
19 Potassium (mg/L) 0.03 0.12 0.16 
20 Fluoride (mg/L) 0.08 0.05 0.11 
21 Sodium (mg/L) 0.03 0.18 0.22 
22 SAR 0.10 0.19 0.28 
S. No.ParameterLinear regression (SPSS)
R2
P, Tmax, TminAL, FL, GL, SL, ULP, Tmax, Tmin, AL, FL, GL, SL, UL
DO (mg/L) 0.03 0.18 0.21 
pH 0.03 0.12 0.15 
Electrical conductivity (mS/cm) 0.00 0.27 0.28 
BOD (mg/L) 0.01 0.19 0.19 
COD (mg/L) 0.09 0.30 0.36 
Nitrate 0.08 0.09 0.16 
Total Coliform (MPN/100 ml) 0.02 0.04 0.08 
Turbidity (NTU) 0.02 0.08 0.10 
Total Alk. (mg/L) 0.10 0.27 0.37 
10 Chloride (mg/L) 0.13 0.22 0.24 
11 Hardness (mg/L) 0.09 0.24 0.34 
12 Calcium (mg/L) 0.08 0.25 0.33 
13 Magnesium (mg/L) 0.06 0.19 0.26 
14 Sulphate (mg/L) 0.07 0.21 0.29 
15 Sodium % 0.25 0.06 0.32 
16 TDS (mg/L) 0.00 0.27 0.28 
17 TSS (mg/L) 0.14 0.03 0.17 
18 Total Phosphate (mg/L) 0.02 0.16 0.17 
19 Potassium (mg/L) 0.03 0.12 0.16 
20 Fluoride (mg/L) 0.08 0.05 0.11 
21 Sodium (mg/L) 0.03 0.18 0.22 
22 SAR 0.10 0.19 0.28 

The maximum R2 (regression coefficient which is the square of correlation coefficient) equal to 0.37 was obtained for total alkalinity with all the parameters as independent variables. Then, nonlinear regressions are performed for all the models shown in Table 1 for all the WQPs. For most nonlinear regression models, a good convergence could not be obtained or estimated in the optimization process. The regression coefficients could be estimated for only the nonlinear model numbers 1 (Asymptotic Regression), 18 (Weibull), 11 (Morgan-Mercer-Florin), and 10 (Michaelis Menten). These are mostly lower than linear regression coefficients and are shown in Tables 46.

Table 4

The nonlinear regression results with climate parameters

ParameterP, Tmax, Tmin
R2 (of nonlinear model number)
1111810
DO (mg/L) 0.00 0.00 0.06 0.08 
pH 0.01 0.01 na 0.05 
Electrical conductivity (mS/cm) 0.01 0.01 0.05 0.05 
BOD (mg/L) 0.01 0.01 na 0.04 
COD (mg/L) 0.04 0.03 na 0.12 
Nitrate 0.09 0.10 0.04 0.02 
Total Coliform (MPN/100 ml) 0.05 0.06 0.02 0.05 
Turbidity (NTU) 0.02 0.02 0.00 0.02 
Total Alk. (mg/L) 0.00 0.02 0.02 0.04 
Chloride (mg/L) 0.00 0.00 0.04 0.05 
Hardness (mg/L) 0.00 0.04 0.04 0.04 
Calcium (mg/L) 0.04 0.04 na 0.04 
Magnesium (mg/L) na 0.01 0.04 0.05 
Sulphate (mg/L) 0.05 0.04 0.03 0.03 
Sodium % (mg/L) 0.03 0.12 0.13 0.11 
TDS (mg/L) 0.00 0.02 0.05 0.05 
TSS (mg/L) 0.00 0.01 0.01 0.02 
Total Phosphate (mg/L) 0.00 0.00 0.06 0.06 
Potassium (mg/L) 0.01 0.02 0.03 0.03 
Fluoride (mg/L) 0.05 0.08 0.02 0.03 
Sodium 0.03 0.02 na 0.04 
SAR 0.03 0.02 0.03 0.04 
ParameterP, Tmax, Tmin
R2 (of nonlinear model number)
1111810
DO (mg/L) 0.00 0.00 0.06 0.08 
pH 0.01 0.01 na 0.05 
Electrical conductivity (mS/cm) 0.01 0.01 0.05 0.05 
BOD (mg/L) 0.01 0.01 na 0.04 
COD (mg/L) 0.04 0.03 na 0.12 
Nitrate 0.09 0.10 0.04 0.02 
Total Coliform (MPN/100 ml) 0.05 0.06 0.02 0.05 
Turbidity (NTU) 0.02 0.02 0.00 0.02 
Total Alk. (mg/L) 0.00 0.02 0.02 0.04 
Chloride (mg/L) 0.00 0.00 0.04 0.05 
Hardness (mg/L) 0.00 0.04 0.04 0.04 
Calcium (mg/L) 0.04 0.04 na 0.04 
Magnesium (mg/L) na 0.01 0.04 0.05 
Sulphate (mg/L) 0.05 0.04 0.03 0.03 
Sodium % (mg/L) 0.03 0.12 0.13 0.11 
TDS (mg/L) 0.00 0.02 0.05 0.05 
TSS (mg/L) 0.00 0.01 0.01 0.02 
Total Phosphate (mg/L) 0.00 0.00 0.06 0.06 
Potassium (mg/L) 0.01 0.02 0.03 0.03 
Fluoride (mg/L) 0.05 0.08 0.02 0.03 
Sodium 0.03 0.02 na 0.04 
SAR 0.03 0.02 0.03 0.04 
Table 5

The nonlinear regression results with land use parameters

ParameterAL, FL, GL, SL, UL
R2 (of nonlinear model number)
1111810
DO (mg/L) 0.07 0.14 0.02 0.17 
pH 0.00 0.06 0.06 0.41 
Electrical conductivity (mS/cm) 0.06 0.06 na 0.41 
BOD (mg/L) 0.04 0.11 na 0.05 
COD (mg/L) 0.07 0.16 0.00 0.12 
Nitrate 0.04 0.05 na 0.16 
Total Coliform (MPN/100 ml) 0.01 0.00 0.00 0.13 
Turbidity (NTU) 0.01 0.08 0.02 0.06 
Total Alk. (mg/L) 0.07 0.02 0.17 0.35 
Chloride (mg/L) 0.06 0.02 0.11 0.12 
Hardness (mg/L) 0.19 0.01 0.09 0.35 
Calcium (mg/L) 0.17 0.00 0.10 0.34 
Magnesium (mg/L) 0.07 0.02 0.09 0.25 
Sulphate (mg/L) 0.08 0.02 na 0.21 
Sodium % (mg/L) 0.01 0.01 0.04 0.22 
TDS (mg/L) 0.09 0.02 0.16 0.24 
TSS (mg/L) 0.01 0.02 0.01 0.27 
Total Phosphate (mg/L) 0.04 0.05 0.08 0.12 
Potassium (mg/L) 0.04 0.07 0.07 0.08 
Fluoride (mg/L) 0.00 0.01 na 0.17 
Sodium 0.05 0.01 0.10 0.12 
SAR 0.04 0.01 0.02 0.16 
ParameterAL, FL, GL, SL, UL
R2 (of nonlinear model number)
1111810
DO (mg/L) 0.07 0.14 0.02 0.17 
pH 0.00 0.06 0.06 0.41 
Electrical conductivity (mS/cm) 0.06 0.06 na 0.41 
BOD (mg/L) 0.04 0.11 na 0.05 
COD (mg/L) 0.07 0.16 0.00 0.12 
Nitrate 0.04 0.05 na 0.16 
Total Coliform (MPN/100 ml) 0.01 0.00 0.00 0.13 
Turbidity (NTU) 0.01 0.08 0.02 0.06 
Total Alk. (mg/L) 0.07 0.02 0.17 0.35 
Chloride (mg/L) 0.06 0.02 0.11 0.12 
Hardness (mg/L) 0.19 0.01 0.09 0.35 
Calcium (mg/L) 0.17 0.00 0.10 0.34 
Magnesium (mg/L) 0.07 0.02 0.09 0.25 
Sulphate (mg/L) 0.08 0.02 na 0.21 
Sodium % (mg/L) 0.01 0.01 0.04 0.22 
TDS (mg/L) 0.09 0.02 0.16 0.24 
TSS (mg/L) 0.01 0.02 0.01 0.27 
Total Phosphate (mg/L) 0.04 0.05 0.08 0.12 
Potassium (mg/L) 0.04 0.07 0.07 0.08 
Fluoride (mg/L) 0.00 0.01 na 0.17 
Sodium 0.05 0.01 0.10 0.12 
SAR 0.04 0.01 0.02 0.16 
Table 6

The nonlinear regression results with all the parameters

ParameterP, Tmax, Tmin, AL, FL, GL, SL, UL
R2 (of nonlinear model number)
1111810
DO (mg/L) 0.05 0.04 0.06 0.24 
pH 0.01 0.06 0.07 0.45 
Electrical conductivity (mS/cm) 0.06 0.01 0.06 0.45 
BOD (mg/L) 0.04 0.08 na 0.09 
COD (mg/L) 0.09 0.11 0.12 0.22 
Nitrate 0.08 0.12 0.11 0.17 
Total Coliform (MPN/100 ml) 0.02 0.06 0.07 0.18 
Turbidity (NTU) 0.00 0.01 0.01 0.06 
Total Alk. (mg/L) 0.16 0.10 0.04 0.38 
Chloride (mg/L) 0.01 0.02 0.05 0.16 
Hardness (mg/L) 0.17 0.12 0.12 0.37 
Calcium (mg/L) 0.07 0.14 0.08 0.37 
Magnesium (mg/L) 0.14 0.00 0.04 0.29 
Sulphate (mg/L) 0.21 0.11 na 0.23 
Sodium % (mg/L) 0.08 0.16 0.16 0.24 
TDS (mg/L) 0.17 0.02 0.06 0.28 
TSS (mg/L) 0.02 0.00 0.27 0.27 
Total Phosphate (mg/L) 0.03 0.02 0.13 0.17 
Potassium (mg/L) 0.01 0.01 0.04 0.10 
Fluoride (mg/L) 0.01 0.04 0.07 0.19 
Sodium 0.11 0.05 0.08 0.15 
SAR 0.12 0.06 0.09 0.19 
ParameterP, Tmax, Tmin, AL, FL, GL, SL, UL
R2 (of nonlinear model number)
1111810
DO (mg/L) 0.05 0.04 0.06 0.24 
pH 0.01 0.06 0.07 0.45 
Electrical conductivity (mS/cm) 0.06 0.01 0.06 0.45 
BOD (mg/L) 0.04 0.08 na 0.09 
COD (mg/L) 0.09 0.11 0.12 0.22 
Nitrate 0.08 0.12 0.11 0.17 
Total Coliform (MPN/100 ml) 0.02 0.06 0.07 0.18 
Turbidity (NTU) 0.00 0.01 0.01 0.06 
Total Alk. (mg/L) 0.16 0.10 0.04 0.38 
Chloride (mg/L) 0.01 0.02 0.05 0.16 
Hardness (mg/L) 0.17 0.12 0.12 0.37 
Calcium (mg/L) 0.07 0.14 0.08 0.37 
Magnesium (mg/L) 0.14 0.00 0.04 0.29 
Sulphate (mg/L) 0.21 0.11 na 0.23 
Sodium % (mg/L) 0.08 0.16 0.16 0.24 
TDS (mg/L) 0.17 0.02 0.06 0.28 
TSS (mg/L) 0.02 0.00 0.27 0.27 
Total Phosphate (mg/L) 0.03 0.02 0.13 0.17 
Potassium (mg/L) 0.01 0.01 0.04 0.10 
Fluoride (mg/L) 0.01 0.04 0.07 0.19 
Sodium 0.11 0.05 0.08 0.15 
SAR 0.12 0.06 0.09 0.19 

The regression coefficients of nonlinear models for the climate parameters are given in Table 4 and land use parameters in Table 5. The regression coefficients of nonlinear models for all the parameters are given in Table 6.

SPSS multilayer perceptron model simulations

The SPSS simulations are performed for a multilayer perceptron model with one input layer, one hidden layer, and an output layer. The hidden nodes are varied from 1 to 50 in the hidden layer. The network architecture consists of eight input nodes, a few hidden nodes, and an output node. The scaled conjugate gradient principle is mostly used as an optimization algorithm to determine weights. The data have been divided into 70% training and 30% testing datasets. The overall regression coefficients (R2 values), RMSE, MAE, MBE, D, and NSE are given in Table 7. The preliminary SPSS simulations are explored primarily to study the suitability of neural networks for stream water quality modeling problems. The fine tuning of the neural network simulations in model architecture, variety of training algorithms, testing, and validation are explored using MATLAB simulations as explained in the next section.

Table 7

The regression metrics of output parameters using SPSS simulations

S. No.ParameterRMSEOverall R2DMAEMBENSENetwork architecture
DO (mg/L) 0.50 0.52 0.71 0.40 0.04 0.51 8-5-1 
pH 0.33 0.35 0.21 0.24 0.01 0.21 8-5-1 
Electrical conductivity (mS/cm) 133.07 0.79 0.92 91.43 −2.76 0.78 8-5-1 
BOD (mg/L) 0.94 0.64 0.82 0.50 −0.02 0.63 8-5-1 
COD (mg/L) 6.63 0.83 0.95 5.12 −0.188 0.83 8-8-1 
Nitrate 0.87 0.47 0.78 0.60 0.04 0.47 8-5-1 
Total Coliform (MPN/100 ml) 22.66 0.29 0.45 14.93 0.56 0.29 8-3-1 
Turbidity (NTU) 7.38 0.15 −0.34 3.75 −0.02 0.15 8-6-1 
Total Alk. (mg/L) 32.05 0.73 0.92 22.76 −3.52 0.72 8-5-1 
10 Chloride (mg/L) 25.76 0.66 0.87 15.82 −0.04 0.66 8-5-1 
11 Hardness (mg/L) 42.62 0.66 0.88 30.07 −1.30 0.65 8-5-1 
12 Calcium (mg/L) 9.995 0.66 0.88 7.58 −0.808 0.65 8-5-1 
13 Magnesium (mg/L) 6.42 0.48 0.78 4.801 0.32 0.48 8-8-1 
14 Sulphate (mg/L) 13.34 0.66 0.89 9.44 −0.189 0.66 8-8-1 
15 Sodium (mg/L) 17.09 0.78 0.94 11.70 0.80 0.78 8-8-1 
16 TDS (mg/L) 84.88 0.76 0.929 59.35 5.76 0.76 8-8-1 
17 TSS (mg/L) 12.26 0.56 0.78 7.29 −0.79 0.56 8-7-1 
18 Total Phosphate (mg/L) 0.27 0.67 0.88 0.16 0.01 0.67 8-5-1 
19 Potassium (mg/L) 1.61 0.64 0.85 1.04 −0.12 0.61 8-6-1 
20 Fluoride (mg/L) 0.27 0.10 0.3 0.20 0.01 0.10 8-5-1 
21 Sodium % 7.99 0.35 0.55 6.15 −0.41 0.35 8-4-1 
22 SAR 0.44 0.72 0.91 0.31 0.00 0.71 8-5-1 
S. No.ParameterRMSEOverall R2DMAEMBENSENetwork architecture
DO (mg/L) 0.50 0.52 0.71 0.40 0.04 0.51 8-5-1 
pH 0.33 0.35 0.21 0.24 0.01 0.21 8-5-1 
Electrical conductivity (mS/cm) 133.07 0.79 0.92 91.43 −2.76 0.78 8-5-1 
BOD (mg/L) 0.94 0.64 0.82 0.50 −0.02 0.63 8-5-1 
COD (mg/L) 6.63 0.83 0.95 5.12 −0.188 0.83 8-8-1 
Nitrate 0.87 0.47 0.78 0.60 0.04 0.47 8-5-1 
Total Coliform (MPN/100 ml) 22.66 0.29 0.45 14.93 0.56 0.29 8-3-1 
Turbidity (NTU) 7.38 0.15 −0.34 3.75 −0.02 0.15 8-6-1 
Total Alk. (mg/L) 32.05 0.73 0.92 22.76 −3.52 0.72 8-5-1 
10 Chloride (mg/L) 25.76 0.66 0.87 15.82 −0.04 0.66 8-5-1 
11 Hardness (mg/L) 42.62 0.66 0.88 30.07 −1.30 0.65 8-5-1 
12 Calcium (mg/L) 9.995 0.66 0.88 7.58 −0.808 0.65 8-5-1 
13 Magnesium (mg/L) 6.42 0.48 0.78 4.801 0.32 0.48 8-8-1 
14 Sulphate (mg/L) 13.34 0.66 0.89 9.44 −0.189 0.66 8-8-1 
15 Sodium (mg/L) 17.09 0.78 0.94 11.70 0.80 0.78 8-8-1 
16 TDS (mg/L) 84.88 0.76 0.929 59.35 5.76 0.76 8-8-1 
17 TSS (mg/L) 12.26 0.56 0.78 7.29 −0.79 0.56 8-7-1 
18 Total Phosphate (mg/L) 0.27 0.67 0.88 0.16 0.01 0.67 8-5-1 
19 Potassium (mg/L) 1.61 0.64 0.85 1.04 −0.12 0.61 8-6-1 
20 Fluoride (mg/L) 0.27 0.10 0.3 0.20 0.01 0.10 8-5-1 
21 Sodium % 7.99 0.35 0.55 6.15 −0.41 0.35 8-4-1 
22 SAR 0.44 0.72 0.91 0.31 0.00 0.71 8-5-1 

MATLAB simulations using the Levenberg–Marquardt algorithm, RBFNN

The MATLAB simulations are performed for feedforward neural networks using one input layer, one hidden layer, and an output layer. The number of hidden nodes was varied from 1 to 17 and found the best results at hidden nodes equal to 17. The input nodes were fixed to be 8, and the output node to 1. The optimal number of hidden nodes was found to be the maximum of 0 to (2n + 1) nodes given by the Kolmogorov mapping theorem (Hecht-Nielsen 1988). The Levenberg–Marquardt steepest gradient descent principle is used to learn weight updation. This algorithm gave better results than Bayesian Regularization, Scaled Conjugate Gradient, and many other algorithms in the MATLAB R2020 toolbox. The data of each WQP are divided into training, validation, and testing set consisting of 70, 15, and 15% sizes of datasets. The correlation coefficients (R values) of training, validation, testing, and overall for all of the parameters are given in Table 8. MATLAB environment gives the model performance in correlation coefficients (R) instead of regression coefficients (R2). The neural network predictions of EC are shown in Figure 4.
Table 8

The correlation coefficient (R) of output parameters using the Levenberg–Marquardt algorithm

S. No.ParameterTrainingValidationTestingOverallNetwork architecture
DO (mg/L) 0.76 0.72 0.78 0.75 8-17-1 
pH 0.67 0.54 0.51 0.62 8-17-1 
Electrical conductivity (mS/cm) 0.94 0.95 0.93 0.94 8-17-1 
BOD (mg/L) 0.82 0.89 0.83 0.83 8-17-1 
COD (mg/L) 0.93 0.94 0.90 0.92 8-17-1 
Nitrate 0.72 0.72 0.74 0.72 8-17-1 
Total Coliform (MPN/100 ml) 0.70 0.76 0.61 0.70 8-17-1 
Turbidity (NTU) 0.49 0.50 0.45 0.48 8-17-1 
Total Alk. (mg/L) 0.91 0.82 0.90 0.90 8-17-1 
10 Chloride (mg/L) 0.90 0.88 0.82 0.89 8-17-1 
11 Hardness (mg/L) 0.94 0.84 0.80 0.90 8-17-1 
12 Calcium (mg/L) 0.91 0.80 0.83 0.87 8-17-1 
13 Magnesium (mg/L) 0.80 0.77 0.80 0.80 8-17-1 
14 Sulphate (mg/L) 0.89 0.71 0.71 0.85 8-17-1 
15 Sodium (mg/L) 0.90 0.94 0.91 0.92 8-17-1 
16 TDS (mg/L) 0.92 0.95 0.96 0.92 8-17-1 
17 TSS (mg/L) 0.87 0.85 0.73 0.81 8-17-1 
18 Total Phosphate (mg/L) 0.84 0.70 0.63 0.75 8-17-1 
19 Potassium (mg/L) 0.77 0.60 0.63 0.71 8-17-1 
20 Fluoride (mg/L) 0.57 0.41 0.44 0.54 8-17-1 
21 Sodium % 0.74 0.65 0.66 0.71 8-17-1 
22 SAR 0.90 0.75 0.84 0.86 8-17-1 
S. No.ParameterTrainingValidationTestingOverallNetwork architecture
DO (mg/L) 0.76 0.72 0.78 0.75 8-17-1 
pH 0.67 0.54 0.51 0.62 8-17-1 
Electrical conductivity (mS/cm) 0.94 0.95 0.93 0.94 8-17-1 
BOD (mg/L) 0.82 0.89 0.83 0.83 8-17-1 
COD (mg/L) 0.93 0.94 0.90 0.92 8-17-1 
Nitrate 0.72 0.72 0.74 0.72 8-17-1 
Total Coliform (MPN/100 ml) 0.70 0.76 0.61 0.70 8-17-1 
Turbidity (NTU) 0.49 0.50 0.45 0.48 8-17-1 
Total Alk. (mg/L) 0.91 0.82 0.90 0.90 8-17-1 
10 Chloride (mg/L) 0.90 0.88 0.82 0.89 8-17-1 
11 Hardness (mg/L) 0.94 0.84 0.80 0.90 8-17-1 
12 Calcium (mg/L) 0.91 0.80 0.83 0.87 8-17-1 
13 Magnesium (mg/L) 0.80 0.77 0.80 0.80 8-17-1 
14 Sulphate (mg/L) 0.89 0.71 0.71 0.85 8-17-1 
15 Sodium (mg/L) 0.90 0.94 0.91 0.92 8-17-1 
16 TDS (mg/L) 0.92 0.95 0.96 0.92 8-17-1 
17 TSS (mg/L) 0.87 0.85 0.73 0.81 8-17-1 
18 Total Phosphate (mg/L) 0.84 0.70 0.63 0.75 8-17-1 
19 Potassium (mg/L) 0.77 0.60 0.63 0.71 8-17-1 
20 Fluoride (mg/L) 0.57 0.41 0.44 0.54 8-17-1 
21 Sodium % 0.74 0.65 0.66 0.71 8-17-1 
22 SAR 0.90 0.75 0.84 0.86 8-17-1 
Figure 4

ANN simulation of electrical conductivity.

Figure 4

ANN simulation of electrical conductivity.

Close modal
We obtain a correlation coefficient (R) of 0.94 in training, 0.95 in validation, 0.93 in testing, and an overall value of 0.94. This result is obtained for network architecture of 8-17-1 and the Levenberg–Marquardt training algorithm. The neural network predictions of dissolved oxygen are shown in Figure 5 for the same network parameters and training algorithm.
Figure 5

ANN simulation of dissolved oxygen (DO).

Figure 5

ANN simulation of dissolved oxygen (DO).

Close modal
We obtain correlation coefficient values of 0.76, 0.72, 0.78, and 0.75 in training, validation, testing, and overall. The neural network predictions of BOD are shown in Figure 6 for the same input parameters. In this case, we obtain correlation coefficient values of 0.82, 0.89, 0.83, and 0.83 in training, validation, testing, and overall.
Figure 6

ANN simulation of biochemical oxygen demand (BOD).

Figure 6

ANN simulation of biochemical oxygen demand (BOD).

Close modal
Similarly, the neural network predictions of nitrate yield 0.72, 0.72, 0.74, and 0.72 in training, validation, testing, and overall. These results are shown in Figure 7.
Figure 7

ANN simulation of nitrate (NO3-N).

Figure 7

ANN simulation of nitrate (NO3-N).

Close modal
Then PCA is performed on the input dataset of eight climate and land use parameters using SPSS. As shown in Table 9, the PCA findings reveal that the three of the eight input parameters have eigenvalues greater than 1.0. The scree plot of eigenvalues is shown in Figure 8.
Table 9

Total variance explained using SPSS

ComponentInitial eigenvalues
Extraction sums of squared loadings
Rotation sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %
1 (P) 3.42 42.73 42.73 3.42 42.73 42.73 3.40 42.56 42.56 
2 (Tmax1.61 20.13 62.86 1.61 20.13 62.86 1.62 20.25 62.81 
3 (Tmin1.28 15.93 78.79 1.28 15.93 78.79 1.28 15.99 78.79 
4 (AL) 0.93 11.67 90.46 – – – – – – 
5 (FL) 0.61 7.67 98.13 – – – – – – 
6 (GL) 0.12 1.44 99.58 – – – – – – 
7 (SL) 0.03 0.42 99.99 – – – – – – 
8 (UL) 0.00 0.01 100.0 – – – – – – 
ComponentInitial eigenvalues
Extraction sums of squared loadings
Rotation sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %
1 (P) 3.42 42.73 42.73 3.42 42.73 42.73 3.40 42.56 42.56 
2 (Tmax1.61 20.13 62.86 1.61 20.13 62.86 1.62 20.25 62.81 
3 (Tmin1.28 15.93 78.79 1.28 15.93 78.79 1.28 15.99 78.79 
4 (AL) 0.93 11.67 90.46 – – – – – – 
5 (FL) 0.61 7.67 98.13 – – – – – – 
6 (GL) 0.12 1.44 99.58 – – – – – – 
7 (SL) 0.03 0.42 99.99 – – – – – – 
8 (UL) 0.00 0.01 100.0 – – – – – – 

Extraction Method: PCA.

Figure 8

The eigenvalues of the input dataset of eight parameters.

Figure 8

The eigenvalues of the input dataset of eight parameters.

Close modal

These three parameters are only the climate parameters – mean daily precipitation, maximum temperature, and minimum temperature. Then, the Radial Basis Function neural network (RBFNN) predictions of WQPs EC, DO, BOD, and nitrate using only these three input parameters are performed and yielded correlation coefficients less than 0.50 using MATLAB. If we see Table 9, the cumulative variance up to 98% is explained better using the first five input parameters, namely mean daily precipitation, maximum and minimum temperatures, agricultural land use factor, and forest land use factor. Therefore, RBFNN simulations are performed for EC, DO, BOD, and nitrate using these five input parameters to obtain correlation coefficients as shown in Table 10 for the entire dataset. We can see that the four parameters’ overall (R) values got slightly better than the results of simple feedforward neural networks obtained in Table 8 (with all the inputs). These results are better than the preliminary ANN results of Anmala et al. (2015) and as good as Venkateshwarlu et al. (2020) which are explored for the Upper Green River water quality data. In the current study and in Anmala et al. (2015), separate ANNs are explored for each of the WQP, whereas composite neural networks are explored for simultaneous, multiple output parameter predictions in Venkateshwarlu et al. (2020). The real-time and fast predictions are made for WQPs of the Upper Green River Watershed using extreme learning machine (ELM) networks in Anmala & Turuganti (2021). While the PCA results of Venkateshwarlu et al. (2020) indicated the effectiveness of only climate parameters (precipitation and temperature), the current PCA results indicate the effectiveness of climate and two land use factors – agricultural and forest in accurate water quality prediction. Venkateshwarlu et al.'s (2020) study was performed on the Karst watershed, i.e., Upper Green River Basin, Kentucky, USA while the current study dealt with the non-Karst watershed of the Godavari River Basin. All of these studies including the current study have developed ANNs for WQP predictions in a causal modeling framework. The results essentially outline the importance, potential, and applicability of ANNs for highly nonlinear stream water quality problems in Karst and non-Karst river basins. There have been many other studies of ANNs, where one WQP is predicted from remaining or available other WQPs or using its own time history, or using simply correlations between them and not so much using a causal modeling framework of climate and land use parameters as in the current study. Di Nunno et al. (2022) predict the nitrate concentrations in the Susquehanna River and the Raccoon River, USA accurately (R2 = 0.77 and 0.94) using recurrent neural networks and time-series models with exogenous inputs such as water discharge, water temperature, dissolved oxygen, and specific conductance. Rajwade et al. (2021) predict BOD from 15 different combinations of available physical, chemical, and biological water quality parameters for Gola River, Uttarakhand, India, and obtained a maximum R2 value of 0.997 using ANNs compared with a maximum of 0.861 using multiple linear regression. Ravansalar & Rajaee (2015) obtained an R2 value of 0.949 using wavelet-based ANNs against a value of 0.381 using ANNs in the prediction of electrical conductivity for Asi River, Turkey. Alam et al. (2021) have used Weighted Regression on Time, Discharge and Seasons (WRTDS) to analyze the long-term trends of water quality especially that of BOD, DO, and nitrate-nitrite (NN), and found the influences of wastewater treatment plants (WWTPs), combined sewage outflows (CSOs), and agricultural runoff in increase of pollution levels for White River at Muncie, IN, USA. However, the current study is limited to the temporal prediction of WQPs from climate and land use parameters and has not considered the influences of discharge and seasons separately. The influence of discharge is intrinsically considered in the climate parameter, i.e., precipitation as one of the model inputs in the current study. The importance of N pressure from agriculture on surface water quality (D'Haene et al. 2022) can be seen in the current model's selective inputs decided by PCA, and the current model can be further used for nitrogen mitigation measures to achieve a good surface water quality status. The above-stated effective five inputs are used again in feedforward neural networks for better predictions and obtainment of functional form for water quality variables such as electrical conductivity, which is discussed in the next section.

Table 10

The regression metrics of four output parameters using MATLAB-RBF for the entire dataset

S. No.ParameterRMSEROverall R2DMAEMBENSENetwork architecture
Electrical conductivity (mS/cm) 0.004 0.999 0.998 0.003 0.998 5-255-1 
DO (mg/L) 0.023 0.984 0.970 0.992 0.008 0.970 5-255-1 
BOD (mg/L) 0.075 0.998 0.995 0.999 0.003 0.995 5-255-1 
Nitrate (mg/L) 0.16 0.972 0.945 0.985 0.113 0.945 5-255-1 
S. No.ParameterRMSEROverall R2DMAEMBENSENetwork architecture
Electrical conductivity (mS/cm) 0.004 0.999 0.998 0.003 0.998 5-255-1 
DO (mg/L) 0.023 0.984 0.970 0.992 0.008 0.970 5-255-1 
BOD (mg/L) 0.075 0.998 0.995 0.999 0.003 0.995 5-255-1 
Nitrate (mg/L) 0.16 0.972 0.945 0.985 0.113 0.945 5-255-1 

Equation of electrical conductivity from MATLAB simulations

The mathematical equation as suggested by Haykin (1999) and Goh et al. (2005) involving the input variables and the output variable as per the ANN model is written as:
formula
(12)

In these simulations, the input and output parameters are normalized in the range of [−1, 1] for tan-sigmoid transfer function. In the above expression, is the normalized output variable, is the transfer function of the output layer, is the output layer bias, is the connection weight between the kth hidden neuron and the single output neuron, is the transfer function of the hidden layer, is the bias of the kth hidden neuron, is the connection weight between the ith input variable and the kth hidden neuron, and is the normalized ith input variable.

Using the five effective input parameters obtained in the PCA analysis, and successful feedforward neural network training in MATLAB, the weights and biases of network architecture of 5-11-1 are substituted in the above Equation (12), to develop the model equation for the prediction of electrical conductivity as follows. The following node level equations can be written to arrive at electrical conductivity prediction with the five effective input parameters.
formula
(13)
formula
(14)
formula
(15)
formula
(16)
formula
(17)
formula
(18)
formula
(19)
formula
(20)
formula
(21)
formula
(22)
formula
(23)
In the above equations, the coefficients multiplying the independent input parameters are the weights between input nodes and hidden nodes, and the constants are the hidden node biases.
formula
(24)
In the above equation, the coefficients multiplying the hyperbolic tangent terms are the weights between hidden nodes and the output node, and the constant is the output node bias.
formula
(25)
Equation (25) can be de-normalized to get the actual electrical conductivity value as follows:
formula
(26)

Equation (26) developed for the prediction of the electrical conductivity needs to be applied in the range of datasets for which the feedforward neural network was trained. For the dataset considered in the present study, the maximum and minimum electrical conductivity values are 2,822.0 and 67.0 mS/cm, respectively. The results of correlation coefficients (R) of training, validation, testing, and overall feedforward neural networks with five effective input parameters (obtained using PCA) are shown in Table 11. It can be noticed that these results are slightly better than the results in Table 8. Moreover, the above functional forms can be developed for all the WQPs individually using the effective input parameters of PCA.

Table 11

The correlation coefficients (R) of FFNN simulation results of electrical conductivity, DO, BOD and nitrate using five effective inputs in MATLAB environment

S. No.ParameterTrainingValidationTestingAllNetwork architecture
Electrical conductivity (mS/cm) 0.94 0.94 0.96 0.95 5-11-1 
DO (mg/L) 0.76 0.76 0.74 0.76 5-11-1 
BOD (mg/L) 0.83 0.83 0.85 0.81 5-11-1 
Nitrate (mg/L) 0.75 0.79 0.78 0.75 5-11-1 
S. No.ParameterTrainingValidationTestingAllNetwork architecture
Electrical conductivity (mS/cm) 0.94 0.94 0.96 0.95 5-11-1 
DO (mg/L) 0.76 0.76 0.74 0.76 5-11-1 
BOD (mg/L) 0.83 0.83 0.85 0.81 5-11-1 
Nitrate (mg/L) 0.75 0.79 0.78 0.75 5-11-1 

Stream water quality modeling is a highly complex nonlinear problem due to basin characteristics, atmospheric influences, and turbulent nature of fluid flow. So linear regression models were not able to give accurate results in the absence of linear correlations between the cause and effect variables. Statistical nonlinear regression models would possibly require more information regarding the physics of the problem for better results. This could also mean the addition of new terms into the models. It is difficult to establish stoppage criteria for inclusion of new terms into the nonlinear regression models. This difficulty is alleviated in neural network models due to the flexibility in choosing network architecture, especially the hidden layer with hidden neurons. This makes ANN models better than nonlinear regression models for highly nonlinear problems. They use relatively less information or they can handle with easily measurable data. Backpropagation and feedforward algorithms aid in adjusting the weights of the input to hidden and output layers to get the lowest error and highest correlation value.

In this study, the potential and applicability of statistical linear and nonlinear regression models and ANNs are investigated in modeling a stream water quality problem for a stream network in a non-Karst watershed. The study investigates the use of climate and land use parameters in predicting the SWQPs in a causal modeling framework with the help of statistical and neural network models. This study extends the methodology of Anmala et al. (2015) for the Godavari River Watershed in the Telangana region of India. The use of ANNs, PCA in reducing the input data dimension, and RBFNNs has been justified as satisfactory results have been obtained in solving the highly nonlinear problem of stream water quality prediction in causal modeling framework. From the above results, the following conclusions can be summarized. The predictions of stream water quality using ANNs with climate and land use factors as inputs are superior to statistical linear and nonlinear regression approaches for all the WQPs. This is evident in the regression coefficients obtained (using Pearson correlation coefficient) for the statistical linear, nonlinear, and neural network models. The PCA revealed that only five out of eight parameters are effective in modeling stream water quality problems using neural networks. Using PCA, the inclusion of land use factors along with climate parameters is successful in the prediction of WQPs of the non-Karst watershed in the current study. A concise equation for electrical conductivity prediction is developed using only the five effective input parameters. The procedure developed for predictive equation of electrical conductivity can be used to develop similar equations individually for all the WQPs of riverine watersheds for efficient environmental, monitoring, and assessment purposes.

All relevant data are available from an online repository or repositories. SRTM 30m DEM data https://earthexplorer.usgs.gov/. Water quality data https://tspcb.cgg.gov.in/Pages/Envdata.aspx. The rainfall and temperature data https://cdsp.imdpune.gov.in/. For 2021year, rainfall and temperature data https://power.larc.nasa.gov/data-access-viewer/. Global land cover mapping at 30 m resolution data http://globallandcover.com/

The authors declare there is no conflict.

Alam
M. S.
,
Han
B.
,
Gregg
A.
&
Pichtel
J.
2021
Nitrate and biochemical oxygen demand change in a typical Midwest stream in the past two decades
.
H2Open Journal
3
(
1
),
519
537
.
doi:10.2166/h2oj.2020.054
.
Anmala
J.
&
Venkateshwarlu
T.
2019
Statistical assessment and neural network modeling of stream water quality observations of Green River watershed, KY, USA
.
Water Science and Technology: Water Supply
19
(
6
),
1831
1840
.
https://doi.org/10.2166/ws.2019.058
.
Anmala
J.
,
Meier
O. W.
,
Meier
A. J.
&
Grubbs
S.
2015
GIS and artificial neural network-based water quality model for a stream network in the Upper Green River Basin, Kentucky, USA
.
Journal of Environmental Engineering
141
(
5
),
04014082
.
https://doi.org/10.1061/(asce)ee.1943-7870.0000801
.
Anmala, J. & Turuganti, V. 2021 Comparison of the performance of decision tree (DT) algorithms and extreme learning machine (ELM) model in the prediction of water quality of the Upper Green River watershed. Water Environment Research 93 (11), 2360–2373
.
Arslan
O.
2013
Spatially weighted principal component analysis (PCA) method for water quality analysis
.
Water Resources
40
(
3
),
315
324
.
https://doi.org/10.1134/S0097807813030111
.
Bayram
A.
,
Kankal
M.
&
Önsoy
H.
2012
Estimation of suspended sediment concentration from turbidity measurements using artificial neural networks
.
Environmental Monitoring and Assessment
184
(
7
),
4355
4365
.
https://doi.org/10.1007/s10661-011-2269-2
.
Burigato Costa
C. M. d. S.
,
da Silva Marques
L.
,
Almeida
A. K.
,
Leite
I. R.
&
de Almeida
I. K.
2019
Applicability of water quality models around the world – a review
.
Environmental Science and Pollution Research
26
(
36
),
36141
36162
.
https://doi.org/10.1007/s11356-019-06637-2
.
Caballero
I.
,
Steinmetz
F.
&
Navarro
G.
2018
Evaluation of the first year of operational Sentinel-2A data for retrieval of suspended solids in medium- to high-turbidity waters
.
Remote Sensing
10
(
7
).
https://doi.org/10.3390/rs10070982
.
D'Haene
K. D.
,
De Waele
J.
,
De Neve
S.
&
Hofman
G.
2022
Spatial distribution of the relationship between nitrate residues in soil and surface water quality revealed through attenuation factors
.
Agriculture, Ecosystems and Environment
330
,
107889
.
https://doi.org/10.1016/j.agee.2022.107889
.
Di Nunno
F.
,
Race
M.
&
Franata
F.
2022
A nonlinear autoregressive exogenous (NARX) model to predict nitrate concentration in rivers
.
Environmental Science and Pollution Research
29
,
40623
40642
.
https://doi.org/10.1007/s11356-021-18221-8
.
Dwivedi
S.
,
Mishra
S.
&
Tripathi
R. D.
2018
Ganga water pollution: a potential health threat to inhabitants of Ganga basin
.
Environment International
117
(
5
),
327
338
.
https://doi.org/10.1016/j.envint.2018.05.015
.
Fathi
E.
,
Zamani-Ahmadmahmoodi
R.
&
Zare-Bidaki
R
. (
2018
)
Water quality evaluation using water quality index and multivariate methods, Beheshtabad River, Iran
.
Applied Water Science
8
(
7
),
https://doi.org/10.1007/s13201-018-0859-7
.
Gajendran
C.
,
Thamarai
P.
&
Basker
R.
2010
Water quality evaluation for Nambiyar River Basin, Tamil Nadu, India by using geo-statistical analysis
.
Asian Journal of Microbiology, Biotechnology and Environmental Sciences
12
(
3
),
555
560
.
Girija
T. R.
,
Mahanta
C.
&
Chandramouli
V.
2007
Water quality assessment of an untreated effluent impacted urban stream: the Bharalu tributary of the Brahmaputra River, India
.
Environmental Monitoring and Assessment
130
(
1–3
),
221
236
.
https://doi.org/10.1007/s10661-006-9391-6
.
Goh
A. T. C.
,
Kulhawy
F. H.
&
Chua
C. G.
2005
Bayesian neural network analysis of undrained side resistance of drilled shafts
.
Journal of Geotechnical and Geoenvironmental Engineering
131
(
1
),
84
93
.
https://doi.org/10.1061/(asce)1090-0241(2005)131:1(84)
.
Haykin
S.
1999
Neural Networks: A Comprehensive Foundation
.
Pearson Education Inc.
,
Delhi
.
Hecht-Nielsen
R.
1988
Applications of counterpropagation networks
.
Neural Networks
1
(
2
),
131
139
.
https://doi.org/10.1016/0893-6080(88)90015-9
.
Heddam
S.
2014
Modeling hourly dissolved oxygen concentration (DO) using two different adaptive neuro-fuzzy inference systems (ANFIS): a comparative study
.
Environmental Monitoring and Assessment
186
(
1
),
597
619
.
https://doi.org/10.1007/s10661-013-3402-1
.
Indian Meteorological Department
n.d.
Climate Data Service Portal
.
Jolliffe
I. T.
&
Cadima
J.
2016
Principal component analysis: a review and recent developments
.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
374
(
2065
),
20150202
.
Joshi
D. M.
,
Kumar
A.
&
Agrawal
N.
2009
Studies on physicochemical parameters to assess the water quality of river Ganga for drinking purpose in Haridwar district
.
Rasayan Journal of Chemistry
2
(
1
),
195
203
.
Kadam
A. K.
,
Wagh
V. M.
,
Muley
A. A.
,
Umrikar
B. N.
&
Sankhua
R. N.
2019
Prediction of water quality index using artificial neural network and multiple linear regression modelling approach in Shivganga River basin, India
.
Modeling Earth Systems and Environment
5
(
3
),
951
962
.
https://doi.org/10.1007/s40808-019-00581-3
.
KC
A.
,
Chalise
A.
,
Parajuli
D.
,
Dhital
N.
,
Shrestha
S.
&
Kandel
T.
2019
Surface water quality assessment using remote sensing, GIS and artificial intelligence
.
Technical Journal
1
(
1
),
113
122
.
https://doi.org/10.3126/tj.v1i1.27709
.
Kumar
B.
,
Singh
U. K.
,
Mukerjee
I.
&
Daas
B. M.
2016
Water quality status of Indian major Rivers with reference to agriculture and drinking purposes
. In:
Geoanthropogenic Environment: An Appraisal
, 1st edn (V. A. V. Raman, ed.).
A.K. Publishers
, Delhi, pp.
65
81
.
Mandal
P.
,
Upadhyay
R.
&
Hasan
A.
2010
Seasonal and spatial variation of Yamuna River water quality in Delhi, India
.
Environmental Monitoring and Assessment
170
(
1–4
),
661
670
.
https://doi.org/10.1007/s10661-009-1265-2
.
Ministry of Jal Shakti
n.d.
Godavari River Management Board
.
Najafzadeh
M.
&
Ghaemi
A.
2019
Prediction of the five-day biochemical oxygen demand and chemical oxygen demand in natural streams using machine learning methods
.
Environmental Monitoring and Assessment
191
(
6
),
1
21
.
https://doi.org/10.1007/s10661-019-7446-8
.
POWER Data Access Viewer
n.d.
POWER Single Point
.
Rajwade
Y. A.
,
Adamala
S.
,
Kumar
Y.
&
Kumar
S.
2021
Application of artificial neural networks for BOD and COD modelling in Gola River, Uttarakhand, India
.
Arabian Journal of Geosciences
14
,
1326
.
https://doi.org/10.1007/s12517-021-07637-8
.
Ravansalar
M.
&
Rajaee
T.
2015
Evaluation of wavelet performance via an ANN-based electrical conductivity prediction model
.
Environmental Monitoring and Assessment
187
(
6
).
https://doi.org/10.1007/s10661-015-4590-7
.
Ritchie
J. C.
,
Zimba
P. V.
&
Everitt
J. H.
2003
Remote sensing techniques to assess water quality
.
Photogrammetric Engineering and Remote Sensing
69
(
6
),
695
704
.
https://doi.org/10.14358/PERS.69.6.695
.
Rumelhart
D. E.
,
Hinton
G. E.
&
Williams
R. J.
1986
Learning representations by back-propagating errors
.
Nature
323
(
6088
),
533
536
.
https://doi.org/10.1038/323533a0
.
Shah
K. A.
&
Joshi
G. S.
2017
Evaluation of water quality index for River Sabarmati, Gujarat, India
.
Applied Water Science
7
(
3
),
1349
1358
.
https://doi.org/10.1007/s13201-015-0318-7
.
Sinha
K.
&
Das
S. P.
2015
Assessment of water quality index using cluster analysis and artificial neural network modeling: a case study of the Hooghly River basin, West Bengal, India
.
Desalination and Water Treatment
54
(
1
),
28
36
.
https://doi.org/10.1080/19443994.2014.880379
.
Solaraj
G.
,
Dhanakumar
S.
,
Rutharvel Murthy
K.
&
Mohanraj
R.
2010
Water quality in select regions of Cauvery Delta River basin, southern India, with emphasis on monsoonal variation
.
Environmental Monitoring and Assessment
166
(
1–4
),
435
444
.
https://doi.org/10.1007/s10661-009-1013-7
.
Song
S.
,
Zheng
X.
&
Li
F.
2010
Surface water quality forecasting based on ANN and GIS for the Chanzhi Reservoir, China
. In:
International Conference on Information Science and Engineering
, pp.
4094
4097
.
https://doi:10.1109/icise.2010.5689328
.
Srivastava
P.
,
Sreekrishnan
T. R.
&
Nema
A. K.
2017
Human health risk assessment and PAHs in a stretch of river Ganges near Kanpur
.
Environmental Monitoring and Assessment
189
(
9
).
https://doi.org/10.1007/s10661-017-6146-5
.
Sutradhar
H.
2020
Assessment of drainage morphometry and watersheds prioritization of Siddheswari River Basin, Eastern India
.
Journal of the Indian Society of Remote Sensing
48
(
4
),
627
644
.
https://doi.org/10.1007/s12524-020-01108-5
.
Telangana State Pollution Control Board
n.d.
Environmental Data
.
United States Geological Survey (USGS). n.d. Earth Explorer. https://earthexplorer.usgs.gov/
Venkateswarlu
T.
,
Anmala
J.
&
Dharwa
M.
2020
PCA, CCA, and ANN modeling of climate and land-use effects on stream water quality of Karst watershed in Upper Green River, Kentucky
.
Journal of Hydrologic Engineering
25
(
6
),
05020008
.
https://doi.org/10.1061/(asce)he.1943-5584.0001921
.
Yang
W.
,
Zhao
Y.
,
Wang
D.
,
Wu
H.
,
Lin
A.
&
He
L.
2020
Using principal components analysis and IDW interpolation to determine spatial and temporal changes of surface water quality of Xin'Anjiang river in Huangshan, China
.
International Journal of Environmental Research and Public Health
17
(
8
),
1
14
.
https://doi.org/10.3390/ijerph17082942
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).