## Abstract

The successful prediction of the stream or river water quality is gaining the attention of various governmental agencies, and pollution control boards worldwide due to its useful applications in determining watershed health, biodiversity, ecology, and suitability of potable water needs of the river basin. The physically based computational water quality models would require large spatial and temporal information databases of climatic, hydrologic, and environmental variables and solutions of nonlinear, partial differential equations at each grid point in a river basin. These models suffer from estimability, convergence, stability, approximation, dispersion, and consistency issues. In such a problematic modeling scenario, an artificial neural network (ANN) modeling of 22 stream water quality parameters (SWQPs) is performed from easily measurable data of precipitation, temperature, and novel land use parameters obtained from Geographic Information System (GIS) analysis for the Godavari River Basin, India. The ANN models are compared with the more traditional, statistical linear, and nonlinear regression models for accuracy and performance statistics. This study obtains regression coefficients of 0.93, 0.78, 0.83, and 0.74 for electrical conductivity, dissolved oxygen, biochemical oxygen demand, and nitrate in testing using feedforward ANNs compared with a maximum of 0.45 using linear and nonlinear regressions. Principal component analysis (PCA) is performed to reduce the input data dimension. The subsequent modeling using radial basis function and ANNs is found to improve the overall regression coefficients slightly for the chosen four water quality parameters (WQPs). A closed form equation for electrical conductivity has been derived from MATLAB simulations. The successful modeling results indicate the effectiveness and potential of ANNs over the statistical regression approaches for estimating the highly nonlinear problem of stream water quality distributions.

## HIGHLIGHTS

A GIS, ANN-based causal WQ model is developed for a non-Karst watershed.

Novel land use factors are developed for the model.

PCA-based ANN models are found to be superior compared with others.

An equation for conductivity is developed from MATLAB simulations.

Land use parameters are also important along with climate parameters for water quality model development.

### Graphical Abstract

## INTRODUCTION

Freshwater availability on Earth is limited to 3%, and it is essential to protect human and other biotic ecosystems to ensure continued survival (NASA n.d.). Even though natural streams are the primary water source, they are most affected by many anthropogenic activities, such as rapid urbanization, agriculture, and industrial activities (Srivastava *et al.* 2017). Preventing unregulated waste deposition from industry and regularly monitoring and inspecting streams will enhance water quality (Shah & Joshi 2017). Currently, stream water quality is measured only at a few locations for any river basin. The desirable scenario would be to have stream water quality variables at a significant number of locations, requiring a lot more resources and continuous monitoring from various governmental agencies. Cost, time, and accessibility of the site are also significant impediments to conventional water quality testing methods for monitoring purposes.

Moreover, *in-situ* data will give point analysis rather than the overall water quality assessment for the entire river basin. In this regard, Remote Sensing and GIS tools could benefit continuous monitoring (Ritchie *et al.* 2003; Caballero *et al.* 2018). Several factors interact with natural chemical, biological, hydrological, and meteorological cycles to affect river water quality in a catchment. Many other factors, such as stream meandering, mixing processes in different dimensions, evaporation, and turbulence in the stream, affect the water quality of streams (Anmala *et al.* 2015). Various models have been developed to quantify the SWQPs on a location basis or using information specific to the measurement locations. Still, no comprehensive model has been developed due to the difficulties of accurately describing the watershed's physical, climatic, urban, and morphologic variations (Burigato Costa *et al.* 2019). A solution to this problem could be to predict the stream water quality from easily definable parameters from the source area. The parameters could be the nature of the watershed or morphological data, hydroclimatological variables, land use data, spectral reflectance characteristics, and specific information on point sources.

Many studies have been performed on large river basins worldwide to assess water quality. The findings have consistently shown that water quality has declined (Giri 2021). The significant parameters which have been taken into consideration to influence river water quality are the physical parameters such as temperature (T), turbidity (Turb), total suspended solids (TSS), and total dissolved solids (TDS); chemical parameters including dissolved oxygen (DO), salinity, pH, nitrate nitrogen (), electrical conductivity, sulfates (), phosphate (), total hardness (TH), total alkalinity (TA), calcium (Ca), magnesium (Mg), biochemical oxygen demand (BOD), and chemical oxygen demand (COD); and microbiological parameters such as fecal coliform (FC) and total coliform (TC) (Mandal *et al.* 2010). The seasonal variation of physicochemical parameters, namely T, pH, DO, free CO_{2}, COD, BOD, carbonate, bicarbonate, TA, hardness, turbidity, Ca, Mg, sodium (Na), potassium (K), , , chloride (Cl^{−}), , electrical conductivity (EC), TDS, and TSS of the Ganga River Basin were investigated (Joshi *et al.* 2009). It has been found that the concentrations of pH, EC, TDS, TSS, turbidity, and sodium exceeded the prescribed limit compared with the year before.

The WQPs of the major river basins, including Ganga, Yamuna, and Cauvery in India, were studied (Mandal *et al.* 2010; Solaraj *et al.* 2010; Dwivedi *et al.* 2018). It has been reported that a high amount of BOD due to the polluted sewage disposal from urban centers is the primary cause of the deterioration of water quality (Gajendran *et al.* 2010). Kumar *et al.* (2016) have chosen 13 main river basins based on their catchment size exceeding 20,000 km^{2} and obtained the primary data from the Central Water Commission (CWC), Central Pollution Control Board (CPCB), and other scientific studies. They have concluded that the water quality of India's main rivers is unfit for human use due to high bacterial counts and anthropogenic inputs into the river systems.

Many studies have developed linkage models by incorporating watershed metrics, land use land cover (LULC) classifications, and meteorological data to create GIS and ANN-based water quality prediction models (Girija *et al.* 2007; Song *et al.* 2010; Anmala *et al.* 2015). Furthermore, in order to establish effective inputs to linkage models (GIS-ANN), researchers have applied statistical and dimension reduction approaches such as PCA to focus on major cause and effect variables and discovered that land use with climatic parameters provides better predictions than land use alone (Anmala & Venkateshwarlu 2019; Venkateshwarlu *et al.* 2020). Arslan (2013) studied the spatiomultivariate statistical analysis in the form of spatially weighted PCA for the Akarya River Basin, Turkey water quality dataset and suggested the incorporation of spatial structures, and patterns of water quality variables information into the PCA model. Fathi *et al.* (2018) investigated the state of water quality of the Beheshtabad River, in Iran. They have used PCA and clustering multivariate statistical techniques to examine temperature, phosphate, turbidity, dissolved oxygen, biochemical oxygen demand, EC, total solids, and pH to reduce data dimensionality and identify major cluster pollution zones. They have concluded that agricultural fertilizers, upstream wastewater runoff, and fish farms are the major variables influencing the water quality of the Beheshtabad River. Yang *et al.* (2020) mapped the spatial and temporal distribution of surface water quality variables in the Xin'anjiang River, Huangshan, China. Nine hundred and sixty water samples were collected on a monthly basis and were tested for 22 water quality indicators. The inverse distance weighted (IDW) method was used to interpolate the PCA comprehensive score to identify emerging pollution stations. In the current study, GIS, PCA, and ANNs are used in a causal modeling framework considering both climate and land use parameters for the predictions of WQPs.

Effective continuous monitoring techniques are needed for modeling to address the above problems. Girija *et al.* (2007) developed a four-layer GIS model to study the effect of LULC on water quality in the Brahmaputra River sub-basin. The BOD, DO, and total phosphorus are the sensitive parameters because of the strong nutrient inflow from farm fields. The nutrient intake has risen further, causing DO depletion in the region. Multivariate statistical techniques and feedforward, back-propagation algorithms were used to develop the water quality index (WQI) and further compared with R^{2} and RMSE values of different indices (Sinha & Das 2015). Furthermore, ANNs and multiple linear regression (MLR) techniques were used to test physicochemical parameters of water quality, and satisfactory results have been found using ANN models (Kadam *et al.* 2019). Anmala *et al.* (2015), Anmala & Venkateshwarlu (2019), and Venkateshwarlu *et al.* (2020) investigated the impact of LULC and hydrological characteristics to model and predict SWQPs of the Upper Green River basin, Kentucky, USA. Statistical regression methods, GIS and ANN-based models, and a substantial causal relationship between them were discovered. The ANN models have predicted the SWQPs more accurately than the nonlinear regression models. The PCA and canonical correlation analysis (CCA) were also performed to find out the reduced set of causal variables in the stream water quality modeling (Venkateshwarlu *et al.* 2020)). The morphometric characteristics of the Siddheswari river basin were studied by Sutradhar (2020). They found that the basin underwent significant soil degradation due to poor management practices.

Due to its simplicity, researchers use statistical modeling more commonly than physically based models to assess stream water quality at a watershed scale. However, the real-world scenario is different, and the statistical methods give over-simplistic results at the watershed scale. The ANN models best describe the nonlinear relationship between response variables and predictors compared with more straightforward statistical regression methods. The issues of nonlinearity, multicollinearity, heteroscedasticity, and model prediction accuracy are best handled using artificial intelligence based and machine learning models such as ANNs, Regression Trees, Random Forests, and ensemble methods (KC *et al.* 2019). Although the ANNs, their hybrid variants, and other machine learning methods are successfully employed for stream water quality predictions (Bayram *et al.* 2012; Heddam 2014; Ravansalar & Rajaee 2015; Najafzadeh & Ghaemi 2019), the current study extends the causal modeling framework of Anmala *et al.* (2015) (and Anmala & Venkateshwarlu 2019; Venkateshwarlu *et al.* 2020) for non-Karst watershed containing stream network using statistical methods and ANNs for deeper understanding of the problem. The specific objectives of the study are to (i) study the influence of climate and land use parameters on WQP estimation using a causal modeling framework, (ii) determine land use factors of the Godavari River Basin using GIS analysis, (iii) study the suitability of statistical and ANN models in water quality modeling for non-Karst watershed stream networks, and (iv) investigate PCA in reducing the input data dimension for effective causal modeling of stream water quality.

## STUDY AREA AND DATA

The study area of the current investigation is the Godavari River Basin. The Godavari River Basin is one of the largest river basins in peninsular India, called Dakshina Ganga. After the Ganges Basin, the Godavari Basin is the country's second-largest basin, covering roughly 9.5% of the country's total land area. Overall, the basin covers the drainage area of 312,812 km^{2} bounded between the longitudes of 70°24′ and 83°4′ E, and latitudes of 16°19′ and 22°34′ N. The river originates in Sahyadris, at an altitude of 1,067 m over mean sea level near Triambakeshwar in the Nashik district of Maharashtra and flows in the Deccan region toward the Eastern Ghats from the Western side. The river flows in a few states and forms interstate boundaries between Telangana and Maharashtra, Telangana, and Chhattisgarh. The river covers the major geographic areas in Maharashtra (48.7%), Telangana (19.87%), Andhra Pradesh (3.53%), Chhattisgarh (10.69%), Madhya Pradesh (10.17%), Odisha (5.67%), Karnataka (1.41%), and lesser area in Pondicherry, respectively (Ministry of Jal Shakti n.d.). The basin receives most of its rainfall during the southwest monsoon season (July–September), with maximum and minimum annual rainfall ranging from 881 to 1,395 mm, with an average of 1,110 mm. The yearly maximum temperature ranges from 31 to 33.5 °C. The western portion of the basin is much hotter compared with the central, northern, and eastern regions.

The monthly available water quality data from January 2019 to April 2021 is collected from the Telangana State Pollution Control Board (TSPCB) website (Telangana State Pollution Control Board n.d.). The geotagged water sampling locations were exported into QGIS software for correlating with geospatial data. The rainfall and temperature data were collected from the Indian Meteorological Department (IMD) (Indian Meteorological Department n.d.). The rainfall data were available from 1901 to 2020 in the NetCDF format with 0.25° × 0.25° grid resolution, and temperature data were available from 1951 to 2020 with 1° × 1° grid resolution. To process the NetCDF file python console used in the QGIS software, all files were run in a loop and were extracted in the Excel format. For 2021-year rainfall and temperature data collected from the NASA website (POWER Data Access Viewer n.d.), data were downloaded by specifying the latitude and longitude of the sample station. The monthly water quality data available at 13 sampling stations in the Telangana state of Godavari River Basin are considered for the present study. This data was available consistently for 22 parameters that are considered in the present study. In developing a causal model for stream water quality problem, Anmala *et al.* (2015) have used two-day cumulative precipitation, temperature, urban, forest, and agricultural land use factors as cause parameters successfully. Developing a similar causal model for Godavari River Basin, the independent or cause parameters considered are climate parameters such as mean daily precipitation (P), maximum temperature (T_{max}), minimum temperature (T_{min}), and land use parameters such as urban land use factor (UL), forest land use factor (FL), agricultural land use factor (AL), grass land use factor (GL), and shrub land use factors (SL) at each of the sampling stations. The dependent WQPs are DO, pH, EC, etc., up to 22 of them as shown in various result tables.

## METHODOLOGY

### GIS land use analysis for geospatial data

### Linear and nonlinear regression

A single linear variable and multivariate regression models are developed first for each of the WQPs by treating the mean daily precipitation, maximum and minimum temperatures, urban, forest, agricultural, grass land and shrub land use factors as independent variables. Then nonlinear regression models are developed for each WQP using the model expressions of SPSS given in Table 1. The A1, A2, etc., are the model constants or undetermined coefficients specific to those model expressions. The nonlinear regression models are developed separately for all the climate parameters, land use parameters, and combined climate and land use parameters.

Nonlinear model number . | Nonlinear model name . | Nonlinear model expression . |
---|---|---|

1 | Asymptotic Regression | A1 + A2·exp(A3·x) |

2 | Asymptotic Regression | A1 − (A2·(A3^x)) |

3 | Density | (A1 + A2·x)^(−1/A3) |

4 | Gauss | A1·(1 − A3·exp(−A2·x^2)) |

5 | Gompertz | A1·exp(−A2·exp(−A3·x)) |

6 | Johnson-Schumacher | A1·exp(−A2/(x + A3)) |

7 | Log-Modified | (A1 + A3·x)^A2 |

8 | Log-Logistic | A1 − ln(1 + A2·exp(−A3·x)) |

9 | Metcherlich Law of Diminishing Returns | A1 + A2·exp(−A3·x) |

10 | Michaelis Menten | A1·x/(x + A2) |

11 | Morgan-Mercer-Florin | (A1·A2 + A3·x^A4)/(A2 + x^A4) |

12 | Peal-Reed | A1/(1 + A2·exp(−(A3·x + A4·x^2 + A5·x^3))) |

13 | Ratio of Cubics | (A1 + A2·x + A3·x^2 + A4·x^3)/(A5·x^3) |

14 | Ratio of Quadratics | (A1 + A2·x + A3·x^2)/(A4·x^2) |

15 | Richards | A1/((1 + A3·exp(−A2·x))^(1/A4)) |

16 | Verhulst | A1/(1 + A3·exp(−A2·x)) |

17 | Von Bertalanffy | (A1^(1 − A4) − A2·exp(−A3·x))^(1/(1 − A4)) |

18 | Weibull | A1 − A2·exp(−A3·x^A4) |

19 | Yield Density | (A1 + A2·x + A3·x^2)^(−1) |

Nonlinear model number . | Nonlinear model name . | Nonlinear model expression . |
---|---|---|

1 | Asymptotic Regression | A1 + A2·exp(A3·x) |

2 | Asymptotic Regression | A1 − (A2·(A3^x)) |

3 | Density | (A1 + A2·x)^(−1/A3) |

4 | Gauss | A1·(1 − A3·exp(−A2·x^2)) |

5 | Gompertz | A1·exp(−A2·exp(−A3·x)) |

6 | Johnson-Schumacher | A1·exp(−A2/(x + A3)) |

7 | Log-Modified | (A1 + A3·x)^A2 |

8 | Log-Logistic | A1 − ln(1 + A2·exp(−A3·x)) |

9 | Metcherlich Law of Diminishing Returns | A1 + A2·exp(−A3·x) |

10 | Michaelis Menten | A1·x/(x + A2) |

11 | Morgan-Mercer-Florin | (A1·A2 + A3·x^A4)/(A2 + x^A4) |

12 | Peal-Reed | A1/(1 + A2·exp(−(A3·x + A4·x^2 + A5·x^3))) |

13 | Ratio of Cubics | (A1 + A2·x + A3·x^2 + A4·x^3)/(A5·x^3) |

14 | Ratio of Quadratics | (A1 + A2·x + A3·x^2)/(A4·x^2) |

15 | Richards | A1/((1 + A3·exp(−A2·x))^(1/A4)) |

16 | Verhulst | A1/(1 + A3·exp(−A2·x)) |

17 | Von Bertalanffy | (A1^(1 − A4) − A2·exp(−A3·x))^(1/(1 − A4)) |

18 | Weibull | A1 − A2·exp(−A3·x^A4) |

19 | Yield Density | (A1 + A2·x + A3·x^2)^(−1) |

### ANN modeling

*et al.*(1986) and Hecht-Nielsen (1988). Separate neural networks are used for the different SWQPs, limiting the output node to one SWQP at a time. The input nodes are fixed to be eight nodes which include mean daily precipitation (P), maximum temperature (T

_{max}), minimum temperature (T

_{min}), urban land use factor (UL), forest land use factor (FL), agricultural land use factor (AL), grass land use factor (GL), and shrub land use factors (SL) at each of the sampling stations. The outputs of the neural network are evaluated for accuracy in terms of the regression coefficient (R

^{2}), root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), index of agreement (D), and Nash–Sutcliffe coefficient of efficiency (NSE). Their definitions are given below:where refers to observations and refers to the model's predictions, and

*n*stands for the number of samples. and refer to the observed and predicted mean values, respectively.

### Principal component analysis

The PCA is a dimensionality reduction method that finds the uncorrelated variables known as ‘principal components’ minimizing the information loss and explaining the maximum variance of the data (Jolliffe & Cadima 2016). It is a very useful statistical technique to analyze large datasets making the interpretation easier. In other words, it finds the effective linear combinations of input parameters when there is a large number of parameters to choose from. It essentially reduces the dataset to an eigenvalue/eigenvector problem and has been reinvented with different names in several fields of science and engineering including data science most recently.

## RESULTS AND DISCUSSION

### Statistical linear and nonlinear regression using SPSS

The correlation matrix of input and output variables is shown in Table 2. There are only five correlations (using Pearson correlation coefficients) which are more than or equal to 0.4 or less than or equal to −0.4 between the input and output variables. Electrical conductivity, sulfate, TDS show correlations of 0.41, 0.42, and 0.40 with agricultural land use, sodium shows a correlation of 0.45 with precipitation, and sulfate shows a correlation of −0.40 with forest land use. The linear regression results using SPSS with climate parameters, land use parameters, and all parameters are shown in Table 3.

. | Climate and land use parameter (→) . | |||||||
---|---|---|---|---|---|---|---|---|

Water quality parameter (↓) . | P . | T_{omax}
. | T_{min}
. | AL . | FL . | GL . | SL . | UL . |

DO (mg/L) | 0.12 | −0.05 | −0.02 | −0.19 | 0.14 | 0.32 | 0.30 | 0.08 |

pH | 0.08 | 0.06 | 0.04 | 0.06 | −0.09 | 0.05 | −0.09 | 0.16 |

Electrical conductivity (mS/cm) | 0.03 | 0.02 | 0.01 | 0.41 | −0.38 | −0.39 | −0.34 | −0.03 |

BOD (mg/L) | −0.01 | −0.09 | −0.05 | 0.11 | −0.06 | −0.25 | −0.17 | −0.12 |

COD (mg/L) | −0.06 | −0.20 | −0.11 | 0.25 | −0.21 | −0.34 | −0.23 | −0.07 |

Nitrate | −0.08 | −0.22 | −0.28 | 0.13 | −0.10 | −0.18 | −0.11 | −0.06 |

Total Coliform (MPN/100 ml) | 0.07 | 0.10 | 0.14 | 0.12 | −0.12 | −0.11 | −0.10 | 0.04 |

Turbidity (NTU) | 0.08 | −0.03 | 0.09 | −0.06 | 0.10 | −0.07 | −0.17 | −0.07 |

Total Alk. (mg/L) | −0.24 | 0.21 | −0.06 | 0.39 | −0.37 | −0.33 | −0.41 | 0.06 |

Chloride (mg/L) | 0.11 | −0.05 | −0.02 | 0.35 | −0.33 | −0.33 | −0.24 | −0.06 |

Hardness (mg/L) | −0.25 | 0.18 | −0.08 | 0.36 | −0.34 | −0.34 | −0.40 | 0.03 |

Calcium (mg/L) | −0.25 | 0.17 | −0.06 | 0.36 | −0.33 | −0.36 | −0.43 | 0.00 |

Magnesium (mg/L) | −0.15 | 0.15 | −0.06 | 0.35 | −0.33 | −0.32 | −0.34 | −0.01 |

Sulphate (mg/L) | 0.23 | −0.11 | 0.13 | 0.42 | − 0.40 | −0.35 | −0.31 | −0.06 |

Sodium (mg/L) | 0.16 | −0.04 | 0.09 | 0.34 | −0.32 | −0.31 | −0.22 | −0.05 |

TDS (mg/L) | 0.02 | 0.01 | −0.01 | 0.40 | −0.38 | −0.38 | −0.34 | −0.03 |

TSS (mg/L) | 0.19 | −0.27 | 0.02 | −0.10 | 0.10 | 0.07 | 0.09 | 0.04 |

Total Phosphate (mg/L) | 0.04 | −0.10 | 0.00 | 0.15 | −0.11 | −0.24 | −0.13 | −0.10 |

Potassium (mg/L) | 0.19 | −0.07 | 0.08 | 0.15 | −0.11 | −0.23 | −0.17 | −0.09 |

Fluoride (mg/L) | −0.01 | 0.25 | 0.18 | 0.00 | 0.00 | −0.02 | −0.12 | 0.05 |

Sodium % | 0.45 | −0.21 | 0.22 | 0.20 | −0.21 | −0.12 | −0.11 | 0.06 |

SAR | 0.27 | −0.22 | 0.05 | 0.34 | −0.33 | −0.29 | −0.23 | 0.00 |

. | Climate and land use parameter (→) . | |||||||
---|---|---|---|---|---|---|---|---|

Water quality parameter (↓) . | P . | T_{omax}
. | T_{min}
. | AL . | FL . | GL . | SL . | UL . |

DO (mg/L) | 0.12 | −0.05 | −0.02 | −0.19 | 0.14 | 0.32 | 0.30 | 0.08 |

pH | 0.08 | 0.06 | 0.04 | 0.06 | −0.09 | 0.05 | −0.09 | 0.16 |

Electrical conductivity (mS/cm) | 0.03 | 0.02 | 0.01 | 0.41 | −0.38 | −0.39 | −0.34 | −0.03 |

BOD (mg/L) | −0.01 | −0.09 | −0.05 | 0.11 | −0.06 | −0.25 | −0.17 | −0.12 |

COD (mg/L) | −0.06 | −0.20 | −0.11 | 0.25 | −0.21 | −0.34 | −0.23 | −0.07 |

Nitrate | −0.08 | −0.22 | −0.28 | 0.13 | −0.10 | −0.18 | −0.11 | −0.06 |

Total Coliform (MPN/100 ml) | 0.07 | 0.10 | 0.14 | 0.12 | −0.12 | −0.11 | −0.10 | 0.04 |

Turbidity (NTU) | 0.08 | −0.03 | 0.09 | −0.06 | 0.10 | −0.07 | −0.17 | −0.07 |

Total Alk. (mg/L) | −0.24 | 0.21 | −0.06 | 0.39 | −0.37 | −0.33 | −0.41 | 0.06 |

Chloride (mg/L) | 0.11 | −0.05 | −0.02 | 0.35 | −0.33 | −0.33 | −0.24 | −0.06 |

Hardness (mg/L) | −0.25 | 0.18 | −0.08 | 0.36 | −0.34 | −0.34 | −0.40 | 0.03 |

Calcium (mg/L) | −0.25 | 0.17 | −0.06 | 0.36 | −0.33 | −0.36 | −0.43 | 0.00 |

Magnesium (mg/L) | −0.15 | 0.15 | −0.06 | 0.35 | −0.33 | −0.32 | −0.34 | −0.01 |

Sulphate (mg/L) | 0.23 | −0.11 | 0.13 | 0.42 | − 0.40 | −0.35 | −0.31 | −0.06 |

Sodium (mg/L) | 0.16 | −0.04 | 0.09 | 0.34 | −0.32 | −0.31 | −0.22 | −0.05 |

TDS (mg/L) | 0.02 | 0.01 | −0.01 | 0.40 | −0.38 | −0.38 | −0.34 | −0.03 |

TSS (mg/L) | 0.19 | −0.27 | 0.02 | −0.10 | 0.10 | 0.07 | 0.09 | 0.04 |

Total Phosphate (mg/L) | 0.04 | −0.10 | 0.00 | 0.15 | −0.11 | −0.24 | −0.13 | −0.10 |

Potassium (mg/L) | 0.19 | −0.07 | 0.08 | 0.15 | −0.11 | −0.23 | −0.17 | −0.09 |

Fluoride (mg/L) | −0.01 | 0.25 | 0.18 | 0.00 | 0.00 | −0.02 | −0.12 | 0.05 |

Sodium % | 0.45 | −0.21 | 0.22 | 0.20 | −0.21 | −0.12 | −0.11 | 0.06 |

SAR | 0.27 | −0.22 | 0.05 | 0.34 | −0.33 | −0.29 | −0.23 | 0.00 |

The bold values indicate appreciable correlation between water quality parameters and climate parameters, land use factors.

S. No. . | Parameter . | Linear regression (SPSS) . | ||
---|---|---|---|---|

R^{2}. | ||||

P, T_{max}, T_{min}
. | AL, FL, GL, SL, UL . | P, T_{max}, T_{min}, AL, FL, GL, SL, UL
. | ||

1 | DO (mg/L) | 0.03 | 0.18 | 0.21 |

2 | pH | 0.03 | 0.12 | 0.15 |

3 | Electrical conductivity (mS/cm) | 0.00 | 0.27 | 0.28 |

4 | BOD (mg/L) | 0.01 | 0.19 | 0.19 |

5 | COD (mg/L) | 0.09 | 0.30 | 0.36 |

6 | Nitrate | 0.08 | 0.09 | 0.16 |

7 | Total Coliform (MPN/100 ml) | 0.02 | 0.04 | 0.08 |

8 | Turbidity (NTU) | 0.02 | 0.08 | 0.10 |

9 | Total Alk. (mg/L) | 0.10 | 0.27 | 0.37 |

10 | Chloride (mg/L) | 0.13 | 0.22 | 0.24 |

11 | Hardness (mg/L) | 0.09 | 0.24 | 0.34 |

12 | Calcium (mg/L) | 0.08 | 0.25 | 0.33 |

13 | Magnesium (mg/L) | 0.06 | 0.19 | 0.26 |

14 | Sulphate (mg/L) | 0.07 | 0.21 | 0.29 |

15 | Sodium % | 0.25 | 0.06 | 0.32 |

16 | TDS (mg/L) | 0.00 | 0.27 | 0.28 |

17 | TSS (mg/L) | 0.14 | 0.03 | 0.17 |

18 | Total Phosphate (mg/L) | 0.02 | 0.16 | 0.17 |

19 | Potassium (mg/L) | 0.03 | 0.12 | 0.16 |

20 | Fluoride (mg/L) | 0.08 | 0.05 | 0.11 |

21 | Sodium (mg/L) | 0.03 | 0.18 | 0.22 |

22 | SAR | 0.10 | 0.19 | 0.28 |

S. No. . | Parameter . | Linear regression (SPSS) . | ||
---|---|---|---|---|

R^{2}. | ||||

P, T_{max}, T_{min}
. | AL, FL, GL, SL, UL . | P, T_{max}, T_{min}, AL, FL, GL, SL, UL
. | ||

1 | DO (mg/L) | 0.03 | 0.18 | 0.21 |

2 | pH | 0.03 | 0.12 | 0.15 |

3 | Electrical conductivity (mS/cm) | 0.00 | 0.27 | 0.28 |

4 | BOD (mg/L) | 0.01 | 0.19 | 0.19 |

5 | COD (mg/L) | 0.09 | 0.30 | 0.36 |

6 | Nitrate | 0.08 | 0.09 | 0.16 |

7 | Total Coliform (MPN/100 ml) | 0.02 | 0.04 | 0.08 |

8 | Turbidity (NTU) | 0.02 | 0.08 | 0.10 |

9 | Total Alk. (mg/L) | 0.10 | 0.27 | 0.37 |

10 | Chloride (mg/L) | 0.13 | 0.22 | 0.24 |

11 | Hardness (mg/L) | 0.09 | 0.24 | 0.34 |

12 | Calcium (mg/L) | 0.08 | 0.25 | 0.33 |

13 | Magnesium (mg/L) | 0.06 | 0.19 | 0.26 |

14 | Sulphate (mg/L) | 0.07 | 0.21 | 0.29 |

15 | Sodium % | 0.25 | 0.06 | 0.32 |

16 | TDS (mg/L) | 0.00 | 0.27 | 0.28 |

17 | TSS (mg/L) | 0.14 | 0.03 | 0.17 |

18 | Total Phosphate (mg/L) | 0.02 | 0.16 | 0.17 |

19 | Potassium (mg/L) | 0.03 | 0.12 | 0.16 |

20 | Fluoride (mg/L) | 0.08 | 0.05 | 0.11 |

21 | Sodium (mg/L) | 0.03 | 0.18 | 0.22 |

22 | SAR | 0.10 | 0.19 | 0.28 |

The maximum R^{2} (regression coefficient which is the square of correlation coefficient) equal to 0.37 was obtained for total alkalinity with all the parameters as independent variables. Then, nonlinear regressions are performed for all the models shown in Table 1 for all the WQPs. For most nonlinear regression models, a good convergence could not be obtained or estimated in the optimization process. The regression coefficients could be estimated for only the nonlinear model numbers 1 (Asymptotic Regression), 18 (Weibull), 11 (Morgan-Mercer-Florin), and 10 (Michaelis Menten). These are mostly lower than linear regression coefficients and are shown in Tables 4–6.

Parameter . | P, T_{max}, T_{min}. | |||
---|---|---|---|---|

R^{2} (of nonlinear model number). | ||||

1 . | 11 . | 18 . | 10 . | |

DO (mg/L) | 0.00 | 0.00 | 0.06 | 0.08 |

pH | 0.01 | 0.01 | na | 0.05 |

Electrical conductivity (mS/cm) | 0.01 | 0.01 | 0.05 | 0.05 |

BOD (mg/L) | 0.01 | 0.01 | na | 0.04 |

COD (mg/L) | 0.04 | 0.03 | na | 0.12 |

Nitrate | 0.09 | 0.10 | 0.04 | 0.02 |

Total Coliform (MPN/100 ml) | 0.05 | 0.06 | 0.02 | 0.05 |

Turbidity (NTU) | 0.02 | 0.02 | 0.00 | 0.02 |

Total Alk. (mg/L) | 0.00 | 0.02 | 0.02 | 0.04 |

Chloride (mg/L) | 0.00 | 0.00 | 0.04 | 0.05 |

Hardness (mg/L) | 0.00 | 0.04 | 0.04 | 0.04 |

Calcium (mg/L) | 0.04 | 0.04 | na | 0.04 |

Magnesium (mg/L) | na | 0.01 | 0.04 | 0.05 |

Sulphate (mg/L) | 0.05 | 0.04 | 0.03 | 0.03 |

Sodium % (mg/L) | 0.03 | 0.12 | 0.13 | 0.11 |

TDS (mg/L) | 0.00 | 0.02 | 0.05 | 0.05 |

TSS (mg/L) | 0.00 | 0.01 | 0.01 | 0.02 |

Total Phosphate (mg/L) | 0.00 | 0.00 | 0.06 | 0.06 |

Potassium (mg/L) | 0.01 | 0.02 | 0.03 | 0.03 |

Fluoride (mg/L) | 0.05 | 0.08 | 0.02 | 0.03 |

Sodium | 0.03 | 0.02 | na | 0.04 |

SAR | 0.03 | 0.02 | 0.03 | 0.04 |

Parameter . | P, T_{max}, T_{min}. | |||
---|---|---|---|---|

R^{2} (of nonlinear model number). | ||||

1 . | 11 . | 18 . | 10 . | |

DO (mg/L) | 0.00 | 0.00 | 0.06 | 0.08 |

pH | 0.01 | 0.01 | na | 0.05 |

Electrical conductivity (mS/cm) | 0.01 | 0.01 | 0.05 | 0.05 |

BOD (mg/L) | 0.01 | 0.01 | na | 0.04 |

COD (mg/L) | 0.04 | 0.03 | na | 0.12 |

Nitrate | 0.09 | 0.10 | 0.04 | 0.02 |

Total Coliform (MPN/100 ml) | 0.05 | 0.06 | 0.02 | 0.05 |

Turbidity (NTU) | 0.02 | 0.02 | 0.00 | 0.02 |

Total Alk. (mg/L) | 0.00 | 0.02 | 0.02 | 0.04 |

Chloride (mg/L) | 0.00 | 0.00 | 0.04 | 0.05 |

Hardness (mg/L) | 0.00 | 0.04 | 0.04 | 0.04 |

Calcium (mg/L) | 0.04 | 0.04 | na | 0.04 |

Magnesium (mg/L) | na | 0.01 | 0.04 | 0.05 |

Sulphate (mg/L) | 0.05 | 0.04 | 0.03 | 0.03 |

Sodium % (mg/L) | 0.03 | 0.12 | 0.13 | 0.11 |

TDS (mg/L) | 0.00 | 0.02 | 0.05 | 0.05 |

TSS (mg/L) | 0.00 | 0.01 | 0.01 | 0.02 |

Total Phosphate (mg/L) | 0.00 | 0.00 | 0.06 | 0.06 |

Potassium (mg/L) | 0.01 | 0.02 | 0.03 | 0.03 |

Fluoride (mg/L) | 0.05 | 0.08 | 0.02 | 0.03 |

Sodium | 0.03 | 0.02 | na | 0.04 |

SAR | 0.03 | 0.02 | 0.03 | 0.04 |

Parameter . | AL, FL, GL, SL, UL . | |||
---|---|---|---|---|

R^{2} (of nonlinear model number). | ||||

1 . | 11 . | 18 . | 10 . | |

DO (mg/L) | 0.07 | 0.14 | 0.02 | 0.17 |

pH | 0.00 | 0.06 | 0.06 | 0.41 |

Electrical conductivity (mS/cm) | 0.06 | 0.06 | na | 0.41 |

BOD (mg/L) | 0.04 | 0.11 | na | 0.05 |

COD (mg/L) | 0.07 | 0.16 | 0.00 | 0.12 |

Nitrate | 0.04 | 0.05 | na | 0.16 |

Total Coliform (MPN/100 ml) | 0.01 | 0.00 | 0.00 | 0.13 |

Turbidity (NTU) | 0.01 | 0.08 | 0.02 | 0.06 |

Total Alk. (mg/L) | 0.07 | 0.02 | 0.17 | 0.35 |

Chloride (mg/L) | 0.06 | 0.02 | 0.11 | 0.12 |

Hardness (mg/L) | 0.19 | 0.01 | 0.09 | 0.35 |

Calcium (mg/L) | 0.17 | 0.00 | 0.10 | 0.34 |

Magnesium (mg/L) | 0.07 | 0.02 | 0.09 | 0.25 |

Sulphate (mg/L) | 0.08 | 0.02 | na | 0.21 |

Sodium % (mg/L) | 0.01 | 0.01 | 0.04 | 0.22 |

TDS (mg/L) | 0.09 | 0.02 | 0.16 | 0.24 |

TSS (mg/L) | 0.01 | 0.02 | 0.01 | 0.27 |

Total Phosphate (mg/L) | 0.04 | 0.05 | 0.08 | 0.12 |

Potassium (mg/L) | 0.04 | 0.07 | 0.07 | 0.08 |

Fluoride (mg/L) | 0.00 | 0.01 | na | 0.17 |

Sodium | 0.05 | 0.01 | 0.10 | 0.12 |

SAR | 0.04 | 0.01 | 0.02 | 0.16 |

Parameter . | AL, FL, GL, SL, UL . | |||
---|---|---|---|---|

R^{2} (of nonlinear model number). | ||||

1 . | 11 . | 18 . | 10 . | |

DO (mg/L) | 0.07 | 0.14 | 0.02 | 0.17 |

pH | 0.00 | 0.06 | 0.06 | 0.41 |

Electrical conductivity (mS/cm) | 0.06 | 0.06 | na | 0.41 |

BOD (mg/L) | 0.04 | 0.11 | na | 0.05 |

COD (mg/L) | 0.07 | 0.16 | 0.00 | 0.12 |

Nitrate | 0.04 | 0.05 | na | 0.16 |

Total Coliform (MPN/100 ml) | 0.01 | 0.00 | 0.00 | 0.13 |

Turbidity (NTU) | 0.01 | 0.08 | 0.02 | 0.06 |

Total Alk. (mg/L) | 0.07 | 0.02 | 0.17 | 0.35 |

Chloride (mg/L) | 0.06 | 0.02 | 0.11 | 0.12 |

Hardness (mg/L) | 0.19 | 0.01 | 0.09 | 0.35 |

Calcium (mg/L) | 0.17 | 0.00 | 0.10 | 0.34 |

Magnesium (mg/L) | 0.07 | 0.02 | 0.09 | 0.25 |

Sulphate (mg/L) | 0.08 | 0.02 | na | 0.21 |

Sodium % (mg/L) | 0.01 | 0.01 | 0.04 | 0.22 |

TDS (mg/L) | 0.09 | 0.02 | 0.16 | 0.24 |

TSS (mg/L) | 0.01 | 0.02 | 0.01 | 0.27 |

Total Phosphate (mg/L) | 0.04 | 0.05 | 0.08 | 0.12 |

Potassium (mg/L) | 0.04 | 0.07 | 0.07 | 0.08 |

Fluoride (mg/L) | 0.00 | 0.01 | na | 0.17 |

Sodium | 0.05 | 0.01 | 0.10 | 0.12 |

SAR | 0.04 | 0.01 | 0.02 | 0.16 |

Parameter . | P, T_{max}, T_{min}, AL, FL, GL, SL, UL . | |||
---|---|---|---|---|

R^{2} (of nonlinear model number). | ||||

1 . | 11 . | 18 . | 10 . | |

DO (mg/L) | 0.05 | 0.04 | 0.06 | 0.24 |

pH | 0.01 | 0.06 | 0.07 | 0.45 |

Electrical conductivity (mS/cm) | 0.06 | 0.01 | 0.06 | 0.45 |

BOD (mg/L) | 0.04 | 0.08 | na | 0.09 |

COD (mg/L) | 0.09 | 0.11 | 0.12 | 0.22 |

Nitrate | 0.08 | 0.12 | 0.11 | 0.17 |

Total Coliform (MPN/100 ml) | 0.02 | 0.06 | 0.07 | 0.18 |

Turbidity (NTU) | 0.00 | 0.01 | 0.01 | 0.06 |

Total Alk. (mg/L) | 0.16 | 0.10 | 0.04 | 0.38 |

Chloride (mg/L) | 0.01 | 0.02 | 0.05 | 0.16 |

Hardness (mg/L) | 0.17 | 0.12 | 0.12 | 0.37 |

Calcium (mg/L) | 0.07 | 0.14 | 0.08 | 0.37 |

Magnesium (mg/L) | 0.14 | 0.00 | 0.04 | 0.29 |

Sulphate (mg/L) | 0.21 | 0.11 | na | 0.23 |

Sodium % (mg/L) | 0.08 | 0.16 | 0.16 | 0.24 |

TDS (mg/L) | 0.17 | 0.02 | 0.06 | 0.28 |

TSS (mg/L) | 0.02 | 0.00 | 0.27 | 0.27 |

Total Phosphate (mg/L) | 0.03 | 0.02 | 0.13 | 0.17 |

Potassium (mg/L) | 0.01 | 0.01 | 0.04 | 0.10 |

Fluoride (mg/L) | 0.01 | 0.04 | 0.07 | 0.19 |

Sodium | 0.11 | 0.05 | 0.08 | 0.15 |

SAR | 0.12 | 0.06 | 0.09 | 0.19 |

Parameter . | P, T_{max}, T_{min}, AL, FL, GL, SL, UL . | |||
---|---|---|---|---|

R^{2} (of nonlinear model number). | ||||

1 . | 11 . | 18 . | 10 . | |

DO (mg/L) | 0.05 | 0.04 | 0.06 | 0.24 |

pH | 0.01 | 0.06 | 0.07 | 0.45 |

Electrical conductivity (mS/cm) | 0.06 | 0.01 | 0.06 | 0.45 |

BOD (mg/L) | 0.04 | 0.08 | na | 0.09 |

COD (mg/L) | 0.09 | 0.11 | 0.12 | 0.22 |

Nitrate | 0.08 | 0.12 | 0.11 | 0.17 |

Total Coliform (MPN/100 ml) | 0.02 | 0.06 | 0.07 | 0.18 |

Turbidity (NTU) | 0.00 | 0.01 | 0.01 | 0.06 |

Total Alk. (mg/L) | 0.16 | 0.10 | 0.04 | 0.38 |

Chloride (mg/L) | 0.01 | 0.02 | 0.05 | 0.16 |

Hardness (mg/L) | 0.17 | 0.12 | 0.12 | 0.37 |

Calcium (mg/L) | 0.07 | 0.14 | 0.08 | 0.37 |

Magnesium (mg/L) | 0.14 | 0.00 | 0.04 | 0.29 |

Sulphate (mg/L) | 0.21 | 0.11 | na | 0.23 |

Sodium % (mg/L) | 0.08 | 0.16 | 0.16 | 0.24 |

TDS (mg/L) | 0.17 | 0.02 | 0.06 | 0.28 |

TSS (mg/L) | 0.02 | 0.00 | 0.27 | 0.27 |

Total Phosphate (mg/L) | 0.03 | 0.02 | 0.13 | 0.17 |

Potassium (mg/L) | 0.01 | 0.01 | 0.04 | 0.10 |

Fluoride (mg/L) | 0.01 | 0.04 | 0.07 | 0.19 |

Sodium | 0.11 | 0.05 | 0.08 | 0.15 |

SAR | 0.12 | 0.06 | 0.09 | 0.19 |

### SPSS multilayer perceptron model simulations

The SPSS simulations are performed for a multilayer perceptron model with one input layer, one hidden layer, and an output layer. The hidden nodes are varied from 1 to 50 in the hidden layer. The network architecture consists of eight input nodes, a few hidden nodes, and an output node. The scaled conjugate gradient principle is mostly used as an optimization algorithm to determine weights. The data have been divided into 70% training and 30% testing datasets. The overall regression coefficients (R^{2} values), RMSE, MAE, MBE, D, and NSE are given in Table 7. The preliminary SPSS simulations are explored primarily to study the suitability of neural networks for stream water quality modeling problems. The fine tuning of the neural network simulations in model architecture, variety of training algorithms, testing, and validation are explored using MATLAB simulations as explained in the next section.

S. No. . | Parameter . | RMSE . | Overall R^{2}
. | D . | MAE . | MBE . | NSE . | Network architecture . |
---|---|---|---|---|---|---|---|---|

1 | DO (mg/L) | 0.50 | 0.52 | 0.71 | 0.40 | 0.04 | 0.51 | 8-5-1 |

2 | pH | 0.33 | 0.35 | 0.21 | 0.24 | 0.01 | 0.21 | 8-5-1 |

3 | Electrical conductivity (mS/cm) | 133.07 | 0.79 | 0.92 | 91.43 | −2.76 | 0.78 | 8-5-1 |

4 | BOD (mg/L) | 0.94 | 0.64 | 0.82 | 0.50 | −0.02 | 0.63 | 8-5-1 |

5 | COD (mg/L) | 6.63 | 0.83 | 0.95 | 5.12 | −0.188 | 0.83 | 8-8-1 |

6 | Nitrate | 0.87 | 0.47 | 0.78 | 0.60 | 0.04 | 0.47 | 8-5-1 |

7 | Total Coliform (MPN/100 ml) | 22.66 | 0.29 | 0.45 | 14.93 | 0.56 | 0.29 | 8-3-1 |

8 | Turbidity (NTU) | 7.38 | 0.15 | −0.34 | 3.75 | −0.02 | 0.15 | 8-6-1 |

9 | Total Alk. (mg/L) | 32.05 | 0.73 | 0.92 | 22.76 | −3.52 | 0.72 | 8-5-1 |

10 | Chloride (mg/L) | 25.76 | 0.66 | 0.87 | 15.82 | −0.04 | 0.66 | 8-5-1 |

11 | Hardness (mg/L) | 42.62 | 0.66 | 0.88 | 30.07 | −1.30 | 0.65 | 8-5-1 |

12 | Calcium (mg/L) | 9.995 | 0.66 | 0.88 | 7.58 | −0.808 | 0.65 | 8-5-1 |

13 | Magnesium (mg/L) | 6.42 | 0.48 | 0.78 | 4.801 | 0.32 | 0.48 | 8-8-1 |

14 | Sulphate (mg/L) | 13.34 | 0.66 | 0.89 | 9.44 | −0.189 | 0.66 | 8-8-1 |

15 | Sodium (mg/L) | 17.09 | 0.78 | 0.94 | 11.70 | 0.80 | 0.78 | 8-8-1 |

16 | TDS (mg/L) | 84.88 | 0.76 | 0.929 | 59.35 | 5.76 | 0.76 | 8-8-1 |

17 | TSS (mg/L) | 12.26 | 0.56 | 0.78 | 7.29 | −0.79 | 0.56 | 8-7-1 |

18 | Total Phosphate (mg/L) | 0.27 | 0.67 | 0.88 | 0.16 | 0.01 | 0.67 | 8-5-1 |

19 | Potassium (mg/L) | 1.61 | 0.64 | 0.85 | 1.04 | −0.12 | 0.61 | 8-6-1 |

20 | Fluoride (mg/L) | 0.27 | 0.10 | 0.3 | 0.20 | 0.01 | 0.10 | 8-5-1 |

21 | Sodium % | 7.99 | 0.35 | 0.55 | 6.15 | −0.41 | 0.35 | 8-4-1 |

22 | SAR | 0.44 | 0.72 | 0.91 | 0.31 | 0.00 | 0.71 | 8-5-1 |

S. No. . | Parameter . | RMSE . | Overall R^{2}
. | D . | MAE . | MBE . | NSE . | Network architecture . |
---|---|---|---|---|---|---|---|---|

1 | DO (mg/L) | 0.50 | 0.52 | 0.71 | 0.40 | 0.04 | 0.51 | 8-5-1 |

2 | pH | 0.33 | 0.35 | 0.21 | 0.24 | 0.01 | 0.21 | 8-5-1 |

3 | Electrical conductivity (mS/cm) | 133.07 | 0.79 | 0.92 | 91.43 | −2.76 | 0.78 | 8-5-1 |

4 | BOD (mg/L) | 0.94 | 0.64 | 0.82 | 0.50 | −0.02 | 0.63 | 8-5-1 |

5 | COD (mg/L) | 6.63 | 0.83 | 0.95 | 5.12 | −0.188 | 0.83 | 8-8-1 |

6 | Nitrate | 0.87 | 0.47 | 0.78 | 0.60 | 0.04 | 0.47 | 8-5-1 |

7 | Total Coliform (MPN/100 ml) | 22.66 | 0.29 | 0.45 | 14.93 | 0.56 | 0.29 | 8-3-1 |

8 | Turbidity (NTU) | 7.38 | 0.15 | −0.34 | 3.75 | −0.02 | 0.15 | 8-6-1 |

9 | Total Alk. (mg/L) | 32.05 | 0.73 | 0.92 | 22.76 | −3.52 | 0.72 | 8-5-1 |

10 | Chloride (mg/L) | 25.76 | 0.66 | 0.87 | 15.82 | −0.04 | 0.66 | 8-5-1 |

11 | Hardness (mg/L) | 42.62 | 0.66 | 0.88 | 30.07 | −1.30 | 0.65 | 8-5-1 |

12 | Calcium (mg/L) | 9.995 | 0.66 | 0.88 | 7.58 | −0.808 | 0.65 | 8-5-1 |

13 | Magnesium (mg/L) | 6.42 | 0.48 | 0.78 | 4.801 | 0.32 | 0.48 | 8-8-1 |

14 | Sulphate (mg/L) | 13.34 | 0.66 | 0.89 | 9.44 | −0.189 | 0.66 | 8-8-1 |

15 | Sodium (mg/L) | 17.09 | 0.78 | 0.94 | 11.70 | 0.80 | 0.78 | 8-8-1 |

16 | TDS (mg/L) | 84.88 | 0.76 | 0.929 | 59.35 | 5.76 | 0.76 | 8-8-1 |

17 | TSS (mg/L) | 12.26 | 0.56 | 0.78 | 7.29 | −0.79 | 0.56 | 8-7-1 |

18 | Total Phosphate (mg/L) | 0.27 | 0.67 | 0.88 | 0.16 | 0.01 | 0.67 | 8-5-1 |

19 | Potassium (mg/L) | 1.61 | 0.64 | 0.85 | 1.04 | −0.12 | 0.61 | 8-6-1 |

20 | Fluoride (mg/L) | 0.27 | 0.10 | 0.3 | 0.20 | 0.01 | 0.10 | 8-5-1 |

21 | Sodium % | 7.99 | 0.35 | 0.55 | 6.15 | −0.41 | 0.35 | 8-4-1 |

22 | SAR | 0.44 | 0.72 | 0.91 | 0.31 | 0.00 | 0.71 | 8-5-1 |

### MATLAB simulations using the Levenberg–Marquardt algorithm, RBFNN

*n*+ 1) nodes given by the Kolmogorov mapping theorem (Hecht-Nielsen 1988). The Levenberg–Marquardt steepest gradient descent principle is used to learn weight updation. This algorithm gave better results than Bayesian Regularization, Scaled Conjugate Gradient, and many other algorithms in the MATLAB R2020 toolbox. The data of each WQP are divided into training, validation, and testing set consisting of 70, 15, and 15% sizes of datasets. The correlation coefficients (R values) of training, validation, testing, and overall for all of the parameters are given in Table 8. MATLAB environment gives the model performance in correlation coefficients (R) instead of regression coefficients (R

^{2}). The neural network predictions of EC are shown in Figure 4.

S. No. . | Parameter . | Training . | Validation . | Testing . | Overall . | Network architecture . |
---|---|---|---|---|---|---|

1 | DO (mg/L) | 0.76 | 0.72 | 0.78 | 0.75 | 8-17-1 |

2 | pH | 0.67 | 0.54 | 0.51 | 0.62 | 8-17-1 |

3 | Electrical conductivity (mS/cm) | 0.94 | 0.95 | 0.93 | 0.94 | 8-17-1 |

4 | BOD (mg/L) | 0.82 | 0.89 | 0.83 | 0.83 | 8-17-1 |

5 | COD (mg/L) | 0.93 | 0.94 | 0.90 | 0.92 | 8-17-1 |

6 | Nitrate | 0.72 | 0.72 | 0.74 | 0.72 | 8-17-1 |

7 | Total Coliform (MPN/100 ml) | 0.70 | 0.76 | 0.61 | 0.70 | 8-17-1 |

8 | Turbidity (NTU) | 0.49 | 0.50 | 0.45 | 0.48 | 8-17-1 |

9 | Total Alk. (mg/L) | 0.91 | 0.82 | 0.90 | 0.90 | 8-17-1 |

10 | Chloride (mg/L) | 0.90 | 0.88 | 0.82 | 0.89 | 8-17-1 |

11 | Hardness (mg/L) | 0.94 | 0.84 | 0.80 | 0.90 | 8-17-1 |

12 | Calcium (mg/L) | 0.91 | 0.80 | 0.83 | 0.87 | 8-17-1 |

13 | Magnesium (mg/L) | 0.80 | 0.77 | 0.80 | 0.80 | 8-17-1 |

14 | Sulphate (mg/L) | 0.89 | 0.71 | 0.71 | 0.85 | 8-17-1 |

15 | Sodium (mg/L) | 0.90 | 0.94 | 0.91 | 0.92 | 8-17-1 |

16 | TDS (mg/L) | 0.92 | 0.95 | 0.96 | 0.92 | 8-17-1 |

17 | TSS (mg/L) | 0.87 | 0.85 | 0.73 | 0.81 | 8-17-1 |

18 | Total Phosphate (mg/L) | 0.84 | 0.70 | 0.63 | 0.75 | 8-17-1 |

19 | Potassium (mg/L) | 0.77 | 0.60 | 0.63 | 0.71 | 8-17-1 |

20 | Fluoride (mg/L) | 0.57 | 0.41 | 0.44 | 0.54 | 8-17-1 |

21 | Sodium % | 0.74 | 0.65 | 0.66 | 0.71 | 8-17-1 |

22 | SAR | 0.90 | 0.75 | 0.84 | 0.86 | 8-17-1 |

S. No. . | Parameter . | Training . | Validation . | Testing . | Overall . | Network architecture . |
---|---|---|---|---|---|---|

1 | DO (mg/L) | 0.76 | 0.72 | 0.78 | 0.75 | 8-17-1 |

2 | pH | 0.67 | 0.54 | 0.51 | 0.62 | 8-17-1 |

3 | Electrical conductivity (mS/cm) | 0.94 | 0.95 | 0.93 | 0.94 | 8-17-1 |

4 | BOD (mg/L) | 0.82 | 0.89 | 0.83 | 0.83 | 8-17-1 |

5 | COD (mg/L) | 0.93 | 0.94 | 0.90 | 0.92 | 8-17-1 |

6 | Nitrate | 0.72 | 0.72 | 0.74 | 0.72 | 8-17-1 |

7 | Total Coliform (MPN/100 ml) | 0.70 | 0.76 | 0.61 | 0.70 | 8-17-1 |

8 | Turbidity (NTU) | 0.49 | 0.50 | 0.45 | 0.48 | 8-17-1 |

9 | Total Alk. (mg/L) | 0.91 | 0.82 | 0.90 | 0.90 | 8-17-1 |

10 | Chloride (mg/L) | 0.90 | 0.88 | 0.82 | 0.89 | 8-17-1 |

11 | Hardness (mg/L) | 0.94 | 0.84 | 0.80 | 0.90 | 8-17-1 |

12 | Calcium (mg/L) | 0.91 | 0.80 | 0.83 | 0.87 | 8-17-1 |

13 | Magnesium (mg/L) | 0.80 | 0.77 | 0.80 | 0.80 | 8-17-1 |

14 | Sulphate (mg/L) | 0.89 | 0.71 | 0.71 | 0.85 | 8-17-1 |

15 | Sodium (mg/L) | 0.90 | 0.94 | 0.91 | 0.92 | 8-17-1 |

16 | TDS (mg/L) | 0.92 | 0.95 | 0.96 | 0.92 | 8-17-1 |

17 | TSS (mg/L) | 0.87 | 0.85 | 0.73 | 0.81 | 8-17-1 |

18 | Total Phosphate (mg/L) | 0.84 | 0.70 | 0.63 | 0.75 | 8-17-1 |

19 | Potassium (mg/L) | 0.77 | 0.60 | 0.63 | 0.71 | 8-17-1 |

20 | Fluoride (mg/L) | 0.57 | 0.41 | 0.44 | 0.54 | 8-17-1 |

21 | Sodium % | 0.74 | 0.65 | 0.66 | 0.71 | 8-17-1 |

22 | SAR | 0.90 | 0.75 | 0.84 | 0.86 | 8-17-1 |

Component . | Initial eigenvalues . | Extraction sums of squared loadings . | Rotation sums of squared loadings . | ||||||
---|---|---|---|---|---|---|---|---|---|

Total . | % of Variance . | Cumulative % . | Total . | % of Variance . | Cumulative % . | Total . | % of Variance . | Cumulative % . | |

1 (P) | 3.42 | 42.73 | 42.73 | 3.42 | 42.73 | 42.73 | 3.40 | 42.56 | 42.56 |

2 (T_{max}) | 1.61 | 20.13 | 62.86 | 1.61 | 20.13 | 62.86 | 1.62 | 20.25 | 62.81 |

3 (T_{min}) | 1.28 | 15.93 | 78.79 | 1.28 | 15.93 | 78.79 | 1.28 | 15.99 | 78.79 |

4 (AL) | 0.93 | 11.67 | 90.46 | – | – | – | – | – | – |

5 (FL) | 0.61 | 7.67 | 98.13 | – | – | – | – | – | – |

6 (GL) | 0.12 | 1.44 | 99.58 | – | – | – | – | – | – |

7 (SL) | 0.03 | 0.42 | 99.99 | – | – | – | – | – | – |

8 (UL) | 0.00 | 0.01 | 100.0 | – | – | – | – | – | – |

Component . | Initial eigenvalues . | Extraction sums of squared loadings . | Rotation sums of squared loadings . | ||||||
---|---|---|---|---|---|---|---|---|---|

Total . | % of Variance . | Cumulative % . | Total . | % of Variance . | Cumulative % . | Total . | % of Variance . | Cumulative % . | |

1 (P) | 3.42 | 42.73 | 42.73 | 3.42 | 42.73 | 42.73 | 3.40 | 42.56 | 42.56 |

2 (T_{max}) | 1.61 | 20.13 | 62.86 | 1.61 | 20.13 | 62.86 | 1.62 | 20.25 | 62.81 |

3 (T_{min}) | 1.28 | 15.93 | 78.79 | 1.28 | 15.93 | 78.79 | 1.28 | 15.99 | 78.79 |

4 (AL) | 0.93 | 11.67 | 90.46 | – | – | – | – | – | – |

5 (FL) | 0.61 | 7.67 | 98.13 | – | – | – | – | – | – |

6 (GL) | 0.12 | 1.44 | 99.58 | – | – | – | – | – | – |

7 (SL) | 0.03 | 0.42 | 99.99 | – | – | – | – | – | – |

8 (UL) | 0.00 | 0.01 | 100.0 | – | – | – | – | – | – |

Extraction Method: PCA.

These three parameters are only the climate parameters – mean daily precipitation, maximum temperature, and minimum temperature. Then, the Radial Basis Function neural network (RBFNN) predictions of WQPs EC, DO, BOD, and nitrate using only these three input parameters are performed and yielded correlation coefficients less than 0.50 using MATLAB. If we see Table 9, the cumulative variance up to 98% is explained better using the first five input parameters, namely mean daily precipitation, maximum and minimum temperatures, agricultural land use factor, and forest land use factor. Therefore, RBFNN simulations are performed for EC, DO, BOD, and nitrate using these five input parameters to obtain correlation coefficients as shown in Table 10 for the entire dataset. We can see that the four parameters’ overall (R) values got slightly better than the results of simple feedforward neural networks obtained in Table 8 (with all the inputs). These results are better than the preliminary ANN results of Anmala *et al.* (2015) and as good as Venkateshwarlu *et al.* (2020) which are explored for the Upper Green River water quality data. In the current study and in Anmala *et al.* (2015), separate ANNs are explored for each of the WQP, whereas composite neural networks are explored for simultaneous, multiple output parameter predictions in Venkateshwarlu *et al.* (2020). The real-time and fast predictions are made for WQPs of the Upper Green River Watershed using extreme learning machine (ELM) networks in Anmala & Turuganti (2021). While the PCA results of Venkateshwarlu *et al.* (2020) indicated the effectiveness of only climate parameters (precipitation and temperature), the current PCA results indicate the effectiveness of climate and two land use factors – agricultural and forest in accurate water quality prediction. Venkateshwarlu *et al.*'s (2020) study was performed on the Karst watershed, i.e., Upper Green River Basin, Kentucky, USA while the current study dealt with the non-Karst watershed of the Godavari River Basin. All of these studies including the current study have developed ANNs for WQP predictions in a causal modeling framework. The results essentially outline the importance, potential, and applicability of ANNs for highly nonlinear stream water quality problems in Karst and non-Karst river basins. There have been many other studies of ANNs, where one WQP is predicted from remaining or available other WQPs or using its own time history, or using simply correlations between them and not so much using a causal modeling framework of climate and land use parameters as in the current study. Di Nunno *et al.* (2022) predict the nitrate concentrations in the Susquehanna River and the Raccoon River, USA accurately (R^{2} = 0.77 and 0.94) using recurrent neural networks and time-series models with exogenous inputs such as water discharge, water temperature, dissolved oxygen, and specific conductance. Rajwade *et al.* (2021) predict BOD from 15 different combinations of available physical, chemical, and biological water quality parameters for Gola River, Uttarakhand, India, and obtained a maximum R^{2} value of 0.997 using ANNs compared with a maximum of 0.861 using multiple linear regression. Ravansalar & Rajaee (2015) obtained an R^{2} value of 0.949 using wavelet-based ANNs against a value of 0.381 using ANNs in the prediction of electrical conductivity for Asi River, Turkey. Alam *et al.* (2021) have used Weighted Regression on Time, Discharge and Seasons (WRTDS) to analyze the long-term trends of water quality especially that of BOD, DO, and nitrate-nitrite (NN), and found the influences of wastewater treatment plants (WWTPs), combined sewage outflows (CSOs), and agricultural runoff in increase of pollution levels for White River at Muncie, IN, USA. However, the current study is limited to the temporal prediction of WQPs from climate and land use parameters and has not considered the influences of discharge and seasons separately. The influence of discharge is intrinsically considered in the climate parameter, i.e., precipitation as one of the model inputs in the current study. The importance of N pressure from agriculture on surface water quality (D'Haene *et al.* 2022) can be seen in the current model's selective inputs decided by PCA, and the current model can be further used for nitrogen mitigation measures to achieve a good surface water quality status. The above-stated effective five inputs are used again in feedforward neural networks for better predictions and obtainment of functional form for water quality variables such as electrical conductivity, which is discussed in the next section.

S. No. . | Parameter . | RMSE . | R . | Overall R^{2}
. | D . | MAE . | MBE . | NSE . | Network architecture . |
---|---|---|---|---|---|---|---|---|---|

1 | Electrical conductivity (mS/cm) | 0.004 | 0.999 | 0.998 | 1 | 0.003 | 0 | 0.998 | 5-255-1 |

2 | DO (mg/L) | 0.023 | 0.984 | 0.970 | 0.992 | 0.008 | 0 | 0.970 | 5-255-1 |

3 | BOD (mg/L) | 0.075 | 0.998 | 0.995 | 0.999 | 0.003 | 0 | 0.995 | 5-255-1 |

4 | Nitrate (mg/L) | 0.16 | 0.972 | 0.945 | 0.985 | 0.113 | 0 | 0.945 | 5-255-1 |

S. No. . | Parameter . | RMSE . | R . | Overall R^{2}
. | D . | MAE . | MBE . | NSE . | Network architecture . |
---|---|---|---|---|---|---|---|---|---|

1 | Electrical conductivity (mS/cm) | 0.004 | 0.999 | 0.998 | 1 | 0.003 | 0 | 0.998 | 5-255-1 |

2 | DO (mg/L) | 0.023 | 0.984 | 0.970 | 0.992 | 0.008 | 0 | 0.970 | 5-255-1 |

3 | BOD (mg/L) | 0.075 | 0.998 | 0.995 | 0.999 | 0.003 | 0 | 0.995 | 5-255-1 |

4 | Nitrate (mg/L) | 0.16 | 0.972 | 0.945 | 0.985 | 0.113 | 0 | 0.945 | 5-255-1 |

### Equation of electrical conductivity from MATLAB simulations

*et al.*(2005) involving the input variables and the output variable as per the ANN model is written as:

In these simulations, the input and output parameters are normalized in the range of [−1, 1] for tan-sigmoid transfer function. In the above expression, is the normalized output variable, is the transfer function of the output layer, is the output layer bias, is the connection weight between the *k*th hidden neuron and the single output neuron, is the transfer function of the hidden layer, is the bias of the *k*th hidden neuron, is the connection weight between the *i*th input variable and the *k*th hidden neuron, and is the normalized *i*th input variable.

Equation (26) developed for the prediction of the electrical conductivity needs to be applied in the range of datasets for which the feedforward neural network was trained. For the dataset considered in the present study, the maximum and minimum electrical conductivity values are 2,822.0 and 67.0 mS/cm, respectively. The results of correlation coefficients (R) of training, validation, testing, and overall feedforward neural networks with five effective input parameters (obtained using PCA) are shown in Table 11. It can be noticed that these results are slightly better than the results in Table 8. Moreover, the above functional forms can be developed for all the WQPs individually using the effective input parameters of PCA.

S. No. . | Parameter . | Training . | Validation . | Testing . | All . | Network architecture . |
---|---|---|---|---|---|---|

1 | Electrical conductivity (mS/cm) | 0.94 | 0.94 | 0.96 | 0.95 | 5-11-1 |

2 | DO (mg/L) | 0.76 | 0.76 | 0.74 | 0.76 | 5-11-1 |

3 | BOD (mg/L) | 0.83 | 0.83 | 0.85 | 0.81 | 5-11-1 |

4 | Nitrate (mg/L) | 0.75 | 0.79 | 0.78 | 0.75 | 5-11-1 |

S. No. . | Parameter . | Training . | Validation . | Testing . | All . | Network architecture . |
---|---|---|---|---|---|---|

1 | Electrical conductivity (mS/cm) | 0.94 | 0.94 | 0.96 | 0.95 | 5-11-1 |

2 | DO (mg/L) | 0.76 | 0.76 | 0.74 | 0.76 | 5-11-1 |

3 | BOD (mg/L) | 0.83 | 0.83 | 0.85 | 0.81 | 5-11-1 |

4 | Nitrate (mg/L) | 0.75 | 0.79 | 0.78 | 0.75 | 5-11-1 |

Stream water quality modeling is a highly complex nonlinear problem due to basin characteristics, atmospheric influences, and turbulent nature of fluid flow. So linear regression models were not able to give accurate results in the absence of linear correlations between the cause and effect variables. Statistical nonlinear regression models would possibly require more information regarding the physics of the problem for better results. This could also mean the addition of new terms into the models. It is difficult to establish stoppage criteria for inclusion of new terms into the nonlinear regression models. This difficulty is alleviated in neural network models due to the flexibility in choosing network architecture, especially the hidden layer with hidden neurons. This makes ANN models better than nonlinear regression models for highly nonlinear problems. They use relatively less information or they can handle with easily measurable data. Backpropagation and feedforward algorithms aid in adjusting the weights of the input to hidden and output layers to get the lowest error and highest correlation value.

## CONCLUSION

In this study, the potential and applicability of statistical linear and nonlinear regression models and ANNs are investigated in modeling a stream water quality problem for a stream network in a non-Karst watershed. The study investigates the use of climate and land use parameters in predicting the SWQPs in a causal modeling framework with the help of statistical and neural network models. This study extends the methodology of Anmala *et al.* (2015) for the Godavari River Watershed in the Telangana region of India. The use of ANNs, PCA in reducing the input data dimension, and RBFNNs has been justified as satisfactory results have been obtained in solving the highly nonlinear problem of stream water quality prediction in causal modeling framework. From the above results, the following conclusions can be summarized. The predictions of stream water quality using ANNs with climate and land use factors as inputs are superior to statistical linear and nonlinear regression approaches for all the WQPs. This is evident in the regression coefficients obtained (using Pearson correlation coefficient) for the statistical linear, nonlinear, and neural network models. The PCA revealed that only five out of eight parameters are effective in modeling stream water quality problems using neural networks. Using PCA, the inclusion of land use factors along with climate parameters is successful in the prediction of WQPs of the non-Karst watershed in the current study. A concise equation for electrical conductivity prediction is developed using only the five effective input parameters. The procedure developed for predictive equation of electrical conductivity can be used to develop similar equations individually for all the WQPs of riverine watersheds for efficient environmental, monitoring, and assessment purposes.

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories. SRTM 30m DEM data https://earthexplorer.usgs.gov/. Water quality data https://tspcb.cgg.gov.in/Pages/Envdata.aspx. The rainfall and temperature data https://cdsp.imdpune.gov.in/. For 2021year, rainfall and temperature data https://power.larc.nasa.gov/data-access-viewer/. Global land cover mapping at 30 m resolution data http://globallandcover.com/

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Water Environment Research*

**93**(11), 2360–2373

*Earth Explorer*. https://earthexplorer.usgs.gov/