ABSTRACT
The present study explores the use of machine learning and geographical methods to forecast and assess the quality of water and examine the use and coverage of land. The study focuses on the Karamana River in India and looks at the significant changes in LULC models and water quality parameters between 2001 to 2020. The study reveals material from a dynamic environment manifested in the river through detailed numerical analysis and visualization, including urban development, deforestation, and agricultural development. Moreover, it has the ability to identify fluctuations in water quality metrics, which includes temperature, dissolved oxygen levels, pH, conductivity, and biochemical oxygen demand (BOD), providing important insights into pollutants, ecological stresses, and human impacts. The Water Quality Index (WQI) evaluates the overall water quality. Correlation analysis elucidates the interactions between water quality parameters while also providing insight into pollution sources and ecosystem dynamics. The findings illustrate the efficacy of the random forest method in soil classification, as it effectively covers an even distribution of soil types. Overall, the study emphasizes the importance of informed strategies for land management and techniques for managing water quality in sustaining the Karamana River ecosystem.
HIGHLIGHTS
Water quality: analyzes key water quality parameters and pollutants.
Land use and land cover (LULC): examines the impact of urbanization and deforestation.
Machine learning: utilizes the random forest method for soil classification and water quality analysis.
Forecasting: integrates machine learning and geographical methods to predict future trends in water quality and land use.
Karamana River: focuses on a vital river in India.
ABBREVIATIONS
- LULC
land use land cover
- GIS
geographic information system
- ERA
ecological risk assessments
- EBK
empirical Bayesian kriging
- USGS
United States geological survey
- DO
dissolved oxygen
- BOD
biochemical oxygen demand
- FC
faecal coliform
- TC
total coliform
- pH
potential of water
- WQI
water quality index
- Mg/L
milligrams per litre
- μmhos/cm
micromhos per centimetre
- CNN
convolutional neural network
- SVM
support vector machine
- RF
random forest
INTRODUCTION
Approximately 2.5 billion people get their daily water supply from groundwater sources. The quality and storage capacity of aquifers, underground reservoirs of groundwater, have been adversely affected by pollution and changes in land use and cover (Bawa & Dwivedi 2019). The quality of groundwater is considerably affected by the rise in human activity, including mining, industrialization, and urbanization (Lipczynska-Kochany 2018). Urbanization, defined as the expansion of the worldwide urban population, has risen from 30% in 1950 to beyond 50%. This is expected to persist as a global phenomenon and a significant catalyst for changes in land use (Jena 2023). Increasing populations may exacerbate environmental and climatic problems, notably by degrading groundwater quality.
‘Land use and land cover (LULC) changes’ denote modifications resulting from human activities that directly or indirectly affect the Earth's surface, significantly influencing groundwater quality and availability (Rasool et al. 2020). Alterations in land utilization, including heightened urbanization, deforestation, increased agricultural practices, and reduced vegetation cover, exert both direct and indirect effects on local ecosystems, water retention, water quality, precipitation patterns, surface water flow, groundwater infiltration, and the processes of water evaporation and plant transpiration (Fahad et al. 2021). In 1984, the United States Geological Survey (USGS) investigated the effects of modifying land surface conditions on groundwater quality. This initiated research examining the correlation between land use, land cover fluctuations, and water quality (Helsel et al. 1984).
In 1984, the USGS examined the correlation between LULC changes and groundwater quality. The primary objective of the research was to ascertain the geographical effects of changes in surface land conditions on groundwater quality. Groundwater levels are influenced by LULC changes (Johnson & Belitz 2009; Salman et al. 2018; Ahmad et al. 2021; Islami et al. 2022). Assessing LULC often entails intricate and labour-intensive techniques. Nonetheless, analyzing satellite imagery with GIS and machine learning might assist researchers in conserving time and money (Gani et al. 2023). Reyes Gómez et al. (2017) conducted a chemical assessment of the surrounding groundwater. They identified a deterioration in water quality correlated with an escalation in the frequency of LULC changes. A distinct association exists between urbanization levels and the credentialization of formerly agricultural land, characterized by increased water runoff and reduced groundwater seepage capability (Hamad et al. 2012).
Moreover, the existence of paved or impermeable concrete highways substantially reduces natural areas, obstructs aquifer replenishment, and decreases the quantity of water bodies in urban locations. This, therefore, results in increased temperatures and a deterioration in groundwater quality (Muthamilselvan et al. 2016). Investigating the impact of LULC alterations on groundwater quality via response surface methodology (RSM) might be an effective educational tool. This research analyses the correlations among the water pollution index, different types of LULC, and other response variables via regression-based statistical modelling (RBSM). A mathematical and statistical relationship between the response and the input variables (or) factors is established in the RSM model. Begum et al. (2023) reported that the model is grounded in accurate data. Using statistical and mathematical techniques, the RSM model connects the response and input variables (or) factors. However, these approaches are frequently criticized for being opaque, ambiguous, sensitive, and unreliable (Hossain & Patra 2020; Uddin et al. 2023). The method is becoming more and more popular for evaluating the quality of surface and groundwater. It is a standard procedure to convert the data into a numerical format to present a comprehensive picture of the state of the water quality (Agbasi et al. 2023; Biswas et al. 2023). The quality of groundwater and associated health risks can be assessed by analyzing its hydrochemical properties and applying the total hazard index (THI). Multiple studies have investigated the chronic daily intake of THI to determine its carcinogenic or non-carcinogenic health effects. These studies seek to measure the total dissolved organic carbon that infiltrates groundwater supplies and is subsequently ingested by humans through food and water consumption (Hossain & Patra 2020). Therefore, this study especially intends to evaluate how LULC changes effect groundwater quality by means of GIS data paired with machine learning approaches to forecast changes in groundwater parameters.
Changes in LULC brought about by humans significantly affect natural ecosystems and ecological problems, such as biodiversity loss, land degradation, and ecosystem fragility. Consideration must be given to the substantial environmental risks these changes pose. The ecological risk assessments (ERA) index, which considers the overall damage to the ecosystem and its components, is one tool used to quantify this threat. Zhang et al. ( 2020) showed that this method could be used to evaluate a specific area's risk factors. It demonstrates how both naturally occurring and artificially induced interventions have evolved. There are three distinct ways that ERA has evolved: conventional, regional, and landscape. Currently in widespread use, the landscape ERA approach corresponds with LULC changes (Hou et al. 2020). Assessing how LULC changes affect groundwater quality can be done effectively by utilising machine learning algorithms to combine hydrological data with multi-temporal remotely sensed data in a geographic information system (GIS). Analysing changes in LULC is a cost-effective technique that improves understanding of this connection by providing more dependable data on their influence on groundwater quality (Sidi Almouctar et al. 2021). A bacterial consortium was used to remove chemical contaminants from household wastewater by Ibrahim et al. (2020). There were no harmful effects at the ideal 2.5 mg/L dosage. Chemical oxygen demand, biological oxygen demand, total solids, dissolved solids, suspended solids, ammonia, nitrate, Kjeldahl nitrogen, and oil and grease all had high treatment percentages. Various physicochemical parameters are employed to determine water quality indicators through GIS modelling. Subsequently, groundwater quality is demonstrated to fluctuate spatially and temporally through machine learning (Hossain et al. 2021; Uddin et al. 2023). The geographical scope of groundwater contamination can be assessed by utilising data from adjacent monitoring wells and estimating values for an unmeasured area (Uddin et al. 2024). The most popular and straightforward interpolation techniques are inverse distance weighted, universal kriging, traditional ordinary kriging, and empirical Bayesian kriging (EBK). Subsetting, or choosing a subset of a dataset statistically representative of the entire, is used explicitly in EBK. Simulations provide several advantages over more conventional interpolation techniques, including the capacity to independently determine the most crucial elements through the simulation of actual events and subsequent analysis of the outcomes (Kumar et al. 2019).
Although the water quality index (WQI) offers thorough evaluations of river systems, sample collection, and laboratory analysis are expensive due to the several water quality indicators included in its computation. In order to maximize the accuracy of WQI prediction using the smallest possible set of water parameters, Mo et al. (2024) attempted to identify the most important water parameters and the most trustworthy models, taking seasonal variations in the water environment into account. Because of the growing demand for freshwater, the effect of land use on water quality is becoming a worldwide problem. In the study by Gani et al. (2023), satellite imagery was utilized to categorize LULC kinds, and the root mean squared (RMS)–WQI model was employed to ascertain the WQ status. The majority of the WQs were found to be within the ECR surface water guideline level. According to the RMS–WQI result, the water quality is good and has a ‘fair’ classification throughout all test locations, ranging from 66.50 to 79.08. Analysis of groundwater resources is a crucial technical tool for preventing illness and reducing water pollution. Water quality evaluations in this area of research are carried out utilizing sequential parametric data that are gathered in real time by Manocha et al. (2023). A single time-invariant model is usually used by a number of state-of-the-art water quality evolution mechanisms to assess the water quality.
Previously conducted studies on the impact of LULC changes on water quality show that urbanization and agricultural growth are worsening the components. Research on the use of machine learning on satellite data for groundwater prediction is lacking, even with remote sensing and GIS techniques used. This research attempts to bridge these gaps by analyzing changes in LULC and predicting water quality data using modern machine learning techniques and GIS methodologies. Consequently, monitoring and managing the surroundings will be significantly different. This study presents a fresh approach to using geographic data combined with machine learning algorithms to forecast water quality properties. A comprehensive LULC study led to the discovery of changes in water quality. Forecasting models for water quality are then developed using many machine learning approaches. Furthermore included is the WQI for groundwater sources, which incorporates relevant parameters such as pH, dissolved oxygen (DO), biological oxygen demand (BOD), nitrates, and coliform levels. These very essential components let one evaluate the suitability of groundwater for human use and understand the effect of LULC variations on groundwater quality.
METHODS
Data collection
The geographical data, historical water quality records, satellite pictures, and topographical maps are collected. The satellite imagery Landsat 7 and 8 has been downloaded from USGS Earth Explorer. Several machine learning techniques, including convolutional neural networks, random forest (RF), and support vector machines, are applied to construct prediction models for water quality evaluations. The models included geographical and meteorological characteristics and were trained using historical data. Accuracy, precision, recall, and F1-score were some of the performance measures used to assess the constructed machine learning models.
Correlation coefficient and linear regression
Properly managing available water resources is crucial to fulfiling future water demand. Numerous physico–chemical characteristics define the quality of water. Various factors, including the water supply, kind of pollutants, seasonal changes, and others, contribute to considerable fluctuations in these parameters. Statistical analysis offers a significant quantity of information. Descriptive statistics, regression analysis, and correlation analysis are valuable methods for evaluating average values and predicting variables, especially when assessment is challenging. These methodologies examine the physicochemical characteristics of a river basin.
A correlation coefficient, a measure of the degree of association between two variables, shows F1-score, recall, accuracy, and precision were among the many measures used to assess the built ML models. Correlation measurements show that there is co-variation. In other words, it does not prove that one variable causes another. The two processes, precipitation and runoff, are mutually inclusive. For instance, two solutes detected at different times and places could be linked from the same source. There must be process knowledge or other proof of causality that is not based on statistical analysis.
The properties of correlation measures include being dimensionless and scaled to fall within −1 < r < 1. R = 0 indicates the lack of association between two variables. ‘R’ is a positive correlation occurs when there is a direct relationship. There is a positive correlation between two variables, meaning that as one variable grows, the other variable likewise increases. ‘r’ is negative when they vary in the opposing directions. Correlation serves as a method to assess if there is a temporal or geographical pattern when one variable represents time or place.
Using the right correlation metric is a responsibility that researchers must take seriously. There are two types of correlations for data: linear and nonlinear. A monotonic connection between two variables, where ‘y’ typically increases or decreases in tandem with ‘x’, is a key concept. When both variables are non-negative, this correlation may be nonlinear, exhibiting patterns that resemble power functions, piecewise linear patterns, or exponential patterns. This nonlinearity provides proof that a linear correlation metric would not be acceptable. Nonlinearity weakens the force of a linear measure, resulting in a decreased correlation coefficient and less significance compared to a linear connection with an equivalent degree of scatter. Tests for correlation like Spearman's rho, Kendall's tau, and Pearson's r are often used by researchers. Two of the rank-based tests look for monotonic connections. The more well-known Pearson's r-value is used to assess linear correlation, a subset of monotonic correlation.
Pearson's r
Let x and y denote the sample means. If the correlation coefficient ‘r’ between variables X and Y is very high, it indicates a strong link between these two variables. In such instances, attempting a linear relationship in the specified format is practical: Y = Ax + B.
Correlation-based K-means clustering for water quality parameters
The association between water quality metrics was investigated using the K-means method, a prominent unsupervised machine learning approach. K-means clustering was used to discover underlying patterns and groups in the multidimensional water quality dataset. The technique made identifying links and associations between various water quality measures easier by grouping comparable data based on feature similarity. Each cluster reflected a unique profile or pattern of water quality attributes, enabling researchers to investigate how various metrics varied throughout the dataset. The centroids of these clusters were then evaluated to identify similarities and variances in water quality characteristics. This method gave valuable insights into the interdependence and connections between different water quality measurements, resulting in a thorough knowledge of the complex linkages that regulate water quality dynamics. Finally, by exploiting the K-means algorithm's capabilities, this research made it easier to explore and analyse complicated patterns within the water quality dataset, allowing for more informed decision-making and environmental management techniques.
Groundwater, hidden under the surface of the earth, is shielded from direct sunlight, seasonal temperature variations, and industrial contaminants, therefore generating somewhat consistent temperatures and lower DO levels than surface water. While surface runoff and rain eventually wind their way into groundwater, pollutants impair its quality more slowly. Surrounding soil and rock formations influence groundwater pH; agricultural chemicals raise alkalinity; industrial contaminants typically cause acidity. Usually high because of dissolved minerals from geological formations, conductivity rises with industrialization and farming. Although biochemical oxygen demand is constant, it may increase in reaction to organic molecules entering groundwater, especially near farms. Groundwater is quite significant under the WQI in terms of pH, conductivity, dissolved solids, hardness, and nitrate levels. These elements vary from studies of surface water as, rather than direct air exposure, groundwater is affected by deeper geological and human-induced changes. A review of the WQI studies underscores the need to use a customized weighting technique and criteria for groundwater, with special attention on pollutants reflecting their unique features. Monitoring these components guarantees both the safety and sustainability of groundwater for use by people and by nature.
RESULTS
LULC changes of the Karamana River Basin during 2001, 2011, and 2020
LULC . | Water body (km2) . | Built-up area (km2) . | Plantation (km2) . | Forest (km2) . |
---|---|---|---|---|
2001 | 7.3 | 63.65 | 507.66 | 123.41 |
2011 | 6.12 | 90.71 | 486.8 | 118.37 |
2020 | 5.65 | 132.72 | 461.08 | 102.55 |
LULC . | Water body (km2) . | Built-up area (km2) . | Plantation (km2) . | Forest (km2) . |
---|---|---|---|---|
2001 | 7.3 | 63.65 | 507.66 | 123.41 |
2011 | 6.12 | 90.71 | 486.8 | 118.37 |
2020 | 5.65 | 132.72 | 461.08 | 102.55 |
Overall, the LULC changes from 2001 to 2020 illustrate the significant impact of human activities on various land categories. The decrease in water bodies, plantations, and forests, coupled with the increase in built-up areas, emphasizes the need for sustainable land management practices and policies that balance development with ecological preservation. Addressing these trends is essential to ensure ecosystems' and communities' long-term health and resilience.
Data on water quality parameters and LULC changes for 2001, 2011, and 2020 are shown in Table 2.
Year . | Water body . | Built-up area . | Plantation . | Forest . | WQ metric . |
---|---|---|---|---|---|
2001 | 7.3 | 63.65 | 507.66 | 123.41 | 2 |
2011 | 6.12 | 90.71 | 486.8 | 118.37 | 2 |
2020 | 5.65 | 132.72 | 461.08 | 102.55 | 1 |
Year . | Water body . | Built-up area . | Plantation . | Forest . | WQ metric . |
---|---|---|---|---|---|
2001 | 7.3 | 63.65 | 507.66 | 123.41 | 2 |
2011 | 6.12 | 90.71 | 486.8 | 118.37 | 2 |
2020 | 5.65 | 132.72 | 461.08 | 102.55 | 1 |
Table 2 displays LULC statistics and water quality parameters for the years 2001, 2011, and 2020. As per the table, the ‘Water Quality metrics’ column is added in this table additionally. It contains a water quality metric that may be generated from several water quality data. This data make it easier to analyze temporal changes in land use patterns and their possible effects on water quality. Trends in urbanization, deforestation, agricultural growth, and accompanying variations in water quality measures across time may provide insights into environmental dynamics and guide sustainable land management strategies. Water quality is LULC, as increasing urban areas and greater agricultural activities usually result in higher pollutant loads into water systems. Urbanization alters the natural surroundings, therefore reducing their ability to filter pollutants. Increasing impermeable surfaces generates more runoff that enables pollutants to enter water bodies and influences WQI values as well as certain water quality standards. Here, WQI values of 1 and 2, respectively, match ‘good’ and ‘fair’ quality accordingly. These general categories, in the meantime, could hide crucial information. For instance, the WQI was rated as 2, meaning ‘fair’, but the DO levels were below the required levels for aquatic life, and the faecal coliform bacteria levels were high, suggesting significant pollution in 2001. This difference exposes the limits of depending only on WQI, as it cannot adequately show the influence of several water quality parameters. Variations in the WQI, therefore, significantly relate to variations in LULC, which emphasizes the need for sustainable land management in preserving water quality.
The equation for multiple linear regression
Year . | Temperature . | DO (mg/L) . | pH . | Conductivity (μmhos/cm) . | BOD5 (mg/L) . | Nitrate nan N + nitritenann (mg/L) . | FC (MPN/100 mL) . | TC (MPN/100 mL) mean . |
---|---|---|---|---|---|---|---|---|
2001 | 29.38 | 0.47 | 6.35 | 1.78 | 4.058 | 0.96 | 1.75 | 1.74 |
2011 | 27.25 | 3.21 | 7.09 | 1.25 | 9.91 | 1.8 | 0.61 | 1.44 |
2020 | 25.34 | 6.53 | 7.75 | 0.78 | 15.18 | 2.56 | 0.41 | 1.17 |
Year . | Temperature . | DO (mg/L) . | pH . | Conductivity (μmhos/cm) . | BOD5 (mg/L) . | Nitrate nan N + nitritenann (mg/L) . | FC (MPN/100 mL) . | TC (MPN/100 mL) mean . |
---|---|---|---|---|---|---|---|---|
2001 | 29.38 | 0.47 | 6.35 | 1.78 | 4.058 | 0.96 | 1.75 | 1.74 |
2011 | 27.25 | 3.21 | 7.09 | 1.25 | 9.91 | 1.8 | 0.61 | 1.44 |
2020 | 25.34 | 6.53 | 7.75 | 0.78 | 15.18 | 2.56 | 0.41 | 1.17 |
Table 3 shows the Karamana River monitoring sites 2001, 2011, and 2020 annual averages of water quality parameters. Samples from the same sites were collected annually to ensure consistency in the findings. While these annual averages provide a broad view of long-term trends in water quality, they may hide important seasonal variations or transient changes required for effective environmental monitoring and management. Daily, weekly, or monthly more frequent data collection intervals help water quality assessments catch these oscillations and provide a more realistic picture of water quality dynamics. Therefore, depending only on yearly averages might restrict the possibility of promptly detecting and solving particular water quality problems. Future studies could involve adding monthly or seasonal sampling intervals with yearly averages to provide a more full knowledge of the variations in water quality, therefore enabling more preventive steps and better management techniques.
Water quality parameters standard values
The water quality standard parameters are mentioned in Table 4. The optimal water temperature for most aquatic life ranges from 0 to 20 °C. Temperatures between 20 and 25 °C are suitable for some species but may cause stress, while temperatures above 25 °C can be harsh, lowering oxygen levels and increasing disease risk. DO levels above 6 mg/L support diverse aquatic life and healthy ecosystems. Levels between 4 and 6 mg/L support some species but limit sensitive ones, and levels below 4 mg/L cause hypoxia, leading to fish deaths and severe effects. A pH range of 6.5–8.5 is ideal for most species and supports natural activities, while a range of 6–9 can sustain some life but may indicate stress – pH values below 6 or above 9 harm physiological functions and ecological balance. Conductivity values below 500 μmhos/cm indicate low dissolved solids and pollutants, while values between 500 and 1,000 μmhos/cm are acceptable but may suggest ion or mineral impact. Values above 1,000 μmhos/cm indicate contamination from various sources. Biochemical oxygen demand levels below 3 mg/L reflect low organic pollution and high water quality, while 3–6 mg/L indicate moderate pollution. Levels above 6 mg/L suggest high pollution, reducing DO levels and impacting ecosystems. Nitrate and nitrite levels below 10 mg/L are safe for ecosystems and humans, while levels between 10 and 20 mg/L may cause eutrophication but are manageable. Levels above 20 mg/L indicate contamination and are harmful to water quality. The absence of FC (0 MPN/100 mL) indicates no contamination and low illness risk. Levels below 100 MPN/100 mL indicate mild pollution, still safe for some uses, while levels above 100 MPN/100 mL indicate high pollution, posing health hazards. The absence of TC (0 MPN/100 mL) indicates low contamination and high quality, while levels below 500 MPN/100 mL indicate minor pollution, treatable for use. Levels above 500 MPN/100 mL indicate severe pollution and urgent health risks.
Parameters . | Good . | Fair . | Poor . |
---|---|---|---|
Temperature | 0–20 °C | 20–25 °C | >25 °C |
DO | >6 mg/L | 4–6 mg/L | <4 mg/L |
pH | 6.5–8.5 | 06-Sep | <6 or >9 |
Conductivity | <500 μmhos/cm | 500–1,000 μmhos/cm | >1,000 μmhos/cm |
Biochemical oxygen demand | <3 mg/L | 3–6 mg/L | >6 mg/L |
Nitratenan N + nitritenan (mg/L) | <10 mg/L | 10–20 mg/L | >20 mg/L |
FC | 0 MPN/100 mL | <100 MPN/100 mL | >100 MPN/100 mL |
TC | 0 MPN/100 mL | <500 MPN/100 mL | >500 MPN/100 mL |
Parameters . | Good . | Fair . | Poor . |
---|---|---|---|
Temperature | 0–20 °C | 20–25 °C | >25 °C |
DO | >6 mg/L | 4–6 mg/L | <4 mg/L |
pH | 6.5–8.5 | 06-Sep | <6 or >9 |
Conductivity | <500 μmhos/cm | 500–1,000 μmhos/cm | >1,000 μmhos/cm |
Biochemical oxygen demand | <3 mg/L | 3–6 mg/L | >6 mg/L |
Nitratenan N + nitritenan (mg/L) | <10 mg/L | 10–20 mg/L | >20 mg/L |
FC | 0 MPN/100 mL | <100 MPN/100 mL | >100 MPN/100 mL |
TC | 0 MPN/100 mL | <500 MPN/100 mL | >500 MPN/100 mL |
Correlation between water quality parameters
Table 5 provides a comprehensive listing of significant water quality measures for a specific body of water. These measures, including the WQI, total and FC counts, temperature, pH, conductivity, biochemical oxygen demand, and DO, collectively influence the overall condition of the aquatic environment. This, in turn, affects the water's suitability for drinking, recreation, and supporting aquatic life.
Temperature . | DO (mg/L) . | PH . | Conductivity (μmhos/cm) . | BOD5 (mg/L) . | Nitratenan N + nitritenan (mg/L) . | FC (MPN/100 mL) . | TC (MPN/100 mL) . | WQI . |
---|---|---|---|---|---|---|---|---|
25.5 | 1.9 | 7.7 | 2,941 | 26.9 | 2.8 | 10,875 | 37,375 | 2 |
25.1 | 7.4 | 7 | 35 | 0.1 | 0 | 165 | 409 | 0 |
27 | 3.1 | 7.2 | 1,416 | 13.2 | 4.41 | 2,992 | 19,917 | 2 |
26.3 | 7 | 7.1 | 30 | 0.5 | 0.32 | 211 | 496 | 0 |
29 | 0.5 | 7.1 | 1,717 | 21.2 | 4.41 | 3,317 | 15,833 | 2 |
29 | 7 | 7.4 | 55 | 0.9 | 0.32 | 800 | 1,400 | 0 |
28 | 0.8 | 7.1 | 1,239 | 11.9 | 1.87 | 4,250 | 11,000 | 2 |
27.6 | 0.6 | 7 | 1,787 | 14.1 | 1.49 | 8,542 | 17,500 | 2 |
26 | 1.8 | 6.8 | 2,012 | 5.5 | 1.8 | 20,458 | 31,167 | 2 |
28 | 0.8 | 6.8 | 2,156 | 7.4 | 1 | 14,825 | 20,733 | 2 |
28.571 | 4.1 | 6.775 | 511.4 | 6.788 | 1.373 | 2,150 | 3,100 | 2 |
Temperature . | DO (mg/L) . | PH . | Conductivity (μmhos/cm) . | BOD5 (mg/L) . | Nitratenan N + nitritenan (mg/L) . | FC (MPN/100 mL) . | TC (MPN/100 mL) . | WQI . |
---|---|---|---|---|---|---|---|---|
25.5 | 1.9 | 7.7 | 2,941 | 26.9 | 2.8 | 10,875 | 37,375 | 2 |
25.1 | 7.4 | 7 | 35 | 0.1 | 0 | 165 | 409 | 0 |
27 | 3.1 | 7.2 | 1,416 | 13.2 | 4.41 | 2,992 | 19,917 | 2 |
26.3 | 7 | 7.1 | 30 | 0.5 | 0.32 | 211 | 496 | 0 |
29 | 0.5 | 7.1 | 1,717 | 21.2 | 4.41 | 3,317 | 15,833 | 2 |
29 | 7 | 7.4 | 55 | 0.9 | 0.32 | 800 | 1,400 | 0 |
28 | 0.8 | 7.1 | 1,239 | 11.9 | 1.87 | 4,250 | 11,000 | 2 |
27.6 | 0.6 | 7 | 1,787 | 14.1 | 1.49 | 8,542 | 17,500 | 2 |
26 | 1.8 | 6.8 | 2,012 | 5.5 | 1.8 | 20,458 | 31,167 | 2 |
28 | 0.8 | 6.8 | 2,156 | 7.4 | 1 | 14,825 | 20,733 | 2 |
28.571 | 4.1 | 6.775 | 511.4 | 6.788 | 1.373 | 2,150 | 3,100 | 2 |
Throughout the observation period, values for all parameters were recorded, with each row in the table corresponding to a distinct measurement event. This research is complex and requires your expertise. For instance, temperature data provide insights into thermal changes that might affect chemical reactions, biological activities, and habitat suitability for aquatic animals. DO levels reflect the water's capacity to sustain aerobic life forms, with deviations from optimal levels indicating potential environmental stresses. pH values impact the solubility of various substances and the growth of aquatic plants and animals. Conductivity, the water's ability to carry electrical currents, is often correlated with salinity and dissolved ion concentrations. Biochemical oxygen demand measures the oxygen required by microbes to decompose organic material, indicating levels of organic pollution. Nitrite and nitrate levels, along with total and FC counts, suggest bacterial contamination and potential nutrient pollution, respectively. In the table, the WQI values of 0, 1, and 2 indicate different quality levels: 0: good quality, 1: fair quality, 2: poor quality.
Water quality is considered ‘good’ (WQI = 0) when it meets or exceeds quality standards, making it fit for drinking, recreation, or providing habitat for aquatic organisms. High-quality water typically has environmental parameters such as temperature, DO, pH, and conductivity within acceptable limits. A WQI score of 1 signifies ‘fair’ water quality. This indicates that the water may differ somewhat from ideal conditions or contain significant amounts of specific contaminants yet remains suitable for various uses. Treatment or management interventions may be needed to make fair-quality water more appropriate for particular purposes, though it may still meet minimum legal standards. A WQI of 2 denotes ‘poor’ water quality, indicating that the water does not meet acceptable standards, posing risks to human health and the environment. Poor-quality water may exhibit significant deviations from ideal conditions, such as high contaminant levels, low DO, or abnormal pH values. Immediate action and comprehensive management strategies are often required to address the root causes of poor water quality and mitigate associated risks.
Confusion matrix for LULC classification:
The RF was correctly classified as 213 forest samples, while 26 forest samples were misclassified. The model successfully categorized 212 plantation samples; 20 were classified incorrectly. In addition, 262 built-up area samples were classified, and 23 were misclassified. Out of 223 water body samples, 21 samples were misclassified.
The SVM correctly classified 237 forest samples, while 13 were misclassified. The model successfully categorized 237 plantation samples, with 17 plantation samples misclassified. Additionally, 228 built-up area samples were correctly classified, while 22 were misclassified. For water bodies, 228 samples were classified correctly, and 18 samples were misclassified.
The CNN properly categorized 227 forest samples, with five forest samples misclassified. The model successfully categorized 249 plantation samples, with one plantation sample misclassified. Additionally, 256 built-up area samples were correctly classified, with three misclassified. A total of 258 samples were classified correctly for water bodies, and 1 sample was misclassified. These three models are compared and contrasted based on performance indicators.
Performance metrics
Table 6 displays the following information: this ML project analyses four landscape features: water bodies, urban areas, plantations, and forests. A CNN, SVM, and RF were employed for classification. The CNN demonstrated impressive performance with 99% accuracy and high precision, recall, and F1-scores, approximately 0.99 and 0.98. This led to its recommendation as the preferred classification strategy. The SVM showed slightly lower but still strong performance with an accuracy of 93%, maintaining consistent precision, recall, and F1-scores at 0.92. The RF algorithm, while effective, had a comparatively lower accuracy of 91% with precision, recall, and F1-scores around 0.91. Among the strategies, CNN performs the best, making it the most suitable for the current multi-class classification challenge.
Model . | Accuracy . | Precision . | Recall . | F1-score . |
---|---|---|---|---|
CNN | 0.99 | 0.99 | 0.99 | 0.98 |
SVM | 0.93 | 0.93 | 0.93 | 0.92 |
RF | 0.91 | 0.91 | 0.91 | 0.91 |
Model . | Accuracy . | Precision . | Recall . | F1-score . |
---|---|---|---|---|
CNN | 0.99 | 0.99 | 0.99 | 0.98 |
SVM | 0.93 | 0.93 | 0.93 | 0.92 |
RF | 0.91 | 0.91 | 0.91 | 0.91 |
DISCUSSION
There is a clear association, as shown by the statistics. Water quality indicators, LULC patterns in the Karamana River Basin (KRB), saw a notable change between 2001 and 2020. These changes, as shown by comprehensive data and visual representations, provide crucial insights into the evolving environmental dynamics of the area. The LULC analysis shows that certain categories saw remarkable changes throughout the study. The number of water and woodland areas has decreased, while the amount of built-up areas and plantation zones has increased. The basin's ecological balance, hydrological regimes, and biodiversity are expected to be profoundly impacted by the complex processes of urbanization, deforestation, and agricultural growth. Water quality monitoring may also provide important information on riverine ecosystems and aquatic habitats. Biochemical oxygen demand, temperature, pH, DO levels, and conductivity are the factors that should be monitored since they show variations over time. These results reveal the evolution of pollution, ecological stress, and human influences across generations, and they are important indicators of water quality. Moreover, the correlation analysis clarifies the correlations between various water quality metrics, revealing possible interdependence and causative links. Strong positive or negative correlations between data, such as faecal and TC counts, provide important information about pollution sources and transmission paths. Furthermore, weaker connections, such as those between temperature and DO, give a more detailed knowledge of environmental dynamics and ecosystem function. The effects of urbanization on water bodies have also been reported for some time. For example, the study by Ramachandra et al. (2012) targeted the rapid urbanization of cities within India, particularly within the Bangalore region, where the growing size of the city brought about the drying of lakes and the deteriorated quality of water. Sridhar & Sathyanathan (2020) showed that surface water body spatial extent was significantly impacted by urbanization. Urban growth has resulted in a major decline and loss of water bodies. This presents with the observed decrease in water bodies in K.R.B., accompanied by deteriorating water quality metrics such as increased biochemical oxygen demand and bacterial contamination. A study by Mohan et al. (2011) focused on the rapid urbanization impact on LULC changes in Delhi. The study shows that cities are expanding into peripheral regions, converting rural areas into urban areas, and causing a significant decrease in crop and fallow land (Luo et al. 2020). The effect of L.U. and urbanization on W.Q. was significantly seasonal, having a greater effect in the winter than in the summer. However, when L.U. and urbanization disruption increased, the seasonality of the impact decreased. It suggests diverting urban expansion towards wasteland or sandy areas instead of productive agricultural lands. Furthermore, Maurya et al. (2021) found that the results are positively correlated with land use change and site-wise differences in the WQI. That is to say, it implies that human activities have strongly affected the water quality of the upper Ganges River. Such relations were also found, where regression was used to come up with similar results in K.R.B., whereby the reduction of the areas of forest and plantations was also aligned with the deterioration of water quality.
Since many studies have conducted extensive research on the use of machine learning algorithms in LULC evaluation, Mutale et al. (2024) sought to evaluate the effectiveness and dependability of SVM, RF, and artificial neural network algorithms for identifying changes in land use and land cover. Additionally, Arrighi & Castelli (2023) determined how well five supervised machine learning classifiers can forecast the ecological status of rivers using data gathered from the ecological monitoring stations' corresponding river catchments, including land use, climate, morphology, and water management parameters. Additionally, the outcomes of the canonical correlation analysis are contrasted with the performances of machine learning classifiers. Another study by Hussein et al. (2023) demonstrated that decision trees, including SVM, K-NN, ET, and discriminant analysis (DA), are effective. The classification learner tool in MATLAB featured these. Results show SVM as the most effective raw and normalized data classifier. In the training data, prediction accuracy was 90.8% for raw data and 89.2% for normalized data. Prediction accuracy values for testing data were 86.67 and 93.33% for raw and adjusted data, respectively. Therefore, CNN, a RF, and a SVM are applied to tackle the land cover categorization challenge. Analyzing the supplied confusion matrices and performance metrics sheds light on how well different machine learning methods classify land covers. The CNN demonstrated exceptional performance with a 99% accuracy rate and good metrics, including 98% F1-scores for the categories and accuracy and recall. The adaptability and efficacy of this system are shown by its correct categorization of forest, plantation, water body, and built-up area samples, making it well-suited for multi-class classification jobs. The SVM demonstrated robust performance but slightly lower accuracy than the CNN. However, the SVM consistently achieved precision and recall of 93% F1-scores of 92%. Despite RF's impressive A score of 91% on the four performance criteria used to evaluate classification models – accuracy, precision, recall, and F1-score – indicates that it is not as robust as CNN and SVM. Classification models may be evaluated visually using the accuracy, precision, recall, and F1-score representations. The CNN model outperforms the other two when accurately classifying land cover types. The results suggest that RF suits land cover classification tasks, providing exceptional accuracy and dependability. This might be very advantageous in several applications, such as environmental monitoring. Metropolitan strategic planning is the meticulous and intentional administration of natural resources in metropolitan regions. After analyzing the impacts of LULC changes on water quality indicators, the intricate network of interactions between humans and the river's natural processes is revealed. These findings highlight the need to prioritize water quality monitoring programmes and development strategies to preserve the area's ecological harmony and economic success.
CONCLUSIONS
KRB water quality fluctuations and land use/cover changes demonstrate the complex relationship between human activities and ecological health. Over the period from 2001 to 2020, significant shifts in land cover have been observed, including a decrease in water body coverage and expansion of built-up areas, indicative of urbanization and infrastructure development. Concurrently, declines in plantation and forested regions suggest potential impacts of agricultural practices and deforestation on the landscape. These changes affect water quality, as evidenced by fluctuations in critical metrics such as temperature, DO, pH, and pollutant concentrations. While water quality was generally classified as ‘good’ in 2001 and 2020, a shift to ‘fair’ quality in 2011 underscores the vulnerability of aquatic ecosystems to changing land use patterns. High levels of biochemical oxygen demand, nitrate/nitrite concentrations, and coliform counts in specific years point to pollution sources that necessitate management and conservation efforts. The use of deep learning and machine learning models for land cover categorization is another example of how new approaches may assist with environmental monitoring and decision-making. Considering all of our findings, it is clear that the KRB and its environs need ecologically conscious land management practices to safeguard water resources and maintain ecological harmony.
ACKNOWLEDGEMENTS
The authors are highly thankful to the Department of Civil Engineering, Chandigarh University for providing the required infrastructure to perform the research work.
FUNDING
No funding is applicable in the current study.
ETHICS STATEMENT
None of the humans or animals are the part of current research work.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.