Abstract
With growing urbanization, water contamination has become a problem. The water quality is assessed using physicochemical parameters and requires manual collection. Moreover, physicochemical parameters are insufficient for water quality monitoring as heavy rainfalls and abundance of air pollutants cause water pollution. Thus, considering natural factors as influencing parameters and the latest technology for easy and global coverage for sampling, water quality monitoring is modified. This study investigates Rawal watershed with (a) physicochemical, (b) air pollutants like nitrogen dioxide (NO2), and (c) meteorological variables like wind speed for June 2018 to September 2022. Correlation and regression analysis are performed. The results show negative correlations for NO2 with total dissolved solids (TDS) (ranging, 0.51–0.85), turbidity (range, 0.53–0.65), pH (range, 0.5–0.75), and dissolved oxygen (DO) (range, 0.5–0.82), and positive correlation with electric conductivity (EC) (range, 0.54–0.85). The regression analysis with LightGBM, multi-layer perceptron (MLP), and support vector machine (SVM) is applied with air pollutants, and meteorological parameters taken as independent variables giving root-mean-square error (RMSE) (ranging, 0.015–0.18). MLP gave an RMSE of 0.18 and 0.003 for TDS and pH, respectively. SVM performed well for DO, turbidity, and EC with RMSE ranging from 0.015 to 0.027. Moreover, floods on August 2022 are taken as a case study.
HIGHLIGHTS
Impact assessment of air pollutants on physicochemical parameters.
Meteorological features can have a moderate impact on water quality, i.e., wind speed with chl-α, EC, DO, and TDS, and air temperature with DO and TDS in August and September.
Machine learning approaches, i.e., LightGBM, MLP, and SVM, are applied for the analysis.
Floods can have a negative impact on water quality introducing an excess of pollutants and nutrients in water.
INTRODUCTION
With the growing human population, the need for production of goods and other resources is increasing rapidly giving rise to the pollution of water and air, in turn affecting the entire ecosystem and human health (Fuller et al. 2022). Water quality is affected by agricultural, industrial, and urban anthropogenic activities that result in large quantities of pollutants that may include nutrients, pathogens, and toxins entering the surface waters. According to the latest report by the United Nations (UN), more than 80% of the wastewater discharged into the rivers is a resultant of human activities (Nations 2022). The discharge of wastewater (e.g., brine) degrades the quality of water that cannot be directly used for potable water (via desalination) and industrial applications (Panagopoulos 2021, 2022; Panagopoulos & Giannika 2022). Along with, the anthropogenic causes of water contamination, the effect of air pollution and climatic changes on the quality of water cannot be ignored (Kan et al. 2012; Matyssek et al. 2012). The abundance of some air pollutants that includes nitrogen (N) can accelerate nutrient pollution or eutrophication in the water, resulting in a complex chain of events that harms the aquatic ecosystem (National Research Council 2000; Nie et al. 2018). In addition, the high concentrations of atmospheric carbon dioxide (CO2) levels can increase biological productivity in water bodies, which leads to acidification (Doney et al. 2009), which has a direct or indirect impact on marine organisms. Besides the effects of the air pollutants, climatic changes can be a reason of concern for the water contamination as they bring changes in the water cycle (Stanković et al. 2019). Over the years, meteorological events such as heavy precipitation, flood have intensified with climatic changes that affect water quality (Puczko & Jekatierynczuk-Rudczyk 2020).
Pakistan is a victim of contaminated surface and ground water because of the increasing urban population, resulting in a rise in agricultural and industrial pollution (Fatima et al. 2022; Mehmood et al. 2022). The resultant polluted water is the cause of major waterborne diseases such as typhoid and cholera (Shah et al. 2016). Like most countries, Pakistan relies on grab sampling or collecting manual samples from the location for water quality assessment and management activities (Ahmed et al. 2021). This is a hard and time-consuming task, which involves dependency on manual labour and is limited to collecting samples from inlet and outlet streams of the sampling sites. Using remote sensing techniques is a new approach that can enhance the data collection sampling process with acquiring data in high spatial and temporal resolution from thousands of sampling points at a time (Usali & Ismail 2010; Gholizadeh et al. 2016). Moreover, remote sensing technology is also being used to monitor the atmosphere (Martin 2008; Yang et al. 2017). This brings an opportunity to analyse the air, meteorological factor, and other factors affecting the physicochemical parameters for any location at any time.
The associations amongst the air pollutants, meteorological factor, and physicochemical water quality parameters are unknown as the influence of the natural factors over the water health is not backed by any concrete evidence. To prove the link among such factors, manual collection of sample through tools and equipment is required at a continuous rate which becomes a complex and time-consuming task. In this study, the solution to the data collection problem is proposed with the use of remote sensing and data reanalysis techniques for acquisition of data samples. As the physicochemical parameters are insufficient to determine the overall water quality due to the impact of natural occurring phenomenona water health that include events such as heavy precipitation patterns and abundance of air pollutants that can decompose and transport harmful pollutants or nutrients to the water bodies. Thus, the present study proposes an improved water quality monitoring model based on a hybrid of remote sensing and data mining techniques with a unique set of monitoring data that consider natural factors as influencing parameters and uses the latest technology as a source for easy and global coverage for sample collection. Three categories of data, i.e., (a) air pollutants, (b) meteorological parameter, and (c) physicochemical parameter, are collected for the monsoon months (June to September) of the years 2018–2022 for the stream network of Rawal Watershed, located in Islamabad, Pakistan. The unique set of data encompasses a total of 16 parameters, which include: six air pollutants, namely, (i) carbon monoxide (CO), (ii) nitrogen dioxide (NO2), (iii) ozone (O3), (iv) sulphur dioxide (SO2), (v) formaldehyde (HCHO), and (vi) methane (CH4), acquired from Sentinel-5 Precursor Level 2 (S5P-L2) TROPOspheric Monitoring Instrument (TROPOMI); three meteorological parameters, namely, (i) air temperature, (ii) wind speed, and (iii) total precipitation, taken from the ERA5 Climate Reanalysis Project (ERA5-CRP); and finally, seven physicochemical parameters, i.e., (i) total dissolved solids (TDS), (ii) pH, (iii) electrical conductivity (EC), (iv) Secchi disk depth (SDD), (v) dissolved oxygen (DO), (vi) turbidity (Tur), and (vii) chlorophyll-α (chl-α), acquired from the Sentinel-2 Multispectral Imager (S2-MSI) Level 1C (L1C) satellite. Pearson correlation analysis and regression analysis using three machine learning techniques, i.e., LightGBM (LGBM), multi-layer perceptron (MLP), and support vector machine (SVM), are performed on the acquired data to explore the interrelationships among the three categories of data. The air pollutants and meteorological parameters are taken as independent variables, and physicochemical water quality features are taken as dependent variables, giving a least root-mean-square error (RMSE) in the range of 0.015–0.18. In addition, the extracted data are examined by applying the weighted arithmetic water quality index (WAWQI) method. The new hybrid approach has led to practical and globally applicable methods for analysing the associations among the features and monitoring the water health of any water body. Moreover, the study reveals that the air pollutants and the meteorological parameters have a significant impact on the quality of the water especially with the abundance of certain air pollutants like NO2 that has an inverse correlation with the physicochemical parameters causing a prominent disturbance in the concentration levels of parameters including DO, pH, and TDS with correlations ranging from 0.61 to 0.86. Overall, the major contributions of the present study are summarized as follows:
An improved water quality monitoring model based on a hybrid of remote sensing and data mining techniques is proposed with a total of 16 unique parameters that are extracted for the stream network of the Rawal watershed which include three categories of data, i.e., (a) air pollutants, (b) meteorological, and (c) physicochemical parameters pertaining to the years 2018–2022 for the monsoon months of June to September. Previously, these sets of parameters have not been used in water quality monitoring models.
Correlation analysis is performed on the unique dataset extracted using remote sensing satellites and data reanalysis techniques to observe the dependencies amongst the natural factors and the physicochemical water quality parameters. This is the first study that has practically acquired remote sensing data, i.e., air pollutants and meteorological features for monitoring the water quality along with the physicochemical water quality parameters.
Regression analysis is proposed using three machine learning techniques, i.e., LGBM, MLP, and SVM to further find any dependencies amongst the 16 features with the air pollutants and meteorological parameters taken as independent variables to predict the five physicochemical water quality features individually, i.e., TDS, DO, pH, Tur, and EC.
This article is organized as follows: Section 2 discusses the related work. Section 3 covers the material and methods used, i.e., the study area, data collection, and pre-processing and describes the proposed methodology for the correlation and regression analysis along with the indexing method. Next, the results of the correlation and regression analysis are elaborated in Section 4 along with the flood case study. In Section 5, the conclusion and future work of this research is presented.
LITERATURE REVIEW
The relationship amongst the water quality, the air pollutants, and meteorological features is significant and has been studied over the years. The deposition of air pollutants such as NO2 can influence the aquatic ecosystems. Eutrophication in water bodies is attributed to the high concentrations of NO2 emissions (Lee & Schwartz 1981). In theory, the eutrophication can result in a complex chain of events disturbing the water health (National Research Council 2000; Nie et al. 2018). However, to support this theory, one needs access to tools and equipment that can help prove how much these natural factors can impact the water health. In general, this can be a complex task as these tools require the manual collection and the data acquired are not available in real time. Thus, even though the literature suggests the presence of complex interactions among the air pollutants, meteorological parameters, and physicochemical parameters, there is a disparity in such associations and the amount of work available that can support these theories and reflect the influence of air pollutants and air temperature on the water quality. Moreover, there is no evidence available of any comparisons between parameters that are employed using modern technology including remote sensing or machine learning techniques. Most of the studies have gathered data from the monitoring stations to analyse the individual air and water pollution of their respective study areas. These studies investigate the pollution level of the location using statistical methods such as generalized additive model (GAM) and other GIS-based techniques (Balaji et al. 2022; Ruhela et al. 2022). Table 1 discusses the data acquisition, methodology, and parameters used to address the relationships amongst meteorological, air, and water quality parameters.
Previous work on relationship amongst meteorological, air, and water quality parameters
Paper . | Study area . | Time period . | Data acquisition . | Parameters . | Methodology . | Results . |
---|---|---|---|---|---|---|
Balaji et al. (2022) | Madurai city, India | 2006 and 2020 | Tamil Nadu Pollution Control Board | Physicochemical, particulate matter and lead | Spatial interpolation technique | Higher than prescribed limits |
Zhang et al. (2017) | Tianjin, China | 2000–2011 | Water quality monitoring station, Meteorological station | Suspended solids, total dissolved solids, wind speed, rainfall, and solar radiation | GAM | Positive correlations between suspended solids and meteorological parameters |
Zhang & Zhi (2020) | Lake Erhai, China | 1999–2012 | China meteorological data network, Yunnan Province Environmental Bulletin | Physicochemical, air temperature, precipitation, wind speed, and sunshine hours | GAM | Lower rainfall leads to poor water quality, total nitrogen increases with air Temperature |
Zhang et al. (2021) | Lake Okeechobee, China | January 1996–December 2010 | Eight sampling sites by South Florida Water Management District | Physicochemical | GAM and random forest | Total nitrogen as predictor for chl-α |
Gintamo et al. (2021) | Cape Town, South Africa | 1979–2018 | South Africa Weather Services, National Groundwater Archive of South Africa | Physicochemical, temperature, precipitation | GIS | Decrease in water quality with high temperature and precipitation |
Paper . | Study area . | Time period . | Data acquisition . | Parameters . | Methodology . | Results . |
---|---|---|---|---|---|---|
Balaji et al. (2022) | Madurai city, India | 2006 and 2020 | Tamil Nadu Pollution Control Board | Physicochemical, particulate matter and lead | Spatial interpolation technique | Higher than prescribed limits |
Zhang et al. (2017) | Tianjin, China | 2000–2011 | Water quality monitoring station, Meteorological station | Suspended solids, total dissolved solids, wind speed, rainfall, and solar radiation | GAM | Positive correlations between suspended solids and meteorological parameters |
Zhang & Zhi (2020) | Lake Erhai, China | 1999–2012 | China meteorological data network, Yunnan Province Environmental Bulletin | Physicochemical, air temperature, precipitation, wind speed, and sunshine hours | GAM | Lower rainfall leads to poor water quality, total nitrogen increases with air Temperature |
Zhang et al. (2021) | Lake Okeechobee, China | January 1996–December 2010 | Eight sampling sites by South Florida Water Management District | Physicochemical | GAM and random forest | Total nitrogen as predictor for chl-α |
Gintamo et al. (2021) | Cape Town, South Africa | 1979–2018 | South Africa Weather Services, National Groundwater Archive of South Africa | Physicochemical, temperature, precipitation | GIS | Decrease in water quality with high temperature and precipitation |
In addition, along with the climatic changes and weather conditions, certain landscape changes can also worsen the environmental quality as suggested by Chen et al. (2021) who examined the impact of river dust on the PM10 concentrations for the downstream areas of Da'an and Dajia rivers. The results reveal that PM10 concentrations have increased during wet and dry seasons (Chen et al. 2021). Along with the traditionally used physicochemical features that are extracted by applying remote sensing techniques, other environmental factors can also be derived using the satellite images. These include topographical parameters that can be extracted from remote sensing techniques using Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) data. A study in 2014 (Beeson et al. 2014) extracted slope and other topographic data from DEM and concluded that special attention should be made in the selection of spatial resolution and input source as they keep changing due to the advancements in remote sensing and can prove critical in water quality models. In 2019 (Oyedotun 2019), land use changes were examined for Chaohu Lake using Landsat MSS and OLI/TIRS images of 1979–2015. The results showed a 25% increase in built-up area causing degradation of the basin due to improper land use activities. A study (Oyedotun & Timothy 2022) extracted hydrological parameters like the drainage network for Chaohu Lake using SRTM DEM data created with Landsat MSS and OLI/TIRS images for the time period of 1979–2015. The study focused on analysing the dynamics of the different streams by extracting the changes in land use patterns.
Nonetheless, studies show the individual acquisition of physicochemical water quality parameters and air pollutants using remote sensing. The collection of the physicochemical parameters gathered through high-resolution images from Landsat (Mohsen et al. 2021), Sentinel (Oiry & Barillé 2021), and MODIS satellites (Arıman 2021). By using semi-analytical methods, patterns are found that relate the physicochemical parameters with the satellite image bands, wavelengths to get equations that can estimate concentrations for these parameters including temperature (Ritchie et al. 2003), total suspended solids (Imen et al. 2015; Sharaf El Din 2020), chl-α (Gitelson & Merzlyak 1998; Liu et al. 2018; Xu et al. 2019), Tur (Harrington et al. 1992; Kapalanga 2015; Lim & Choi 2015), and DO (Theologou et al. 2015; Ahmed et al. 2022a). Similarly, air pollutants such as CO, SO2, NO2, and PM10 have been estimated using S5P-L2 (Al-Alola et al. 2022) and MODIS satellites (Dinoi et al. 2010).
The literature has revealed that modern technology such as remote sensing and data mining techniques is a more robust and economical method for the acquisition of parameters. In addition, there are studies that discuss the influence of air pollutants and meteorological features on the physicochemical parameters of water. However, there is no evidence found on utilizing the modern technology, i.e., remote sensing and data mining techniques for analysing the relationships amongst such features to enhance the water quality monitoring and management. Thus, in this study, a hybrid of remote sensing and data mining techniques is employed to investigate the influence of air pollutants and meteorological variables on the physicochemical water quality parameters through correlation and regression analysis. Therefore, unlike previous studies that have assessed the individual air and water pollution of a study area, this study has evaluated the interactions amongst the meteorological, air, and water quality parameters. In addition, the data are acquired through remote sensing and data reanalysis techniques on a large scale from multiple sample points of the study area unlike the traditional collection from inlet or outlet points of the streams and selected air/water quality monitoring stations. This gives access to a substantial set of data points that are used to perform data analytics and establish strong associations among the extracted parameters.
MATERIALS AND METHODS
This article analyses the associations among the air, meteorological, and physicochemical parameters for the study area of Rawal watershed stream network. The steps and the respective methods used are discussed in this section.
Study area
Rawal watershed (Ali et al. 2013) is located at latitude 33° 42′ N and longitude of 73° 7′ E and lies in the capital city of Islamabad. With a population of 1.2 million and a total area of 906.50 km2, Islamabad is the ninth populous city of the nation. It has a humid subtropical climate with an average annual temperature of 28.5 °C. The cool winter season in Islamabad lasts for 3 months (December to March) with the lowest temperature in January of 4 °C and hot, humid summers (May to August) with the highest temperature in June of 38 °C. A monsoon season (June to September) with an annual average precipitation of 1,143 mm. With an average of 15.2 days, July has the wet days with at least 1.016 mm of precipitation. The average annual humidity of Islamabad is 77% (Weather by month 2022). The air quality of Islamabad lies in the ‘Unhealthy for Sensitive Groups’ category with an air quality index of 115. However, it is termed as the cleanest city of Pakistan for the year 2019 with an annual average reading of 35.2 μg/m3 for PM2.5 pollutant. This makes it the tenth cleanest city of the nation, although the air quality status still remains unsafe for young children and its inhabitants with sensitive health conditions (PM2.5 2022).
(a) Map of Pakistan, (b) Rawal watershed, and (c) Rawal stream network.
Data collection
Three types of features are acquired in the current study, i.e., (a) the air pollutants, (b) meteorological factors, and (c) physicochemical water quality features. The air pollutants data for the Rawal stream network are extracted from S5P-L2 satellite images (Veefkind et al. 2012). The concentrations of six pollutants: CO, NO2, ozone (O3), SO2, formaldehyde (HCHO), and methane (CH4). The air pollutants data are extracted from the TROPOMI imager of the S5P satellite that operates with a swath of 2,600 km, a spatial resolution of 3.5 km ×7 km, and four spectrometers, i.e., ultra violet (UV), UV–visible (UV-VIS), near-infrared (NIR), short wave–infra red (SWIR) and eight spectral bands. The NO2 concentrations are derived by measuring the solar light that is backscattered by the Earth's atmosphere using UV, UV-VIS spectrometer. Band 4 is used for NO2 retrieval (Van Geffen et al. 2020). The HCHO, O3, and SO2 concentrations are derived from the Band 3 of the UV-VIS spectrometer (Theys et al. 2017; De Smedt et al. 2018; Garane et al. 2019). The Band 7 of the SWIR is used to measure CH4 and CO concentrations (Magro et al. 2021). The details of the S5P-L2 are given in Table 2. Table 3 describes the type of parameters along with their units, sources, effects, water solubility, and other details. The meteorological parameters, namely, air temperature, wind speed, and total precipitation, are extracted from ERA5-CRP of the Copernicus Climate Change Service (C3S). ERA5-CRP (Hersbach et al. 2020) has a Climate Data Store with a detailed record of the global atmosphere, ocean waves, and the land surface. It combines the historical observations based on assimilation and advanced modelling into a global consistent dataset. ERA5-CRP has a spatial resolution of 31 km and will cover a dataset from 1950 to real time. However, only 1979 to July 2020 data are available at the time of the study. The daily aggregates of total precipitation and averages of wind speed and air temperature are taken for the study. The air temperature is taken at a 2 m distance from the Earth's surface. The wind speed is taken at a 10 m distance from the surface of Earth and represents the northward neutral wind. Total precipitation is the accumulated rain and snow water that falls on the Earth's surface. Seven physicochemical parameters include TDS, pH, EC, SDD, DO, Tur, and chlorophyll-α (chl-α) are extracted from S2-MSI Level 1C (L1C) (Baillarin et al. 2012). S2 satellite has the MSI imager with a swath width of 290 km. S2-MSI contains the geo-located top of atmosphere reflectance in the L1C product scaled by value of 10,000. The physicochemical parameters are extracted from the adapted equations that are applied on the Sentinel images. Ground truth data from Rawal Dam Water Filtration Plant have been used for verification of the satellite data output results. The equation with the lowest RMSE is selected as the adapted equation for the study area. The selected equation and the respective RMSE for each parameter are mentioned in Table 4.
Product description
Product . | Launch date . | Coverage/cycle/revisit time . | Resolution . | Bands . | Spectral range (nm) . |
---|---|---|---|---|---|
S5P-L2 | 13 October 2017 | Global, <1 d, 16 d | 3.5 × 7 km2 (launch date – August 2019) 3.5 × 5.5 km2 (6 August 2019 to present) | UV (Bands 1, 2) | 270–495 |
UV-VIS (Bands 3, 4) | |||||
NIR (Bands 5, 6) | 675–775 | ||||
SWIR (Bands 7, 8) | 2,305–2,385 | ||||
ERA5-CRP | Early 2020 | Global, 1 h | 0.28 × 0.28 (31) km2 | – | – |
S2-MSI L1C | 23 June 2015 | Global, 5 d | 10–60 m | Ultra-Blue, Blue, Green, Red | 443–665 |
VNIR (Bands 5, 6, 7, 8, 8a) | 705–865 | ||||
SWIR (Bands 9, 10, 11, 12) | 940–2,190 |
Product . | Launch date . | Coverage/cycle/revisit time . | Resolution . | Bands . | Spectral range (nm) . |
---|---|---|---|---|---|
S5P-L2 | 13 October 2017 | Global, <1 d, 16 d | 3.5 × 7 km2 (launch date – August 2019) 3.5 × 5.5 km2 (6 August 2019 to present) | UV (Bands 1, 2) | 270–495 |
UV-VIS (Bands 3, 4) | |||||
NIR (Bands 5, 6) | 675–775 | ||||
SWIR (Bands 7, 8) | 2,305–2,385 | ||||
ERA5-CRP | Early 2020 | Global, 1 h | 0.28 × 0.28 (31) km2 | – | – |
S2-MSI L1C | 23 June 2015 | Global, 5 d | 10–60 m | Ultra-Blue, Blue, Green, Red | 443–665 |
VNIR (Bands 5, 6, 7, 8, 8a) | 705–865 | ||||
SWIR (Bands 9, 10, 11, 12) | 940–2,190 |
Six air pollutants, three meteorological, and seven physicochemical water quality parameters with their units and description
Variable (unit) . | Availability time period . | Description . | |
---|---|---|---|
Air pollutants | CO (mol/m2) | 2018/06/28 − 2022/09/17 | Sources: Combustion of fossil fuels, atmospheric oxidation of methane and other hydrocarbons, exhausts of motor vehicles Effects: Unintentional and suicidal poisonings, dizziness, confusion, unconsciousness, and death. Water solubility: Poor |
NO2 (mol/m2) | 2018/06/28 − 2022/09/10 | Sources: Burning of fuel, emissions from cars, trucks and buses, power plants, animal manure, precipitation falls across hard surfaces Effects: Acid rain, nutrient pollution, algae blooms, ozone, smog Water solubility: high, forms HNO3, forms nitrogen monoxide (NO) | |
O3 (mol/m2) | 2018/09/8 − 2022/09/17 | Sources: Volatile organic compounds and nitrogen oxides. Chemical plants, gasoline pumps, oil-based paints Effects: Sensitive vegetation, smog Water solubility: partial, forms OH-radicals | |
SO2 (mol/m2) | 2018/12/05 − 2022/09/17 | Sources: Combustion of fossil fuels, steel making, fertilizer manufacturing Effects: Respiratory problems Water solubility: High, forms sulphuric acid | |
HCHO (mol/m2) | Sources: Oxidation of hydrocarbons, decomposition of plant residues, automotive exhaust, cigarette smoke Effects: Allergic reaction, certain types of cancer Water solubility: High, forms glycol (CH₂(OH)₂) | ||
CH4 (parts per billion (ppb)) | 2019/02/08 − 2022/09/17 | Sources: Agricultural activities, biomass burning Effects: Ozone Water solubility: almost insoluble | |
Meteorological | Air temperature (K) | 1979/01 − 2020/07 | Temperature of air at 2 m above the surface of land, sea, or inland waters |
Total precipitation (m) | Accumulated liquid and frozen water comprising rain and snow | ||
Wind speed (ms−1) | Northward component of the ‘neutral wind’ | ||
Physicochemical | pH | 2015/06/23 − 2022/09/20 | Measure of hydrogen ion activity |
TDS (mg/l) | Measure of organic and inorganic materials, dissolved in water | ||
Tur (NTU) | Measure of clarity of a liquid | ||
SDD (m) | Measure of light penetration into a water body | ||
chl-α (mg/l) | Measure of the amount of algae growing in a water body | ||
DO (mg/l) | Measure of the degree of pollution by organic matter | ||
EC (mS/cm) | Measure of water capacity to convey electric current |
Variable (unit) . | Availability time period . | Description . | |
---|---|---|---|
Air pollutants | CO (mol/m2) | 2018/06/28 − 2022/09/17 | Sources: Combustion of fossil fuels, atmospheric oxidation of methane and other hydrocarbons, exhausts of motor vehicles Effects: Unintentional and suicidal poisonings, dizziness, confusion, unconsciousness, and death. Water solubility: Poor |
NO2 (mol/m2) | 2018/06/28 − 2022/09/10 | Sources: Burning of fuel, emissions from cars, trucks and buses, power plants, animal manure, precipitation falls across hard surfaces Effects: Acid rain, nutrient pollution, algae blooms, ozone, smog Water solubility: high, forms HNO3, forms nitrogen monoxide (NO) | |
O3 (mol/m2) | 2018/09/8 − 2022/09/17 | Sources: Volatile organic compounds and nitrogen oxides. Chemical plants, gasoline pumps, oil-based paints Effects: Sensitive vegetation, smog Water solubility: partial, forms OH-radicals | |
SO2 (mol/m2) | 2018/12/05 − 2022/09/17 | Sources: Combustion of fossil fuels, steel making, fertilizer manufacturing Effects: Respiratory problems Water solubility: High, forms sulphuric acid | |
HCHO (mol/m2) | Sources: Oxidation of hydrocarbons, decomposition of plant residues, automotive exhaust, cigarette smoke Effects: Allergic reaction, certain types of cancer Water solubility: High, forms glycol (CH₂(OH)₂) | ||
CH4 (parts per billion (ppb)) | 2019/02/08 − 2022/09/17 | Sources: Agricultural activities, biomass burning Effects: Ozone Water solubility: almost insoluble | |
Meteorological | Air temperature (K) | 1979/01 − 2020/07 | Temperature of air at 2 m above the surface of land, sea, or inland waters |
Total precipitation (m) | Accumulated liquid and frozen water comprising rain and snow | ||
Wind speed (ms−1) | Northward component of the ‘neutral wind’ | ||
Physicochemical | pH | 2015/06/23 − 2022/09/20 | Measure of hydrogen ion activity |
TDS (mg/l) | Measure of organic and inorganic materials, dissolved in water | ||
Tur (NTU) | Measure of clarity of a liquid | ||
SDD (m) | Measure of light penetration into a water body | ||
chl-α (mg/l) | Measure of the amount of algae growing in a water body | ||
DO (mg/l) | Measure of the degree of pollution by organic matter | ||
EC (mS/cm) | Measure of water capacity to convey electric current |
Adapted equations for the physicochemical water quality parameters
Variable . | Adapted equations . | Reference . | RMSE . |
---|---|---|---|
Turbidity | 35.121 − 14.489 ((R3)/(R4)) − 0.911 (R8a) | Khattab & Merkel (2014) | 7.65 NTU |
pH | 8.790 + 0.141 (R11) − 0.228 (R3/R4) | Abdullah (2015) | 3.36 |
EC | 422.034 − 1,080.365 (R11) | Abdullah (2015) | 228.7 mS/cm |
chl-α | 54.658 + 520.451 (R2) − 1,221.89 (R3) + 611.115 (R4) − 198.199 (R8a) | Lim & Choi (2015) | 10.15 mg/l |
DO | 10.841 − 0.682 ((R1)/(R8a)) − 0.002 ((R2)/(R8a) + (B2)) | Abdullah (2015) | 2.82 mg/l |
TDS | 120.750 + 264.752(R8a/R1) | Abdullah (2015) | 111.92 mg/l |
SDD | 0.2 + 1.4 ln (R2/R4) | Deutsch et al. (2014) | 0.22 m |
Variable . | Adapted equations . | Reference . | RMSE . |
---|---|---|---|
Turbidity | 35.121 − 14.489 ((R3)/(R4)) − 0.911 (R8a) | Khattab & Merkel (2014) | 7.65 NTU |
pH | 8.790 + 0.141 (R11) − 0.228 (R3/R4) | Abdullah (2015) | 3.36 |
EC | 422.034 − 1,080.365 (R11) | Abdullah (2015) | 228.7 mS/cm |
chl-α | 54.658 + 520.451 (R2) − 1,221.89 (R3) + 611.115 (R4) − 198.199 (R8a) | Lim & Choi (2015) | 10.15 mg/l |
DO | 10.841 − 0.682 ((R1)/(R8a)) − 0.002 ((R2)/(R8a) + (B2)) | Abdullah (2015) | 2.82 mg/l |
TDS | 120.750 + 264.752(R8a/R1) | Abdullah (2015) | 111.92 mg/l |
SDD | 0.2 + 1.4 ln (R2/R4) | Deutsch et al. (2014) | 0.22 m |
Data pre-processing
Google Earth Engine software (Google earth engine 2022) is used for the pre-processing of the satellite images, extracting parameters and sample points from the study area. The maps are prepared by Arc-Map 10.8 (ArcGIS 2022). The satellite images cover a larger part, and to extract the area of interest, i.e., Rawal stream network, GIS clipping tools, are used to select the target boundaries from the image. Once the images are clipped, the dates are matched for the three different datasets. The matching dates, the respective parameters, and the number of samples retrieved are also shown in Table 5. A total of 4,998 samples are extracted from the clipped images with the same latitude–longitude values for each monsoon month of the year (i.e., the matching dates of the month) for the three categories of data, which are air pollutants, meteorological, and physicochemical parameters. This gives a total sum of 84,966 samples for the pre-processed extracted images of the study area. Once the data are compiled, the sample points extracted for each monsoon month (16 months in total) on the matching dates and latitude longitude values are then averaged to get a single dataset for a month.
Samples compiled with matching dates between S5P-L2, S2-MSI, and ERA5-CRP
Year . | Matching dates . | No. of pre-processed images per month . | No. of parameters . | No. of samples per month . |
---|---|---|---|---|
2018 | 2018/07/05, 2018/07/30 2018/08/09, 2018/08/19, 2018/08/24, 2018/08/29 2018/09/03, 2018/09/08, 2018/09/18 | 126 (S5P-L2) 9 (S2-MSI) 9 (ERA5-CRP) | 12 | 4,998 |
Total samples per year | 14,994 | |||
2019 | 2019/06/05, 2019/06/10, 2019/06/25 2019/07/05, 2019/07/15, 2019/07/20 2019/08/04, 2019/08/19, 2019/08/24 2019/09/03, 2019/09/08, 2019/09/18 | 168 (S5P-L2) 12 (S2-MSI) 12 (ERA5-CRP) | 15 | 4,998 |
Total samples per year | 19,992 | |||
2020 | 2020/06/04, 2020/06/14, 2020/06/29 2020/07/09, 2020/07/14, 2020/07/29 2020/08/03, 2020/08/23 2020/09/07, 2020/09/12, 2020/09/17, 2020/09/22, 2020/09/27 | 182 (S5P-L2) 13 (S2-MSI) 13 (ERA5-CRP) | 16 | 4,998 |
Total samples per year | 19,992 | |||
2021 | 2021/06/04, 2021/06/09, 2021/06/14, 2021/06/19, 2021/06/24, 2021/06/29 2021/07/04, 2021/07/24 2021/08/03, 2021/08/13, 2021/08/18 2021/09/02, 2021/09/12, 2021/09/22, 2021/09/27 | 210 (S5P-L2) 15 (S2-MSI) 15 (ERA5-CRP) | 13 | 4,998 |
Total samples per year | 19,992 | |||
2022 | 2022/06/04, 2022/06/09, 2022/06/14, 2022/06/19, 2022/06/24, 2022/06/29 2022/08/13, 2022/08/23 | 112 (S5P-L2) 8 (S2-MSI) 8 (ERA5-CRP) | 13 | 4,998 |
Total samples per year | 9,996 | |||
Total number of samples | 84,966 |
Year . | Matching dates . | No. of pre-processed images per month . | No. of parameters . | No. of samples per month . |
---|---|---|---|---|
2018 | 2018/07/05, 2018/07/30 2018/08/09, 2018/08/19, 2018/08/24, 2018/08/29 2018/09/03, 2018/09/08, 2018/09/18 | 126 (S5P-L2) 9 (S2-MSI) 9 (ERA5-CRP) | 12 | 4,998 |
Total samples per year | 14,994 | |||
2019 | 2019/06/05, 2019/06/10, 2019/06/25 2019/07/05, 2019/07/15, 2019/07/20 2019/08/04, 2019/08/19, 2019/08/24 2019/09/03, 2019/09/08, 2019/09/18 | 168 (S5P-L2) 12 (S2-MSI) 12 (ERA5-CRP) | 15 | 4,998 |
Total samples per year | 19,992 | |||
2020 | 2020/06/04, 2020/06/14, 2020/06/29 2020/07/09, 2020/07/14, 2020/07/29 2020/08/03, 2020/08/23 2020/09/07, 2020/09/12, 2020/09/17, 2020/09/22, 2020/09/27 | 182 (S5P-L2) 13 (S2-MSI) 13 (ERA5-CRP) | 16 | 4,998 |
Total samples per year | 19,992 | |||
2021 | 2021/06/04, 2021/06/09, 2021/06/14, 2021/06/19, 2021/06/24, 2021/06/29 2021/07/04, 2021/07/24 2021/08/03, 2021/08/13, 2021/08/18 2021/09/02, 2021/09/12, 2021/09/22, 2021/09/27 | 210 (S5P-L2) 15 (S2-MSI) 15 (ERA5-CRP) | 13 | 4,998 |
Total samples per year | 19,992 | |||
2022 | 2022/06/04, 2022/06/09, 2022/06/14, 2022/06/19, 2022/06/24, 2022/06/29 2022/08/13, 2022/08/23 | 112 (S5P-L2) 8 (S2-MSI) 8 (ERA5-CRP) | 13 | 4,998 |
Total samples per year | 9,996 | |||
Total number of samples | 84,966 |
Correlation and regression analysis
Pearson correlation analysis is the commonly used technique to measure a linear relationship. The strength and direction of the relationship between two parameters are observed with the change in one variable. The correlation analysis is performed on the collected data samples to account for trends among the physicochemical water quality parameters, meteorological parameters, and the air pollutants. Because of the lack of data on the same available dates, certain parameters including methane, wind speed, and other meteorological features are inaccessible for certain years. Various parameter combinations are taken for a number of years to deduce the important patterns with the correlation analysis.
A regression problem determines the function that can approximate the future values with a high accuracy. Moreover, regression analysis is performed using three types of machine learning algorithms that include LGBM, MLP, and SVM. LGBM (Ke et al. 2017) is a highly efficient gradient-boosting decision tree (GBDT) that is an ensemble model with the decision tree as the base classifier. It uses gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB). The split point is determined with the GOSS for calculating the information gain. The EFB speeds up the training of the decision tree by exclusively bundling the features to fewer features. Thus, with employing EFB and GOSS, LGBM is an efficient GBDT that does not impact the accuracy of the tree. On the other hand, the MLP regressors (Murtagh 1991) are a network of neurons that train using back propagation with a single output neuron. The only main difference between MLP used for classification and regression problems is the output is a single neuron with no activation function, and the loss function is the mean squared error. Similarly, SVM (Awad & Khanna 2015) is a popular machine learning algorithm that is used for modelling complex engineering systems. SVM is based on the concept of the structural risk minimization that finds connections between input and output features. The SVM maps the training data from the input space to a high-dimensional feature space. A hyperplane is constructed in the feature space with a maximum margin. The air pollutants and meteorological features are taken as independent variables, and five of the physicochemical parameters, i.e., TDS, DO, pH, Tur, and EC are individually predicted with the unique feature set.
Water quality index
Over the years, the physicochemical and biological parameters are mostly used to monitor the quality of water that should fall under set standards and guidelines. The occurrence of these parameters beyond the defined limit can be harmful for human health. To express the water quality in some standard form, researchers have come up with a number of water quality indices, which are the most effective tool used to describe the quality of water. WQI classification may also help to analyse the trend of water quality over a period of time and can identify how environmental impact and anthropogenic activities have affected the water quality for drinking or other water consumption.
WAWQI parameters calculated
Variable . | Standard value (Sn) . | Ideal value (Vid) . | Proportionality constant (K) . | Unit weight (wn) . |
---|---|---|---|---|
TDS | 1,000 mg/l | 0 | 1.74003 | 0.00174 |
pH | 8.5 | 7 | 1.74003 | 0.20471 |
EC | 2,000 mS/cm | 0 | 1.74003 | 0.00087 |
SDD | 18 m | 0 | 1.74003 | 0.09667 |
DO | 10 mg/l | 14.6 | 1.74003 | 0.17400 |
Tur | 5 NTU | 0 | 1.74003 | 0.34801 |
chl-α | 10 mg/l | 0 | 1.74003 | 0.17400 |
Variable . | Standard value (Sn) . | Ideal value (Vid) . | Proportionality constant (K) . | Unit weight (wn) . |
---|---|---|---|---|
TDS | 1,000 mg/l | 0 | 1.74003 | 0.00174 |
pH | 8.5 | 7 | 1.74003 | 0.20471 |
EC | 2,000 mS/cm | 0 | 1.74003 | 0.00087 |
SDD | 18 m | 0 | 1.74003 | 0.09667 |
DO | 10 mg/l | 14.6 | 1.74003 | 0.17400 |
Tur | 5 NTU | 0 | 1.74003 | 0.34801 |
chl-α | 10 mg/l | 0 | 1.74003 | 0.17400 |
RESULTS AND DISCUSSION
The aim of this study is to analyse the impact of air pollutants and meteorological factors on the quality of water as the sources of water pollution can be attributed to activities that are either anthropogenic or naturally occurring. Thus, the present study has proposed a hybrid of remote sensing and data mining techniques with a unique set of monitoring data. Firstly, the results of the correlation analysis using the Pearson method are discussed. Next, the results of the interrelationships amongst the three categories of data with the regression framework are presented. Finally, the WAWQI method results on the classification of the study area water quality are discussed.
Correlation analysis
The binned scatter plots for the associations are given as follows: (a) negative correlation between CO and pH for June 2019 with respect to (b) air temperature, (c) wind speed, and (d) total precipitation. Next, (e) negative correlation between NO2 and TDS for July 2019 with respect to (f) air temperature, (g) wind speed, and (h) total precipitation.
The binned scatter plots for the associations are given as follows: (a) negative correlation between CO and pH for June 2019 with respect to (b) air temperature, (c) wind speed, and (d) total precipitation. Next, (e) negative correlation between NO2 and TDS for July 2019 with respect to (f) air temperature, (g) wind speed, and (h) total precipitation.
The binned scatter plots for the associations are given as follows: (a) negative correlation between NO2 and turbidity for August 2018 with respect to (b) air temperature, (c) wind speed, and (d) total precipitation. Next, (e) positive correlation between NO2 and chl-α for September 2018 with respect to (f) air temperature, (g) wind speed, and (h) total precipitation.
The binned scatter plots for the associations are given as follows: (a) negative correlation between NO2 and turbidity for August 2018 with respect to (b) air temperature, (c) wind speed, and (d) total precipitation. Next, (e) positive correlation between NO2 and chl-α for September 2018 with respect to (f) air temperature, (g) wind speed, and (h) total precipitation.
Figure 3(a) shows the data points in the downward trend for the NO2 and Tur from left to right, which shows that these parameters have a negative correlation for the month of August 2018. Figure 3(b)–3(d) shows that with respect to the meteorological parameters, a high air temperature (29 °C), low wind speed (−0.12 ms−1), and a medium precipitation rate of 0.0015 m are observed for August. Figure 3(e) shows an upward trend for the NO2 and chl-α parameters with a high air temperature (28 °C), low wind speed (0.05 ms−1), and low precipitation level (0.00006 m). Thus, the correlation analysis results show that for each monsoon month, there are a number of relationships with the physicochemical parameters that prove the connection between air and water pollution. Furthermore, it is observed that the meteorological parameters tend to exhibit high values in cases of negative correlations and low values in instances of positive correlations among the air pollutants and physicochemical parameters. This suggests that meteorological conditions may play a role in influencing the associations between air pollutants and physicochemical parameters (Stull 2017; Gupta et al. 2018).
(i) (a) The mean CO concentrations of June 2019, (b) NO2 mean concentrations for June 2019, (c) the mean HCHO concentrations for June 2019, (d) the mean wind speed component for June 2019, (e) the WQI of June 2019, and (f) the correlation matrix for the air pollutants, meteorological, and physicochemical parameters for June 2019. (ii) (a) The mean CO concentrations of July 2020, (b) NO2 mean concentrations for July 2020, (c) the mean HCHO concentrations for July 2020, (d) the mean O3 concentrations for July 2020, (e) the WQI of July 2020, and (f) the correlation matrix for the air pollutants, meteorological, and physicochemical water parameters for July 2020.
(i) (a) The mean CO concentrations of June 2019, (b) NO2 mean concentrations for June 2019, (c) the mean HCHO concentrations for June 2019, (d) the mean wind speed component for June 2019, (e) the WQI of June 2019, and (f) the correlation matrix for the air pollutants, meteorological, and physicochemical parameters for June 2019. (ii) (a) The mean CO concentrations of July 2020, (b) NO2 mean concentrations for July 2020, (c) the mean HCHO concentrations for July 2020, (d) the mean O3 concentrations for July 2020, (e) the WQI of July 2020, and (f) the correlation matrix for the air pollutants, meteorological, and physicochemical water parameters for July 2020.
For the month of July, strong correlations are observed with NO2 air pollutant. In 2018, the negative relationships for NO2 with TDS and DO are observed. For July 2019, the negative relationships that are observed include (i) CO with DO and TDS and (ii) NO2 with DO and TDS. The positive relations for 2019 include (i) NO2 with EC and (ii) CO with EC. For July 2020, the negative relations include (i) NO2 with DO, TDS, pH, and Tur, (ii) CO with DO, pH, Tur, and TDS, (iii) O3 with DO, TDS, pH, Tur, and (iv) HCHO with DO, TDS, pH, and Tur. The positive relations for July 2020 include (i) NO2 with chl-α, EC, and SDD; (ii) CO with chl-α, EC, and SDD, (iii) O3 with EC and SDD, and (iv) HCHO with chl-α, EC, and SDD. For July 2021, the negative relations include (i) NO2 with TDS and Tur. The positive relations for July 2021 include NO2 with EC. Figure 4 (ii) shows the air pollutants and meteorological parameters that had the most impact on physicochemical parameters, the WQI, and the correlation matrix for the month of July 2020.
The correlation matrices for the month of August for the years 2018–2021.
For the month of September, the major relationships are observed for CO and NO2 pollutants with DO and EC. In September 2018, the prominent negative correlations include (i) air temperature with DO and TDS, (ii) wind speed with chl-α and EC, and (iii) NO2 with DO, TDS, pH, and Tur. The positive relationships for September 2018 include (i) wind speed with DO and TDS (ii) NO2 with chl-α, EC, and SDD. For September 2019, the negative relations are observed for (i) CO with DO, pH, and TDS, (ii) NO2 with DO, TDS, pH, and Tur, and (iii) SO2 with pH, and Tur. The positive relations for September 2019 include (i) CO with EC, (ii) NO2 with chl-α, EC, and SDD, and (ii) SO2 with SDD and EC. For September 2020, the negative relations include (i) SO2 with EC and (ii) CO with chl-α, EC, and SDD. The positive relations for September 2020 include (i) CO with DO, pH, TDS, and Tur and (ii) SO2 with DO, pH, and TDS. For September 2021, the negative relations include (i) NO2 with DO, TDS, and Tur, (ii) CO with DO, and TDS, (iii) CH4 with DO and TDS, and (iv) SO2 with Tur. The positive relations for September 2021 include (i) NO2 with chl-α and EC, (ii) CO with EC, (iii) SO2 with EC, (iv) HCHO with TDS, and (v) CH4 with chl-α and EC.
As the meteorological data are available up to July 2020, relationships observed between physicochemical and meteorological parameters include the (i) wind speed with chl-α, EC, DO, and TDS and (ii) air temperature with DO and TDS in the months of August and September. The impact of meteorological factors including the wind speed is dependent on the topography and surroundings of the water body. The dynamics of TDS is impacted by the maximum wind speed as it induces sediment resuspension (Evans 1994; Zhang et al. 2017). The wind-driven sediment resuspension is directly linked to the water quality as it increases phosphorus and nitrogen concentrations (Reddy et al. 1996). The meteorological and physicochemical relationships observed are prevailing in the month of August. The average temperature and humidity of Islamabad in August are 28.7 °C and 73%. The meteorological features can intensify the hydrological cycle, i.e., change the precipitation patterns, which can mobilize the transport of pollutants to the water bodies. Table 7 shows the negative and positive correlations discovered in the monsoon months. NO2 and TDS have dominant relation in all months followed by NO2 and DO. Similarly, the positive relation between NO2 and EC is the most common. These relations are observed to be more prevailing in the month of August. This can be related to the fact that Asian monsoon seasons have strong control over the water flow regimes (Mamun et al. 2021).
The prominent relationships amongst air pollutants, meteorological, and physicochemical features for monsoon season (2018–2022)
Sn . | Relation . | Correlation . | Prevailing month . |
---|---|---|---|
1 | NO2 and TDS | Negative | August |
2 | NO2 and DO | June | |
3 | NO2 and Tur | June | |
4 | NO2 and pH | June and September | |
5 | CO and DO | June and September | |
6 | CO and TDS | September | |
7 | NO2 and EC | Positive | August |
8 | NO2 and SDD | June | |
9 | NO2 and chl-α | August and September | |
10 | CO and EC | July and September |
Sn . | Relation . | Correlation . | Prevailing month . |
---|---|---|---|
1 | NO2 and TDS | Negative | August |
2 | NO2 and DO | June | |
3 | NO2 and Tur | June | |
4 | NO2 and pH | June and September | |
5 | CO and DO | June and September | |
6 | CO and TDS | September | |
7 | NO2 and EC | Positive | August |
8 | NO2 and SDD | June | |
9 | NO2 and chl-α | August and September | |
10 | CO and EC | July and September |
The top most prominent relationships are listed in Table 7 that are discovered over the 5-year time span and are discussed in detail as follows:
NO2 and TDS
This relation is most dominant in the month of August. When the NO2 dissolves in water and decomposes, it forms nitric acid (HNO3). Nitric acid forms nitrate salts when it is neutralized. Thus, NO2 exists and reacts either as gases in the air, as acids in droplets of water, or as a salt (NeiláCape & Lammel 1996). These gases, acids, and salts together contribute to pollution effects that have been observed and attributed to acid rain (Singh & Agrawal 2007). Nitrates are an indirect contributor to the change in TDS levels in water as they fuel algal blooms (Paerl 1988; Lopes et al. 2021). The excess amount of nitrates, i.e., the higher concentrations (10 mg/L) in the water gives rise to nutrient pollution that results in the creation of dead zones (Diaz & Rosenberg 2008) in the water, known as hypoxia (Khangaonkar et al. 2018). These zones have very little to no oxygen present that are caused by the algal blooms (red tides) that consume the oxygen during decomposition and thus are dangerous for the survival of aquatic life. These toxic red tides produce a shadow causing the death of other plants. This phenomenon is referred to as eutrophication (Jingzhong et al. 1985), which makes the bottom strata of water unsafe for fish and other aquatic animals or plants. It is also estimated that between 12 and 44% of the nitrogen loading of coastal water bodies comes from the air (Price et al. 1997).
NO2 and DO
This relation is most dominant in the month of June. The abundance of NO2 can lead to nutrient pollution or eutrophication. The sources of NO2 pollution can be agricultural runoff or burning of fuel. With the abundance of such nitrate salts in water, hypoxia occurs which means the low level of oxygen is the cause of deprivation of DO (Correll 1998; Kann & Welch 2005; Arend et al. 2011). Hypoxia gives rise to the overgrowth of algae which leads to low oxygen, and such species sinks and decomposes at the bottom of the sea. The eutrophication phenomenon affects the survival of aquatic life with dead zones or red tides. Thus, the negative correlations observed for this relationship prove the fact that an excess amount of NO2 can lead to a decline of DO in water.
NO2 and Tur
NO2 and turbidity are prominent in the month of June for the 5-year time period. Eutrophic waters generally have low water quality as there are frequent algal blooms and low levels of oxygen. The excess of nutrients results in a high productivity in such waters. This phenomenon can cause an increase in the turbidity and decrease the clarity of water. The water turns to a green or brownish colour, making it hard for aquatic organisms to prey and be on a lookout for predators (Lehtiniemi et al. 2005). Thus, this shows that NO2 and turbidity are linked together whether they have a positive or negative relation depends on the levels or range of the concentrations of the two parameters. Whether a parameter has a positive effect on the other depends on the abundance of that parameter. After a threshold is reached, the effect can be negative (Odum et al. 1979).
NO2 and pH
This relation is noticeable in the months of June and September. Ocean acidification (Gattuso & Hansson 2011) is the by-product of the chain reactions set off by eutrophication. Acidification occurs when carbon dioxide is produced in abundance due to the decomposition of algae and plants, thus decreasing the pH level of water. This process slows the growth of fish which can eventually lead to a reduction in fisheries resulting in smaller harvests (Turner & Chislock 2010). The relation observed for the pH and NO2 parameter is an evidence of the inverse proportionality of the two parameters.
NO2 and EC
The correlation between NO2 and conductivity is significant for August as compared to other monsoon months. Conductivity in a water body is affected by the increase in nitrogen and phosphorus nutrients, caused by eutrophication. The findings of this study are consistent with other studies that relate conductivity with nitrogen (Kløve 2001).
NO2 and chl-α
NO2 and chl-α have a dominant relation in the months of August and September. Nutrients like nitrate and phosphate are responsible for phytoplankton growth and metabolism (Filstrup & Downing 2017; Filstrup et al. 2018). The higher these nutrients are the higher is chlorophyll-α. This strong relationship is evident in Japanese lakes proving the significance established in this research.
NO2 and SDD
The relationship between NO2 and SDD is most dominant in June. As it is established that the nitrate salts in excess amounts give rise to eutrophication, which leads to the increase in turbidity levels and the decrease in the water clarity that makes the water cloudy and difficult for aquatic organisms to prey. Thus, the reduction in light penetration leads to the lowering of the Secchi depth. Lake Tahoe is an example of such occurrence where the eutrophic water caused the decrease in the Secchi depth as observed by Goldman et al. (2003). The relation between SDD and NO2 for this study is positive but that is dependent on the range of the concentrations of both the parameters. Nevertheless, the two parameters have a prominent association.
CO and EC
This association is seen to be more prevailing in the months of July and September. The CO may have an indirect association on the EC property of water as some major sources of CO production are the wetlands and near-coastal regions. The formation of CO in oceans and lakes is attributed to the methanogenic, sulphate-reducing, and acetogenic bacteria (Conrad 1988). Among, the low-molecular-weight carbonyl compounds in natural water bodies, CO is the dominant one, which is the product of the photochemical degradation of dissolved organic matter (DOM) in the sea and is emitted to the atmosphere (Chen et al. 1978; Kieber et al. 1990; Mopper et al. 1991; Weber 2020). The DOM has an impact on the surface water ecosystem affecting the water temperature, biogeochemical process, and water transparency (Solomon et al. 2015). Moreover, the DOM can accelerate hypoxia and eutrophication (Ledesma et al. 2012; Kritzberg et al. 2020), which in turn affects the conductivity in a water body.
(i) The mean CO concentrations, (ii) CO–EC correlation with EC on y-axis and CO on x-axis, (iii) the mean EC concentrations, and (iv) CO–EC correlation with CO on y-axis and EC on x-axis is shown with respect to (a) wind speed and (b) air temperature for September 2018.
(i) The mean CO concentrations, (ii) CO–EC correlation with EC on y-axis and CO on x-axis, (iii) the mean EC concentrations, and (iv) CO–EC correlation with CO on y-axis and EC on x-axis is shown with respect to (a) wind speed and (b) air temperature for September 2018.
Regression analysis
Each model predicts the physicochemical parameter with the air pollutants and meteorological dataset and is evaluated by the RMSE. The RMSE of the three regression models are observed, and the results are shown in Table 8. The correlation analysis shows that the CO and NO2 variables have a significant association with the physicochemical parameters. Thus, considering this a combination, (a) CO and meteorological and (b) NO2 and meteorological are taken to observe the outcomes. The results show that SVM regression model performed best for predicting the DO, EC, and Tur parameters with an RMSE of 0.01477, 0.024616, and 0.026881, respectively. The least RMSE of 0.18330 is achieved with all the six air pollutants and meteorological parameters for estimating the concentrations of TDS with the MLP regressor. For predicting TDS, the second-best RMSE of 0.189 is given for the NO2 and meteorological dataset with the SVM regressor. Similarly, for the pH parameter, the best RMSE of 0.0029 is achieved with either CO or NO2 dataset with MLP. Overall, SVM regressor performed best amongst the three regression models with the RMSE in the 0.015–0.03.
RMSE of the regression models for estimating the concentrations of TDS, DO, pH, Tur, and EC
Variables (independent) . | Model . | RMSE . | ||||
---|---|---|---|---|---|---|
TDS . | DO . | pH . | TUR . | EC . | ||
All air pollutant and meteorological | LGBM | 0.20559 | 0.06112 | 0.0422 | 0.08886 | 0.08579 |
MLP | 0.18330 | 0.01626 | 0.0067 | 0.02776 | 0.072097 | |
SVM | 0.205118 | 0.01477 | 0.00302 | 0.026881 | 0.024616 | |
CO and meteorological | LGBM | 0.27382 | 0.08491 | 0.04981 | 0.1158 | 0.10947 |
MLP | 0.1912 | 0.01562 | 0.002929 | 0.027552 | 0.042880 | |
SVM | 0.189846 | 0.01688 | 0.0030 | 0.030389 | 0.091375 | |
NO2 and meteorological | LGBM | 0.2612 | 0.08691 | 0.0518 | 0.1167 | 0.10571 |
MLP | 0.192496 | 0.015552 | 0.002927 | 0.0284148 | 0.04883 | |
SVM | 0.18984 | 0.016887 | 0.003022 | 0.0304012 | 0.09137 |
Variables (independent) . | Model . | RMSE . | ||||
---|---|---|---|---|---|---|
TDS . | DO . | pH . | TUR . | EC . | ||
All air pollutant and meteorological | LGBM | 0.20559 | 0.06112 | 0.0422 | 0.08886 | 0.08579 |
MLP | 0.18330 | 0.01626 | 0.0067 | 0.02776 | 0.072097 | |
SVM | 0.205118 | 0.01477 | 0.00302 | 0.026881 | 0.024616 | |
CO and meteorological | LGBM | 0.27382 | 0.08491 | 0.04981 | 0.1158 | 0.10947 |
MLP | 0.1912 | 0.01562 | 0.002929 | 0.027552 | 0.042880 | |
SVM | 0.189846 | 0.01688 | 0.0030 | 0.030389 | 0.091375 | |
NO2 and meteorological | LGBM | 0.2612 | 0.08691 | 0.0518 | 0.1167 | 0.10571 |
MLP | 0.192496 | 0.015552 | 0.002927 | 0.0284148 | 0.04883 | |
SVM | 0.18984 | 0.016887 | 0.003022 | 0.0304012 | 0.09137 |
Bold values show the “Best Results Achieved (Lowest RMSE)”.
Water quality index
Floods in Pakistan (2022)
In August 2022, Pakistan experienced severe flooding during the monsoon rains, affecting 81 districts. This catastrophic event had far-reaching consequences, including the loss of lives, property, and agricultural land, impacting approximately 33 million people. One of the significant repercussions of such flooding is the introduction of excessive nutrients, pollutants, and harmful sediments into water bodies. These substances disrupt the delicate balance of the aquatic ecosystem and contaminate the water and other food resources. This process, known as eutrophication, leads to a decrease in the concentration levels of certain physicochemical parameters. Of particular concern is the proximity of the flood-affected districts in Punjab to Rawal Lake. Rawal Lake is a crucial water resource as it is part of the Soan River Basin, which forms the midstream of the three sub-basins of the Indus River. Since all the sub-basins eventually flow into the Indus River, the flooding events can indirectly impact the water health of Rawal Lake. The influx of floodwaters, carrying sediments, pollutants, and nutrients, can potentially deteriorate the water quality of the lake, posing challenges to the sustainability of the ecosystem.
Furthermore, it is essential to recognize that Pakistan heavily relies on the Indus River and its tributaries for surface water resources. Therefore, the consequences of flooding extend beyond immediate areas of impact, as the disruption caused by flooding can affect the overall water health of these crucial water sources. The devastating floods in August 2022 have not only caused immediate devastation but also pose long-term challenges for water management and preservation in Pakistan. Addressing the indirect impacts of flooding on the water quality of Rawal Lake and the broader implications for surface water resources is critical in ensuring the well-being and sustainability of both the environment and the population reliant on these water bodies.
Air and meteorological parameters of Rawal network for the 13th (11 days after the first flood hit Pakistan) and 23rd (3 days after the second flood hit) of August 2022.
Air and meteorological parameters of Rawal network for the 13th (11 days after the first flood hit Pakistan) and 23rd (3 days after the second flood hit) of August 2022.
Comparison of water quality of July (before flood) and August (after flood). (a) WQI for 29 July 2022, (b) WQI for 23 August 2022, and (c) map of Pakistan showing waterways, Rawal watershed, and the flooded areas.
Comparison of water quality of July (before flood) and August (after flood). (a) WQI for 29 July 2022, (b) WQI for 23 August 2022, and (c) map of Pakistan showing waterways, Rawal watershed, and the flooded areas.
In summary, the data and images provide insights into the impact of the floods on Rawal Lake's water health. The concentrations of CO and the classification of water quality indicate notable changes after the flooding events. Understanding these effects is crucial for assessing and mitigating the environmental consequences of such natural disasters on water bodies.
CONCLUSION AND FUTURE WORK
Rawal watershed is surrounded by high-population sites, and traditionally the water quality of the watershed is assessed with the physicochemical parameters that include turbidity, pH, and DO. The data collection is performed with manual grab sampling through field visits. This gives us a limited set of variables that are insufficient to truly determine the quality of the water as these high-population sites have anthropogenic emissions of air pollutants and other meteorological factors that can have an influence on the water health. Thus, the present study collected three categories of data, i.e., (a) physicochemical parameters including pH, TDS, electrical conductivity, DO, SDD, turbidity, and chlorophyll-α from S2-MSI L1C satellite, (b) air pollutants, i.e., CO, NO2, O3, SO2, HCHO, and CH4 are extracted from S5P-L2, and (c) meteorological parameters, i.e., air temperature, wind speed, and total precipitation taken from the ERA5-CRP project for the years 2018–2022. Thus, the environmental factors are taken as influencing parameters with easy and global coverage for sample collection with remote sensing technology to propose a water quality monitoring model with a unique set of features. Pearson's correlation and regression analysis are performed on this new dataset along with the application WAWQI method to rank the water quality. Moreover, the floods of August 2022 are taken as an example to observe the impact of natural calamities on the quality of water.
The correlation analysis shows four prominent negative relationships for physicochemical parameters with the air pollutants for all monsoon months. The top associations include the NO2–TDS with correlation ranging from 0.51 to 0.85 and NO2–DO with correlation ranging from 0.5 to 0.82. This is followed by NO2–Tur with correlation ranging from 0.53 to 0.65 and NO2–pH with correlation ranging from 0.5 to 0.75. These negative correlations are the most common in the month of June for Tur, pH, and DO. The NO2 and TDS relation is dominant in the month of August. Both these months have an ‘Unfit for drinking’ rating with the WAWQI method. The correlations are evident in the eutrophication process that occurs in the water when the nitrate nutrients are in abundance causing a chain reaction of change in Tur, pH, TDS, and DO parameters. Four positive associations are observed with NO2 and CO pollutants and the physicochemical parameters. These include NO2–EC (range, 0.54–0.85), NO2–chl-α (range, 0.53–0.79), NO2–SDD (range, 0.5–0.74), and CO–EC (range, 0.51–0.67). These relationships are the most common in August for NO2–chl-α and NO2–EC. On the other hand, NO2–SDD and CO–EC are prominent in June and July, respectively. These correlations prove the fact that the high amount of nitrate will cause an increase in phytoplankton giving rise to chl-α, decreasing water clarity or SDD and increasing EC. The meteorological features can have a moderate impact on the water quality, but due to the limitation of the available data, the relationships observed for the time period of July 2018 to July 2020 include the (i) wind speed with chl-α, EC, DO, and TDS and (ii) air temperature with DO and TDS in the months of August and September. The wind speed has a positive correlation with DO (range, 0.55–0.60) and with TDS in the range of 0.57–0.71. The relation between the wind speed and TDS is justifiable as it causes resuspension which changes the TDS levels. Moreover, the meteorological features are also observed for the flooding events in August 2022 to observe the negative impact of natural calamities as the flood-affected districts of Punjab lie in close proximity to the Rawal Lake, which can introduce an excess of pollutants and nutrients in the water bodies giving rise to eutrophication and eventually lowering the water quality. The results show that the precipitation level is much lower after the flooding events. However, the wind speed is shifted from low to medium concentration levels and the air temperature is constant. However, the air temperature is much higher for the lake as compared to the other parts of the stream as Rawal watershed has a south-facing slope that is much warmer and prone to having high air temperature.
The regression analysis using machine learning techniques, i.e., LGBM, MLP, and SVM are applied with the air pollutants and meteorological parameters taken as independent variables to predict the concentrations of TDS, pH, turbidity, EC, and DO parameters. MLP gave the best results for TDS and pH with an RMSE of 0.18 and 0.003, respectively. While SVM performed well for DO, turbidity, and EC with an RMSE of 0.015, 0.027, and 0.025, respectively. In addition, the WAWQI method is used for the classification of the water quality for the Rawal stream network that is calculated with the physicochemical water parameters alone and does not consider air pollutants and meteorological factors or other hydrological features (Ahmed et al. 2022b), i.e., slope, aspect, lithology, geology, and land cover/land use. Thus, the WAWQI method seems to be biased over the location and weights assigned to specific water quality parameters.
Therefore, in the future, an improved water quality indexing technique to effectively analyse and interpret the impact of human activities on water quality shall be investigated. This may also include the integration of various natural factors, including topographical parameters such as slope and aspect, as well as hydrological parameters such as lithology, geology, and soil type, that directly or indirectly influence the overall water quality (Ahmed et al. 2022c). For instance, the slope steepness plays a critical role in water contamination by regulating the speed at which rainfall runoff flows down the slope. Steep slopes can result in rapid flow, leading to soil erosion, the swift transport of pollutants and sediments, and disturbance to aquatic ecosystems. Similarly, soil type is another important factor in water pollution. Soils with high infiltration capacity can reduce the amount of runoff, thereby mitigating potential contamination. Hence, in the future, the integration of natural factors and development of advanced indexing techniques based on machine learning technologies will contribute to effective water quality monitoring and management. In addition, studying the impact of natural disasters such as floods, landslides, wildfires, and droughts on water quality can also help identify patterns and comprehend the effects of these disasters on water health. Such analyses can provide valuable insights and inform the development of strategies aimed at protecting ecosystems for the benefit and welfare of humanity.
ACKNOWLEDGEMENT
Research and development of this study were conducted in IoT Lab, NUST-SEECS, Islamabad, Pakistan
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.