Abstract
In addition to the influence of climate change on water availability and hydrological risks, the effects on water quality are in the early stages of investigation. This study aims to consolidate the latest interdisciplinary research in the application of artificial intelligence (AI) in the field of assessment of water quality parameters and its prediction. This research paper specifically explores the intricate relationship between climate change and water quality parameters at Sandia station, situated within the Narmada basin in Central India. As global climatic patterns continue to shift, the repercussions on water resources have gained prominence. In this work, electrical conductivity is predicted using the KERAS data processing environment on TensorFlow. The root-mean-square error (RMSE), coefficient of determination (R2), Nash–Sutcliffe efficiency (NSE), etc. are calculated between observed and predicted values to assess the model performance. A total of ten models are developed depending upon the input geometry from past monthly timelines. The results indicate that model 8, with ten inputs, performs the best based on the R2 value of 0.889. These results indicate that AI can be very helpful in analyzing the possible threats in the future for drinking water, livestock feeding, irrigation, and so on.
HIGHLIGHTS
Long short-term memory (LSTM) models used for prediction of water quality.
Unique study of electrical conductivity in the central part of India.
Study of impacts of climate change on India based on water quality parameters.
Study is carried out at Sandia Hydrological site.
Five performance efficiency parameters used to calculate predicting capability.
INTRODUCTION
The primary consequences of climate change on water availability manifest as floods and droughts, which are highly devastating extreme events (Goyal & Surampalli 2018). In addition to these quantitative shifts, climate change also exerts an influence on the quality of surface water (Tiwari et al. 2022). For instance, it is apparent that drought conditions can inevitably trigger alterations in the concentration and quality of surface or subsurface water, occasionally resulting in constraints on water supply (Delpla et al. 2009; Mishra & Nagarajan 2010; Mohammed & Scholz 2018; Kumar et al. 2021). While the deterioration of water quality can directly impact the extraction of surface water, wells might need to be discontinued because of concerns about both groundwater quality and safety issues associated with flooding risks (Doocy et al. 2013). However, despite the general acknowledgment of these realities, only a limited number of scientific studies have been previously available that delved into the ramifications of climate change on modifications to water quality. The overarching observation is that the decline in drinking water quality, prompted by the context of climate change, accentuates the prevalence of situations that pose potential health risks (Rai & Singh 2015; Dile et al. 2018; Van Allemann et al. 2019). Central India is also vulnerable to the extreme conditions of climate and anthropogenic activities that not only degrade the quality of potable water but also pose a threat to the flora and fauna. Narmada River is in the central part of India. The Narmada basin, also located in Central India, is a vital water resource region that sustains the livelihoods of millions while supporting diverse ecosystems. The Narmada River serves as a lifeline, providing water for irrigation, domestic use, and industrial activities (Bhagwat & Maity 2014). However, the Narmada basin water resources are under constant increasing pressure due to the mounting impact of climate change. With the rise in global temperature and the emergence of unpredictable weather patterns, an imbalance has been created between natural systems that dramatically affects the water quality of the river (Corwin & Lesch 2005; Mauser & Bach 2009; Kumar 2016; Heddam & Kisi 2018).
Electrical conductivity (EC) of water is generally referred to as the ability of the water body to allow the passage of electric current through it. The health of water bodies is affected by the quantity of EC present (Pal et al. 2015). EC helps monitor the salinity of irrigation water, which is added to the soil and affects the health of plants and crop yield (Corwin & Lesch 2005). Water stress conditions are observed in plants because of high salinity content. EC also helps in detection of the mixing of ground water and surface water and the movements of water molecules. Sources of water can also be tracked down with the help of this tool and it helps in understanding the flow hydraulics (Lau et al. 2019). Subsurface materials have some properties that can also be indicated with the help of EC. In addition to salinity, EC can also indicate the presence of contaminants like heavy metals or pollutants in water (Gomaa et al. 2020). Overall, EC is a versatile parameter that offers valuable information about the electrical properties of materials, aiding in various scientific, industrial, and environmental applications. But the measurement and prediction of EC is quite challenging and efficient models are difficult to train in data-scarce regions of South Asian countries.
Furthermore, the specific influence of climate changes on water quality in the Narmada basin remains understudied. This study endeavors to bridge this gap by investigating how climate change impacts EC as a water quality parameter at Sandia station. The findings of this study hold implications not only for the Narmada basin but also for broader water resource management strategies in the face of climate change. By unraveling the complexities of this relationship, the paper contributes to a deeper understanding of the potential challenges posed by climate-induced alterations in water quality, ultimately aiding in the formulation of informed mitigation and adaptation strategies. Also, the paper has examined how alterations in the properties of the physicochemical parameters affect the quality of water resources like rivers and lakes. Subsequently, the discussion delves into the projected implications for the production of potable water and the quality of the supplied water. In this paper, a novel method is proposed to predict the EC on a monthly scale by using the Keras software which uses TensorFlow environment packages. Keras is an application programming interface (API) designed for human beings, not machines. It follows best practices for reducing cognitive load: it offers consistent and simple APIs, minimizes the number of user actions required for common use cases, and provides clear and actionable error messages. Keras also gives the highest priority to crafting great documentation and developer guides. The models are trained and tested first. After that, these trained models are used for predictions. A total of ten models were analyzed, each having between three and 12 hidden layers. The models' performances are then compared based on various efficiency parameters such as the root-mean-square error (RMSE), and so on. Overall, these models are promising tools in the field of predicting EC on a pre-monthly basis and to make sound and informed decisions based on this knowledge. The reader will gain new insights with this work about the application of artificial intelligence (AI) in the field of water quality parameter prediction with the help of the long short-term memory (LSTM) technique, which is a new and innovative method that has very good accuracy.
Study area
Data collection
Electrical conductivity descriptive statistics for Sandia hydrological site (μmho/cm)
Mean . | Standard error . | Median . | Mode . | Standard deviation . | Sample variance . |
---|---|---|---|---|---|
293.7143 | 4.437902 | 277 | 227 | 96.92524 | 9,394.502 |
Mean . | Standard error . | Median . | Mode . | Standard deviation . | Sample variance . |
---|---|---|---|---|---|
293.7143 | 4.437902 | 277 | 227 | 96.92524 | 9,394.502 |
Electrical conductivity data statistics for Sandia hydrological site (μmho/cm)
Kurtosis . | Skewness . | Range . | Minimum . | Maximum . | Confidence level (95.0%) . |
---|---|---|---|---|---|
2.096373 | 1.189028 | 682 | 98 | 780 | 8.720301 |
Kurtosis . | Skewness . | Range . | Minimum . | Maximum . | Confidence level (95.0%) . |
---|---|---|---|---|---|
2.096373 | 1.189028 | 682 | 98 | 780 | 8.720301 |
Electrical conductivity monthly data from the Sandia hydrological site, Narmada River.
Electrical conductivity monthly data from the Sandia hydrological site, Narmada River.
Monthly distribution of electrical conductivity data from the Sandia hydrological site, Narmada River, classified into observed, trend, seasonality, and residual components.
Monthly distribution of electrical conductivity data from the Sandia hydrological site, Narmada River, classified into observed, trend, seasonality, and residual components.
The EC monthly data at the Sandia hydrological site on the Narmada River has been decomposed into four distinct components: observed, trend, seasonality, and residual. The observed component represents the actual measured EC values for each month. It includes all the variations and fluctuations in the data, which can be influenced by a variety of factors such as weather, environmental changes, and human activities. The trend component captures the long-term changes or patterns in the EC data. It indicates that there is a consistent increase in conductivity over time.
The seasonality component reveals regular fluctuations that occur as a result of seasonal factors such as rainfall, temperature, or agricultural practices. Lastly, the variability present in data that can neither be associated with trend nor with seasonality is represented as residuals. The residuals incorporate random fluctuations and unexplained variations, which could be triggered by errors occurring during measurement of EC or by unpredictable factors. The decomposition of the data into these four components helped the research work in better understanding the fundamental patterns and causes of variation in EC at the Sandia hydrological site, which in turn helped in making better-informed decisions and identifying potential environmental trends or issues.
METHODOLOGY
The Supplementary Material presents a flowchart of the research methodology. The monthly water sample was picked up from the Sandia Ghat site and analyzed by CWC New Delhi, India. The water sample was collected in the first week of the month. Data about water quality for the 40 years from 1981 to 2020 were acquired from the divisional office of CWC Sandia, Madhya Pradesh, India. For this research work, as mentioned previously, the API Keras was used for water quality parameter predictions. It is an open-source framework written in Python for deep learning that is now a part of the TensorFlow platform. Keras was originally developed as independent from TensorFlow, but later on merged with it, starting from version 2.0. Researchers and developers can build and experiment with neural networks with ease due to its user-focused approach and user-responsive functionalities. LSTM, which is a type of recurrent neural network (RNN) architecture, is also used in this study. LSTMs are a specific variant of RNNs designed to address some of the limitations of traditional RNNs when dealing with long sequences and capturing long-term dependencies in data. The LSTM model has been used for groundwater hydrology (Zhang et al. 2018; Afzaal et al. 2019; Bowes et al. 2019; Supreetha et al. 2020; Wunsch et al. 2021) and rainfall–runoff forecasting (Hu et al. 2018; Kratzert et al. 2018). The literature pertaining to the use of LSTM for water quality parameters is not much in trend.
Variation graphs for the ten LSTM models for: (a) R2; (b) RMSE values; (c) NSE values; (d) MAE values; and (e) MAPE percentages.
Variation graphs for the ten LSTM models for: (a) R2; (b) RMSE values; (c) NSE values; (d) MAE values; and (e) MAPE percentages.


R2 measures how well the independent variables account for the variability in the dependent variable. R2 values range from 0 to 1 (0%–100%). The different R2 values have the following indications:
➢ R2 = 0: This means that the independent variables do not explain any of the variability in the dependent variable, and the regression model does not fit the data at all.
➢ R2 = 1: This indicates that the independent variables perfectly explain all the variability in the dependent variable, and the regression model fits the data perfectly.
➢ 0 < R2 <1: In practice, R2 values typically fall between 0 and 1.



➢ When NSE equals 1, it indicates a flawless model fit where the simulated values precisely align with the observed values.
➢ If NSE is greater than 0, it signifies that the model is delivering predictions superior to the average of the observed values.
➢ When NSE equals 0, it suggests that the model's performance is equivalent to using the mean of the observed values, with no improvement.
RESULTS AND DISCUSSION
A total of ten models were applied for predicting the EC of the Narmada River basin at Sandia hydrological site. Table 3 and Figures 5 represents the fluctuations in various performance metrics of the models during the prediction process. The input shape represents the number of months (in the middle), which was used as input for the network. The batch shape indicates the number of months for which the prediction was achieved for the given LSTM network. A higher R2 value indicates a better fit. In this table, the R2 values range from 0.748 to 0.889. Model 8 with the highest R2 value (0.889) appears to have the best fit. Golian et al. (2015) employed R2 values for discharge prediction goodness-of-fit criteria and a highest value of 0.86 was achieved. Salehnia et al. (2017) also used R2 value as a reliable indicator for drought indices comparison.
Variation of different performance parameters of the models during prediction
S No. . | Input shape . | Batch shape . | R2 . | RMSE . | NSE . | MAE . | MAPE (%) . |
---|---|---|---|---|---|---|---|
1 | (1–3–1) | (1–12–1) | 0.845 | 129.19 | 0.32 | 109.06 | 15 |
2 | (1–4–1) | (1–12–1) | 0.878 | 109.41 | 0.51 | 94.14 | 12 |
3 | (1–5–1) | (1–12–1) | 0.833 | 141.82 | 0.18 | 115.96 | 16 |
4 | (1–6–1) | (1–12–1) | 0.825 | 107.75 | 0.52 | 90.86 | 12 |
5 | (1–7–1) | (1–12–1) | 0.872 | 98.72 | 0.60 | 86.92 | 11 |
6 | (1–8–1) | (1–12–1) | 0.748 | 139.51 | 0.20 | 116.62 | 17 |
7 | (1–9–1) | (1–12–1) | 0.842 | 113.56 | 0.47 | 96.44 | 13 |
8 | (1–10–1) | (1–12–1) | 0.889 | 92.43 | 0.65 | 83.04 | 11 |
9 | (1–11–1) | (1–12–1) | 0.835 | 89.91 | 0.67 | 78.77 | 11 |
10 | (1–12–1) | (1–12–1) | 0.853 | 80.25 | 0.74 | 72.27 | 9 |
S No. . | Input shape . | Batch shape . | R2 . | RMSE . | NSE . | MAE . | MAPE (%) . |
---|---|---|---|---|---|---|---|
1 | (1–3–1) | (1–12–1) | 0.845 | 129.19 | 0.32 | 109.06 | 15 |
2 | (1–4–1) | (1–12–1) | 0.878 | 109.41 | 0.51 | 94.14 | 12 |
3 | (1–5–1) | (1–12–1) | 0.833 | 141.82 | 0.18 | 115.96 | 16 |
4 | (1–6–1) | (1–12–1) | 0.825 | 107.75 | 0.52 | 90.86 | 12 |
5 | (1–7–1) | (1–12–1) | 0.872 | 98.72 | 0.60 | 86.92 | 11 |
6 | (1–8–1) | (1–12–1) | 0.748 | 139.51 | 0.20 | 116.62 | 17 |
7 | (1–9–1) | (1–12–1) | 0.842 | 113.56 | 0.47 | 96.44 | 13 |
8 | (1–10–1) | (1–12–1) | 0.889 | 92.43 | 0.65 | 83.04 | 11 |
9 | (1–11–1) | (1–12–1) | 0.835 | 89.91 | 0.67 | 78.77 | 11 |
10 | (1–12–1) | (1–12–1) | 0.853 | 80.25 | 0.74 | 72.27 | 9 |
Lower RMSE values indicate better predictive accuracy. Model 10 with the lowest RMSE (80.25) is a strong contender for the best model. RMSE has also been used to predict complex wavelet neural networks for daily river-flow predictions (Krishna et al. 2011). Nourani et al. (2012) used RMSE for calibration and verification of genetic programming neural networks for highlighting best results for different types of watersheds. Other researchers have also used RMSE for flood forecasting (Golian et al. 2015), change in river topology and vegetation (Watanabe & Kawahara 2016), streamflow predictions (Tegegne et al. 2017), and hazard mapping (Kumar et al. 2022), among others.
The NSE value that lies above 0.5 can be considered as an optimal one (Quansah et al. 2021). The resulting values of NSE lie between 0.18 and 0.74. Model 3 with the least value of NSE, 0.18, does not show much improvement over the observed value. Model 10 with a maximum value of 0.74 performs best followed by Models 9 and 8. Fares et al. (2014) obtained values of NSE between 0.65 and 0.74 and found the model's general performance rating to be good. Jiang et al. (2015) also obtained values of NSE between 0.71 and 0.90 and found it a satisfactory parameter for accessing performance criteria.
MAE is a great tool to understand the uncertainty in the prediction of a model (Freedman et al. 2014). Lower MAE values indicate better performance, as they represent how close the model's predictions are to the actual values. Model 10 has the lowest MAE of 72.27, indicating it has the best predictive performance among the ten models. Model 9 also has a relatively low MAE of 78.77, showing good predictive accuracy. Models 8 and 5 have MAE values in the 80s, indicating reasonable predictive performance. Models 2, 4 and 7 have MAE values in the 90s, indicating decent but slightly less accurate predictions compared with the previous models. Models 3 and 6 have higher MAE of about 116, suggesting that they have less accurate predictions. In summary, Models 10 and 9 perform the best in terms of prediction accuracy, while Models 3 and 6 perform the worst. MAE was also used by previous researchers for rainfall–runoff simulation (Hu et al. 2018), monthly stream flow (Bisht et al. 2020), hourly flood predictions (Tiwari & Chatterjee 2010), and so on.
Lower MAPE values indicate better predictive accuracy. LSTM Model 10 has the lowest MAPE of 9%, indicating it has the best predictive accuracy among the ten models. This means that, on average, its predictions are only off by 9% from the actual values in terms of percentage error. Models 5, 8, and 9 all have MAPE values of 11%, which also indicates very good predictive accuracy. These models are consistently accurate, with predictions deviating by only 11% from actual values. Accurate precision on a relative scale is demonstrated by Models 2 and 4 with MAPE values of 12%, which is well within the acceptable limits. As far as Models 7 and 3 are concerned, they have MAPE values of 13% and 16%, respectively, which show that these models are slightly less precise than the preceding ones but may still be considered acceptable depending on the application of the EC prediction. This shows that predictions deviating by a higher percentage from the actual values make the models less reliable for monthly prediction. To sum up, Model 10 has emerged with best predictive accuracy, followed closely by Models 5, 8, and 9, which also executed well. Moderate accuracy is achieved with Models 2 and 4, whereas Models 7 and 3 are less accurate than the previous ones. Wilby et al. (2003) uses MAPE for performance evaluation of conceptual models in runoff analysis.
In this study, the best-performing model is Model 8 with an R2 value of 0.89, which is extremely good compared with the various other models proposed by the researchers in recent times. Ubah et al. (2021) used artificial neural network (ANN) models (feed-orward multilayer neural networks) for forecasting the EC with the best-performing model having an R2 value of 0.95. Other researchers also used the feed-forward ANN and statistical methods for EC prediction with an R2 value of 0.83 (Satish et al. 2022). Sunori et al. (2023) used a regression tree model known as a general regression neural network (GRNN) for prediction of EC and found it satisfactory for forecasting.
CONCLUSION
In this paper, a comprehensive study was conducted to understand the climate-change impacts on parameters associated with water quality with a special focus on EC at the Sandia hydrological observation site located on the Narmada River basin, Central India. It aimed at helping to bridge the gap between understanding the intricate relationship of water quality with climate change using artificial neural networks as a supporting tool. EC is a crucial parameter to affect the quality. For prediction of EC in the context of climate change, LSTM models were used in this study. A total of ten models based on input parameters were introduced and recently used for water quality prediction work. A total of five performance criteria were used, consisting of RMSE, coefficient of determination (R2), NSE, MAE, and MAPE. Among the models, Model 8 performs the best with the highest R2 value of 0.889, demonstrating excellent predicting capability. With the lowest MAE of 72.27 and MAPE of 9%, Model 10 also performs well. The MAPE value of 11% exhibited by Models 5, 8, and 9 also shows consistent and better accuracy. These results show the performance efficiency of the LSTM model in predicting the EC for the observation site is quite acceptable for decision-making and assessing the impact of climate change. One of the limitations of the model is that it is as accurate as the data obtained from the site. Because the collection of data and testing are manual, there may be some discrepancies in the results produced. Further studies can be carried out for different parameters, different neural networks, and data-processing tools like wavelets.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.