ABSTRACT
This study investigates the application of an artificial neural network (ANN) framework for analysing water pollution caused by solids. To address the challenge, we develop a convolutional neural network trained under a transfer learning strategy with AlexNet. We feed the network with pictures of samples of water with low, medium, and high concentrations of total suspended solids and achieve a high validation accuracy of 99.85% with a precision of 99.85%, which is highly competitive with other approaches. Our model demonstrates significant improvements in speed and reliability over conventional image processing methods, effectively predicting pollution levels. Our findings suggest that ANNs can serve as an effective tool for real-time monitoring and management of water pollution, facilitating proactive decision-making and policy formulation.
HIGHLIGHTS
Artificial neural networks are used to estimate water pollution caused by total suspended solids.
Samples are prepared with different concentrations of suspended solids and filmed to obtain images.
A convolutional neural network with a transfer learning strategy is trained and tested for analysing its performance.
Low, medium, and high concentrations of total suspended solids in water can be identified with high accuracy and modest computer resources.
INTRODUCTION
Water pollution caused by solids, including suspended and dissolved particles, poses critical challenges to environmental sustainability, public health, and economic development (Khan et al. 2022). Solids in water, particularly suspended solids, are an important pollutant affecting rivers and lakes, impacting their usability for activities such as energy production, agricultural irrigation, fishing, and recreation. High concentrations of suspended solids in water can lead to adverse ecological effects, including algal blooms, extreme turbidity, and reduced light penetration, which in turn inhibits photosynthesis within aquatic systems (Adjovu et al. 2023). In addition, total suspended solids (TSS) can carry a wide range of pollutants, including heavy metals (Aradpour et al. 2021), microorganisms, and nutrients (Nodefarahani et al. 2020; Naderian et al. 2024), which could severely impact aquatic ecosystems and human health. This parameter is essential in evaluating urban water systems because suspended particles can absorb contaminants, like hydrocarbons, and organic matter, making high TSS levels a potential indicator of contamination (Rossi et al. 2005). Furthermore, elevated TSS concentrations often indicate possible organic or faecal contamination in water (Howard et al. 2004). Understanding and mitigating these impacts requires accurate monitoring and prediction of pollution levels. For instance, in Mexico, the Ministry of Environment and Natural Resources has proposed the NOM-001-SEMARNAT-2021 standard (National Advisory Committee for Standardization of the Environment and Natural Resources of Mexico 2021) for setting parameters to analyse the quality of contaminated water, including TSS. To make water quality information accessible to the public, the Ministry of Environment and Natural Resources also proposed a system for estimating water quality based on three different contamination parameters: biochemical oxygen demand, chemical oxygen demand (COD), and TSS. TSS is among these parameters because elevated TSS levels diminish the ability of water bodies to support diverse aquatic life. These parameters help identify conditions ranging from nearly natural states, unaffected by human activity, to water showing clear signs of wastewater discharges and severe deforestation (Ministry of Environment and Natural Resources Mexico 2011).
Traditional methods for identifying and classifying TSS in water samples include using a Gooch crucible with a 2.4-cm glass fibre filter, Buchner funnel, membrane filter, and Gooch crucible asbestos (Smith & Greenberg 1963). While effective, these methods require specialised laboratory facilities, trained personnel, as well as significant time and financial resources. These limitations are particularly challenging for regions with limited access to advanced water quality laboratories, thus making it difficult to monitor and manage the treatment of TSS in water. This underscores the need for innovative approaches to enhance the efficiency and accuracy of water quality assessment (Kim et al. 2024). An artificial neural network (ANN) framework offers a promising solution to these challenges, especially considering that water quality data are inherently complex, with numerous interdependent variables such as turbidity, TSS, and COD. An ANN excels in managing and interpreting complex, non-linear relationships within large datasets, providing more accurate predictions than traditional linear models (Noori et al. 2022). Such a framework is capable of learning from historical data to predict future pollution levels, which is crucial for proactive environmental management (Schauser & Steinberg 2001). Furthermore, once trained, ANN models can rapidly process and analyse data, significantly reducing the time required for water quality assessment, a beneficial feature for real-time monitoring and decision-making (Palani et al. 2008). These frameworks can be easily scaled to incorporate additional data sources and parameters, enhancing their applicability across different water bodies and pollution scenarios. ANNs continuously improve their performance as more data become available and thus remain relevant and accurate over time, adapting to changes in pollution patterns and environmental conditions (Noori et al. 2013). An additional benefit of their use is that these frameworks can be seamlessly integrated with Internet of Things (IoT) devices and remote sensing technologies. For example, Lloret et al. (2011) developed a wireless sensor network where each sensor node captures images from the field and uses image processing to detect issues in the leaves, such as deficiencies, pests, or diseases. Once detected, the sensor sends a notification to the farmer via the network. Similarly, Sabari et al. (2020) designed a system for the continuous monitoring of water quality parameters based on the concept of IoT, where the proposed model utilises various sensors to measure the required parameters. In another example, Patil et al. (2019) designed a system that informs users about upcoming floods through notifications and alerts. The system uses sensors to gather flood-related data and provides information about nearby safe locations for evacuation.
The integration of ANNs with IoT devices and remote sensing technologies enables continuous, automated monitoring of water quality, providing real-time data that enhance the responsiveness and effectiveness of pollution control measures. By providing reliable and detailed insights into pollution dynamics, ANNs support informed decision-making done by policymakers, environmental agencies, and stakeholders. This leads to better resource allocation, targeted pollution control strategies, and, ultimately, more effective environmental protection.
In the field of water management, several ANN models have been primarily used to characterise both the quantity and quality of water (Farmaki et al. 2010; Najafzadeh et al. 2021). One of the most common applications of ANNs in water monitoring is through remote sensing (Wagle et al. 2020), where satellite images are used to predict different water levels and the evolution of contaminants (Agrawal & Petersen 2021). Another significant application of ANNs is in the modelling, particularly in wastewater treatment plants (WWTPs), as well as in the control process within these facilities (Hamed et al. 2004). Also, ANNs have been widely utilised for predicting water quality parameters through various machine learning (ML) methods (Haghiabi et al. 2018; Kim et al. 2024).
In this article, we present a novel approach to assess water quality by developing a convolutional neural network (CNN) capable of predicting high, medium, and low pollution levels based on TSS concentrations. This method offers a cost-effective, rapid, and non-invasive alternative for water quality monitoring, enabling the classification of water quality using a single image captured with a smartphone camera. Early studies under similar conditions already point towards the benefits of using CNN in the classification of water contamination by TSS (Lopez-Betancur et al. 2022). The remainder of this paper is organised as follows: In the Methods section, we present in detail the sample preparation, the experimental procedure, the CNN development for classifying water quality based on solids, as well as its training and calibration phases. In the Results section, we report the findings of our study based on the followed methodology. In the Discussion section, we summarise the findings of this work and its impact in environmental science. Finally, we present an outlook for future work in the Conclusions section.
METHODS
Samples
To carry out this research, 30 water samples were prepared with different levels of TSS as pollutants. These samples were obtained by selecting clays with particles sizes smaller than 60 μm, since TSS are considered matter with particle diameter less than 62 μm (Bilotta & Brazier 2008). This process involved sieving the material through a 60 μm mesh sieve to achieve material homogenisation. It is important to note that the selected clays are primarily composed of iron and aluminium, the most common types of clays found in urban environments (Perry & Taylor 2009). This selection aims to mimic natural contamination in urban water sources.
Samples supplies and data image obtained for the cropped image from the video recording: (a) clays used to make the samples, (b) container of 5 × 5 × 5 cm3, (c) a sample with the translucent container, (d) cropped image associated with a high TSS concentration, (e) cropped image associated with a medium TSS concentration, and (f) cropped image associated with a low TSS concentration.
Samples supplies and data image obtained for the cropped image from the video recording: (a) clays used to make the samples, (b) container of 5 × 5 × 5 cm3, (c) a sample with the translucent container, (d) cropped image associated with a high TSS concentration, (e) cropped image associated with a medium TSS concentration, and (f) cropped image associated with a low TSS concentration.
The number of samples varied across different classes: 4 samples for the low class, 10 samples for the medium class, and 16 samples for the high class. This variation is due to the different ranges of TSS concentrations within each class. The low class exhibited minimal variability in TSS concentrations, the medium class displayed a broader range of concentrations, and the high class had the widest range of TSS concentrations.
Experimental procedure
Scheme of experimental setup (see text). Samples are put in a transparent cubic container on a magnetic stirrer, illuminated laterally using an 18-inch Ring Light Edge-Lit LED. Images were captured with an iPhone 12.
Scheme of experimental setup (see text). Samples are put in a transparent cubic container on a magnetic stirrer, illuminated laterally using an 18-inch Ring Light Edge-Lit LED. Images were captured with an iPhone 12.
CNN development
In the domain of deep learning, various types of algorithms exist, with CNNs being among the most widely used (Baek et al. 2020). These models offer several advantages: (1) they reduce both time and costs (e.g., material and labour costs), (2) enable forecasting across different system phases, (3) simplify complex systems to enhance comprehension, and (4) predict target values even in situations where site access is challenging (Barzegar et al. 2020). Therefore, in this study, we propose the use of a CNN.
The primary task of a CNN is classification. Initially, it performs feature extraction from the input image. These features are then fed into a neural network (NN), producing output probabilities that indicate the classification of the input image into a specific category (Ferentinos 2018). However, training a CNN from scratch requires two main conditions: (1) access to a large dataset with labelled data and (2) significant computational and memory resources (Morid et al. 2021).
An alternative to training CNNs from scratch is the transfer learning (TL) strategy, which allows leveraging knowledge acquired from large datasets of non-environmental data to address specific environmental challenges, such as water quality analysis. Specifically, parameters from well-trained CNN models on non-environmental datasets, which contain diverse images (e.g., ImageNet models like AlexNet (Yuan & Zhang 2016), VGGNet (Purwono et al. 2023), and ResNet (Wu et al. 2019)), can be transferred to tailor a CNN model for analysing water quality.
AlexNet architecture. (a) Training with big data ImageNet (input image size 244 × 244 × 3). (b) Five convolutional layers with ReLu as activation function and max-pooling to size reduce. (c) Three FC layers and the last layer uses SoftMax as activation function to the classifier.
AlexNet architecture. (a) Training with big data ImageNet (input image size 244 × 244 × 3). (b) Five convolutional layers with ReLu as activation function and max-pooling to size reduce. (c) Three FC layers and the last layer uses SoftMax as activation function to the classifier.
To carry out the TL process used in our work, we make use of Python as the programming language, because it has a large number of packages that facilitate the use of ML algorithms (Raschka & Mirjalili 2019), and it is also free to use. The entire procedure was conducted using the Jupyter Notebook environment within the Visual Studio Code Integrated Development Environment. All the computations were performed on a computer equipped with an 8th generation Intel i5 processor and 8 GB of RAM, which provides relatively modest computational resources. The well-trained AlexNet model was obtained from the Torchvision package, which is a Python package that includes several pre-trained network models. The modification of the AlexNet CNN involved removing the classifier, which initially had 1,000 different types of classes, and we proposed a three-class water quality classifier based on different concentrations of TSS: high, medium, and low. Subsequently, the parameters obtained from the pre-training of AlexNet were unfrozen to adjust the weights of the network model.
In the ML realm, one of the most important elements to define are the hyperparameters of the ANN. These consist of configurations of the network that affect its structure, learning, and performance. Hyperparameters differ from parameters in that they are not automatically modified or adjusted during training; instead, they must be specified beforehand (Yu & Zhu 2020). The hyperparameters established in this work were the optimisation algorithm, learning rate, batch size, and number of epochs. Below, we describe each of these.
The optimisation process in artificial intelligence (AI) involves identifying optimal parameters that improve the performance of a CNN model. One of the classic methods for this process is the stochastic gradient descent (SGD) optimisation method (Newton et al. 2018). However, tuning the learning rate of SGD, as a hyperparameter, is often challenging because the magnitudes of different parameters vary significantly and need to be adjusted throughout the training process (Zhang 2018). Therefore, in our study, we used the Adam optimiser as it is an efficient stochastic optimisation method that only requires first-order gradients and has low memory requirements. This method computes individual adaptive learning rates for different parameters based on estimates of the first and second moments of gradients (Kingma & Ba 2014) and iteratively finding values that minimise the error (loss).


One of the crucial hyperparameters is the number of epochs used during training. An epoch entails presenting each sample in the training dataset with an opportunity to update the model internal parameters. Each epoch consists of one or more batches. As each sample or batch is processed through the network, the error is computed, and the back propagation (BP) algorithm is applied to adjust the weights and biases of the network. During BP, the error is propagated backward through the network, gradients of the weights are calculated with respect to the error, and these gradients are used to update the weights, aiming to minimise the error. This process facilitates model learning and performance enhancement. Although the number of epochs is typically large, our approach employs 50 epochs.
Conversely, the batch size determines the number of samples propagated through the CNN and used to update model parameters in each iteration until the training is complete. Larger batch sizes facilitate greater computational parallelism and can often enhance performance. However, they require more memory and can introduce latency during training. For the creation of the CNN model, a total of 6,515 images with varying suspended solid concentrations were used, with 5,862 images allocated for the training set and 653 images for validation. Consequently, given the total number of images available, we opted for a batch size of 50 images.
Hyperparameters used in the development of the CNN model
Hyperparameters . | Value . |
---|---|
Algorithm optimiser | Adam |
Learning rate | 0.000005 |
Batch size | 50 |
Epoch | 50 |
Hyperparameters . | Value . |
---|---|
Algorithm optimiser | Adam |
Learning rate | 0.000005 |
Batch size | 50 |
Epoch | 50 |
Representation of the TL process in the CNN model used: (a) well-trained AlexNet CNN, (b) removing last layer classifier for a new task, (c) reuse of the pre-trained model, (d) new training dataset of water samples and modification of the CNN, and (e) classification result.
Representation of the TL process in the CNN model used: (a) well-trained AlexNet CNN, (b) removing last layer classifier for a new task, (c) reuse of the pre-trained model, (d) new training dataset of water samples and modification of the CNN, and (e) classification result.
Validation metrics
For evaluating the proposed CNN, the most common metrics are based on the prediction of four possible outcomes: true positives (), true negatives (
), false positives (
), and false negatives (
) (Seliya et al. 2009). In this study, we use accuracy, precision, recall, F-measure, receiver operating characteristic (ROC), and confusion matrix as validation metrics. Below, we describe the function of each metric.
In our study, we set .
The ROC curve illustrates how a classifier balances between correctly identifying and
. It provides a comprehensive view of the classifier effectiveness, independent of class distribution or error costs (Davis & Goadrich 2006). The area under the ROC curve (AUC) represents the probability that a randomly selected positive instance is ranked higher than a randomly selected negative instance according to the model predictions. In general, a classifier with a larger AUC indicates better performance compared to one with a smaller AUC. This curve is commonly used as a validation metric.
Finally, a confusion matrix, a widely used tool in classification problems, is employed in this research. This tool provides detailed information about the predicted classifications (Deng et al. 2016). The confusion matrix is particularly beneficial for evaluating the overall performance of the classification model, which is crucial for guiding subsequent improvements. It is structured such that each cell corresponds to a specific class assigned by the model, with rows representing the actual classes and columns representing the predicted classes. Ideally, correctly classified instances align along the diagonal of the matrix, while misclassified instances appear in the off-diagonal cells.
RESULTS
Learning curve of the CNN: (a) classifier accuracy by epoch for training and validation dataset and (b) classifier loss by epoch for training and validation dataset.
Learning curve of the CNN: (a) classifier accuracy by epoch for training and validation dataset and (b) classifier loss by epoch for training and validation dataset.
Confusion matrix obtained by the validation dataset: (a) normalised confusion matrix and (b) confusion matrix without normalisation.
Confusion matrix obtained by the validation dataset: (a) normalised confusion matrix and (b) confusion matrix without normalisation.



ROC curve obtained from the evaluation of CNN performance: (a) ROC curve and AUC value for each class and (b) ROC curve and overall AUC value.
ROC curve obtained from the evaluation of CNN performance: (a) ROC curve and AUC value for each class and (b) ROC curve and overall AUC value.
Table 2 summarises the remaining evaluation metrics for the CNN performance on the training and validation datasets. This table includes the accuracy, precision, recall (or sensitivity), F-measure, and training time. The metrics for both training and validation data achieve a value of 0.9985. Notably, despite the relatively long training time, the CNN demonstrates excellent performance.
Validation metrics obtained from the validation and training dataset
Metric . | Validation values . | Training values . |
---|---|---|
Accuracy | 0.9985 | 0.9997 |
Precision | 0.9985 | 0.9997 |
Recall | 0.9985 | 0.9997 |
F-measure | 0.9986 | 0.9997 |
Training time | – | 354 min 13.4 s |
Metric . | Validation values . | Training values . |
---|---|---|
Accuracy | 0.9985 | 0.9997 |
Precision | 0.9985 | 0.9997 |
Recall | 0.9985 | 0.9997 |
F-measure | 0.9986 | 0.9997 |
Training time | – | 354 min 13.4 s |
A random feature map from an image in the validation dataset through each convolutional layer. (a) Random input image. Feature map after (b) convolutional layer 1, (c) convolutional layer 2, (d) convolutional layer 3, (e) convolutional layer 4, and (f) convolutional layer 5.
A random feature map from an image in the validation dataset through each convolutional layer. (a) Random input image. Feature map after (b) convolutional layer 1, (c) convolutional layer 2, (d) convolutional layer 3, (e) convolutional layer 4, and (f) convolutional layer 5.
DISCUSSION
The issue of water pollution caused by TSS requires action from environmental policymakers concerning the connection of individuals to the public sewer system as well as monitoring the health of water bodies. In this context, ML techniques offer valuable tools for decision-making based on physical data obtained from simple water imaging (Lopez-Betancur et al. 2022). The CNN developed in this study exemplifies a tool that delivers excellent performance with modest computational resources.
Figure 5, which depicts the learning curve of the proposed model, shows that the CNN developed for water quality classification based on TSS concentration demonstrates strong performance as compared to other studies (Lopez-Betancur et al. 2022). This is evident from epoch 10 onwards, with an accuracy exceeding 0.99, along with high precision, sensitivity, and F-measure. However, fluctuations occur in subsequent training epochs. These fluctuations are primarily due to images with low and medium solid concentrations, where the solid content is so minimal that it becomes challenging to identify within the sample, especially during constant agitation. With an agitation speed of 300 rpm and a selected image crop area of 450 × 450 pixels out of a total of 1,920 × 1,080 pixels, some images may not adequately capture the TSS concentration in the water sample. Fluctuations in the learning process are likely due to difficulties in distinguishing between the low and medium classes. This is supported by the confusion matrix, which shows one misclassified image between the low class (label 3) and the medium class (label 2) in terms of true labels and predictions (Figure 6(b)). It is important to emphasise that overfitting can be ruled out based on the learning curve analysis. The learning curve shows that as the number of epochs increases, the model accuracy continues to improve. Although there are fluctuations in accuracy and loss, these variations are relatively small, within a range between 0.02 and 0.0004, especially when the accuracy reaches a high value of 0.99. This minor fluctuation indicates stability and high performance, suggesting that the model is learning effectively and is unlikely to be overfitting.
Regarding the high class in TSS concentration, it is evident that this class was the most accurately classified by the network. With the highest number of images used for this category, no misclassified labels were found, resulting in an accuracy of 1.0. This can be attributed to the higher TSS concentrations in these samples and the abundance of training data, enabling the proposed CNN model to correctly associate high concentrations with this label.
As shown by the previously presented metrics (Table 2), despite the presence of one misclassified image, the ROC curve indicates that the classification probability is nearly 1.0 for all classes. Specifically, the high class achieves a perfect identification probability of 1.0. For the medium and low classes, the probability of correct identification is 0.99999, which is rounded to 1.0000 in ROC (Figure 7(a)) for clarity due to decimal precision. In addition, the overall AUC for the model is 0.999996, also rounded to 1.0000 in ROC (Figure 7(b)), indicating a very high level of accuracy. These results suggest that the developed network has a high probability of accurately classifying TSS concentrations.
Regarding the feature maps presented in Figure 8, these provide a detailed view of how the network processes and classifies images, thereby validating its ability to detect relevant features. These feature maps reveal which features are activated at each layer of the network, illustrating how the network processes the image – from detecting simple edges and textures to identifying complex patterns. Reviewing these maps confirms that the CNN focuses on differences in suspended solids and effectively learns to identify relevant patterns, while ignoring factors such as optical distortions and unwanted radiation. In addition, the dimensions of each feature map at the output of the layers demonstrate that the model is correctly structured according to the architecture used by AlexNet.
Although there are various research studies focused on predicting contamination by TSS using images, many of them exhibit variations in both precision and practicality. For example, Zhang et al. (2002) achieved a precision of 91% in TSS detection by utilising combined optical and microwave data obtained by Landsat 5 TM and ERS-2 SAR satellites respectively, alongside ANN. However, this approach relies on complex and costly technologies, which may limit its applicability in resource-restricted contexts. Conversely, the study by Saberioon et al. (2020) used Sentinel-2A images and reported an R2 of 80% for TSS estimation, although this method also involves costs associated with remote sensing. In the work of Mustafa et al. (2017), indices derived from satellite images obtained by Landsat 8 OLI were applied, finding a correlation of 65.1% between the Airborne Water Extraction Index and TSS, indicating limitations in the precision of their estimates. Furthermore, Pai et al. (2007) characterised TSS in effluents from a hospital WWTP using genetic algorithms for neural networks, yielding errors of 23.14 and 51.73% in TSS determination. When applying a genetic model, errors of 23.14 and 26.67% were reported. These results reflect the diversity of existing approaches and their respective limitations.
CONCLUSIONS
This study developed a CNN framework using TL to classify three levels of TSS in water, achieving an accuracy of 99.85% despite training on small datasets. Although the model performed exceptionally well overall, low TSS concentrations were the most challenging to classify, and were occasionally confused with medium concentrations. However, this confusion was observed in only one image, and the model's performance remained highly accurate overall. The model was optimised for low computational costs, ensuring reproducibility and practical applicability. Given the high cost and time of laboratory TSS determinations, this approach offers a novel, cost-effective alternative for classifying solids.
The model is limited to a TSS concentration range of up to 6,000 mg/L, with higher concentrations potentially complicating classification. Controlled lighting conditions were used to minimise classification errors, but the model is not suited for samples with significant dissolved solids, which may cause errors due to their varied composition and colouration. The computational power required for finer analysis remains a constraint, though this method offers a practical solution for regions with limited access to water quality laboratories, such as small communities in Mexico. This tool could aid sustainable water management, improving monitoring and decision-making for rivers, lakes, irrigation, and WWTPs.
Future work aims to expand the model to classify water into five quality levels, aligning with the Mexican environmental standards. This includes addressing systematic experimental errors by incorporating varied particle types, lighting conditions, and particle sizes. The performance of alternative CNN architectures will also be explored to advance AI applications in environmental science, focusing on additional water quality parameters and contaminants of global concern.
ACKNOWLEDGEMENTS
We acknowledge Salomón Borjas García for valuable support.
FUNDING
We acknowledge support from CIC-UMSNH under grants 18371 and 26140.
AUTHOR CONTRIBUTIONS
All authors contributed equally to this research paper.
DATA AVAILABILITY STATEMENT
All relevant data are available from https://github.com/Itzel-LS/WQPTSSCANN.
CONFLICT OF INTEREST
The authors declare there is no conflict.