ABSTRACT
This study investigates the application of an artificial neural network (ANN) framework for analysing water pollution caused by solids. To address the challenge, we develop a convolutional neural network trained under a transfer learning strategy with AlexNet. We feed the network with pictures of samples of water with low, medium, and high concentrations of total suspended solids and achieve a high validation accuracy of 99.85% with a precision of 99.85%, which is highly competitive with other approaches. Our model demonstrates significant improvements in speed and reliability over conventional image processing methods, effectively predicting pollution levels. Our findings suggest that ANNs can serve as an effective tool for real-time monitoring and management of water pollution, facilitating proactive decision-making and policy formulation.
HIGHLIGHTS
Artificial neural networks are used to estimate water pollution caused by total suspended solids.
Samples are prepared with different concentrations of suspended solids and filmed to obtain images.
A convolutional neural network with a transfer learning strategy is trained and tested for analysing its performance.
Low, medium, and high concentrations of total suspended solids in water can be identified with high accuracy and modest computer resources.
INTRODUCTION
Samples supplies and data image obtained for the cropped image from the video recording: (a) clays used to make the samples, (b) container of 5 × 5 × 5 cm3, (c) a sample with the translucent container, (d) cropped image associated with a high TSS concentration, (e) cropped image associated with a medium TSS concentration, and (f) cropped image associated with a low TSS concentration.
Samples supplies and data image obtained for the cropped image from the video recording: (a) clays used to make the samples, (b) container of 5 × 5 × 5 cm3, (c) a sample with the translucent container, (d) cropped image associated with a high TSS concentration, (e) cropped image associated with a medium TSS concentration, and (f) cropped image associated with a low TSS concentration.
Scheme of experimental setup (see text). Samples are put in a transparent cubic container on a magnetic stirrer, illuminated laterally using an 18-inch Ring Light Edge-Lit LED. Images were captured with an iPhone 12.
Scheme of experimental setup (see text). Samples are put in a transparent cubic container on a magnetic stirrer, illuminated laterally using an 18-inch Ring Light Edge-Lit LED. Images were captured with an iPhone 12.
Traditional methods for identifying and classifying TSS in water samples include using a Gooch crucible with a 2.4-cm glass fibre filter, Buchner funnel, membrane filter, and Gooch crucible asbestos (Smith & Greenberg 1963). While effective, these methods require specialised laboratory facilities, trained personnel, as well as significant time and financial resources. These limitations are particularly challenging for regions with limited access to advanced water quality laboratories, thus making it difficult to monitor and manage the treatment of TSS in water. This underscores the need for innovative approaches to enhance the efficiency and accuracy of water quality assessment (Kim et al. 2024). An artificial neural network (ANN) framework offers a promising solution to these challenges, especially considering that water quality data are inherently complex, with numerous interdependent variables such as turbidity, TSS, and COD. An ANN excels in managing and interpreting complex, non-linear relationships within large datasets, providing more accurate predictions than traditional linear models (Noori et al. 2022). Such a framework is capable of learning from historical data to predict future pollution levels, which is crucial for proactive environmental management (Schauser & Steinberg 2001). Furthermore, once trained, ANN models can rapidly process and analyse data, significantly reducing the time required for water quality assessment, a beneficial feature for real-time monitoring and decision-making (Palani et al. 2008). These frameworks can be easily scaled to incorporate additional data sources and parameters, enhancing their applicability across different water bodies and pollution scenarios. ANNs continuously improve their performance as more data become available and thus remain relevant and accurate over time, adapting to changes in pollution patterns and environmental conditions (Noori et al. 2013). An additional benefit of their use is that these frameworks can be seamlessly integrated with Internet of Things (IoT) devices and remote sensing technologies. For example, Lloret et al. (2011) developed a wireless sensor network where each sensor node captures images from the field and uses image processing to detect issues in the leaves, such as deficiencies, pests, or diseases. Once detected, the sensor sends a notification to the farmer via the network. Similarly, Sabari et al. (2020) designed a system for the continuous monitoring of water quality parameters based on the concept of IoT, where the proposed model utilises various sensors to measure the required parameters. In another example, Patil et al. (2019) designed a system that informs users about upcoming floods through notifications and alerts. The system uses sensors to gather flood-related data and provides information about nearby safe locations for evacuation.
The integration of ANNs with IoT devices and remote sensing technologies enables continuous, automated monitoring of water quality, providing real-time data that enhance the responsiveness and effectiveness of pollution control measures. By providing reliable and detailed insights into pollution dynamics, ANNs support informed decision-making done by policymakers, environmental agencies, and stakeholders. This leads to better resource allocation, targeted pollution control strategies, and, ultimately, more effective environmental protection.
In the field of water management, several ANN models have been primarily used to characterise both the quantity and quality of water (Farmaki et al. 2010; Najafzadeh et al. 2021). One of the most common applications of ANNs in water monitoring is through remote sensing (Wagle et al. 2020), where satellite images are used to predict different water levels and the evolution of contaminants (Agrawal & Petersen 2021). Another significant application of ANNs is in the modelling, particularly in wastewater treatment plants (WWTPs), as well as in the control process within these facilities (Hamed et al. 2004). Also, ANNs have been widely utilised for predicting water quality parameters through various machine learning (ML) methods (Haghiabi et al. 2018; Kim et al. 2024).
In this article, we present a novel approach to assess water quality by developing a convolutional neural network (CNN) capable of predicting high, medium, and low pollution levels based on TSS concentrations. This method offers a cost-effective, rapid, and non-invasive alternative for water quality monitoring, enabling the classification of water quality using a single image captured with a smartphone camera. Early studies under similar conditions already point towards the benefits of using CNN in the classification of water contamination by TSS (Lopez-Betancur et al. 2022). The remainder of this paper is organised as follows: In the Methods section, we present in detail the sample preparation, the experimental procedure, the CNN development for classifying water quality based on solids, as well as its training and calibration phases. In the Results section, we report the findings of our study based on the followed methodology. In the Discussion section, we summarise the findings of this work and its impact in environmental science. Finally, we present an outlook for future work in the Conclusions section.
METHODS
Samples
To carry out this research, 30 water samples were prepared with different levels of TSS as pollutants. These samples were obtained by selecting clays with particles sizes smaller than 60 μm, since TSS are considered matter with particle diameter less than 62 μm (Bilotta & Brazier 2008). This process involved sieving the material through a 60 μm mesh sieve to achieve material homogenisation. It is important to note that the selected clays are primarily composed of iron and aluminium, the most common types of clays found in urban environments (Perry & Taylor 2009). This selection aims to mimic natural contamination in urban water sources.
For preparing the samples, distilled water was used. Clays were weighed on an analytical balance with a precision of 0.1 mg. The TSS concentration in the samples ranged from 40 to 6,000 mg/L. Based on these concentrations, the samples were divided into three water quality categories: low, medium, and high. Low concentrations of solids correspond to less polluted water samples with 40–70 mg/L of TSS, medium concentrations to 80–400 mg/L, while high concentrations of TSS range from 500 to 6,000 mg/L, corresponding to strongly polluted water. This setup is shown in Figure 1. This classification aligns with the regulations and standards in Mexico. For instance, the ‘low’ classification corresponds to TSS concentrations permitted for the discharge of treated wastewater into rivers, streams, canals, drains, reservoirs, lakes, and lagoons, as well as for the irrigation of green areas (National Advisory Committee for Standardization of the Environment and Natural Resources of Mexico 2021). This class represents water of good quality in terms of TSS. The ‘medium’ class, however, includes contaminated water that is considered acceptable but still contaminated. This classification aligns with the criteria proposed by the Ministry of Environment and Natural Resources in Mexico for classifying water quality based on suspended solids. The ‘high’ class corresponds to heavily contaminated water, also classified using the same indicator (Ministry of Environment and Natural Resources of Mexico 2011), as illustrated in Figure 1.
The number of samples varied across different classes: 4 samples for the low class, 10 samples for the medium class, and 16 samples for the high class. This variation is due to the different ranges of TSS concentrations within each class. The low class exhibited minimal variability in TSS concentrations, the medium class displayed a broader range of concentrations, and the high class had the widest range of TSS concentrations.
Experimental procedure
To carry out the experimental part of this work, we kept photographic records of each water sample containing solids. Each record was made by placing 100 mL of the sample in transparent cubic containers with a 5 cm edge. The sample was illuminated laterally using a white dispersed light source, specifically an 18-inch Ring Light Edge-Lit LED, positioned 20 cm away from the water sample. The light source was placed laterally to avoid reflections on the water container. Due to the particle size of the TSS, a magnetic stirrer was required to prevent sedimentation. The stirrer was operated at 300 rpm, which was the speed at which the solids remained in constant suspension, with a 1.5-inch-diameter hexagonal capsule placed at the centre of the water container. The images were captured through a 1-min video recorded using an iPhone 12 with a 12 MP camera and a resolution of 1,920 × 1,080 progressively displayed pixels, also known as high definition, capable of 30 frames/s (fps) and a 2.5× zoom. The entire experimental setup, shown in Figure 2, was conducted on a levelled anti-vibration optical table, with the smartphone mounted statically on a tripod and positioned 15 cm away from the camera shutter, located in a booth with black curtains to prevent external light intrusion. The entire experimental setup was documented to ensure repeatability in video recording. The images used in the development of the CNN were extracted from videos captured at a rate of 4 fps. This process was implemented using Python, a versatile and accessible programming language (Thaker & Shukla 2020), with the MoviePy package. A total of 240 images per sample were generated, resulting in a total of 7,200 images. Following individual analysis, 685 images were excluded due to blur issues, resulting in a dataset of 6,515 images for further analysis. The original dimensions of the captured images were 1,920 × 1,080 pixels, but they were centrally cropped to 450 × 450 pixels to reduce noise from the magnetic stir bar, container edges, and induced vortices by the magnetic stirrer. The cropping process was conducted using the OpenCV library in Python.
CNN development
In the domain of deep learning, various types of algorithms exist, with CNNs being among the most widely used (Baek et al. 2020). These models offer several advantages: (1) they reduce both time and costs (e.g., material and labour costs), (2) enable forecasting across different system phases, (3) simplify complex systems to enhance comprehension, and (4) predict target values even in situations where site access is challenging (Barzegar et al. 2020). Therefore, in this study, we propose the use of a CNN.
The primary task of a CNN is classification. Initially, it performs feature extraction from the input image. These features are then fed into a neural network (NN), producing output probabilities that indicate the classification of the input image into a specific category (Ferentinos 2018). However, training a CNN from scratch requires two main conditions: (1) access to a large dataset with labelled data and (2) significant computational and memory resources (Morid et al. 2021).
An alternative to training CNNs from scratch is the transfer learning (TL) strategy, which allows leveraging knowledge acquired from large datasets of non-environmental data to address specific environmental challenges, such as water quality analysis. Specifically, parameters from well-trained CNN models on non-environmental datasets, which contain diverse images (e.g., ImageNet models like AlexNet (Yuan & Zhang 2016), VGGNet (Purwono et al. 2023), and ResNet (Wu et al. 2019)), can be transferred to tailor a CNN model for analysing water quality.
AlexNet architecture. (a) Training with big data ImageNet (input image size 244 × 244 × 3). (b) Five convolutional layers with ReLu as activation function and max-pooling to size reduce. (c) Three FC layers and the last layer uses SoftMax as activation function to the classifier.
AlexNet architecture. (a) Training with big data ImageNet (input image size 244 × 244 × 3). (b) Five convolutional layers with ReLu as activation function and max-pooling to size reduce. (c) Three FC layers and the last layer uses SoftMax as activation function to the classifier.
To carry out the TL process used in our work, we make use of Python as the programming language, because it has a large number of packages that facilitate the use of ML algorithms (Raschka & Mirjalili 2019), and it is also free to use. The entire procedure was conducted using the Jupyter Notebook environment within the Visual Studio Code Integrated Development Environment. All the computations were performed on a computer equipped with an 8th generation Intel i5 processor and 8 GB of RAM, which provides relatively modest computational resources. The well-trained AlexNet model was obtained from the Torchvision package, which is a Python package that includes several pre-trained network models. The modification of the AlexNet CNN involved removing the classifier, which initially had 1,000 different types of classes, and we proposed a three-class water quality classifier based on different concentrations of TSS: high, medium, and low. Subsequently, the parameters obtained from the pre-training of AlexNet were unfrozen to adjust the weights of the network model.
In the ML realm, one of the most important elements to define are the hyperparameters of the ANN. These consist of configurations of the network that affect its structure, learning, and performance. Hyperparameters differ from parameters in that they are not automatically modified or adjusted during training; instead, they must be specified beforehand (Yu & Zhu 2020). The hyperparameters established in this work were the optimisation algorithm, learning rate, batch size, and number of epochs. Below, we describe each of these.
The optimisation process in artificial intelligence (AI) involves identifying optimal parameters that improve the performance of a CNN model. One of the classic methods for this process is the stochastic gradient descent (SGD) optimisation method (Newton et al. 2018). However, tuning the learning rate of SGD, as a hyperparameter, is often challenging because the magnitudes of different parameters vary significantly and need to be adjusted throughout the training process (Zhang 2018). Therefore, in our study, we used the Adam optimiser as it is an efficient stochastic optimisation method that only requires first-order gradients and has low memory requirements. This method computes individual adaptive learning rates for different parameters based on estimates of the first and second moments of gradients (Kingma & Ba 2014) and iteratively finding values that minimise the error (loss).


One of the crucial hyperparameters is the number of epochs used during training. An epoch entails presenting each sample in the training dataset with an opportunity to update the model internal parameters. Each epoch consists of one or more batches. As each sample or batch is processed through the network, the error is computed, and the back propagation (BP) algorithm is applied to adjust the weights and biases of the network. During BP, the error is propagated backward through the network, gradients of the weights are calculated with respect to the error, and these gradients are used to update the weights, aiming to minimise the error. This process facilitates model learning and performance enhancement. Although the number of epochs is typically large, our approach employs 50 epochs.
Conversely, the batch size determines the number of samples propagated through the CNN and used to update model parameters in each iteration until the training is complete. Larger batch sizes facilitate greater computational parallelism and can often enhance performance. However, they require more memory and can introduce latency during training. For the creation of the CNN model, a total of 6,515 images with varying suspended solid concentrations were used, with 5,862 images allocated for the training set and 653 images for validation. Consequently, given the total number of images available, we opted for a batch size of 50 images.
Hyperparameters used in the development of the CNN model
Hyperparameters . | Value . |
---|---|
Algorithm optimiser | Adam |
Learning rate | 0.000005 |
Batch size | 50 |
Epoch | 50 |
Hyperparameters . | Value . |
---|---|
Algorithm optimiser | Adam |
Learning rate | 0.000005 |
Batch size | 50 |
Epoch | 50 |
Representation of the TL process in the CNN model used: (a) well-trained AlexNet CNN, (b) removing last layer classifier for a new task, (c) reuse of the pre-trained model, (d) new training dataset of water samples and modification of the CNN, and (e) classification result.
Representation of the TL process in the CNN model used: (a) well-trained AlexNet CNN, (b) removing last layer classifier for a new task, (c) reuse of the pre-trained model, (d) new training dataset of water samples and modification of the CNN, and (e) classification result.
Validation metrics
For evaluating the proposed CNN, the most common metrics are based on the prediction of four possible outcomes: true positives (), true negatives (
), false positives (
), and false negatives (
) (Seliya et al. 2009). In this study, we use accuracy, precision, recall, F-measure, receiver operating characteristic (ROC), and confusion matrix as validation metrics. Below, we describe the function of each metric.
In our study, we set .
The ROC curve illustrates how a classifier balances between correctly identifying and
. It provides a comprehensive view of the classifier effectiveness, independent of class distribution or error costs (Davis & Goadrich 2006). The area under the ROC curve (AUC) represents the probability that a randomly selected positive instance is ranked higher than a randomly selected negative instance according to the model predictions. In general, a classifier with a larger AUC indicates better performance compared to one with a smaller AUC. This curve is commonly used as a validation metric.
Finally, a confusion matrix, a widely used tool in classification problems, is employed in this research. This tool provides detailed information about the predicted classifications (Deng et al. 2016). The confusion matrix is particularly beneficial for evaluating the overall performance of the classification model, which is crucial for guiding subsequent improvements. It is structured such that each cell corresponds to a specific class assigned by the model, with rows representing the actual classes and columns representing the predicted classes. Ideally, correctly classified instances align along the diagonal of the matrix, while misclassified instances appear in the off-diagonal cells.
RESULTS
Learning curve of the CNN: (a) classifier accuracy by epoch for training and validation dataset and (b) classifier loss by epoch for training and validation dataset.
Learning curve of the CNN: (a) classifier accuracy by epoch for training and validation dataset and (b) classifier loss by epoch for training and validation dataset.
Confusion matrix obtained by the validation dataset: (a) normalised confusion matrix and (b) confusion matrix without normalisation.
Confusion matrix obtained by the validation dataset: (a) normalised confusion matrix and (b) confusion matrix without normalisation.



ROC curve obtained from the evaluation of CNN performance: (a) ROC curve and AUC value for each class and (b) ROC curve and overall AUC value.
ROC curve obtained from the evaluation of CNN performance: (a) ROC curve and AUC value for each class and (b) ROC curve and overall AUC value.
Table 2 summarises the remaining evaluation metrics for the CNN performance on the training and validation datasets. This table includes the accuracy, precision, recall (or sensitivity), F-measure, and training time. The metrics for both training and validation data achieve a value of 0.9985. Notably, despite the relatively long training time, the CNN demonstrates excellent performance.
Validation metrics obtained from the validation and training dataset
Metric . | Validation values . | Training values . |
---|---|---|
Accuracy | 0.9985 | 0.9997 |
Precision | 0.9985 | 0.9997 |
Recall | 0.9985 | 0.9997 |
F-measure | 0.9986 | 0.9997 |
Training time | – | 354 min 13.4 s |
Metric . | Validation values . | Training values . |
---|---|---|
Accuracy | 0.9985 | 0.9997 |
Precision | 0.9985 | 0.9997 |
Recall | 0.9985 | 0.9997 |
F-measure | 0.9986 | 0.9997 |
Training time | – | 354 min 13.4 s |
A random feature map from an image in the validation dataset through each convolutional layer. (a) Random input image. Feature map after (b) convolutional layer 1, (c) convolutional layer 2, (d) convolutional layer 3, (e) convolutional layer 4, and (f) convolutional layer 5.
A random feature map from an image in the validation dataset through each convolutional layer. (a) Random input image. Feature map after (b) convolutional layer 1, (c) convolutional layer 2, (d) convolutional layer 3, (e) convolutional layer 4, and (f) convolutional layer 5.
DISCUSSION
The issue of water pollution caused by TSS requires action from environmental policymakers concerning the connection of individuals to the public sewer system as well as monitoring the health of water bodies. In this context, ML techniques offer valuable tools for decision-making based on physical data obtained from simple water imaging (Lopez-Betancur et al. 2022). The CNN developed in this study exemplifies a tool that delivers excellent performance with modest computational resources.
Figure 5, which depicts the learning curve of the proposed model, shows that the CNN developed for water quality classification based on TSS concentration demonstrates strong performance as compared to other studies (Lopez-Betancur et al. 2022). This is evident from epoch 10 onwards, with an accuracy exceeding 0.99, along with high precision, sensitivity, and F-measure. However, fluctuations occur in subsequent training epochs. These fluctuations are primarily due to images with low and medium solid concentrations, where the solid content is so minimal that it becomes challenging to identify within the sample, especially during constant agitation. With an agitation speed of 300 rpm and a selected image crop area of 450 × 450 pixels out of a total of 1,920 × 1,080 pixels, some images may not adequately capture the TSS concentration in the water sample. Fluctuations in the learning process are likely due to difficulties in distinguishing between the low and medium classes. This is supported by the confusion matrix, which shows one misclassified image between the low class (label 3) and the medium class (label 2) in terms of true labels and predictions (Figure 6(b)). It is important to emphasise that overfitting can be ruled out based on the learning curve analysis. The learning curve shows that as the number of epochs increases, the model accuracy continues to improve. Although there are fluctuations in accuracy and loss, these variations are relatively small, within a range between 0.02 and 0.0004, especially when the accuracy reaches a high value of 0.99. This minor fluctuation indicates stability and high performance, suggesting that the model is learning effectively and is unlikely to be overfitting.
Regarding the high class in TSS concentration, it is evident that this class was the most accurately classified by the network. With the highest number of images used for this category, no misclassified labels were found, resulting in an accuracy of 1.0. This can be attributed to the higher TSS concentrations in these samples and the abundance of training data, enabling the proposed CNN model to correctly associate high concentrations with this label.
As shown by the previously presented metrics (Table 2), despite the presence of one misclassified image, the ROC curve indicates that the classification probability is nearly 1.0 for all classes. Specifically, the high class achieves a perfect identification probability of 1.0. For the medium and low classes, the probability of correct identification is 0.99999, which is rounded to 1.0000 in ROC (Figure 7(a)) for clarity due to decimal precision. In addition, the overall AUC for the model is 0.999996, also rounded to 1.0000 in ROC (Figure 7(b)), indicating a very high level of accuracy. These results suggest that the developed network has a high probability of accurately classifying TSS concentrations.
Regarding the feature maps presented in Figure 8, these provide a detailed view of how the network processes and classifies images, thereby validating its ability to detect relevant features. These feature maps reveal which features are activated at each layer of the network, illustrating how the network processes the image – from detecting simple edges and textures to identifying complex patterns. Reviewing these maps confirms that the CNN focuses on differences in suspended solids and effectively learns to identify relevant patterns, while ignoring factors such as optical distortions and unwanted radiation. In addition, the dimensions of each feature map at the output of the layers demonstrate that the model is correctly structured according to the architecture used by AlexNet.
Although there are various research studies focused on predicting contamination by TSS using images, many of them exhibit variations in both precision and practicality. For example, Zhang et al. (2002) achieved a precision of 91% in TSS detection by utilising combined optical and microwave data obtained by Landsat 5 TM and ERS-2 SAR satellites respectively, alongside ANN. However, this approach relies on complex and costly technologies, which may limit its applicability in resource-restricted contexts. Conversely, the study by Saberioon et al. (2020) used Sentinel-2A images and reported an R2 of 80% for TSS estimation, although this method also involves costs associated with remote sensing. In the work of Mustafa et al. (2017), indices derived from satellite images obtained by Landsat 8 OLI were applied, finding a correlation of 65.1% between the Airborne Water Extraction Index and TSS, indicating limitations in the precision of their estimates. Furthermore, Pai et al. (2007) characterised TSS in effluents from a hospital WWTP using genetic algorithms for neural networks, yielding errors of 23.14 and 51.73% in TSS determination. When applying a genetic model, errors of 23.14 and 26.67% were reported. These results reflect the diversity of existing approaches and their respective limitations.
CONCLUSIONS
This study developed a CNN framework using TL to classify three levels of TSS in water, achieving an accuracy of 99.85% despite training on small datasets. Although the model performed exceptionally well overall, low TSS concentrations were the most challenging to classify, and were occasionally confused with medium concentrations. However, this confusion was observed in only one image, and the model's performance remained highly accurate overall. The model was optimised for low computational costs, ensuring reproducibility and practical applicability. Given the high cost and time of laboratory TSS determinations, this approach offers a novel, cost-effective alternative for classifying solids.
The model is limited to a TSS concentration range of up to 6,000 mg/L, with higher concentrations potentially complicating classification. Controlled lighting conditions were used to minimise classification errors, but the model is not suited for samples with significant dissolved solids, which may cause errors due to their varied composition and colouration. The computational power required for finer analysis remains a constraint, though this method offers a practical solution for regions with limited access to water quality laboratories, such as small communities in Mexico. This tool could aid sustainable water management, improving monitoring and decision-making for rivers, lakes, irrigation, and WWTPs.
Future work aims to expand the model to classify water into five quality levels, aligning with the Mexican environmental standards. This includes addressing systematic experimental errors by incorporating varied particle types, lighting conditions, and particle sizes. The performance of alternative CNN architectures will also be explored to advance AI applications in environmental science, focusing on additional water quality parameters and contaminants of global concern.
ACKNOWLEDGEMENTS
We acknowledge Salomón Borjas García for valuable support.
FUNDING
We acknowledge support from CIC-UMSNH under grants 18371 and 26140.
AUTHOR CONTRIBUTIONS
All authors contributed equally to this research paper.
DATA AVAILABILITY STATEMENT
All relevant data are available from https://github.com/Itzel-LS/WQPTSSCANN.
CONFLICT OF INTEREST
The authors declare there is no conflict.