Abstract
Salt water adversely affects human health and plant growth. In parallel with the increasing interest in non-contact determination of salt concentration in water, a novel approach is proposed in this study. In the proposed approach, S parameter measurements, which show the scattering properties of electromagnetic waves, are used. First, the relationship between salt concentration in water and permittivity values, a distinguishing feature for liquids, is shown. Then, based on the derived correlations from a set of S parameter measurements, it is shown that the salt concentration in water can be predicted. Finally, after exactly determining the relations of permittivity, salt concentration and S parameter, a system that allows non-contact determination of salt concentration is proposed. Since the proposed system makes its prediction using a classifier, decision tree algorithms are employed for this purpose. In order to evaluate the appropriateness and success of the algorithms, a set of classification experiments were held using various water samples with different levels of salt concentration. The results of the classification experiments show that the Hoeffding tree algorithm achieved the best results and is the most suitable decision tree algorithm for determining the salt concentration of liquids. For this reason, the proposed non-contact approach can be used to determine the salt concentration in water reliably and quickly if its hardware and software components can be embedded into a prototype system.
HIGHLIGHTS
Showing the relationship between salt concentration in water and permittivity values.
Predicting salt concentration.
Proposing a system that allows non-contact determination of salt concentration.
INTRODUCTION
The effect of salt water on human health and life can be examined in two ways. First, the salt concentration in drinking water causes direct health problems, and second, the salt concentration in irrigation waters and water used in agriculture and animal husbandry causes damage to plants, soil, animals and ecosystems. In (Deniz & Kadioglu 2021) the geochemistry of salts and in the study by Jabed et al. (2020), the effects of water salinity on human health were investigated. Jabed et al. (2020) showed the relationship between drinking water salinity and hypertension and excessively diluted urine. It was shown that the high salinity level in drinking water affects human health and may pose a significant health risk, especially for diabetic and renal dialysis patients who need to control their daily salt intake (Ekinci et al. 2011). Evaluation of water quality and associated health risk was performed in the Terme River, Turkey, using physicochemical quality indices and multivariate analysis (Ustaoğlu et al. 2021).
Salt water has effects on agriculture and plants as well as human health. Salinity is one of the environmental factors affecting the growth of plants; because water with high salinity damages plants (Bartels & Sunkar 2005). Soils with high salinity are barren soils and inhibit the growth of plants; because salinity reduces the ability of plants to take up water (Fipps 2003). In addition, excess salt in the transpiration stream damages the cells in the transpiration leaves (Munns & Tester 2008). In addition, excessive salinity damages cells in sweaty leaves (Munns & Tester 2008).
The amount of salt in water can be determined by ultrasonic waves. On the other hand, it is known that the speed of ultrasonic waves changes according to the salt concentration and temperature (Wong & Zhu 1995). The absorption and fluorescence properties of salts and their aqueous solutions were investigated using UV-vis spectroscopy and spectrofluorometry (Chai 2008). Since the refractive index of an environment increases with the salinity level, salinity sensors based on the refractive index was also developed (Quan & Fry 1995).
Optical techniques based on various fibre optic sensors were proposed for in situ monitoring of water salinity (Zhao & Liao 2002; Meng 2014). For this purpose, separate optoelectronic components such as laser source, photodetector, microcontroller and microprocessor are used, but such detection systems are costly. In addition, a separate communication setup is required for the transfer of on-site data to the laboratory. It is important to determine whether there is salt in water and if yes how much salt it contains with low-cost systems. Therefore, an effective optoelectronic system was proposed for remote sensing of salinity (Villarreal et al. 2018).
Microstrip antennas are widely used in many applications such as data communications, medicine (Daliri 2010) and agriculture (Yahaya 2012). However, in this study, a microstrip patch antenna was used during a process carried out to determine the salt concentration in water. To do this, first, the relationship between the salt concentration in water and the permeability values, which is a distinguishing feature for liquids, was investigated. This relationship led to the understanding that there may be a relationship between salt concentration and scattering (S) parameters. Therefore, a measurement system was designed to measure the S parameters. The measurement system consists of a microstrip patch antenna designed and manufactured in this study and a commercial Vector Network Analyser (VNA). The S parameters of the liquids, i.e. different brand waters and water with different salt concentration levels, classified in this study were measured with the VNA and the antenna. After deriving the relationship between the permeability value obtained from the measurements and the S parameter, the relationship between the Smax value and the salt concentration was determined using the curve fitting approach.
The approach proposed in this study was designed to allow a quick, automated and non-contact determination of salt concentration. Such an automated process requires a software application relying on a machine learning algorithm embedded to the designed system. In this study, decision tree algorithms were preferred as classifiers and their performances were compared using well-known performance metrics in order to make a fair comparison. The results obtained in the performance evaluation part of this study showed that the Hoeffding tree algorithm is the most suitable algorithm to determine the salt concentration in water. The remainder of this paper is as follows. Experimental setup and methodology used in this study are presented in Section 2. Classifiers used by the proposed system and metrics employed to evaluate the results are explained in Section 3. The results are reported in Section 4. Finally, this paper is concluded in Section 5.
EXPERIMENTAL SETUP AND METHODOLOGY
S parameters with incident and reflected waves in a two-port linear circuit.
Antennas send electrical signals from any transmitter into free space by converting them into electromagnetic waves. These types of antennas are called transmitting antennas. Similarly, they receive electromagnetic waves from free space and convert these waves into electrical signals and transmit them to receivers. Such antennas are called receiving antennas. The antenna used in this study is both a transmitter and a receiver. There are some antenna parameters that need to be considered in microstrip antenna design. These are antenna patch, ground plane, feeding method, dielectric constant and thickness of the dielectric layer. The patch is a conductive structure that allows electromagnetic waves to absorb or radiate and is positioned on a dielectric layer. A circular copper patch was preferred in the antenna. The reason why the circular patch was preferred is its symmetrical radiation characteristic, which is not found in other types of patches. The dielectric layer is the material between the antenna radiation patch and the underside where the ground plane is located. FR-4 material with a thickness of 1.6 mm and a dielectric constant of 4.4 was used as a dielectric layer in the antenna. If the dielectric value of the dielectric layer is high in patch antennas, surface waves occur. For this reason, material with a very high dielectric value was not preferred. The conductive material used in the ground plane has a material similar to the material used in the radiant patch. The coaxial probe feeding method was preferred as the feeding method. The reasons why this method was preferred are that it is easy to manufacture, has low artificial radiation characteristics, low interference radiation and less line losses.
The flowchart of the methodology used in this study is given in Figure 3. First, using four different classification algorithms, various salt water and pure water samples were classified. Then, the performances of these algoritms were evaluated using a set of performance metrics obtained with and without 10-fold cross validation.
CLASSIFIERS AND PERFORMANCE METRICS
In the following subsections, decision trees and performance metrics used in this study are explained.
Decision trees
The Rep tree algorithm (Quinlan 1987), originally proposed by Quinlan, uses regression tree logic. It creates many trees in different iterations and chooses the best tree among these trees. It uses the principle of gaining information by entropy in the creation of the regression tree, and achieves the least error with pruning (Devasena 2014; Srinivasan & Mekala 2014).
Hoeffding tree was proposed by Hulten et al. (2001). It uses the statistical value known as the Hoeffding boundary in each node of the decision tree to decide how to split the node. It reads each sample once and processes at an appropriate time interval.
Random Forest, proposed by Breiman, performs well even for classifying large amounts of data. Instead of branching nodes selected from the best attributes in the dataset, it branches all nodes by selecting the best of randomly received attributes in each node. Each dataset is generated from the original dataset with displacement and no pruning is involved (Breiman 2001). The reason why Random Forest is faster and more accurate than other algorithms is that it increases the classification rate by generating more than one decision tree.
Logistics Model Tree (LMT) combines logistic regression and decision tree (Lavrač 2003; Landwehr et al. 2005). While ordinary decision trees form a fragmented fixed model, LMT is a decision tree with a linear regression model whose leaves provide a piecewise linear regression (Lavrač 2003). The LogitBoost algorithm can be used to generate a logistic regression model at each node of the tree and then the node is separated using C4.5 criteria. With each LogitBoost execution, it restarts from its results via the parent node. Finally, the tree is pruned (Sumner et al. 2005).
Performance metrics
RESULTS
When an electromagnetic wave is sent to liquids, interaction between molecules and electromagnetic waves occurs. As a result of this interaction, polarisation and depolarisation occur in molecules with different dielectric permittivity values. These cause a decrease in the magnitude of the wave in the wave velocity, the loss of energy caused by the friction of the directing molecules. In this study, the real part of permittivity (εr′) is a measure of how much energy from an external electric field is stored in a liquid sample and the imaginary part of permittivity (εr′′) is a measure of how dissipative or lossy a liquid sample is to an external electric field.
In curve fitting algorithms, a mathematical model is created and correlation coefficient (R2) in this model is determined. Correlation coefficient is between 0 and 1, the closer it is to 1, the more perfect the harmony has been achieved. In this study, curve fitting algorithms were used to determine the relation between salt concentration, permittivity and S parameter. For this purpose, first, the permittivity measurements of liquid samples in the range of 1.5–3 GHz were made with a coaxial probe method. The measurement results are presented in Figure 4. It was observed that when salt content in a liquid sample increased, the permittivity value changed depending on the salt concentration, and there was also a difference between the pure water and salt water graphs. The relation between salt concentration and permittivity is presented in Figure 5. The correlation coefficients between salt concentration and permittivity values, real part and imaginary part, were R2 = 0.94 and R2 = 0.98, respectively. These values show that there is a very consistent relation between salt concentration and permittivity value.
Permittivity measurement of liquid samples. (a) The real part of permittivity and (b) the imaginary part of permittivity.
Permittivity measurement of liquid samples. (a) The real part of permittivity and (b) the imaginary part of permittivity.
Relation between salt concentration and permittivity. (a) The real part of permittivity and (b) the imaginary part of permittivity.
Relation between salt concentration and permittivity. (a) The real part of permittivity and (b) the imaginary part of permittivity.
S parameter measurements of the liquid samples with the proposed measurement system are given in Figure 6. The relation between the peak points (Smax) expressing the maximum amplitude values from the S parameter measurements and the permittivity values is presented in Figure 7. It can be seen that increase in salt concentration in water is inversely proportional to ε′ and directly proportional to ε′′. As the salt concentration increases, ε′ decreases and ε′′ increases. The relation between Smax value and salt concentration is shown in Figure 8. It is seen that Smax and salt concentration change inversely. A correlation coefficient of 0.93 indicated that the fit was high and the correlation allowed to determine the relation between salt concentrations or permittivity using S parameter measurements.
Relation between Smax and permittivity. (a) The real part of permittivity and (b) the imaginary part of permittivity.
Relation between Smax and permittivity. (a) The real part of permittivity and (b) the imaginary part of permittivity.
Since there is a strict relation between salt concentration and S parameter, both deciding whether a liquid sample was salt water or pure water and predicting salt concentration using S parameter measurements is possible. To prove this experimentally, S parameter measurements of seven different brands of bottled water and different concentrations of salt water were made. All measurements were made two times and a dataset was created. Using the dataset, first it was predicted whether a liquid sample was salt water or pure water. Then, the sample's salt concentration was predicted. The results were also validated using 10-fold cross validation. Tables 1 and 2 list the results in terms of the performance metrics employed in this study and the total number of correct and incorrect classifications for each of the classification algorithms.
Performance metrics obtained when the aim was to predict whether liquid samples were salt water or pure water
Hoeffding Tree . | ||||||||
---|---|---|---|---|---|---|---|---|
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Pure water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Average | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
LMT . | ||||||||
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Pure water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Average | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Random Forest . | ||||||||
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Pure water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Average | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Rep Tree . | ||||||||
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 0.83 | 1 | 0.90 | 0,84 | 1 | 0,80 | 0.88 | 0.83 |
Pure water | 1 | 0.85 | 0.92 | 0.84 | 0.87 | 1 | 0.93 | 0.83 |
Average | 0.93 | 0.91 | 0.91 | 0.84 | 0.92 | 0.91 | 0.91 | 0.83 |
Hoeffding Tree . | ||||||||
---|---|---|---|---|---|---|---|---|
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Pure water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Average | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
LMT . | ||||||||
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Pure water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Average | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Random Forest . | ||||||||
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Pure water | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Average | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Rep Tree . | ||||||||
Class . | Without Cross validation . | With Cross validation . | ||||||
. | Precision . | Recall . | F-Score . | MCC . | Precision . | Recall . | F-Score . | MCC . |
Salt water | 0.83 | 1 | 0.90 | 0,84 | 1 | 0,80 | 0.88 | 0.83 |
Pure water | 1 | 0.85 | 0.92 | 0.84 | 0.87 | 1 | 0.93 | 0.83 |
Average | 0.93 | 0.91 | 0.91 | 0.84 | 0.92 | 0.91 | 0.91 | 0.83 |
Performance metrics obtained when the aim was to predict the salt concentration of liquid samples
Hoeffding Tree . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Class . | Without cross validation . | With cross validation . | ||||||||||
Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | |
10 (%) Salt | 1 | 1 | 1 | 1 | 1 | 0.50 | 0.66 | 0.66 | ||||
25 (%) Salt | 1 | 1 | 1 | 1 | 0.66 | 1 | 0.80 | 0.76 | ||||
50 (%) Salt | 1 | 1 | 1 | 1 | 10 | 0 | 1 | 1 | 1 | 1 | 9 | 1 |
75 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
100 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
Average | 1 | 1 | 1 | 1 | 0.93 | 0.90 | 0.89 | 0.88 | ||||
LMT . | ||||||||||||
Class . | Without cross validation . | With cross validation . | ||||||||||
. | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . |
10 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
25 (%) Salt | 1 | 1 | 1 | 1 | 1 | 0.50 | 0.67 | 0.66 | ||||
50 (%) Salt | 1 | 1 | 1 | 1 | 10 | 0 | 1 | 0.50 | 0.67 | 0.66 | 8 | 2 |
75 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
100 (%) Salt | 1 | 1 | 1 | 1 | 0.50 | 1 | 0.67 | 0.61 | ||||
Average | 1 | 1 | 1 | 1 | 0.90 | 0.80 | 0.80 | 0.78 | ||||
Random Forest . | ||||||||||||
Class . | Without cross validation . | With cross validation . | ||||||||||
. | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . |
10 (%) Salt | 1 | 1 | 1 | 1 | 1 | 0.50 | 0.66 | 0.66 | ||||
25 (%) Salt | 1 | 1 | 1 | 1 | 0.66 | 1 | 0.80 | 0.76 | ||||
50 (%) Salt | 1 | 1 | 1 | 1 | 10 | 0 | 1 | 1 | 1 | 1 | 9 | 1 |
75 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
100 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
Average | 1 | 1 | 1 | 1 | 0.93 | 0.90 | 0.89 | 0.88 | ||||
Rep Tree . | ||||||||||||
Class . | Without cross validation . | With cross validation . | ||||||||||
Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | |
10 (%) Salt | 0.33 | 1 | 0.50 | 0.40 | 0 | 0 | 0 | −0.50 | ||||
25 (%) Salt | – | 0 | – | – | 0 | 0 | 0 | −0.25 | ||||
50 (%) Salt | – | 0 | – | – | 4 | 6 | 0 | 0 | 0 | −0.25 | 0 | 0 |
75 (%) Salt | 0.50 | 1 | 0.66 | 0.61 | – | 0 | – | – | ||||
100 (%) Salt | – | 0 | – | – | 0 | 0 | 0 | −0.16 | ||||
Average | – | 0.4 | – | – | – | 0 | – | – |
Hoeffding Tree . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Class . | Without cross validation . | With cross validation . | ||||||||||
Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | |
10 (%) Salt | 1 | 1 | 1 | 1 | 1 | 0.50 | 0.66 | 0.66 | ||||
25 (%) Salt | 1 | 1 | 1 | 1 | 0.66 | 1 | 0.80 | 0.76 | ||||
50 (%) Salt | 1 | 1 | 1 | 1 | 10 | 0 | 1 | 1 | 1 | 1 | 9 | 1 |
75 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
100 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
Average | 1 | 1 | 1 | 1 | 0.93 | 0.90 | 0.89 | 0.88 | ||||
LMT . | ||||||||||||
Class . | Without cross validation . | With cross validation . | ||||||||||
. | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . |
10 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
25 (%) Salt | 1 | 1 | 1 | 1 | 1 | 0.50 | 0.67 | 0.66 | ||||
50 (%) Salt | 1 | 1 | 1 | 1 | 10 | 0 | 1 | 0.50 | 0.67 | 0.66 | 8 | 2 |
75 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
100 (%) Salt | 1 | 1 | 1 | 1 | 0.50 | 1 | 0.67 | 0.61 | ||||
Average | 1 | 1 | 1 | 1 | 0.90 | 0.80 | 0.80 | 0.78 | ||||
Random Forest . | ||||||||||||
Class . | Without cross validation . | With cross validation . | ||||||||||
. | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . |
10 (%) Salt | 1 | 1 | 1 | 1 | 1 | 0.50 | 0.66 | 0.66 | ||||
25 (%) Salt | 1 | 1 | 1 | 1 | 0.66 | 1 | 0.80 | 0.76 | ||||
50 (%) Salt | 1 | 1 | 1 | 1 | 10 | 0 | 1 | 1 | 1 | 1 | 9 | 1 |
75 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
100 (%) Salt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
Average | 1 | 1 | 1 | 1 | 0.93 | 0.90 | 0.89 | 0.88 | ||||
Rep Tree . | ||||||||||||
Class . | Without cross validation . | With cross validation . | ||||||||||
Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | Precision . | Recall . | F-Score . | MCC . | Correct Instances (Total Number) . | Incorrect Instances (Total Number) . | |
10 (%) Salt | 0.33 | 1 | 0.50 | 0.40 | 0 | 0 | 0 | −0.50 | ||||
25 (%) Salt | – | 0 | – | – | 0 | 0 | 0 | −0.25 | ||||
50 (%) Salt | – | 0 | – | – | 4 | 6 | 0 | 0 | 0 | −0.25 | 0 | 0 |
75 (%) Salt | 0.50 | 1 | 0.66 | 0.61 | – | 0 | – | – | ||||
100 (%) Salt | – | 0 | – | – | 0 | 0 | 0 | −0.16 | ||||
Average | – | 0.4 | – | – | – | 0 | – | – |
In predicting whether a liquid sample was salt water or not, Precision, Recall, F-Score and MCC values were 1 in all algorithms except Rep tree. This shows that the algorithms made perfect classifications. On the other hand, Rep tree achieved a precision of 0.83, a recall of 1, an F-Score of 0.9 and an MCC of 0.84. The recall value of 1 in predicting whether salt water or not indicates that Rep tree correctly classified all the salt water samples. However, there is a decrease in its recall value in classifying pure water samples. This shows that Rep tree made wrong classifications in predicting the classes of the salt water samples. The value of F-Score also supports this. The fact that Rep tree's precision value was 1 indicates that it correctly classified all the pure water samples. When cross validation was applied, a decrease in MCC was observed as the success in predicting salt water samples decreased. Figure 9 presents the total number of correct and incorrect classifications for each of the classification algorithms when the aim was to predict whether the liquid samples were salt water or pure water.
Predicting whether liquid samples were salt water or pure water (a) without cross validation and (b) with cross validation.
Predicting whether liquid samples were salt water or pure water (a) without cross validation and (b) with cross validation.
Five classes were created for predicting the salt concentration of the liquid samples. Without cross validation, all the performance metrics had a value of 1 for Hoeffding Tree, LMT and Random Forest. On the other hand, Rep Tree showed a very bad performance in this classification. When cross validated, the results were worse compared to the classification without cross validation. This indicates that when a liquid sample with a different concentration level is analysed with the proposed system, the concentration level might be predicted incorrectly; although, the decision regarding whether the liquid is salt water or pure water is correct.
Accuracy, Kappa and MAE values of the algorithms are listed in Figures 10–12, respectively. It is seen that without cross validation Hoeffding Tree, LMT and Random Forest achieved an accuracy of 100% and a Kappa value of 1. However, Rep Tree achieved an accuracy of 91% and a Kappa value of 0.83 in predicting whether a liquid sample was salt water or not. On the other hand, in predicting salt concentration it achieved an accuracy of 40% and a Kappa value of 0.25. After cross validation, accuracy rates and Kappa values of Hoeffding Tree and Random Forest were lower. Hoeffding Tree and Random Forest achieved an accuracy of 90% and a Kappa value of 0.87 in.predicting salt concentration. LMT achieved an accuracy of 80% and a Kappa value of 0.75 in predicting salt concentration. Since Rep Tree could not classify any samples correctly, its accuracy was 0%. Hoeffding Tree had the lowest MAE values.
Accuracy. (a) Predicting whether a liquid sample was salt water or not and (b) predicting the salinity level.
Accuracy. (a) Predicting whether a liquid sample was salt water or not and (b) predicting the salinity level.
Kappa. (a) Predicting whether a liquid sample was salt water or not and (b) predicting the salinity level.
Kappa. (a) Predicting whether a liquid sample was salt water or not and (b) predicting the salinity level.
MAE. (a) Predicting whether a liquid sample was salt water or not and (b) predicting the salinity level.
MAE. (a) Predicting whether a liquid sample was salt water or not and (b) predicting the salinity level.
When all the performance metrics are taken into consideration, it is seen that all the algorithms are suitable for predicting whether a liquid sample is salt water or not. Among the algorithms, Rep Tree had the lowest performance in terms of all the performance metrics. It had an accuracy rate of 91.66%, while the others achieved an accuracy rate of 100%. While the average values of Precision, Recall, F-Score, MCC and Kappa were 1 in the others, in Rep Tree, the average of these values was 0.92, 0.91, 0.91, 0.83 and 0.82, respectively. In predicting the salt concentration of the liquid samples, Hoeffding Tree, LMT and Random Forest algorithms achieved an accuracy rate of 100%, while Rep Tree achieved an accuracy rate of 40%. As expected, after cross validation, all the algorithms had lower performance. Nevertheless, Hoeffding Tree and Random Forest algorithms achieved an accuracy rate of 90%, a precision of 0.93, a recall of 0.90, an F-Score value of 0.89 and a Kappa value of 0.87. Although they had almost the same metric values, compared to Random Forest, Hoeffding Tree had a lower error rate. Therefore, it can be concluded that Hoeffding Tree made better classifications.
CONCLUSION
Due to the increasing interest in non-contact detection of salinity, in this study a novel approach for this purpose was proposed. Since the proposed approach relies on software component, i.e. classifiers, as well as hardware components, a set of classification experiments were performed to evaluate the success of different decision tree algorithms in order to determine the most suitable one for salinity detection. The results of the classification experiments show that the Hoeffding tree algorithm achieved the best results and Rep tree had the worst performance. The main limitation of the proposed approach is that it can deliver uncertain results when the concentration of salinity is below 5%. Future work of this study focuses on the development of a VNA-embedded system for the proposed system. Such a prototype system when supported by software application based on the Hoeffding tree algorithm will be able to quickly determine salt concentration in water. It will be a standalone and automated solution and its single requirement to operate will be a power supply.
FUNDING
No funding was received for conducting this study.
CONFLICTS OF INTERESTS
The authors have no conflicts of interest to declare that are relevant to the content of this article.
AUTHORS’ CONTRIBUTION
The first author conceived and planned the experiments and prepared the samples. Both authors carried out the experiments and contributed to the interpretation of the results. Both authors discussed the results and contributed to the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.