Forecasting the efficiency of solar still production (SSP) can reduce the capital risks involved in a solar desalination project. Solar desalination is an attractive method of water desalination and offers a more reliable water source. In this study, to estimate SSP, we employed the data obtained from experimental fieldwork. SSP is assumed to be a function of ambient temperature, relative humidity, wind speed, solar radiation, feed flow rate, temperature of feed water, and total dissolved solids in feed water. In this study, back-propagation artificial neural network (ANN) models with two transfer functions were adopted for predicting SSP. The best performance was obtained by the ANN model with one hidden layer having eight neurons which employed the hyperbolic transfer function. Results of the ANN model were compared with those of stepwise regression (SWR) model. ANN model produced more accurate results compared to SWR model in all modeling stages. Mean values for the coefficient of determination and root mean square error by ANN model were 0.960 and 0.047 L/m^{2}/h, respectively. Relative errors of predicted SSP values by ANN model were about ±10%. In conclusion, the ANN model showed greater potential in accurately predicting SSP, whereas the SWR model showed poor performance.

## INTRODUCTION

Solar stills are widely used in solar desalination. Nonetheless, solar still production (SSP) is very low. Accordingly, increasing SSP has been the focus of intensive study. Much of the literature has focused on experiments to find a better design for solar stills to improve the SSP (e.g., Tanaka & Nakatake 2009; Kabeel *et al.* 2013; Ayoub *et al.* 2015; Koilraj Gnanadason *et al.* 2015). Usually, these experimental studies are costly, laborious and time-consuming. Consequently, mathematical modeling (MM) may be the best alternative for finding better designs. MM is one of the most effective methods for providing a clear obvious understanding of solar still behavior and enhancing SSP. Moreover, MM by artificial intelligence (AI) gives the most accurate results and is much faster than classical MM.

Artificial neural networks (ANNs), which are one of these AI technologies, are mathematical models that try to mimic the structures and functions of biological neural networks used for solving complex problems (Ali Abdoli *et al.* 2012; Behboudian *et al.* 2014; Nezhad *et al.* 2016). ANNs have been used in various ways to model and predict the performance of various desalination and solar desalination systems ranging from predicting solar still performance (Santos *et al.* 2012; Mashaly & Alazba 2015, 2016a, 2016b, 2016c; Mashaly *et al.* 2015) to optimizing the performance of solar-powered membrane distillation units (Porrazzo *et al.* 2013), to simulating the reverse osmosis desalination process (Khayet *et al.* 2011). These networks have also been able to control multi-stage flash desalination plants (Tayyebi & Alishiri 2014) as well as analyze seawater desalination systems (Gao *et al.* 2007).

However, SSP is one of the most important elements in the solar desalination process and an essential tool used to analyze solar still performance. Hence, it is necessary to investigate and predict SSP and also consider the effects of meteorological and operational parameters on it. The objective of this research was to develop SSP models using both an ANN model and a stepwise regression (SWR) model. The two approaches were compared in terms of model performance. Also, an assessment of the performance of the ANN and SWR models using a statistical comparison was conducted.

## MATERIALS AND METHODS

### Experimental procedure

^{2}single stage of C6000 panel (F Cubed, Carocell Solar Panel, Australia). The solar-still panel was manufactured using modern, cost-effective materials such as coated polycarbonate plastic. When heated, the panel distilled a film of water that flowed over the absorber mat of the panel. The panel was fixed at an angle of 29° to the horizontal. The basic construction materials were galvanized steel legs, an aluminum frame, and polycarbonate covers. The transparent polycarbonate was coated on the inside with a special material to prevent fogging (patented by F Cubed, Australia). A cross-sectional view of the solar still is presented in Figure 1. The working idea of the available system is summarized in the following paragraphs.

The water was fed to the panel using a centrifugal pump (model PKm 60, 0.5 HP, Pedrollo, Italy) with a constant flow rate of 10.74 L/h. The feed was supplied by eight drippers/nozzles, creating a film of water that flowed over the absorbent mat. Underneath the absorbent mat was an aluminum screen that helped to distribute the water across the mat. Beneath the aluminum screen was an aluminum plate. Aluminum was chosen for its hydrophilic properties, to assist in the even distribution of the sprayed water. Water flows through and over the absorbent mat, and solar energy was absorbed and partially collected inside the panel; as a result, the water was heated and hot air circulated naturally within the panel. First, the hot air flowed towards the top of the panel, then reversed its direction to approach the bottom of the panel. During this process of circulation, the humid air touches the cooled surfaces of the transparent polycarbonate cover and the bottom polycarbonate layer, causing condensation. The condensed water flowed down the panel and was collected in the form of a distilled stream. Seawater was used as feed water input to the system. The solar still system was run from 02/23/2013 to 04/23/2013. Raw seawater was obtained from the Gulf, Dammam, in eastern Saudi Arabia (26°26′24.19″N, 50°10′20.38″E). The initial concentrations of total dissolved solids (TDS) in the three types of water along with their pH, density , and electrical conductivity (EC) were 41.4 ppt, 8.02, 1.04 g.cm^{−3}, and 66.34 mS cm^{−1}, respectively. The production or the amount of distilled water produced (SSP) during a time period by the system was obtained by collecting the cumulative amount of water produced over time. The temperature of the feed water (*T _{F}*) was measured by using thermocouples (T-type, UK). Temperature data for feed brine water were recorded on a data logger (model 177-T4, Testo, Inc., UK) at 1 min intervals. The amount of feed water (

*M*) was measured by calibrated digital flow meter mounted on the feed water line (micro-flo, Blue-White, USA). The amount of brine water and distilled water were measured by graduated cylinder. TDS concentration and EC were measured using a TDS-calibrated meter (Cole-Parmer Instrument, Vernon Hills, USA). A pH meter (model 3510 pH meter, Jenway, UK) was used to measure pH. A digital-density meter (model DMA 35

_{F}_{N}, Anton Paar, USA) was used to measure

*ρ*. The seawater was fed separately to the panel using the pump described above. The residence time – the time taken for the water to pass through the panel – was approximately 20 minutes. Therefore, the flow rate of the feed water, the distilled water, and the brine water was measured every 20 minutes. Also, the TDS of feed water (

*TDS*) were measured every 20 minutes. The weather data, such as ambient temperature (

_{F}*T*), relative humidity (

_{o}*RH*), wind speed (

*WS*), and solar radiation (

*SR*), were obtained from the weather station mentioned above. Here, there is one dependent variable, which was the SSP, and seven independent variables, which are

*T*,

_{o}*RH*,

*WS*,

*SR*,

*TDS*,

_{F}*M*, and

_{F}*T*.

_{F}### Artificial neural network

*W*) which represents the strength of the connection. In this respect, in the hidden layer a summation of weights is passed through a function called the transfer function which decides the relationship between the inputs and output (Eren

*et al.*2012; Valderrama

*et al.*2015; Wadi Abbas Al-Fatlawi

*et al.*2015). The output layer neuron (

*Y*) is expressed as: where f is the transfer function;

_{k}*W*are the weights from the hidden layer to the output layer;

_{kj}*B*are the biases in the output layer; and h

_{k}_{j}is the neuron's activation value in the hidden layer mathematically expressed as (Haykin 1999): where

*W*are the weights from the input layer to the hidden layer;

_{ji}*X*are the inputs; and

_{i}*B*are the biases in the hidden layer. The transfer function could be sigmoid (SIG) as shown in Equation (3) or hyperbolic tangent (TANH) as displayed in Equation (4):

_{j}### ANN model development

*T*,

_{o}*RH*,

*WS*,

*SR*,

*TDS*,

_{F}*M*, and

_{F}*T*. The output parameter was SSP. The obtained data points from the experimental work were randomly used to train (70% of the data points), test (20% of the data points), and validate (10% of the data points) the ANN model. The training, testing, and validation sets have 112, 32, and 16 data points, respectively. In the validation stage, the data used to check the performance of the ANN model are not used in the training stage. Different ANN architectures with one hidden layer were trained. The optimal number of neurons in the hidden layer was determined by trial-and-error procedure, in which the number of neurons in the hidden layer was varied from 1 to 10 to find the best ANN architecture. The training iteration was fixed to 200,000. The learn rate and momentum factor were fixed and were 0.01 and 0.8, respectively. Before the training process, all of the input and output parameters were automatically normalized between 0.15 and 0.85 by the used software. The normalization process accelerates the ANN training, and increases the network's generalization capabilities. This formula was used for the normalization process: where is the original value of input and output parameters; is normalized value; and are maximum and minimum values of input and output parameters, respectively.

_{F}### Stepwise regression

*T*,

_{o}*RH*,

*WS*,

*SR*,

*TDS*,

_{F}*M*, and

_{F}*T*). The SWR equation in the general form can be written as: where

_{F}*Y*is the output value using the SWR method;

*B*(i = 0, 1, 2, … , n) are the regression coefficients, and

_{i}*X*(i = 1, 2, … , n) are the input variables.

_{i}The SWR includes finding appropriate independent variables and values for the coefficients. To achieve this aim, SWR analysis was carried out using the Statistical Package for Social Science (IBM SPSS Statistics 22) program (SPSS Inc., Chicago, IL, USA). The basic procedures of SWR were: (1) identifying an initial model; (2) frequently altering the model at the former step by adding or removing independent variable in accordance with the ‘stepping criteria: probability-of-F-to-enter <= 0.05, probability-of-F-to-remove >= 0.1’; (3) ending the search when stepping is no longer potential given the stepping criteria, or when a particular maximum number of steps has been reached. Simply and briefly, SWR technique chooses the most correlated independent variable first, and then chooses the second independent variable which most correlates with the residual variance in the dependent variable. This procedure continues until choice of an additional independent variable does not increase the coefficient of determination (CD) by a significant amount. A detailed description of the SWR can be found in numerous sources (Morrison 1990; Brereton 1995; Hocking 1996).

### Statistical parameters for models’ assessment

*SSP*denotes the observed value;

_{o,i}*SSP*is the predicted value; is the mean of observed values; is the mean of predicted values; and

_{p,i}*n*is the whole number of observations.

## RESULTS AND DISCUSSION

### Data description and general findings

Table 1 shows some of the descriptive statistics for the data obtained from the experimental work used in the ANN and SWR models. Table 2 presents Pearson correlation coefficient values computed using SPSS for all parameters. The mean ± SD (standard deviation) parameter values were 26.64 ± 3.68 °C, 23.36 ± 12.90%, 2.44 ± 3.12 km/h, 587.55 ± 181.93 W/m^{2}, 0.21 ± 0.04 °C, 36.66 ± 4.27 L/min, 80.23 ± 29.42 PPT, and 0.50 ± 0.24 L/m^{2}/h for *T _{o}*,

*RH*,

*WS*,

*SR*,

*T*,

_{F}*M*,

_{F}*TDS*, and SSP, respectively. The coefficient of variation (CV) for all of the parameters was different with the greatest variation being observed in the

_{F}*WS*, while the smallest variation was identified in

*M*. Pearson correlation coefficient values for

_{F}*T*,

_{o}*WS*,

*T*, and

_{F}*TDS*were negative, while they were positive for

_{F}*RH*,

*SR*, and

*M*.

_{F}*SR*shows the highest Pearson correlation coefficient value of 0.733, while RH is the lowest (0.014). A two-tailed significance test was significant for

*WS*,

*SR*,

*M*, and

_{F}*TDS*but not significant for

_{F}*T*,

_{o}*RH*, and

*T*, being greater than 0.001, as indicated in Table 2. The effects of meteorological and operational factors on the SSP were investigated experimentally, and our findings in this regard indicate that the most important factor that affects SSP is the

_{F}*SR*. The SSP was observed to decrease in line with an increase in

*TDS*. A full and complete demonstration of these experimental results can be found in Mashaly

_{F}*et al.*(2016).

. | Mean ± SD . | Ranges . | CV (%) . |
---|---|---|---|

T_{o} (°C) | 26.64 ± 3.68 | 16.87–33.23 | 14.00 |

RH (%) | 23.36 ± 12.90 | 12.90–70.00 | 55.00 |

WS (km/h) | 2.44 ± 3.12 | 0.00–12.65 | 128.00 |

SR (W/m^{2}) | 587.55 ± 181.93 | 75.10–920.69 | 31.00 |

T_{F} (°C) | 0.21 ± 0.04 | 0.13–0.25 | 20.00 |

M_{F} (L/min) | 36.66 ± 4.27 | 22.10–42.35 | 12.00 |

TDS_{F} (PPT) | 80.23 ± 29.42 | 41.40–130.00 | 37.00 |

SSP (L/m^{2}/h) | 0.50 ± 0.24 | 0.05–0.97 | 48.00 |

. | Mean ± SD . | Ranges . | CV (%) . |
---|---|---|---|

T_{o} (°C) | 26.64 ± 3.68 | 16.87–33.23 | 14.00 |

RH (%) | 23.36 ± 12.90 | 12.90–70.00 | 55.00 |

WS (km/h) | 2.44 ± 3.12 | 0.00–12.65 | 128.00 |

SR (W/m^{2}) | 587.55 ± 181.93 | 75.10–920.69 | 31.00 |

T_{F} (°C) | 0.21 ± 0.04 | 0.13–0.25 | 20.00 |

M_{F} (L/min) | 36.66 ± 4.27 | 22.10–42.35 | 12.00 |

TDS_{F} (PPT) | 80.23 ± 29.42 | 41.40–130.00 | 37.00 |

SSP (L/m^{2}/h) | 0.50 ± 0.24 | 0.05–0.97 | 48.00 |

SD, standard deviation; CV, coefficient of variation.

. | T_{o}
. | RH . | WS . | SR . | T_{F}
. | M_{F}
. | TDS_{F}
. |
---|---|---|---|---|---|---|---|

SSP | |||||||

Pearson correlation | −0.070 | 0.014 | −0.307* | 0.733* | −0.061 | 0.228* | −0.403* |

Sig. (2-tailed) | 0.376 | 0.864 | 8 × 10^{−5} | 2.82 × 10^{−28} | 0.445 | 0.004 | 1.28 × 10^{−7} |

. | T_{o}
. | RH . | WS . | SR . | T_{F}
. | M_{F}
. | TDS_{F}
. |
---|---|---|---|---|---|---|---|

SSP | |||||||

Pearson correlation | −0.070 | 0.014 | −0.307* | 0.733* | −0.061 | 0.228* | −0.403* |

Sig. (2-tailed) | 0.376 | 0.864 | 8 × 10^{−5} | 2.82 × 10^{−28} | 0.445 | 0.004 | 1.28 × 10^{−7} |

*Correlation is significant at the 0.01 level (2-tailed).

### ANN model selection

^{2}/h, IA values from 0.988 to 0.994, and MARE values from 6.690 to 10.061 L/m

^{2}/h. For TANH function in the training phase, the ANN architectures’ CD values ranged from 0.955 to 0.990, RMSE values from 0.024 to 0.051 L/m

^{2}/h, IA values from 0.988 to 0.997, and MARE values from 5.074 to 9.973 L/m

^{2}/h. It can be observed that the TANH function performed better than the SIG function, as seen from the results and Figure 2. However, the most appropriate ANN architecture is achieved when the number of neurons in the hidden layer was 8 and the TANH function was used, as this gave the best SSP predictions with the lowest error. The developed ANN model is 7-8-1 and this means that the developed ANN model contains 7 input neurons, 8 neurons in the hidden layer, and 1 output neuron, as illustrated in Figure 3. The developed ANN model's CD value was 0.990, RMSE value was 0.024 L/m

^{2}/h, IA value was 0.997, and MARE value was 5.074 L/m

^{2}/h. The relative contribution of each input to the output (SSP) for the developed ANN model was examined. The results revealed that

*T*and

_{o}*WS*are not significant, contributing only 3.48% and 3.52%, respectively. Also, it has been found that

*TDS*and

_{F}*SR*are the dominant inputs, which is in accordance with the results of Tanaka & Nakatake (2006) and Dev

*et al.*(2011). The

*RH*,

*M*, and

_{F}*T*were also found to have a moderate influence on the SSP prediction with a contribution percentage of 16.41, 17.93, and 15.85%, respectively. The developed ANN model can be solved using a spreadsheet (i.e., Microsoft Excel) and expressed by an algebraic system of equations:

_{F}### SWR models

SWR models are developed by iteratively adding and removing the terms from a multi-linear model based on their significance within a regression by using SPSS software. All seven dependent variables (*T _{o}*,

*RH*,

*WS*,

*SR*,

*T*,

_{F}*M*, and

_{F}*TDS*) were used to create predicting models for SSP. SWR produced five models with 1–5 predictor variables, where

_{F}*SR*was involved in each set of predictor variables, as listed in Table 3. The variables of

*SR*and

*TDS*have the most effect on SSP modeling. The effect of the variable of

_{F}*T*was not considerable, while in the ANN model it was moderately effective. The effect of

_{F}*WS*,

*M*, and

_{F}*RH*were low. CD values associated with each of the five models ranged from 0.545 to 0.902. Corresponding standard errors of the estimate (SEE) ranged from 0.071 to 0.164 L/m

^{2}/h. It can be noted from Table 3 that the absence or presence of some of the input variables in the SWR models significantly affects the performances of these models. Model 1 with just the

*SR*performed worst, with CD = 0.545 and SSE = 0.164 L/m

^{2}/h. Model 2 performed better than Model 1, owing to the presence of TDS

_{F}. The CD value of Model 2 was increased by 59.82% than that for Model 1. Furthermore, the SSE value of Model 2 was decreased by 47.56% than that for Model 1. Following step 2 (Model 2), the accuracy was dramatically unchanged. Models 3, 4, and 5 CD values were increased by 2.30%, 2.87%, and 3.56% than that from Model 2. Also, the SSE values for Models 3, 4, and 5 were decreased by 11.63%, 13.95% and 17.44% than that from Model 2. However, Model 5 showed the best model for estimating SSP (CD = 0.902, SEE = 0.071 L/m

^{2}/h) which involved

*SR*,

*TDS*,

_{F}*WS*,

*M*, and

_{F}*RH*as predictor variables.

Model . | Mathematical expression . | CD . | SEE (L/m^{2}/h)
. |
---|---|---|---|

1 | 0.545 | 0.164 | |

2 | 0.871 | 0.086 | |

3 | 0.891 | 0.076 | |

4 | 0.896 | 0.074 | |

5 | 0.902 | 0.071 |

Model . | Mathematical expression . | CD . | SEE (L/m^{2}/h)
. |
---|---|---|---|

1 | 0.545 | 0.164 | |

2 | 0.871 | 0.086 | |

3 | 0.891 | 0.076 | |

4 | 0.896 | 0.074 | |

5 | 0.902 | 0.071 |

CD, coefficient of determination; SEE, standard error of the estimate.

### Comparing ANN and SWR models

^{2}/h, IA of 0.997, and MARE of 5.074%, as presented in Table 4 for the training process. The CD and IA values are close to one while RMSE and MARE values are close to zero, demonstrating excellent agreement between the observed and predicted findings from the ANN model. Figure 4 illustrates that the majority of points obtained by the SWR model during the training process are below the 1:1 line. Also, it is seen that the REs are between −100 and +24% by using the SWR model and the CD was decreased to 0.902. The RMSE and MARE values of 0.146 and 36.297 in the SWR model, respectively, were increased by 508.33% and 615.35% than those from the developed ANN model. The IA values in the SWR model were decreased by 9.63% than that from the ANN model during the training process.

. | CD . | RMSE (L/m^{2}/h)
. | IA . | MARE (%) . |
---|---|---|---|---|

Training | ||||

ANN | 0.990 | 0.024 | 0.997 | 5.074 |

SWR | 0.902 | 0.146 | 0.901 | 36.297 |

Testing | ||||

ANN | 0.918 | 0.070 | 0.976 | 10.593 |

SWR | 0.864 | 0.152 | 0.871 | 26.310 |

Validation | ||||

ANN | 0.972 | 0.047 | 0.991 | 7.152 |

SWR | 0.941 | 0.166 | 0.882 | 25.506 |

. | CD . | RMSE (L/m^{2}/h)
. | IA . | MARE (%) . |
---|---|---|---|---|

Training | ||||

ANN | 0.990 | 0.024 | 0.997 | 5.074 |

SWR | 0.902 | 0.146 | 0.901 | 36.297 |

Testing | ||||

ANN | 0.918 | 0.070 | 0.976 | 10.593 |

SWR | 0.864 | 0.152 | 0.871 | 26.310 |

Validation | ||||

ANN | 0.972 | 0.047 | 0.991 | 7.152 |

SWR | 0.941 | 0.166 | 0.882 | 25.506 |

In the testing process, Figure 4 illustrates how SSP values mostly follow a 1:1 line, showing a strong match between the observed and predicted values. The statistics of the ANN model were CD = 0.918, RMSE = 0.070 L/m^{2}/h, IA = 0.976, and MARE = 10.593%, as shown in Table 4. As shown in Figure 4, 71.88% and 18.75% of the REs accumulate between −10 and 10% for the ANN and SWR models, respectively. The SWR model had a CD value that was about 6% less accurate than the one from the developed ANN model. The value of RMSE for the SWR model (0.152 L/m^{2}/h) was almost double that of the value for the developed ANN model. The SWR model had an IA value that was about 11% less accurate than that from the developed ANN model, while the MARE value for the SWR model was about 2.5 times that of the value for the developed ANN model during the testing process.

By looking at Figure 4 during the validation process, we noted that some values predicted by the SWR model were imprecise, while most of the predicted values were acceptable when using the developed ANN model. The average REs for the ANN and SWR models were −2.27% and −25.47%, respectively. It is obvious from Table 4 that stronger agreement between the observed and predicted values was obtained using the ANN model. This agreement is reflected in the CD, RMSE, IA, and MARE results. The CD and IA values for the ANN were close to one, while the RMSE value was close to zero. The MARE value for the SWR model was almost 3.57 times that of the value for the ANN model during the validation process. Additionally, the SWR model had CD and IA values that were approximately 3.19% and 11% less accurate, respectively, than those from the ANN model. Additionally, the value of RMSE for the SWR model was almost 3.5 times that of the value for the ANN model. As seen from the findings in Table 4 and Figure 4, the ANN model revealed a better prediction performance than the SWR model, and presented sufficient accuracy in the prediction of SSP. These results are consistent with the studies of Jalali-Heravi & Garkani-Nejad (2002) and Rahimi-Ajdadi & Abbaspour-Gilandeh (2011).

On the basis of the average values in all the modeling stages, the performance of the ANN model (CD = 0.960, RMSE = 0.047 L/m^{2}/h, IA = 0.988, and MAE = 7.606 L/m^{2}/h) performed better than the SWR model (CD = 0.902, RMSE = 0.155 L/m^{2}/h, IA = 0.885, and MAE = 29.371 L/m^{2}/h). The average REs were −30.55%, −25.96%, and −25.47% for the SWR model when using the training, testing, and validation data sets, respectively. The errors indicated that SSP values predicted by the SWR model tended to be underpredicted. This systematic underprediction may be due to inappropriate specification of the relationships between SSP and some variables used in the modeling process. However, the SWR model is easier to use and gives results that are deemed reasonable, but the ANN model gave the most precise estimates. Consequently, the developed ANN model gives a high accuracy in the prediction of SSP and offers flexibility and an ability to model non-linear relationships to get precise results.

## CONCLUSION

ANN and SWR are applied to modeling solar still production (SSP) using ambient temperature (*T _{o}*),

*RH*,

*WS*,

*SR*, the temperature of feed water (

*T*), the TDS of feed water (

_{F}*TDS*), and the flow rate of feed water (

_{F}*M*) as explanatory variables. A feed-forward back-propagation algorithm was used in the training process of the ANN model. Several ANN architectures with different numbers of neurons in the hidden layer were trained to determine which architecture gave the best performance and produced minimal errors. The ANN model with 7-8-1 architecture was selected as the best model for modeling the SSP. The performance of the ANN and SWR models was evaluated by the CD, RMSE, IA, and MARE statistical performance indicators. The effectiveness and validity of the ANN model was confirmed owing to the high values of CD and IA, and the low values of RMSE and MARE. Results revealed that ANN can learn the relationships between the input variables and SSP very well; it provided better prediction results than the SWR model. As a final conclusion, the ANN model gives the most precise and reliable estimates, but the SWR model is easier to use though less efficient.

_{F}## ACKNOWLEDGEMENT

The project was financially supported by King Saud University, Vice Deanship of Research Chairs.