Artificial intelligence for predicting solar still production and comparison with stepwise regression under arid climate

Forecasting the efficiency of solar still production (SSP) can reduce the capital risks involved in a solar desalination project. Solar desalination is an attractive method of water desalination and offers a more reliable water source. In this study, to estimate SSP, we employed the data obtained from experimental fieldwork. SSP is assumed to be a function of ambient temperature, relative humidity, wind speed, solar radiation, feed flow rate, temperature of feed water, and total dissolved solids in feed water. In this study, back-propagation artificial neural network (ANN) models with two transfer functions were adopted for predicting SSP. The best performance was obtained by the ANN model with one hidden layer having eight neurons which employed the hyperbolic transfer function. Results of the ANN model were compared with those of stepwise regression (SWR) model. ANN model produced more accurate results compared to SWR model in all modeling stages. Mean values for the coefficient of determination and root mean square error by ANNmodel were 0.960 and 0.047 L/m/h, respectively. Relative errors of predicted SSP values by ANN model were about ±10%. In conclusion, the ANN model showed greater potential in accurately predicting SSP, whereas the SWR model showed poor performance. doi: 10.2166/aqua.2017.046 om https://iwaponline.com/aqua/article-pdf/66/3/166/398099/jws0660166.pdf 2020 Ahmed F. Mashaly (corresponding author) A. A. Alazba Alamoudi Water Research Chair, King Saud University, Riyadh, Saudi Arabia E-mail: mashaly.ahmed@gmail.com A. A. Alazba Agricultural Engineering Department, King Saud University, Riyadh, Saudi Arabia


INTRODUCTION
However, SSP is one of the most important elements in the solar desalination process and an essential tool used to analyze solar still performance.Hence, it is necessary to investigate and predict SSP and also consider the effects of meteorological and operational parameters on it.The objective of this research was to develop SSP models using both an ANN model and a stepwise regression (SWR) model.
The two approaches were compared in terms of model performance.Also, an assessment of the performance of the ANN and SWR models using a statistical comparison was conducted.

Experimental procedure
The experiments were conducted at the Agricultural The solar-still system used in the experiments was constructed from a 6 m 2 single stage of C6000 panel (F Cubed, Carocell Solar Panel, Australia).The solar-still panel was manufactured using modern, cost-effective materials such as coated polycarbonate plastic.When heated, the panel distilled a film of water that flowed over the absorber mat of the panel.The panel was fixed at an angle of 29 W to the horizontal.
The basic construction materials were galvanized steel legs, an aluminum frame, and polycarbonate covers.The transparent polycarbonate was coated on the inside with a special material to prevent fogging (patented by F Cubed, Australia).
A cross-sectional view of the solar still is presented in Figure 1.The working idea of the available system is summarized in the following paragraphs.
The water was fed to the panel using a centrifugal pump (model PKm 60, 0.5 HP, Pedrollo, Italy) with a constant flow rate of 10.74 L/h.The feed was supplied by eight drippers/nozzles, creating a film of water that flowed over the absorbent mat.Underneath the absorbent mat was an aluminum screen that helped to distribute the water across the mat.
Beneath the aluminum screen was an aluminum plate.Aluminum was chosen for its hydrophilic properties, to assist in the even distribution of the sprayed water.Water flows through and over the absorbent mat, and solar energy was absorbed and partially collected inside the panel; as a result, the water was heated and hot air circulated naturally within the panel.
First, the hot air flowed towards the top of the panel, then reversed its direction to approach the bottom of the panel.
where f is the transfer function; W kj are the weights from the hidden layer to the output layer; B k are the biases in the output layer; and h j is the neuron's activation value in the hidden layer mathematically expressed as (Haykin ): where W ji are the weights from the input layer to the hidden layer; X i are the inputs; and B j are the biases in the hidden layer.The transfer function could be sigmoid (SIG) as shown in Equation ( 3) or hyperbolic tangent (TANH) as displayed in Equation ( 4): ties.This formula was used for the normalization process: where Xo is the original value of input and output parameters; Xn is normalized value; Xmax and Xmin are maximum and minimum values of input and output parameters, respectively.

Stepwise regression
This method uses multiple linear regression analysis to determine the relationship between the independent variables and a dependent variable.In this study, SWR analysis was carried out to determine the relations between dependent variable (SSP) and the independent variables (T o , RH, WS, SR, TDS F , M F , and T F ).The SWR equation in the general form can be written as: where Y is the output value using the SWR method; B i (i ¼ 0, 1, 2, … , n) are the regression coefficients, and X i The SWR includes finding appropriate independent variables and values for the β coefficients.To achieve this aim, SWR analysis was carried out using the Statistical Pack-  7)-( 10): where SSP o,i denotes the observed value; SSP p,i is the predicted value; SSP o is the mean of observed values; SSP p is the mean of predicted values; and n is the whole number of observations.

Data description and general findings
Table 1 shows some of the descriptive statistics for the data obtained from the experimental work used in the ANN and SWR models.where F 1, F 2, F 3, F 4, F 5 , F 6 , F 7 and F 8 are computed as follows: All seven dependent variables (T o , RH, WS, SR, T F , M F , and TDS F ) were used to create predicting models for SSP.
SWR produced five models with 1-5 predictor variables, where SR was involved in each set of predictor variables, as listed in

Comparing ANN and SWR models
To assess the capability of ANN and SWR models in predicting SSP, their findings in different modeling stages were compared with one other.The performance of the ANN and SWR models was evaluated using the statistical parameters in Table 4.The observed and predicted SSP values from the ANN and SWR models during the training, testing, and validation processes are compared in Figure 4; these are in the form of a scatter (1:1) plot (left panel) and a relative error (RE) plot (right panel).The data were mostly evenly and tightly distributed around the 1:1 line, Solar stills are widely used in solar desalination.Nonetheless, solar still production (SSP) is very low.Accordingly, increasing SSP has been the focus of intensive study.Much of the literature has focused on experiments to find a better design for solar stills to improve the SSP (e.g., Tanaka & Nakatake ; Kabeel et al. ; Ayoub et al. ; Koilraj Gnanadason et al. ).Usually, these experimental studies are costly, laborious and time-consuming.Consequently, mathematical modeling (MM) may be the best alternative for finding better designs.MM is one of the most effective methods for providing a clear obvious understanding of solar still behavior and enhancing SSP.Moreover, MM by artificial intelligence (AI) gives the most accurate results and is much faster than classical MM.Artificial neural networks (ANNs), which are one of these AI technologies, are mathematical models that try to mimic the structures and functions of biological neural networks used for solving complex problems (Ali Abdoli et al. ; Behboudian et al. ; Nezhad et al. ).ANNs have been used in various ways to model and predict the performance of various desalination and solar desalination systems ranging from predicting solar still performance (Santos et al. ; Mashaly & Alazba , a, b, c; Mashaly et al. ) to optimizing the performance of solar-powered membrane distillation units (Porrazzo et al. ), to simulating the reverse osmosis desalination process (Khayet et al. ).These networks have also been able to control multi-stage flash desalination plants (Tayyebi & Alishiri ) as well as analyze seawater desalination systems (Gao et al. ).
During this process of circulation, the humid air touches the cooled surfaces of the transparent polycarbonate cover and the bottom polycarbonate layer, causing condensation.The condensed water flowed down the panel and was collected in the form of a distilled stream.Seawater was used as feed water input to the system.The solar still system was run from 02/23/2013 to 04/23/2013.Raw seawater was obtained from the Gulf, Dammam, in eastern Saudi Arabia (26 W 26 0 24.19″N,50 W 10 0 20.38″E).The initial concentrations of total dissolved solids (TDS) in the three types of water along with their pH, density (ρ), and electrical conductivity (EC) were 41.4 ppt, 8.02, 1.04 g.cm À3 , and 66.34 mS cm À1 , respectively.The production or the amount of distilled water produced (SSP) during a time period by the system was obtained by collecting the cumulative amount of water produced over time.The temperature of the feed water (T F ) was measured by using thermocouples (T-type, UK).Temperature data for feed brine water were recorded on a data logger (model 177-T4, Testo, Inc., UK) at 1 min intervals.The amount of feed water (M F ) was measured by calibrated digital flow meter mounted on the feed water line (micro-flo, Blue-White, USA).The amount of brine water and distilled water were measured by graduated cylinder.TDS concentration and EC were measured using a TDS-calibrated meter (Cole-Parmer Instrument, Vernon Hills, USA).A pH meter (model 3510 pH meter, Jenway, UK) was used to measure pH.A digital-density meter (model DMA 35 N , Anton Paar, USA) was used to measure ρ.The seawater was fed separately to the panel using the pump described above.The residence timethe time taken for the water to pass through the panelwas approximately 20 minutes.Therefore, the flow rate of the feed water, the distilled water, and the brine water was measured every 20 minutes.Also, the TDS of feed water (TDS F ) were measured every 20 minutes.The weather data, such as ambient temperature (T o ), relative humidity (RH), wind speed (WS), and solar radiation (SR), were obtained from the weather station mentioned above.Here, there is one dependent variable, which was the SSP, and seven independent variables, which are T o , RH, WS, SR, TDS F , M F , and T F .Artificial neural network A particular architecture of ANNs is feed-forward backpropagation neural network (FFBNN), which is popular and commonly applied in many applications, including making predictions.The input parameters are presented to the network through the input layer, which acts as a buffer layer.The number of neurons in this layer is equal to the input/independent parameters.These neurons are connected to the neurons of the adjacent layer and the hidden layer, and all interconnections have a weight (W ) which represents the strength of the connection.In this respect, in the hidden layer a summation of weights is passed through a function called the transfer function which decides the relationship between the inputs and output (Eren et al. ; Valderrama et al. ; Wadi Abbas Al-Fatlawi et al. ).The output layer neuron (Y k ) is expressed as:

Figure 1 |
Figure 1 | Cross-sectional view of the solar still.
4) ANN model development Commercial Qnet 2000 software was employed in this study to develop the ANN model from the experimental data to predict SSP.The input parameters of the ANN model were T o , RH, WS, SR, TDS F , M F , and T F .The output parameter was SSP.The obtained data points from the experimental work were randomly used to train (70% of the data points), test (20% of the data points), and validate (10% of the data points) the ANN model.The training, testing, and validation sets have 112, 32, and 16 data points, respectively.In the validation stage, the data used to check the performance of the ANN model are not used in the training stage.Different ANN architectures with one hidden layer were trained.The optimal number of neurons in the hidden layer was determined by trial-and-error procedure, in which the number of neurons in the hidden layer was varied from 1 to 10 to find the best ANN architecture.The training iteration was fixed to 200,000.The learn rate and momentum factor were fixed and were 0.01 and 0.8, respectively.Before the training process, all of the input and output parameters were automatically normalized between 0.15 and 0.85 by the used software.The normalization process accelerates the ANN training, and increases the network's generalization capabili- age for Social Science (IBM SPSS Statistics 22) program (SPSS Inc., Chicago, IL, USA).The basic procedures of SWR were: (1) identifying an initial model; (2) frequently altering the model at the former step by adding or removing independent variable in accordance with the 'stepping criteria: probability-of-F-to-enter <¼ 0.05, probability-of-F-toremove >¼ 0.1'; (3) ending the search when stepping is no longer potential given the stepping criteria, or when a particular maximum number of steps has been reached.Simply and briefly, SWR technique chooses the most correlated independent variable first, and then chooses the second independent variable which most correlates with the residual variance in the dependent variable.This procedure continues until choice of an additional independent variable does not increase the coefficient of determination (CD) by a significant amount.A detailed description of the SWR can be found in numerous sources (Morrison ; Brereton ; Hocking ).Statistical parameters for models' assessment The performances of both models (ANN and SWR) were evaluated by these statistical error criteria: CD, root mean square error (RMSE), index of agreement (IA), and mean absolute relative error (MARE).The CD illustrates the proportion of the total variance in the observed data that can be demonstrated by the model.RMSE was used to estimate the sensitivity and extremum effect of the predicted value.IA values closer to one show the better agreement with the model.MARE expresses accuracy as a percentage.The closer MARE is to zero, the better the model accuracy.The mathematical expressions for calculation of these statistical parameters are stated in Equations ( Figure 3.The developed ANN model's CD value was 0.990, RMSE value was 0.024 L/m 2 /h, IA value was 0.997, and MARE value was 5.074 L/m 2 /h.The relative contribution of each input to the output (SSP) for the developed ANN model was examined.The results revealed that T o and WS are not significant, contributing only 3.48% and 3.52%, respectively.Also, it has been found that TDS F and SR are the dominant inputs, which is in accordance with the results of Tanaka & Nakatake () and Dev et al. ().The RH, M F , and T F were also found to have a moderate influence on the SSP prediction with a contribution percentage of 16.41, 17.93, and 15.85%, respectively.The developed ANN model can be solved using a spreadsheet (i.e., Microsoft Excel) and expressed by an algebraic system of equations:

Figure 2 |
Figure 2 | Statistical performance of the developed ANN model with various transfer functions and hidden neurons during the training phase.

Figure 3 |
Figure 3 | Architecture of the optimal ANN model.

Figure 4 |
Figure 4 | Performance of the developed ANN and SWR models training, testing, and validation processes.

Table 2 |
Pearson correlation coefficient values

Table 3 .
The variables of SR and TDS F have the most effect on SSP modeling.The effect of the variable of T F was not considerable, while in the ANN model it was moderately effective.The effect of WS, M F , and RH F , and RH as predictor variables.

Table 3 |
Developed SWR models for estimating SSP the developed ANN model.The average REs for the ANN and SWR models were À2.27% and À25.47%, respectively.It is obvious from Table 4 that stronger agreement between the observed and predicted values was obtained using the ANN model.This agreement is reflected in the CD, RMSE, IA, and MARE results.The CD and IA values for the ANN were close to one, while the RMSE value was close to zero.The MARE value for the SWR model was almost 3.57 times that of the value for the ANN model during the validation process.Additionally, the SWR model had CD and IA values that were approximately 3.19% and 11% less accurate, respectively, than those from the ANN model.Additionally, the value of RMSE for the SWR model was almost 3.5 times that of the value for the ANN model.As seen from the findings in Table 4 and Figure 4, the ANN model revealed a better prediction performance than the SWR model, and presented sufficient accuracy in the prediction of SSP.These results are consistent with the studies of Jalali-Heravi & Garkani-Nejad () and Rahimi-Ajdadi & Abbaspour-Gilandeh ().On the basis of the average values in all the modeling stages, the performance of the ANN model (CD ¼ 0.960, RMSE ¼ 0.047 L/m 2 /h, IA ¼ 0.988, and MAE ¼ 7.606 L/m 2 /h) performed better than the SWR model (CD ¼ 0.902, RMSE ¼ 0.155 L/m 2 /h, IA ¼ 0.885, and MAE ¼ 29.371 L/m 2 /h).The average REs were À30.55%, À25.96%, and À25.47% for the SWR model when using the training, testing, and validation data sets, respectively.The errors indicated that SSP values predicted by the SWR model tended to be underpredicted.This systematic underprediction may be due to inappropriate specification of the relationships between SSP and some variables used in the modeling process.However, the SWR model is easier to use and gives results that are deemed reasonable, CD, coefficient of determination; SEE, standard error of the estimate.

Table 4 |
Statistical performance of the developed ANN and SWR models during training,