Today water distribution utilities are trying to improve operational efficiency through increased demand intelligence from their largest customers. Moving to prognostic operations allows utilities to optimally schedule and scale resources to meet demand more reliably and economically. Commercial greenhouses are large water consumers. In order to produce effective forecasting models for greenhouse water demand, the factors that drive demand must be enumerated and prioritized. In this study greenhouse water demand was modeled using artificial neural networks trained with a dataset containing eight input factors for a commercial greenhouse growing bell peppers. The dataset contained water usage, climatic and temporal data for the years 2012–2014. This model was then evaluated using the Extended Fourier Amplitude Sensitivity Test, a global sensitivity analysis, in order to determine the importance, or sensitivity, of each input factor. It was found that time of day, solar radiation, and outdoor temperature (°C) had the largest effects on the model output. These outputs could be used to contribute to the generation of a simplified demand-forecasting model.

## NOTATION

The following symbols are used in this paper:

*ANN*Artificial neural network

Fourier coefficient

Fourier coefficient

Total output variance of the model

Variance caused by all factors except

*i*th input factorVariance caused by

*i*th input factor*F*_{i}Forecast or modeled value of

*i*th parameter*GSA*Global sensitivity analysis

*L*Litre

*LSA*Local sensitivity analysis

*M*Interference factor, usually taken as 4

*N*Number of data points

*O*_{i}Observed value of

*i*th parameter*SCADA*Supervisory control and data acquisition

*S*_{i}First order sensitivity indices

*S*_{Ti}Total order sensitivity indices

*X*Input factor

*Y*Model output

- ha
Hectare (10,000 m

^{2}) *r*Pearson product-moment correlation coefficient

*s*Activation variable for eFAST transformation, taken uniformly between ±π

*x*_{i}Transformed value of input

*i*Random phase-shift used in eFAST input transformation

Incommensurate frequency assigned to input

*i*

## INTRODUCTION

The primary mandate of water utilities is to provide a safe, uninterrupted supply of potable water. This can often make network optimization a lower priority. However, water utilities can consume up to 40% of a municipality's energy bill, of which approximately 80% is used for distribution (pumping) (United States Environmental Protection Agency 2013). Consequently, it is clear that inefficient pumping schedules can be costly. The development of demand forecasting models plays a key role in operational habits and can be vital in developing pumping schedules that allow the network to operate at equilibrium (Fodya & Harley 2014). Pumping schedules are not the only areas which demand forecasting can impact; network upgrades can be very expensive, consequently, it should be certain that these upgrades are critically necessary. Network upgrades are often proposed based on network hydraulic models (Jain & Ormsbee 2002), which can identify areas of interest such as low-pressure zones. There can be some vulnerability with this approach as the hydraulic model is based on estimates of demand, estimates that are typically broadly based on low-resolution billing figures. This relative inaccuracy can then be propagated forward, leading to less than optimum network upgrade forecasts.

Numerous studies have been undertaken to provide more accurate methods for water demand forecasting, Donkor *et al.* (2014) have provided a comprehensive review of many of these studies which took place between the years 2000 and 2010. The study outlined three main concerns that were not addressed in the works studied:

**Model practicality**: Is the model easily implemented by the utility?**Input selection**: Are the proposed model inputs/indicators easily/inexpensively monitored?**Input importance**: Are all proposed inputs/indicators necessary and have they been prioritized?

The necessity of model practicality cannot be overemphasized. Most utilities are streamlined facilities that are often challenged by regular day-to-day operation and do not have the resources to manage complex operational models. The selection of easily monitored input factors is also critical. Some inputs may be deemed relevant, but, if they cannot be measured well, or easily, it limits the model's practicality. This is particularly true of models with high temporal resolution. Examples of such factors range from gross national product and inflation rate (Firat *et al.* 2009), appliance ownership and efficiency (Williamson *et al.* 2002), and household size and income (Polebitski & Palmer 2010).

The significance of input factor priority, as stated by Donkor *et al.* (2014), is to ensure inputs are chosen that have notable influence on the model output and do not create spurious relationships. There is, however, another purpose for the selection or screening of input variables, and that is computational cost. The inclusion of non-essential input variables in a model can be computationally expensive as the model attempts to determine relationships that may be very weak or nonexistent between extraneous variables. Such an expense can translate into an increase in the computational time and may also compromise accuracy through overgeneralization of the model.

*et al.*(2014) while trying to characterize the major influencing factors for large water consumers in greenhouse agriculture. Figure 1 shows the capacity breakdown by consumer type for a large water utility in Essex County Ontario. Inspection of the figure reveals that over three-quarters of all water goes to commercial greenhouses. It is not difficult to imagine how this region of southwestern Ontario has the densest concentration of commercial greenhouses in North America.

The planned approach is to model greenhouse water usage behaviour using an artificial neural network (ANN) and screening input factors using global sensitivity analysis (GSA). GSA will allow the underlying relationships between inputs and outputs of the model to be quantified making it possible to remove input factors that have little to no effect on the output. This technique will provide a basis for model simplification and allow the model developer to focus on using inputs that are easily monitored by the water utility without compromising model accuracy. For this water utility, providing an easily executable and more accurate demand forecasting model for greenhouse water usage will greatly improve day-to-day operations, enable optimization of pumping schedules, and promote improved infrastructure planning. This procedure is not unique to greenhouse water demand modeling; it has great potential in water distribution networks that are dominated by a single sector, e.g. lumber/wood products, petroleum, and oil refinement.

### Data screening

Data screening is a term used to describe the process of filtering out inputs that have little to no effect on the output of a model (Morris 1991; Saltelli *et al.* 2004). Fu *et al.* (2012) explored the use of GSA, more specifically Sobol's method, on two water distribution networks to ensure only variables which had significant effects on the output were included in the model. Sobol's method is a variance-based method for quantifying the effect each input has on the output while also taking into account the interactions between input variables. It was found that removal of these insensitive factors allowed the model to become more computationally efficient while not compromising accuracy. This study will use the Extended Fourier Amplitude Sensitivity Test (eFAST) as the method of GSA. The choice of eFAST was based on recommendations of Saltelli *et al.* (1999) for its lower computational cost and similar accuracy when compared to the method of Sobol, and eFAST has been used in studies pertaining to wind turbine power output (McKay *et al.* 2014), crop growth models (Wang *et al.* 2013; Vanuytrecht *et al.* 2014) and water treatment models (Cosenza *et al.* 2013). Further details on GSA will be provided later in this paper.

### Study area

### Greenhouse water usage studies

There have been several studies conducted on agricultural water needs; however, research on commercial greenhouse water usage is extremely limited. This section will discuss studies that have been undertaken that involve crop water usage and may not incorporate greenhouses. Orgaz *et al.* (2005) examined plant water demand in unheated plastic greenhouses in order to determine crop coefficients to be used to calculate evapotranspiration rates, which is the sum of soil water evaporation and plant transpiration. The procedure was carried out for four prevalent local crops (melon, watermelon, sweet pepper, and green bean) with a soil growth medium. It was shown that there were considerable differences in water requirements dependent on crop growth stage, season, and growing practices. In a United Nations report (Doorenbos & Pruitt 1977) it was found that solar radiation had the largest effect on evapotranspiration rates of various crops. Ma *et al.* (2013) also examined the environmental factors influencing water evaporation, soil water evaporation, and plant transpiration. Using regression analysis, equations were developed to describe the behaviour of these evaporative processes based on three climactic factors: indoor temperature, indoor humidity, and solar radiation. A genetic algorithm was then used to optimize these equations. In this case the genetic algorithm was used to determine the minimum values of evaporation and transpiration and the values of the corresponding climactic factors at which they occur. These idealized climatic factors could then be implemented into the artificial environment of the greenhouse in order to reduce plant-watering needs. The main focus of these studies along with several others (Thompson *et al.* 2007; Capraro *et al.* 2008; Bernier *et al.* 2010) was to improve watering schemes inside the greenhouse operation. These studies are beneficial to the greenhouse operations in terms of water conservation, but do not address the needs of the water utility, which require a more general view of greenhouse water usage in order to forecast water needs.

### Greenhouse operations

In this region, greenhouse operations use large storage tanks to supply the crop with water. These tanks are typically filled in the evening when demands on the water network are low. This raises an issue when modeling greenhouse water demand since the water usage inside the greenhouse will not be reflected in consumption from the water utility until the storage tanks are refilled. Another facet of greenhouse watering is recycling, which is the process of collecting and treating runoff or excess water and mixing it with fresh water in order for it to be used again in the crop watering process. This process allows for the efficient use of fresh water and reduces the associated costs. The limit to the reuse of this excess water in most cases is salinity levels, more specifically sodium chloride (NaCl) (Trajkova *et al.* 2006). The salinity tolerances vary from crop to crop and in order to ensure these values are not exceeded, salinity levels are measured in the greenhouse using the electrical conductivity of the water. Greenhouse operation can also utilize alternative water sources such as wells and ponds. Usage of these alternative sources can dramatically impact the amount of water needed from the utility, but can also pose issues to crop health. Use of untreated water such as recycled, rain, and pond water can potentially destroy crops, as there is no way of knowing what contaminants it contains, which could cost the grower millions of dollars. Because of this risk, most operations in this region employ the use of small-scale water treatment facilities. These treatment facilities are very limited in capacity as the costs associated with larger-scale options outweigh the costs of potable municipal water; this combined with evolving regulations on alternative water sourcing solidifies the need for municipal water sources in greenhouse operations. This study examines the use of water inside the greenhouse, meaning the water that has been sent to the plants. The methods and technologies mentioned above are not used in every greenhouse operation worldwide so examination of plant watering trends will be of use to any water utility dealing with greenhouse demand.

## THE DATA

The analysis utilizes data reported every 15 minutes for each of the factors in Table 1. The data are for 1.42 ha of greenhouse growing bell peppers and cover the years 2012, 2013 and 2014. The data were collected from the supervisory control and data acquisition system (SCADA) of the greenhouse operator and contain 100,609 data points for each factor. The greenhouse logs numerous factors such as water electrical conductivity, fertilizer levels, and growth medium weight along with climactic and temporal data. It should be noted that the data were collected from a heated greenhouse, meaning the greenhouse is heated to a minimum of 20 °C during the winter months with a few exceptions occurring during the cleanout process at the beginning and end of the year. It is obvious that many of these factors require instrumentation inside the greenhouse operation, which would be difficult if not impossible for the water utility to implement on a large scale. The dataset does however contain two factors that would require internal (inside the greenhouse) monitoring; these factors are greenhouse humidity and greenhouse temperature. The purpose of using these internal factors is to compare their importance to that of the external (measured outside the greenhouse) factors and to determine if the water demand can be reliably forecast using only external factors. The rationale for the use of external greenhouse factors is that a water utility can easily monitor them with the installation of a small-scale weather station at their distribution centre. One issue that may arise is the double layer of polyethylene used in greenhouse construction. This material will diffuse the solar radiation leading to different values being recorded by indoor and outdoor sensors. For this analysis, the numerical value of the solar radiation is of little importance, as this study will examine the effect input factor variation will have on the output of the model (water usage).

Input factor . | Range (Min–Max) . | Unit . |
---|---|---|

Greenhouse Temperature | 2.11–37.11 | Celsius (°C) |

Outdoor Temperature | (−23.30) –34.51 | Celsius (°C) |

Cumulative Solar Radiation | 0–3,096 | Joule per square centimetre (J/cm^{2}) |

Solar Radiation | 0–1,045.65 | Watt per square metre (W/m^{2}) |

Wind Speed | 0–13.69 | Metre per second (m/s) |

Greenhouse Relative Humidity | 31–100 | Percent (%) |

Time | 0–23.75 | Decimal hour |

Month | 1–12.97 | Decimal month (Jan 01 = 1, Dec 31 = 12.97) |

Input factor . | Range (Min–Max) . | Unit . |
---|---|---|

Greenhouse Temperature | 2.11–37.11 | Celsius (°C) |

Outdoor Temperature | (−23.30) –34.51 | Celsius (°C) |

Cumulative Solar Radiation | 0–3,096 | Joule per square centimetre (J/cm^{2}) |

Solar Radiation | 0–1,045.65 | Watt per square metre (W/m^{2}) |

Wind Speed | 0–13.69 | Metre per second (m/s) |

Greenhouse Relative Humidity | 31–100 | Percent (%) |

Time | 0–23.75 | Decimal hour |

Month | 1–12.97 | Decimal month (Jan 01 = 1, Dec 31 = 12.97) |

The entire data collection system is connected to the greenhouse operator's commercial SCADA system and all data are collected in one software package. Greenhouse operations are divided into zones and each zone has its own water feed. Each feed contains metering measuring flow, temperature and various other characteristics mentioned previously, which are fed into the control software for analysis. The water usage data used in this analysis are measured as cumulative water usage in litres every 15 minutes with flow sensors, for which the data reset every day at 700 hours.

### Preliminary data analysis

Table 2 summarizes the correlation coefficients for each factor. It can be observed that a value of 1 exists along the diagonal, which shows that the correlation between the factor and itself is perfectly linear. Table 2 will also provide a basis for validation of the sensitivities that will be produced using the GSA. The results in Table 2 show that the input factor that has the strongest linear relationship with water usage (output) is solar radiation (*r* = 0.753). Weak correlation exists between water usage and greenhouse temperature (*r* = 0.453) and humidity (*r* = −0.404). Table 2 also exposes relationships between input factors. Greenhouse temperature and solar radiation have the strongest linear relationship (*r* = 0.617). Greenhouse temperature and humidity (*r* = 0.547) and solar radiation and humidity (*r* = 0.534) show moderate correlation. The existence of these moderate-to-strong correlation coefficients between input factors show that there is multicollinearity within the inputs. Multicollinearity can cause issues when modeling, particularly in linear regression models (De Veaux & Ungar 1994), as it can increase the variance of coefficient estimates and makes these estimates sensitive to minor changes. How these effects are to be dealt with will be addressed in a later section.

. | Month . | Time . | Humidity . | Solar radiation . | Cumulative solar radiation . | Wind speed . | Greenhouse temperature . | Outdoor temperature . | Water usage . |
---|---|---|---|---|---|---|---|---|---|

Month | 1.000 | ||||||||

Time | −0.002 | 1.000 | |||||||

Humidity | 0.252 | −0.169 | 1.000 | ||||||

Solar Radiation | −0.028 | 0.150 | −0.534 | 1.000 | |||||

Cumulative Solar Radiation | −0.050 | 0.092 | −0.286 | −0.055 | 1.000 | ||||

Wind Speed | −0.111 | 0.104 | −0.061 | 0.178 | −0.246 | 1.000 | |||

Greenhouse Temperature | −0.079 | 0.138 | −0.547 | 0.617 | 0.161 | 0.040 | 1.000 | ||

Outdoor Temperature | 0.339 | 0.085 | −0.238 | 0.312 | 0.370 | −0.182 | 0.521 | 1.000 | |

Water Usage | 0.019 | 0.062 | −0.404 | 0.753 | −0.078 | 0.097 | 0.453 | 0.328 | 1.000 |

. | Month . | Time . | Humidity . | Solar radiation . | Cumulative solar radiation . | Wind speed . | Greenhouse temperature . | Outdoor temperature . | Water usage . |
---|---|---|---|---|---|---|---|---|---|

Month | 1.000 | ||||||||

Time | −0.002 | 1.000 | |||||||

Humidity | 0.252 | −0.169 | 1.000 | ||||||

Solar Radiation | −0.028 | 0.150 | −0.534 | 1.000 | |||||

Cumulative Solar Radiation | −0.050 | 0.092 | −0.286 | −0.055 | 1.000 | ||||

Wind Speed | −0.111 | 0.104 | −0.061 | 0.178 | −0.246 | 1.000 | |||

Greenhouse Temperature | −0.079 | 0.138 | −0.547 | 0.617 | 0.161 | 0.040 | 1.000 | ||

Outdoor Temperature | 0.339 | 0.085 | −0.238 | 0.312 | 0.370 | −0.182 | 0.521 | 1.000 | |

Water Usage | 0.019 | 0.062 | −0.404 | 0.753 | −0.078 | 0.097 | 0.453 | 0.328 | 1.000 |

## THE MODEL

*et al.*1994).

## GLOBAL SENSITIVITY ANALYSIS

The purpose of performing a SA of a model output is to determine which inputs have the greatest effect on the output. When examining methods for performing SA a distinction is drawn between two different methods of SA, local (LSA) and global (GSA). Saltelli *et al.* (1999) have provided some insight into their differences. LSA involves varying input factors one at a time while holding other factors fixed and examining the effects on the output. The LSA is undertaken at a central point in the input space which limits the ability to observe effects of interactions between factors since the area of the input space explored is nil. GSA explores all possible input values along the search path and addresses the issue of input interaction by exploring a finite region of the input space by examining the variance of the output averaged across all inputs. There are several methods for executing a GSA, herein the variance-based method of the eFAST is used. The choice of eFAST was based on information found in previous sections and also from results of studies that analyzed convergence of various screening techniques (Vanrolleghem *et al.* 2015), for which eFAST showed superior performance in terms of computational cost and reliability versus Morris Screening (Morris 1991) and standardized regression coefficients (Saltelli *et al.* 2008a).

### Extended Fourier Amplitude Sensitivity Test

*et al.*(1999) is an extension of the Fourier Amplitude Sensitivity Test (FAST) which was introduced by Cukier

*et al.*(1973). FAST and eFAST are quantitative, variance-based methods for carrying out a GSA, meaning both methods quantify the effect each input factor has on the variance of the output of the model. Equation (1) illustrates the quantification of sensitivity where

*Y*is the output of the model,

*X*is an input factor and E(

*Y*|

*X*) is the expected value of

*Y*based on a fixed value of

*X*, where var

*is taken over all values of*

_{X}*X*. The advantage of using eFAST over FAST is the latter calculates only first order effects, which do not account for interaction between input factors; eFAST allows for the quantification of first order and total indices, which allows for the calculation of interaction effects.

*ω*) which must be assigned to each input factor; an algorithm for choosing frequencies is proposed by Saltelli

*et al.*(2008b). The main criterion for choosing frequencies is that they must be incommensurate, meaning they cannot be linear combinations of each other.

The transformation used in eFAST is given in Equation (2) where *x _{i}* is the transformed value of the

*i*th input factor, ω

*is the frequency chosen for input*

_{i}*i*,

*s*is a set of evenly spaced values chosen between –π and π used for activation, and

*φ*is a random phase-shift used to ensure the sampling curve does not pass through the same points twice and is chosen uniformly between the values 0 and 2π. Equation (2) is used for a normalized dataset for which the values of the inputs fall between 1 and 0. Modification of this equation to encompass any input values will be addressed later in this paper. A major issue when dealing with the frequency domain is the Nyquist frequency and aliasing. These issues are dealt with in eFAST by defining a sample size that is sufficiently large.

#### eFAST first order indices

*S*(Equation (3)) and is calculated the same way as in FAST, by assigning a unique incommensurate frequency (ω

_{i}*) to each input factor, then evaluating the ratio of the variance associated with each frequency (, Equation (4)) to the total variance of the output (, Equation (5)). This is made possible by using Parseval's Theorem. The summation of first order indices (*

_{i}*S*) for a linear model should be equal to 1, showing that all of the variance of the model output is accounted for without including the effects of interaction. If this is the case, the first order indices are sufficient for calculating the importance, or sensitivity of each factor.

_{i}#### eFAST total indices

*S*) must be performed. Saltelli

_{Ti}*et al.*(1999) proposed a method for evaluating these total order effects by calculating the variance of all factors excluding the input factor being examined , where stands for all but the

*i*th factor and is calculated using Equation (6). This procedure is performed by assigning one frequency to the input factor being examined, and assigning another, much lower frequency to all of the other inputs, and an algorithm for assigning these frequencies is proposed by Saltelli

*et al.*(2008a). This now allows for the calculation of the total effect or total indices of each input factor using Equation (7). The procedure for calculating the eFAST sensitivities was carried out using SimLab sensitivity analysis software created by the European Commission Joint Research Council. SimLab implements the procedure outlined in the previous sections and generates an eFAST sample that is unique to each input factor by modifying Equation (4) using the mean and standard deviation of each factor to ensure that the full range of possible values are sampled. SimLab also implements the algorithm for selecting frequencies for each input factor on the basis of a selected sample size, which was chosen to be 1,480 based on the recommendations of Saltelli

*et al.*(1999).

## RESULTS AND DISCUSSION

### MATLAB neural network model

*r*

^{2}) for the model error is the square of the Pearson product-moment correlation coefficient. For this model

*r*

^{2}is equal to 0.712. The average absolute relative error, given by Equation (8), is a statistic used in evaluating the performance of neural networks (Adamowski & Karapataki 2010), where

*O*is the observed or target output,

_{i}*F*is the forecast or modeled output and

_{i}*N*is the number of data points. This statistic cannot be used in this case due to the issue of division by zero. The dataset contains a large portion of zero water usage data points, which will cause Equation (8) to approach infinity.

^{4}) data points, which are represented by a solid black line in Figure 6. It can now be seen that each cycle represents 1 year, and that the peak water usage occurs around the mid-point of the cycle, which would correspond to the mid-point of the year, which matches the seasonal pattern shown in Figure 3. Since this pattern is repeated for each of the 3 years it can be said that there is a relationship between time of year and greenhouse water usage. It should be noted that Figure 6 shows negative values across the output space with the largest occurrences appearing at the beginning of 2014. These negative values can be caused by overfitting of the data, which is caused by spurious relationships being drawn in the model that do not exist and create noise in the output signal. Overfitting can be caused by the inclusion of variables that do not in reality have an effect on the output of the model, such as wind speed. The large number of zero water usage data points might also be the cause of the negative consumption values. To rectify this, a floor of 0 could be set in the model to prevent any negative values from occurring. Also, Figure 6 shows peak water usage as ≈4,800 L where as in Figure 3 the maximum is ≈7,500 L. This shows that there are errors in the model in terms of magnitude, but the seasonal patterns have been captured. Overall, this model contained at least one variable (wind speed) that is known to have little to no effect on water usage. This inclusion will likely have a negative effect on model performance and can be used to test the results of the GSA as it should not appear as a highly influential input.

### Global sensitivity analysis

The results of the eFAST GSA are shown in Table 3. The summation of the first order indices is 0.7176, which means that approximately 28% of the variance of the model output is not accounted for in the first order indices. Since the first order indices (*S _{i}*) do not equal 1, the total order indices (

*S*) need to be calculated in order to determine the sensitivities caused by the interaction between inputs. The summation of

_{Ti}*S*is greater than 1 (1.6169), which also shows that there is cross-correlation between input variables.

_{Ti}. | First order indices (S)_{i}. | Total indices (S)_{Ti}. | ||
---|---|---|---|---|

Input factor . | Value . | Rank . | Value . | Rank . |

Time | 0.4071 | 1 | 0.6385 | 1 |

Solar Radiation | 0.2051 | 2 | 0.3397 | 2 |

Outdoor Temperature | 0.0514 | 3 | 0.1818 | 3 |

Cumulative Solar Radiation | 0.0173 | 4 | 0.1221 | 4 |

Greenhouse Relative Humidity | 0.0141 | 5 | 0.0818 | 6 |

Month | 0.0104 | 6 | 0.0981 | 5 |

Wind | 0.0086 | 7 | 0.0767 | 8 |

Greenhouse Temperature | 0.0036 | 8 | 0.0782 | 7 |

Σ | 0.7176 | 1.6169 |

. | First order indices (S)_{i}. | Total indices (S)_{Ti}. | ||
---|---|---|---|---|

Input factor . | Value . | Rank . | Value . | Rank . |

Time | 0.4071 | 1 | 0.6385 | 1 |

Solar Radiation | 0.2051 | 2 | 0.3397 | 2 |

Outdoor Temperature | 0.0514 | 3 | 0.1818 | 3 |

Cumulative Solar Radiation | 0.0173 | 4 | 0.1221 | 4 |

Greenhouse Relative Humidity | 0.0141 | 5 | 0.0818 | 6 |

Month | 0.0104 | 6 | 0.0981 | 5 |

Wind | 0.0086 | 7 | 0.0767 | 8 |

Greenhouse Temperature | 0.0036 | 8 | 0.0782 | 7 |

Σ | 0.7176 | 1.6169 |

Table 3 contains first (*S _{i}*) and total order (

*S*) sensitivity indices for each factor ranked in order of first order sensitivity. The factors that have the most influence on the variance of the output are time, solar radiation, and outdoor temperature, accounting for 92.4% of the first order indices and 71.7% of the total order indices. As expected, wind ranks low on the first (0.0086) and total order indices (0.0767), but it seems unusual that greenhouse temperature has the lowest first order sensitivity (0.0036), when intuitively it should have a much higher effect. The possible explanation for this low ranking could be overgeneralization during the training process as it ranks 8th in first order and 7th in total order where it has similar sensitivity to that of wind. Another possibility is that the greenhouse temperature has no effect on the watering schemes used by the greenhouse, and that the top ranked factors like time and solar radiation are the main drivers. Multicollinearity might be thought to be an issue as greenhouse temperature and solar radiation have a high correlation coefficient (0.617) except that relative humidity also has a high correlation coefficient with solar radiation (−0.534) and it ranked higher in the sensitivity index. This coupled with the findings of Doorenbos & Pruitt (1977), which showed solar radiation as a major factor in plant water need, strengthens the reliability of the results of the GSA. The input factor month ranks 6th in the first order and 5th in the total order indices. This is counterintuitive when Figure 6 shows such a clear yearly trend. This issue might be resolved by using weeks or days in the ANN model in place of decimal months. The reasoning for this is that the narrow range of months (1–12.97) may cause generalization issues when training the ANN as the small changes in the month may be difficult to correlate with water usage, whereas using a larger range may prevent this and generate a higher sensitivity for the seasonal input factor. Another explanation for the low ranking of the month is that all of the seasonal effects are captured within the highly seasonal solar radiation and that the month is not specifically used in the watering schemes. When looking at the total effect it is noticed that the top four factors are in the same order as in the first order index, and the final four factors switch ranking (5 with 6, and 7 with 8). These bottom four factors account for only 20.7% of the total indices and 5.1% of the first order indices.

_{Ti}## CONCLUSION

The results of the GSA provided insights into the factors driving water usage in greenhouses growing bell peppers in southwestern Ontario. For the case studied it was determined that time (decimal hour), solar radiation, and outdoor temperature are the three main factors responsible for the variance in the model output. These factors account for 66.3% of the model variance and 92.4% of the first order sensitivity (*S _{i}*). The rank of these inputs remains constant through the analysis of the total indices (

*S*), for which they account for 71.7%. Inclusion of cumulative solar radiation in the model would increase the total sensitivity accounted for to 79.2%. Including cumulative solar radiation in the model would not require any extra instrumentation since solar radiation is already being monitored. Although time of year, more specifically month, did not rank high in the GSA, it should also be included in the model considering the seasonal variation observed in the output data. The main result of the GSA was justification of the use of only external input factors in a greenhouse water demand model, which is useful to a water utility since they are accessible and are shown to be influential predictors of greenhouse water demand. These external factors can be measured by the water utility by installing metering devices at a centralized location that collects and sends data back to the utility. Since this method is dealing with short-term forecasting (15 minute) the current readings from the telemetry should provide reliable results for forecasting the next period.

_{Ti}This study focused on water usage inside the greenhouse in order to provide a depiction of the watering habits utilized by the growers. Since there is a disparity in technologies implemented in greenhouse operations throughout the world (water recycling, water storage, and climate control) the ability to forecast in-greenhouse water usage will allow for a broader application of the techniques and results of this study. In order to address the issues related to technology and alternative water supply, this process can be refined by installing metering on the water lines feeding the greenhouse operation in order to capture the amount of water drawn from the water utility. This utility or fresh water demand can then be subtracted from the greenhouse demand modeled using the results of this study, leaving only the recycled water usage and providing a full picture of greenhouse water use.

This method can be applied to various crop types such as tomatoes, cucumbers, melons and flowers, with the goal of developing a comprehensive view of greenhouse water demand. These results can be implemented in a piecewise water demand-forecasting model that will address the unique combination of greenhouse crops at any given location. Using this technique, greenhouse operators can expose underlying patterns in the manual watering schemes utilized by many growers in order to work towards full automation of crop watering.

## ACKNOWLEDGEMENTS

The authors would like to thank Union Water Supply System, Ontario Clean Water Agency, and Ontario Greenhouse Vegetable Growers for their continued support of this research.