Abstract
Accurate models of water withdrawal are crucial in anticipating the potential water use impacts of drought and climate change. Machine learning methods can simulate the complex, nonlinear relationship between water use and potential explanatory factors, but rarely incorporate the hierarchical nature of water use data. This work presents a novel approach for the prediction of water withdrawals across multiple usage sectors using an ensemble of models fit at different hierarchical levels. Models were fit at the facility and sectoral grouping levels, as well as across facility clusters defined by temporal water use characteristics. Using repeated holdout cross-validation and a dataset of over 300,000 observations of monthly water withdrawal across 1,509 facilities, it demonstrates that ensemble predictions led to statistically significant improvements in predictive performance in five of the eight sectors analyzed. The use of ensemble modeling resulted in lower predictive errors compared to facility models in 65% of facilities analyzed. The relative improvement gained by ensemble modeling was greatest for facilities with fewer observations and higher variance, indicating its potential value in predicting withdrawal for facilities with relatively short data records or data quality issues.
HIGHLIGHTS
Hierarchical ensemble models reduce predictive errors for a majority of facilities analyzed.
Cluster analysis is used to build models for groups of facilities with similar temporal water use behavior.
Ensemble models are most beneficial in facilities with high variance and fewer observations of withdrawal.
INTRODUCTION
Sustainable water resources management requires accurate models, predictions, and projections of water demand. Short-term water use forecasting can be crucial in drought management and utility operations. Longer-term projections of water use can help identify potential supply risks under conditions of population growth (Vörösmarty et al. 2000) and climate change (Brown et al. 2013; Fiorillo et al. 2021). These models also form an important component of integrated water systems models and decision support systems that simulate hydrologic water supply, infrastructure, demand, and reuse (Willuweit & O'Sullivan 2013; Sharvelle et al. 2017). Accurate models and projections of water demand are especially valuable in locations where water management institutions have relatively limited control on water use. For instance, this is the case in many areas of the Eastern U.S. where large portions of withdrawal are not subject to permitting requirements (Virginia Department of Environmental Quality 2022). However, the factors that govern water demand are highly complex and involve interactions between climatic and environmental conditions, socio-economic factors, pricing, and institutional governance structures. Given this complexity, it is unsurprising that many water use forecasts turn out to be inaccurate in hindsight (Pacific Institute 2013; Perrone et al. 2015).
Recognizing this need, numerous studies have used statistical regression models to identify the environmental, socio-economic, and institutional factors associated with greater volumes of water use. For instance, multiple studies have demonstrated the relationship between climatic conditions, land use, and water use at the municipal scale (Balling et al. 2008; House-Peters et al. 2010; Mini et al. 2014; Lee et al. 2015; Toth et al. 2018). Several studies have leveraged water use data to characterize drivers of broad-scale geographic variability in per-capita municipal water use efficiency and trends (Sankarasubramanian et al. 2017; Worland et al. 2018; Chinnasamy et al. 2021). Because the factors that influence water use tend to be complex and nonlinear, there is increasing use of machine learning to model and predict water use. Machine learning models have been widely applied in the prediction of physical hydrologic systems (e.g., Akrami et al. 2014; Guimarães Santos & Silva 2014; Alizadeh et al. 2017a, 2017b). Methods including random forests, boosted regression trees, and artificial neural networks have been leveraged to identify climatic and governance factors that influence municipal and irrigation demand (Toth et al. 2018; Bolorinos et al. 2020; Fiorillo et al. 2021; Lamb et al. 2021). Short-term urban demand forecasting has also benefited from methods such as long short-term memory networks (Hu et al. 2019; Mu et al. 2020; Fu et al. 2022; Zanfei et al. 2022), neural networks (Huang et al. 2021; Huang et al. 2022; Liu et al. 2023), and hybrid approaches (Guo et al. 2022). When compared with linear regression approaches, machine learning models are often able to achieve lower predictive errors than standard approaches (Toth et al. 2018; Bolorinos et al. 2020; Wongso et al. 2020), pointing toward their potential value in water use modeling.
Across this body of research, one factor that is rarely explicitly considered is the impact of data structure on model predictions and inferences. Water use data are inherently hierarchical, with multiple options for grouping and categorizing observations. For instance, water use datasets often include observations through time for multiple water users. These water users in turn can be grouped or classified based on geographic location, water use sector, or institutional governance structures. Depending on their structure, regression approaches may be capturing different drivers of variability that lead to different management implications. For example, models of cross-sectional variability (where there is a single record, such as a long-term average withdrawal, for each water user) and locations can assist in targeting conservation measures (Deoreo & Mayer 2012; Suero et al. 2012). Models of temporal variability (where multiple observations through time are available) can lead to more accurate predictions of water use under different policy and drought conditions (Hester & Larson 2016).
These different approaches can provide greater insights into the nature between water use and various factors that influence it. For example, cross-sectional analyses have found a positive correlation between income and water use (Balling et al. 2008; House-Peters et al. 2010; Sankarasubramanian et al. 2017) that is not present in longitudinal studies (Shortridge & DiCarlo 2020). This suggests that water use is greater in locations or households with higher incomes, but not necessarily during periods of greater economic growth. Recognizing this, longitudinal regression has become a standard statistical approach in modeling water use (Polebitski & Palmer 2010; House-Peters & Chang 2011; Baerenklau et al. 2014; Shortridge & DiCarlo 2020), where model parameters can vary across groups within population-level constraints. This provides a middle ground between pooled regression models, where all observations are grouped together and described via a single set of model parameters, and unpooled regression where a unique model is fit for each group in the data (Gelman & Hill 2007).
Recent advances in machine learning have begun to develop new approaches that account for hierarchical data structures. For example, the mixed effects random forest (MERF) approach models individual predictions through time as an additive function of a random forest (RF) model of population-level mean behavior processes and individual-level random effects (Hajjem et al. 2014; Capitaine et al. 2021). Several studies have proposed methods that integrate regression and classification trees within a mixed modeling framework to address subgroups and hierarchies in clinical trial data (Fokkema et al. 2018; Seibold et al. 2019; Fokkema et al. 2021). Other methods leverage ensemble learning, where predictions from multiple models are aggregated into a single prediction (Eygi Erdogan et al. 2021). Ensemble learning, in which multiple models are independently fit to a dataset and averaged into a single prediction, has been found to generally reduce model variance which results in more accurate predictions on new data (Kuncheva 2014; James et al. 2021). This aspect of ensemble modeling has the potential to improve accuracy in water use prediction, particularly due to previously observed issues with data quality and errors that are present in many water use datasets (Zhang & Balay 2014; Chini & Stillwell 2017; McCarthy et al. 2022). Beyond the general approach of leveraging machine learning models within a hierarchical data structure, many of these previous studies also present examples of context-specific algorithm development as they were specifically designed to be compatible with clinical trial data. The development of hierarchical machine learning modeling approaches tailored to water use data has the potential to both increase predictive accuracy relative to current methods and provide new inferences about water use behavior and influences across different hierarchical levels.
The objective of this research was to develop and assess a novel algorithm for prediction of water withdrawals across multiple usage sectors using an ensemble of predictive regression models fit at different hierarchical levels. This work leverages 29 years of monthly withdrawal data from approximately 2,500 water using facilities across Virginia. Models were fit at different grouping levels, ranging from single-facility models to sector-wide models using multiple climatic and socio-economic predictor variables. A cluster analysis was conducted to identify clusters of facilities with similar temporal patterns of water withdrawal and fit cluster-level models. Grouping level models were then combined into a weighted ensemble prediction using quadratic programming. The predictive accuracy of all models was evaluated through a repeated holdout cross-validation approach, and compared to a null model where facility-level withdrawal was based on long-term averages. Finally, the facility-level characteristics associated with improved ensemble predictions were identified to better understand the conditions in which ensemble modeling provides the most value.
METHODS
Data sources and processing
This analysis used long-term records water withdrawal provided by the Virginia Department of Environmental Quality (VDEQ). All water users in the U.S. state of Virginia who withdraw more than 37,854 l (10,000 U.S. gallons) per day are required to report monthly water withdrawal to VDEQ. This dataset includes 313,321 nonzero monthly withdrawal records between 1990 and 2018 from 2,579 water using facilities across eight water use sectors (Table 1). Note that agriculture refers to livestock and agricultural processing operations, rather than crop irrigation. Additional details on withdrawal data are presented in Shortridge & DiCarlo (2020). However, many of these facilities only have short-term records of water withdrawal or a majority of months with zero reported withdrawals. To ensure that all facilities had sufficient data available for model training, weighting, and validation, only facilities with at least 36 nonzero withdrawal observations were retained for inclusion in the analysis. This number was selected because at least 2 years of data are needed to calculate withdrawal anomalies, and at least 2 additional years are needed to split the data into testing and training datasets.
. | All data . | Retained for analysis . | |||
---|---|---|---|---|---|
Sector . | Facilities (n) . | Observations (n) . | Facilities (n) . | Observations (n) . | Total water use (MG/month) . |
Agriculture (Ag) | 155 | 7,032 | 36 | 5,914 | 129 |
Aquaculture (Aq) | 14 | 2,978 | 12 | 2,913 | 866 |
Commercial (Com) | 463 | 55,573 | 292 | 52,721 | 740 |
Industrial (Ind) | 211 | 37,464 | 154 | 36,968 | 16,000 |
Irrigation (Irr) | 727 | 23,036 | 187 | 18,157 | 1,310 |
Mining (Min) | 91 | 14,394 | 70 | 14,260 | 1,320 |
Municipal (Mun) | 894 | 166,747 | 735 | 164,718 | 24,900 |
Thermoelectric (Thm) | 24 | 6,097 | 23 | 6,073 | 201,000 |
Total | 2,579 | 313,321 | 1,509 | 301,724 | 246,000 |
. | All data . | Retained for analysis . | |||
---|---|---|---|---|---|
Sector . | Facilities (n) . | Observations (n) . | Facilities (n) . | Observations (n) . | Total water use (MG/month) . |
Agriculture (Ag) | 155 | 7,032 | 36 | 5,914 | 129 |
Aquaculture (Aq) | 14 | 2,978 | 12 | 2,913 | 866 |
Commercial (Com) | 463 | 55,573 | 292 | 52,721 | 740 |
Industrial (Ind) | 211 | 37,464 | 154 | 36,968 | 16,000 |
Irrigation (Irr) | 727 | 23,036 | 187 | 18,157 | 1,310 |
Mining (Min) | 91 | 14,394 | 70 | 14,260 | 1,320 |
Municipal (Mun) | 894 | 166,747 | 735 | 164,718 | 24,900 |
Thermoelectric (Thm) | 24 | 6,097 | 23 | 6,073 | 201,000 |
Total | 2,579 | 313,321 | 1,509 | 301,724 | 246,000 |
A total of 13 socio-economic variables were included as potential predictors of water withdrawal, representing a variety of population, economic, and land-use characteristics that have been shown to have relationships with water withdrawals in previous research (Sankarasubramanian et al. 2017; Worland et al. 2018; Shortridge & DiCarlo 2020). Additionally, three climatic predictor variables were included to account for widespread evidence of the relationship between weather and water withdrawals (House-Peters & Chang 2011; Brown et al. 2013; Lee et al. 2015). Additional details on predictor variable data, sources, processing, and formatting are provided in Supplementary material.
Modeling approach
Model name . | Description and rationale . |
---|---|
Facility-grouping level | Separate model fit to each facility in the dataset. Captures facility-level water use behavior, but not generalizable to other facilities. |
Sector-grouping level | Model fit using data from all facilities within each water use sector. Captures general water use behavior across multiple facilities at the expense of accuracy at individual facility level. |
Large cluster grouping level | Model fit using data from all facilities within each large cluster (k = 3). Clusters are defined based on temporal water use patterns, and thus contain facilities with similar withdrawal patterns even if they are different water use sectors. |
Small cluster grouping level | Model fit using data from all facilities within each small cluster (k = 8). Same as large clusters, but with facilities partitioned into smaller groups with less in-group variability in temporal withdrawal patterns. |
Ensemble | Withdrawal predictions are a weighted average of the four grouping level models above. |
Null | Withdrawal predictions are equal to the long-term average withdrawal in each month for each facility. Included as a baseline for comparison. |
Model name . | Description and rationale . |
---|---|
Facility-grouping level | Separate model fit to each facility in the dataset. Captures facility-level water use behavior, but not generalizable to other facilities. |
Sector-grouping level | Model fit using data from all facilities within each water use sector. Captures general water use behavior across multiple facilities at the expense of accuracy at individual facility level. |
Large cluster grouping level | Model fit using data from all facilities within each large cluster (k = 3). Clusters are defined based on temporal water use patterns, and thus contain facilities with similar withdrawal patterns even if they are different water use sectors. |
Small cluster grouping level | Model fit using data from all facilities within each small cluster (k = 8). Same as large clusters, but with facilities partitioned into smaller groups with less in-group variability in temporal withdrawal patterns. |
Ensemble | Withdrawal predictions are a weighted average of the four grouping level models above. |
Null | Withdrawal predictions are equal to the long-term average withdrawal in each month for each facility. Included as a baseline for comparison. |
Facility grouping and clustering
The water withdrawal data used in this study can be grouped at multiple hierarchical levels. Each water using facility has multiple observations of water use through time. Facilities are often categorized by water use sector, under the assumption that two facilities in the same water use sector will exhibit similar water use behavior. For this study, predictive models were fit at four different levels of facility grouping: facility level, sectoral level, small cluster level, and large cluster level (Table 2). At the finest level, facility-level grouping entailed fitting a distinct model for each facility in the dataset. This allows for the model to be highly tailored to the water use characteristics of that facility but less generalizable to new data, particularly in instances where a facility does not have many observations to draw from (Gelman & Hill 2007). The next level of grouping was the sectoral level, where a single model was fit to all facilities within that sector. This provides a representation of generalized water use patterns in a given sector, such as the higher irrigation withdrawals that are observed during periods of high temperature and low rainfall (Shortridge & DiCarlo 2020). This provides a model of how sectoral water withdrawals relate in general with predictor variables but will likely result in less accurate predictions for a single facility.
One limitation with sectoral grouping is that facilities in a single sector might actually exhibit very different patterns of water use (Attaallah 2018; McCarthy et al. 2022). Thus, the small and large cluster grouping levels were determined based on the results of a hierarchical cluster analysis (Everitt et al. 2011) that identified coherent facility groupings based on five water use characteristics calculated for each facility:
Mean withdrawal volume (MG/month), log transformed.
Coefficient of variation: standard deviation of withdrawal divided by mean.
Seasonality: the lowest 3-month mean withdrawal divided by the highest 3-month mean withdrawal, where lower values indicate greater seasonality in withdrawal volume.
Autocorrelation: maximum degree of autocorrelation observed at any time lag.
Number of observations: the number of nonzero withdrawal observations available.
To determine the optimal number of clusters, facilities were divided into k {1, 15} hierarchical clusters based on Euclidian distance. Gap statistic estimates for each value of k exhibited non-monotonic behavior indicating well defined clusters at k = 3 and k = 8, suggesting that there were three large clusters of facilities that could be further divided into eight smaller subclusters (Tibshirani et al. 2001). An analysis of correspondence between cluster assignment and sector indicated that clusters generally did not correspond to a single sector. This suggests that there are certain patterns of water use behavior that cannot be explained simply by sectoral classifications, consistent with previous research (Attaallah 2018; McCarthy et al. 2022). Thus, models were also fit at the large (k = 3) and small (k=8) cluster levels, where data from all facilities within a single cluster were combined into a single model. Additional details and results of the cluster analysis are included in Supplementary material.
Classical regression and machine learning models
The four grouping level models, as well as the ensemble model, were then used to generate withdrawal predictions for the testing dataset. Thus, the testing dataset was not used in either the initial model fitting or the ensemble weighting.
Model evaluation
RESULTS
Model selection and performance
In each iteration of the holdout cross-validation, three model formulations (GLM, GAM, or RF) were compared for each of the four grouping levels (sector, large cluster, small cluster, and facility) based on out-of-sample RMAE in the weighting dataset. The formulation with the lowest RMAE was selected for the grouping level model and for incorporation into the ensemble model in that holdout iteration. The frequency with which each formulation was selected (i.e., minimized out-of-sample RMAE) at each grouping level is presented in Table 3. For the cluster-level grouping and for most sector-level groupings, the most frequently selected models were the GLM formulation. More complex formulations (GAM and RF) were more often selected for the facility-level grouping models and for the agriculture, mining, and municipal. The relatively strong performance of the simpler linear models could be due to a potential for overfitting with the GAM and RF formulations, where their flexibility results in a lower bias relative to model training data but greater variance and error when fit to new datasets (Hastie et al. 2009).
. | GLM . | GAM . | RF . |
---|---|---|---|
Sector-level groupings | |||
Agriculture | 47.5% | 19.2% | 33.3% |
Aquaculture | 62.6% | 9.1% | 28.3% |
Commercial | 53.5% | 41.4% | 5.1% |
Industrial | 52.5% | 41.4% | 6.1% |
Irrigation | 61.6% | 33.3% | 5.1% |
Mining | 31.3% | 45.5% | 23.2% |
Municipal | 24.2% | 35.4% | 40.4% |
Thermoelectric | 51.5% | 18.2% | 30.3% |
Cluster-level groupings | |||
Large cluster | 55.2% | 22.2% | 22.6% |
Small cluster | 46.5% | 25.4% | 28.2% |
Facility-level groupings | |||
Facility | 28.6% | 16.7% | 54.4% |
. | GLM . | GAM . | RF . |
---|---|---|---|
Sector-level groupings | |||
Agriculture | 47.5% | 19.2% | 33.3% |
Aquaculture | 62.6% | 9.1% | 28.3% |
Commercial | 53.5% | 41.4% | 5.1% |
Industrial | 52.5% | 41.4% | 6.1% |
Irrigation | 61.6% | 33.3% | 5.1% |
Mining | 31.3% | 45.5% | 23.2% |
Municipal | 24.2% | 35.4% | 40.4% |
Thermoelectric | 51.5% | 18.2% | 30.3% |
Cluster-level groupings | |||
Large cluster | 55.2% | 22.2% | 22.6% |
Small cluster | 46.5% | 25.4% | 28.2% |
Facility-level groupings | |||
Facility | 28.6% | 16.7% | 54.4% |
A summary of mean RMAE for each model grouping level is presented in Table 4. The ensemble model had the lowest mean RMAE in all sectors except agriculture and industrial, where the facility-grouping level models had the lowest mean RMAE. Paired, two-sided Wilcoxon rank sum tests were used to compare the distribution of RMAE in the ensemble and facility-grouping models for each sector. In the aquaculture, commercial, irrigation, mining, and municipal sectors, the use of the ensemble formulation resulted in statistically significant reductions in RMAE compared to the facility-level models. The only sector in which the ensemble model resulted in a statistically significant increase in error was agriculture.
Sector . | Null . | Sector . | Large cluster . | Small cluster . | Facility . | Ensemble . | p-value . |
---|---|---|---|---|---|---|---|
Agriculture | 1.623 | 1.471 | 1.595 | 1.513 | 1.364 | 1.377 | 7.54 × 10−5 |
Aquaculture | 0.333 | 0.330 | 0.340 | 0.341 | 0.316 | 0.313 | 5.00 × 10−3 |
Commercial | 0.667 | 0.658 | 0.658 | 0.659 | 0.651 | 0.638 | <10−5 |
Industrial | 0.560 | 0.576 | 0.571 | 0.572 | 0.438 | 0.460 | 5.37 × 10−1 |
Irrigation | 0.879 | 0.862 | 0.870 | 0.869 | 0.851 | 0.846 | <10−5 |
Mining | 0.822 | 0.835 | 0.831 | 0.831 | 0.774 | 0.738 | <10−5 |
Municipal | 0.601 | 0.572 | 0.578 | 0.572 | 0.538 | 0.508 | <10−5 |
Thermoelectric | 0.533 | 0.526 | 0.528 | 0.530 | 0.472 | 0.471 | 9.31 × 10−1 |
Sector . | Null . | Sector . | Large cluster . | Small cluster . | Facility . | Ensemble . | p-value . |
---|---|---|---|---|---|---|---|
Agriculture | 1.623 | 1.471 | 1.595 | 1.513 | 1.364 | 1.377 | 7.54 × 10−5 |
Aquaculture | 0.333 | 0.330 | 0.340 | 0.341 | 0.316 | 0.313 | 5.00 × 10−3 |
Commercial | 0.667 | 0.658 | 0.658 | 0.659 | 0.651 | 0.638 | <10−5 |
Industrial | 0.560 | 0.576 | 0.571 | 0.572 | 0.438 | 0.460 | 5.37 × 10−1 |
Irrigation | 0.879 | 0.862 | 0.870 | 0.869 | 0.851 | 0.846 | <10−5 |
Mining | 0.822 | 0.835 | 0.831 | 0.831 | 0.774 | 0.738 | <10−5 |
Municipal | 0.601 | 0.572 | 0.578 | 0.572 | 0.538 | 0.508 | <10−5 |
Thermoelectric | 0.533 | 0.526 | 0.528 | 0.530 | 0.472 | 0.471 | 9.31 × 10−1 |
Bold values indicate the grouping level with the lowest mean RMAE for that sector. p-values refer to the significance level of a two-sided paired Wilcoxan Rank Sum test between the facility grouping and ensemble models.
Ensemble model structure
To better understand the facility characteristics associated with improved ensemble model performance relative to facility-grouping models, the predictive improvement from use of a model ensemble for each facility was regressed against facility water use characteristics. The results of this regression are presented in Table 5. The ensemble model tended to provide the most improvement relative to the facility-grouping model in facilities with a lower number of observations, higher coefficient of variation, and less autocorrelation. These are all conditions that create a potential for facility model training datasets that are less representative of withdrawal behavior as a whole due to smaller sample size and greater data variance through time. Because this can result in models that are overfit to training data and less generalizable to unseen data, the incorporation of other, more general model formulations into an ensemble can provide particularly noticeable reduction in out-of-sample predictive errors in this context.
. | Estimate . | Std. Error . | p-value . |
---|---|---|---|
Intercept | 1.40 × 10−02 | 8.00 × 10−03 | 7.80 × 10−02 |
log(Water.Use.MGM) | −1.00 × 10−03 | 1.00 × 10−03 | 1.27 × 10−01 |
n.obs.nonzero | −2.85 × 10−05 | 1.13 × 10−05 | 1.20 × 10−02 |
Water.Use.COV | 1.60 × 10−02 | 1.00 × 10−03 | < 0.001 |
Water.Use.ACF.strength | −2.20 × 10−02 | 4.00 × 10−03 | < 0.001 |
Water.Use.Seasonality | 7.00E × 10−03 | 5.00 × 10−03 | 1.86 × 10−01 |
UseType (aquaculture) | 3.00 × 10−03 | 1.30 × 10−02 | 8.39 × 10−01 |
UseType (commercial) | −1.00 × 10−03 | 7.00 × 10−03 | 8.32 × 10−01 |
UseType (industrial) | −5.00 × 10−03 | 8.00 × 10−03 | 4.66 × 10−01 |
UseType (irrigation) | −1.60 × 10−02 | 7.00 × 10−03 | 2.60 × 10−02 |
UseType (mining) | 1.50 × 10−02 | 8.00 × 10−03 | 7.80 × 10−02 |
UseType (municipal) | 4.00 × 10−03 | 7.00 × 10−03 | 6.03 × 10−01 |
UseType (thermoelectric) | −5.00 × 10−03 | 1.10 × 10−02 | 6.31 × 10−01 |
. | Estimate . | Std. Error . | p-value . |
---|---|---|---|
Intercept | 1.40 × 10−02 | 8.00 × 10−03 | 7.80 × 10−02 |
log(Water.Use.MGM) | −1.00 × 10−03 | 1.00 × 10−03 | 1.27 × 10−01 |
n.obs.nonzero | −2.85 × 10−05 | 1.13 × 10−05 | 1.20 × 10−02 |
Water.Use.COV | 1.60 × 10−02 | 1.00 × 10−03 | < 0.001 |
Water.Use.ACF.strength | −2.20 × 10−02 | 4.00 × 10−03 | < 0.001 |
Water.Use.Seasonality | 7.00E × 10−03 | 5.00 × 10−03 | 1.86 × 10−01 |
UseType (aquaculture) | 3.00 × 10−03 | 1.30 × 10−02 | 8.39 × 10−01 |
UseType (commercial) | −1.00 × 10−03 | 7.00 × 10−03 | 8.32 × 10−01 |
UseType (industrial) | −5.00 × 10−03 | 8.00 × 10−03 | 4.66 × 10−01 |
UseType (irrigation) | −1.60 × 10−02 | 7.00 × 10−03 | 2.60 × 10−02 |
UseType (mining) | 1.50 × 10−02 | 8.00 × 10−03 | 7.80 × 10−02 |
UseType (municipal) | 4.00 × 10−03 | 7.00 × 10−03 | 6.03 × 10−01 |
UseType (thermoelectric) | −5.00 × 10−03 | 1.10 × 10−02 | 6.31 × 10−01 |
DISCUSSION
The use of ensemble modeling resulted in a statistically significant reduction in mean out-of-sample RMAE in five of the eight sectors assessed (aquaculture, commercial, irrigation, mining, and municipal) relative to models fit using just facility-specific data. The only sector in which the ensemble model resulted in a statistically significant increase in error was agriculture. Across all facilities, the ensemble model resulted in error reduction relative to grouping level models in over 60% of facilities. Thus, while it does not necessarily result in greater predictive accuracy for all facilities in this dataset, it does result in predictive improvements across the population of water users as a whole. In this sense, its value is likely highest in situations where a general modeling approach is needed to simulate longitudinal water withdrawals across a heterogenous body of water users, rather than a model of a single water user, especially for users that have a long and robust record of water withdrawals. Particularly for large water users (especially thermoelectric facilities) that tend to dominate overall withdrawal volumes (McCarthy et al. 2022), facility-specific models will likely prove most accurate. However, our results demonstrate that there are numerous other types of water use where ensemble modeling provides predictive value.
These results are consistent with other studies that have found a frequent benefit of using machine learning approaches to predict water withdrawal when compared to linear models (e.g., Toth et al. 2018; Bolorinos et al. 2020; Wongso et al. 2020). In this study's model selection process, non-linear GAM and RF models were selected most often for agriculture, mining, and municipal sector-level models, as well as for the majority of facility-level models (Table 3). In these instances, the added flexibility of non-linear approaches seems to provide the most value. However, the other sector-level models and the large cluster models most often used GLMs, indicating that in these instances a simple linear approach is suitable. This suggests a potential for overfitting when using machine learning approaches (Hastie et al. 2009). Because ensemble learning has been found to generally reduce model variance and overfitting (Kuncheva 2014; James et al. 2021), this possibly explains some of the benefit demonstrated by ensemble learning in this study.
These results have several practical implications for water supply management. One notable finding is the relatively low performance of sector-level models. Comparing mean RMAE across all holdout iterations, sector-level models were never the lowest error formulation and in the industrial and mining sectors actually resulted in greater error than a null model based only on long-term average withdrawal alone (Table 4). This suggests that the relationships between socio-economic and climatic conditions with water withdrawal vary too significantly across facilities in those sectors to be beneficial in making facility-level predictions. The better performance of cluster-level models in these sectors, combined with the higher weights attributed to cluster-level models within ensembles (Figure 4), suggest the importance of classifying water users based on usage behavior rather than sectoral classifications alone. It is also important to note that a large body of research on water use focuses on large municipal utilities with long-term records. While the importance and influence of these water users is clear, these results demonstrate that some modeling approaches that are effective in these contexts will be less so among smaller or more newly established water users with shorter and more variable withdrawal records. Alternative methods such as the ensemble modeling approach presented here could be particularly valuable in locations where water use is fairly decentralized and for estimating the impact of newly established water withdrawals.
While these results demonstrate a value in the use of ensemble modeling in predicting withdrawal across a broad, heterogeneous body of water users, there are several limitations that could be addressed through further research. This work is based on data from a single state, and additional research would be needed to demonstrate if the findings here are generalizable to other locations with different climates and regulatory contexts for water use. Additionally, across the 28 years of data included in our dataset, it is possible that certain institutional changes or conditions occurred that would influence water use. For example, more widespread use of water-efficient appliances has led to a documented decrease in household water use (Deoreo & Mayer 2012) and severe drought combined with public educational campaigns lead to reduced outdoor water use (Bolorinos et al. 2020). While these unaccounted for variables would not impact the overall conclusions about the potential value of ensemble modeling, they could potentially improve model performance across many formulations.
Several areas of additional research could be envisioned to build on the results presented here. For instance, this research grouped facilities based on temporal water usage characteristics and sectoral classifications. However, water withdrawals could also depend on regulatory governance or water source, as well as geographic location or climate regions. Exploration of alternative grouping strategies could be a valuable area of additional research. Different methods for model evaluation could also advance the results presented here. For instance, this work used a global error metric (RMAE) across all observations, but event detection metrics (Liemohn et al. 2021) that quantify the degree to which models capture specific conditions of interest, such as periods of high withdrawal, may be of value, as could deviation-based metrics (Barati et al. 2014) Quantifying model outcomes across multiple performance criteria, as in Adnan et al. (2023), could also provide a more comprehensive view of model performance. Similarly, this work employs a random cross-validation approach to identify generalizable relationships between withdrawal and predictor variables that are valid across the full period of data. However, practical forecasting needs might be better served by models with sequential training, weighting, and testing periods. While this research compared the ensemble machine learning approach to modeling forms that are commonly applied to withdrawal data, additional research that compares the approach to other hierarchical machine learning methods could lead to further improvements. Finally, the occurrence of unaccounted for times of uncertainty, such as pandemics or natural disasters, will likely lead to conditions and water use behavior that exceeds the range of conditions included in this study but present crucial times for ensuring reliable water supply. Additional research addressing this question will be critical in improved water management during times of extreme stress.
CONCLUSION
Water withdrawal data are inherently hierarchical, often composed of multiple observations for each water user through time, and multiple ways of grouping and categorizing those users. Machine learning models are becoming more widely used in water use modeling, but rarely account for the hierarchical nature of water withdrawal data or make use of this structure to improve predictions. This work presents a novel approach for prediction of longitudinal water withdrawals across multiple usage sectors using an ensemble of machine learning models fit at different hierarchical grouping levels. These grouping levels included facility and sectoral-level models, as well as facility clusters determined based on temporal water use characteristics. Grouping level models were also combined into an ensemble model that predicted withdrawal as a weighted average of predictions from each individual grouping level model. For all model structures, relative error depended strongly on the sector assessed, with the highest predictive errors in agricultural, irrigation, and municipal sectors. The ensemble models achieved statistically significant reductions in error compared to facility-level models in the majority of water use sectors assessed. The use of an ensemble model resulted in more accurate predictions relative to the facility model in 63% of facilities, and ensemble improvements were greatest for facilities with relatively few records and high variance in withdrawal. This points to their potential value in predicting withdrawal for facilities with relatively short records of withdrawal or data quality issues that could lead to highly variable withdrawal estimates. Inspection of the weights used in the ensemble model indicated that small cluster weights were often higher than sector-level weights, pointing toward the limitations of sectoral-level models and the potential benefits of considering the behavior of facilities with similar water use patterns, even if they are in a different sector. The ensemble modeling method presented here can thus provide a general approach for prediction of water withdrawals that can be applied across heterogenous, multi-sector groupings of water users.
ACKNOWLEDGEMENTS
I would like to gratefully acknowledge the Virginia Department of Environmental Quality for providing the data used in this project.
DATA AVAILABILITY STATEMENT
All code and data used in this analysis are available at: https://osf.io/5pqvx/?view_only=a9b67a7867eb411585897076bf36a433.
CONFLICT OF INTEREST
The authors declare there is no conflict.