Municipal water withdrawal (MWW) information is of great significance for water supply planning, including water supply pipeline networks planning, optimization and management. Currently most MWW data are reported as spatially aggregated over large-area survey regions or even lack of data, which is unable to meet the growing demand for spatially detailed data in many applications. In this paper, six different models are constructed and evaluated in estimating global MWW using aggregated MWW data and gridded raster covariates. Among the models, the artificial neural network-based indirect model (NNM) shows the best accuracy with higher R2 and lower NMAE and NRMSE in different spatial scales. The estimates achieved from the NNM model are consistent with census and survey data, and outperforms the existing global gridded MWW dataset. At last, the NNM model is applied to mapping global gridded MWW for the year 2015 at 0.1 × 0.1° resolution. The proposed method can be applied to a wider aggregated output learning problem and the high-resolution global gridded MWW data can be used in hydrological models and water resources management.

  • Different models are constructed and evaluated in estimating gridded municipal water withdrawal.

  • Global fine-resolution municipal water withdrawal data are generated using aggregated data and an artificial neural network model.

  • Gridded indirect artificial neural network model through per capita municipal water withdrawal achieved better performance than other models.

  • Uncertainty analysis indicates the robustness of a gridded indirect artificial neural network model at regional scale.

  • The artificial neural network-based method can be applied to a broader aggregated output learning problem.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Municipal water use is defined as the water used for domestic, household purposes or public services (Ritchie & Roser 2017). Municipal water withdrawal (MWW), meaning the amount of water withdrawn from surface or ground water sources for municipal use, currently accounts for 12% of the total water withdrawal globally, while this percentage varies dramatically across regions and countries, ranging from 0.45% in Somalia and 100% in Monaco (FAO 2021).

It is expected that domestic water use will increase significantly over the first half of this century (Wada et al. 2016b; Boretti & Rosa 2019). These trends have brought heavey burdens to water resources management, especially in highly urbanized areas (Wang et al. 2021c). Therefore, it is very important to estimate MWW with high spatial resolution to assess the pattern and distribution. Moreover, understanding the spatial patterns of MWW is an important step to improve water resource utilization efficiency, reduce water shortage, and development of mitigation adaptation strategy of water resources (Vandecasteele et al. 2014; Bierkens et al. 2015). Furthermore, the development of high-resolution global hydrological models requires reliable estimation of water withdrawals for validation.

However, there is a lack of high-resolution MWW data. Currently, the spatial granularity of MWW data is too coarse, mostly reported at the region (e.g., country, state, or basin) level. An important reason is that it is not easy to obtain high quality MWW data in many regions, especially for developing countries, because it is costly to measure and collect data of the water use of a particularly large number of domestic water users while the domestic water and industrial water are mixed in the water supply network. In addition, the comparability of MWW data from different countries and regions is always in doubt because of the inconsistency of statistical caliber and statistical method (Ellingson et al. 2019).

Consequently, there is a huge gap between the importance and availability of high-resolution MWW data. Fine-resolution estimation and mapping of MWW is necessary and valuable. In general, the process of redistribute coarse region level (e.g., administrative unit) census or survey data to a finer scale (e.g., pixel level) is called spatial disaggregation (spatialization or downscaling).

Currently, multiple methods have been used to produce gridded MWW. Most studies use population or population density as a proxy to disaggregate MWW. For instance, Guo et al. (2013) established the relationship model between per capita domestic water and per capita GDP, and in combination with the population distribution map, estimated the 4-kilometer grid distribution of domestic water in parts of Northwest China. Huang et al. (2018) used the population density maps as the proxy for disaggregating domestic water withdrawal from administrative division level to grid level (0.5° resolution). Europe public water withdrawal at 5 × 5 km resolution for 2006 was estimated by Vandecasteele et al. (2014) using population and tourism density as the proxy. Moreover, gridded MWW estimation and projection modules have been incorporated in many global hydrological models, such as the PCR-GLOBWB (Wada & Bierkens 2014), WaterGAP (Flörke et al. 2013), and H08 (Hanasaki et al. 2013). Wada et al. (2016a) estimated domestic water use by multiplying the number of persons in a grid cell with the country-specific per capita domestic water extraction on a global scale with a high-resolution of 0.1°. The methods applied in hydrological models are mostly driven by population numbers, GDP per capita, and per capita water use intensities (Wada et al. 2016b). A high-resolution water demand method for households and industries was constructed with a spatial resolution of 30 arc-seconds by Lips (2020) following the estimation approach of Wada et al. (2016b).

In summary, the current studies or models have four main limitations. Firstly, the resolution is still too coarse, mostly in 0.5° spatial resolution (about 50-km in the equator) (e.g., Flörke et al. 2013; Hanasaki et al. 2013; Wada & Bierkens 2014; Huang et al. 2018). Secondly, some studies created high-resolution MWW data but most at local or regional scale (e.g., Guo et al. 2013; Vandecasteele et al. 2014; Lips 2020). Thirdly, most of the studies establish the relationship between MWW and the most influencing factors in simple empirical ways relying on some empirical parameters or formulas. Lastly, existing studies mostly convert their spatial aggregation problem into a standard supervised regression problem by first aggregating the covariates within each census level.

For standard supervised regression, each training example has an individual output in training set. In the disaggregation problem, the training set consists of subsets of examples and the individual real target of each training example is unknown. Alternatively, for each subset of examples, an aggregated target is known. This framework is called aggregated output learning problem (Musicant et al. 2007). Some approaches are proposed to handle this kind of problem, such as aggregated linear model (Hernández-González et al. 2019), linear exponential model (Derval et al. 2020), aggregated Gaussian processes (Zhu et al. 2022a), and machine learning models (Musicant et al. 2007; Zhang et al. 2021). As far as we know, there is no such kind of approaches applied in MWW disaggregation.

There are many factors that can influence MWW, such as climate and environmental conditions, socioeconomic factors, technology development, and policy interventions (Wang et al. 2021c). With the development of satellite remote sensing technology, massive geospatial data, which can be used to derive covariates in MWW estimation, are available at high spatial-temporal resolutions.

This study aims to develop a framework for estimating MWW at global scale with a high resolution of 0.1°. Different models, including traditional regression model and machine learning models, are constructed and evaluated to estimate global gridded MWW using aggregated MWW data and gridded covariates. There are three new aspects in this study. Firstly, aggregated output learning models are constructed and verified. Secondly, to methodologically compare different models in estimating gridded MWW with high spatial resolution at global scale. Thirdly, to provide a more accurate gridded MWW dataset which is conducive to the global water resources management, especially to the water resources management in developing countries.

Data

MWW data

In this study, the country level and sub-national level for MWW data of the year 2015 or around 2015 were collected from various datasets (Table 1): (I) Country level MWW data from FAO AUASTAT database (FAO 2021); (II) United States state level and county level data from USGS (Dieter et al. 2018). Since many counties are very small, we aggregated counties into 279 zones based on the principle of spatial proximity; (III) China provincial level, prefecture level and basin level data are collected from the Water Resources Bulletins issued by departments at different levels; (IV) Brazil municipality level data are acquired from Brazil National Water Agency; (V) Russian district MWW data are estimated from Shiklomanov et al. (2011).

Table 1

Summary information of reported or surveyed MWW

RegionSpatial statistic unitNumber of data recordsData source
World Country 180 countries FAO AQUASTAT (FAO 2021
USA State and county 53 states, 3,223 counties, and 279 aggregated zones USGS (Dieter et al. 2018
China Province, prefecture and basin 31 provincial level administrative divisions, 355 prefectural level divisions, and 63 basins China national, provincial, and basin Water Resources Bulletin 
Brazil Municipality 5570 municipalities Brazil National Water Agency 
Russia District 7 districts Estimated from Shiklomanov et al. (2011)  
RegionSpatial statistic unitNumber of data recordsData source
World Country 180 countries FAO AQUASTAT (FAO 2021
USA State and county 53 states, 3,223 counties, and 279 aggregated zones USGS (Dieter et al. 2018
China Province, prefecture and basin 31 provincial level administrative divisions, 355 prefectural level divisions, and 63 basins China national, provincial, and basin Water Resources Bulletin 
Brazil Municipality 5570 municipalities Brazil National Water Agency 
Russia District 7 districts Estimated from Shiklomanov et al. (2011)  

Geospatial covariates

Domestic water use is influenced by both natural conditions such as climate and the availability of water and social conditions such as the income level and the habit of water use. The choice of covariates should reflect the natural and social factors that affect domestic water use. Climate factor, topographic factors, and NDVI can reflect the natural conditions, while population density, night light, fossil fuel CO2 emissions, NO2 density and HDI can reflect socioeconomic conditions. Our research goal is to produce gridded MWW estimation, thus we only consider gridded dataset (e.g. remote sensing and reanalysis data). Considering the availability of data, a total of 13 geospatial covariates are used, among which three are related to climate, four to topography, and six others. The detailed geospatial covariates and their respective data sources are listed in Table 2.

Table 2

Datasets and derived covariates

DatasetResolutionDerived covariateDescription
MSWEP v2.0 (Beck et al. 20170.1° p_15 Precipitation, 2015 
CRU TS v4.04 (Harris et al. 20200.5° ta_15 Mean temperature, 2015 
CERES_EBAF-Surface_Edition4.0 (Kato et al. 20181° nr_15 Net surface radiation, 2015 
GMTED2010 (Danielson & Gesch 201130 arc-seconds dem Average elevation 
sd_dem Standard deviation of elevation 
slp Average slope 
sd_slp Standard deviation of slope 
GIMMS3g (Fensholt & Proud 20125 arc-minutes ndvi_15 NDVI, 2015 
WorldPop 2000–2020 UN adjusted 1 km (Lloyd et al. 20191 km pd_15 Population density, 2015 
VIIRS Stray Light Corrected Nighttime Day/Night Band Composites Version 1 (Mills et al. 20135 arc-minutes ntl_15 Nighttime light index, 2015 
ODIAC Fossil Fuel Emission Dataset 2019 (Oda 20151 km co2_15 Fossil fuel CO2 emissions, 2015 
Merged TM4NO2A version 2.3 (Georgoulias et al. 20190.25° no2_15 NO2 vertical column density, 2015 
HDI (Kummu et al. 20185 arc-minutes HDI Human Development Index, 2015 
DatasetResolutionDerived covariateDescription
MSWEP v2.0 (Beck et al. 20170.1° p_15 Precipitation, 2015 
CRU TS v4.04 (Harris et al. 20200.5° ta_15 Mean temperature, 2015 
CERES_EBAF-Surface_Edition4.0 (Kato et al. 20181° nr_15 Net surface radiation, 2015 
GMTED2010 (Danielson & Gesch 201130 arc-seconds dem Average elevation 
sd_dem Standard deviation of elevation 
slp Average slope 
sd_slp Standard deviation of slope 
GIMMS3g (Fensholt & Proud 20125 arc-minutes ndvi_15 NDVI, 2015 
WorldPop 2000–2020 UN adjusted 1 km (Lloyd et al. 20191 km pd_15 Population density, 2015 
VIIRS Stray Light Corrected Nighttime Day/Night Band Composites Version 1 (Mills et al. 20135 arc-minutes ntl_15 Nighttime light index, 2015 
ODIAC Fossil Fuel Emission Dataset 2019 (Oda 20151 km co2_15 Fossil fuel CO2 emissions, 2015 
Merged TM4NO2A version 2.3 (Georgoulias et al. 20190.25° no2_15 NO2 vertical column density, 2015 
HDI (Kummu et al. 20185 arc-minutes HDI Human Development Index, 2015 

Climate-related covariates include precipitation, mean temperature, and net solar radiation for the year 2015. We derive annual precipitation from the Multi-Source Weighted-Ensemble Precipitation (MSWEP) dataset (version 2; 0.1° spatial resolution) (Beck et al. 2017). MSWEP ensembles a variety of gauge-, satellite-, and reanalysis-based precipitation datasets to achieve better accuracy. For mean temperature, we employ the CRU TS v4.04 dataset (Harris et al. 2020). In addition, net solar radiation was derived from the Surface Irradiances of Edition 4.0 Clouds and the Earth's Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) data product (Kato et al. 2018).

For topographic variables, elevation, slope, and standard deviations of elevation and slope are extracted from the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) (Danielson & Gesch 2011), which is an enhanced elevation model covering the whole global released by the U.S. Geological Survey and the National Geospatial-Intelligence Agency. GMTED2010 has three different spatial resolutions of 30, 15, and 7.5 arc-seconds, and the 30 arc-seconds one was used in this study. Firstly, slope data is produced from the elevation raster. For each 0.1 × 0.1° grid cell, means and standard deviations of elevation and slope were subsequently calculated from the elevation and slope data.

Other geospatial covariates include the Normalized Difference Vegetation Index (NDVI), population density, nighttime light index, CO2 emissions, NO2 vertical column density, and the Human Development Index (HDI). Mean NDVI for the year 2015 is derived from GIMMS3g dataset (Fensholt & Proud 2012). WorldPop developed peer-reviewed open and high-resolution population distributions data. In this study, population density is derived from WorldPop dataset (Lloyd et al. 2019). Nighttime light can reflect the extent and intensity of human activities and provide the potential to assess socioeconomic development. VIIRS Stray Light Corrected Nighttime Day/Night Band Composites Version 1 (Mills et al. 2013) is employed to acquire nighttime light index information. Satellite-based fossil fuel CO2 emissions and NO2 concentrations can reflect the anthropogenic influence on atmosphere, which may help us to estimate MWW. In this study, ODIAC Fossil Fuel Emission Dataset 2019 (Oda 2015) and Merged TM4NO2A version 2.3 (Georgoulias et al. 2019) are used to derive fossil fuel CO2 emissions and NO2 concentrations. HDI is often used to describe the development status of an area, and are thus included in this study. HDI for the year 2015 is retrieved from Kummu et al. (2018).

Due to the different resolutions of these gridded datasets, resampling or aggregation procedures are performed prior to the model construction and application. The gridded dataset with a lower resolution is resampled to 0.1° cell size using the bilinear method, while an aggregation procedure is utilized to generate a reduced-resolution for dataset with higher resolution than 0.1°. ArcGIS software is used to perform these spatial analysis procedures.

Data pre-processing is a crucial step that helps enhance the quality of data which directly affects the ability of our model to learn. In this study, the features are first transformed using inverse hyperbolic sine transformation and then standardized. The pre-processing of the MWW and its covariates is performed using ArcGIS software complemented with Python scripts.

Methods

Notations

The task of this study is to use information of region level aggregated observation of MWW and the spatially gridded covariates to estimate the gridded MWW.

Suppose the gridded inputs are , where each contains the values of features for the ith pixel. Here n is the number of pixels, d is the number of features. Inputs are stored on the rows of matrix :
The corresponding gridded outputs are represented by:
where each is unknown and needed to be solved.

Suppose we have k regions that the aggregated output is known, the region set is represented by . For region , the gridded inputs are represented by , where , nr is the number of pixels in region r, and the corresponding gridded outputs are represented by .

We have data of a set of region-level MWW , where each . We denote region-level average covariates as .

Population information will also be used in this study, and is the population count of the ith pixel:

Artificial neural network

Artificial neural network (ANN) is one of the most common supervised machine learning models. The excellent performance and adaptability of machine learning has demonstrated its potential in many fields (Rasouli et al. 2012; Povak et al. 2014; He et al. 2016; Pelletier et al. 2016; Lamorski et al. 2017; Yan et al. 2019). Compared with traditional approaches (including traditional statistical models and empirical models), machine learning approaches have proven to be more effective, precise, and flexible. Machine learning uses one or more algorithms to explore the relationship between responses and their related predictors, with no need to consider the explicit mathematical form of the model. Complex nonlinear relationships can be easily handled with machine learning, which may facilitate the discovery of the underlying mechanisms (Zhu et al. 2022b). Machine learning models rely on ancillary and remotely sensed data are applied in aggregated data spatialization (Musicant et al. 2007; Gervasoni et al. 2018; Qiu et al. 2019; Derval et al. 2020; Šimbera 2020). ANN is an artificial neural network with multiple hidden layers between the input and output layers (LeCun et al. 2015). Conventional ANNs are typically feedforward networks in which information moves in only a forward direction from the input layer through hidden layers and to the output layer. Recently, ANN has been widely used in various fields because of its high accuracy and ability to model complex and non-linear relationships.

In this study, we have defined an ANN model using two hidden layers, the first with 100 neurons and the second with 60 neurons (Figure 1). The model has four layers: an input layer, an output layer and two hidden layers. The input layer consists of 13 neurons, which is consistent with the number of geospatial covariates, and the output layer has a single neuron. We use ReLU (Agarap 2019) as activation function after each layer. The model building, training, validation, and application are realized with Python and PyTorch on Google Colaboratory.
Figure 1

ANN model structure used in this study.

Figure 1

ANN model structure used in this study.

Close modal

Models

Three types of six models were used in the research (Figure 2). The first type model contains only one model, it is a region level linear regression model (hereafter use RegLin to represent the model) (Equation (1)) and used as a baseline:
(1)
where is a vector of parameters and is a vector of random errors. The averaged values of covariates within the regions are used to train the model.
Figure 2

Different types of models used to estimate MWW.

Figure 2

Different types of models used to estimate MWW.

Close modal
Figure 3

Regions used in model construction.

Figure 3

Regions used in model construction.

Close modal

The second and third types are both pixel level learning models by using aggregated output information. The general way to formulate a model can break into two parts: the relationship between gridded inputs and unknown gridded outputs, and the relationship between gridded outputs and aggregated outputs. The first part postulates a relationship (f) between the gridded inputs and outputs, even though the gridded outputs are not directly observed. The second part converts the unobserved gridded outputs to the region-level outputs using sum or weighted sum method.

The second type model directly establishes a relationship between MWW and its corresponding geospatial covariates. It includes two models: a linear-exponential regression model (LinExp) (Equation (2)) and an ANN model (NN) (Equation (3)):
(2)
(3)
The indirect third type model first establishes the relationships between per capita MWW and geospatial covariates, and then converts to MWW based on the population distribution. The third type of models includes three models: an indirect linear model (LinM) (Equation (4)), an indirect linear-exponential model (LinExpM) (Equation (5)) and an indirect ANN model (NNM) (Equation (6)):
(4)
(5)
(6)
Solving the aggregated output learning problem consists of learning the function f that predicts MWW with accuracy. This is equivalent to finding the parameters θ of the model by minimizing a loss function. In this study, we use mean absolute error (MAE) as the loss function (Figure 7):
(7)
where and represent the surveyed and estimated MWW for the region r, respectively.

In order to find θ, the loss function is globally minimized by optimization algorithms. In this study, optimization is done using the AdamW algorithm (Loshchilov & Hutter 2019). Adam, for Adaptive moment estimation, is a variation of the standard Stochastic Gradient Descent algorithm; it uses first-order gradients and estimations of the first and second moments of these gradients to regularize learning steps.

For model building, we used MWW data of 821 regions, including 279 aggregated zones in the United States, 355 prefectural level divisions in China, seven districts in Russia and other 180 countries (Figure 3). To evaluate the accuracy of the different models, 2 × 5-fold cross validation procedures were performed in this study. The procedure randomly divided the 821 regions into five folds, and each fold was then used once as the validation regions, with the rest as the training regions. This procedure repeats 2 times. The results can be used as a measurement of model generalization ability. After the models had been trained, global gridded MWW were generated using global geospatial covariates as the input to the models.

Accuracy assessment

Three statistical methods comprising coefficient of determination (R2), normalized mean absolute error (NMAE), and normalized root mean square error (NRMSE) are used to evaluate model performance. Compared to absolute performance metrics (e.g., MAE and RMSE), relative model performance metrics (e.g., NMAE and NRMSE) are more intuitive and comparable regarding datasets with different scales. There are many normalization methods in common use. One can normalize by the mean, the range, the standard deviation, or the interquartile range of observations. In this study, we simply use the mean of the observations as the normalization method. Furthermore, Taylor diagram is also used to evaluate the model performance, which shows standard deviation, correlation coefficient, and root mean square error (RMSE) at the same time.

Model performance for the whole training regions

In order to evaluate the different model, the performance metrics of R2, NMAE, NRMSE are calculated during the 2 × 5-fold cross validation procedure for the training regions which include 279 merged regions in the United States, 355 prefectures and cities in China, seven districts in Russia and other 180 countries from FAO AQUASTAT database. The accuracy statistics of 2 × 5-fold cross-validation for different models are given in Table 3. Note that values in bold represent the best results in terms of specific performance metric. Within the six models studied, the two better models are LinM with the highest R2 (0.94) and the lowest NRMSE (90.51%), and NNM with the second highest R2 (0.92) and the lowest NMAE (31.09%). It can be judged from the performance indicators that model LinM and model NNM have good ability to simulate the distribution of MWW globally. As previously stated we use MAE as the loss function, according to this criterion, NNM achieved the best performance, followed by LinM and LinExpM. The performance of pixel-level indirect models (LinM, LinExpM, and NNM) are found to be more optimal than other models.

Table 3

Accuracy assessment results of 2 × 5-fold cross-validation

ModelR2NMAE(%)NRMSE(%)
RegLin 0.78 ± 0.16 57.32 ± 10.24 189.89 ± 24.93 
LinExp 0.77 ± 0.30 48.00 ± 25.87 159.66 ± 100.27 
NN 0.80 ± 0.12 43.02 ± 8.90 162.35 ± 38.12 
LinM 0.94 ± 0.04 31.36 ± 5.69 90.52 ± 22.20 
LinExpM 0.91 ± 0.05 32.34 ± 5.55 99.95 ± 20.83 
NNM 0.92 ± 0.08 31.09 ± 6.18 108.17 ± 43.43 
ModelR2NMAE(%)NRMSE(%)
RegLin 0.78 ± 0.16 57.32 ± 10.24 189.89 ± 24.93 
LinExp 0.77 ± 0.30 48.00 ± 25.87 159.66 ± 100.27 
NN 0.80 ± 0.12 43.02 ± 8.90 162.35 ± 38.12 
LinM 0.94 ± 0.04 31.36 ± 5.69 90.52 ± 22.20 
LinExpM 0.91 ± 0.05 32.34 ± 5.55 99.95 ± 20.83 
NNM 0.92 ± 0.08 31.09 ± 6.18 108.17 ± 43.43 

Comparisons at different countries and spatial levels

After evaluating the models, global gridded MWWs from all the models are generated using the gridded covariates. We have compared our modelled MWWs with existing surveyed (or estimated) data at different countries and spatial levels (Figure 4). The performance of the modelled MWWs and existing MWW datasets were also evaluated using the Taylor diagram (Figure 5).
Figure 4

Different countries and spatial levels: (a) US county, (b) US state, (c) China water resources zone, (d) China province, and (e) Brazil municipality.

Figure 4

Different countries and spatial levels: (a) US county, (b) US state, (c) China water resources zone, (d) China province, and (e) Brazil municipality.

Close modal
Figure 5

Taylor Diagram of models by (a) US county, (b) US state, (c) China water resources zone, (d) China province, and (e) Brazil municipality.

Figure 5

Taylor Diagram of models by (a) US county, (b) US state, (c) China water resources zone, (d) China province, and (e) Brazil municipality.

Close modal

US county and state level MWWs estimated from our models are compared with that from USGS. In these comparisons, 3223 counties and 53 states are taken into consideration. The results are shown in Table 4 and Figure 5(a). For both levels, the NNM models show the best accuracy with higher R2 and lower NMAE and NRMSE. For comparisons by county, the R2, NMAE and NRMSE for NNM model are 0.76, 45.37%, and 172.77%, respectively. While the state level comparison is much better, the R2, NMAE and NRMSE for NNM model are 0.99, 10.39%, and 14.82%, respectively.

Table 4

US county and state comparison results

ModelUS county (N = 3223)
US state (N = 53)
R2NMAE (%)NRMSE (%)R2NMAE (%)NRMSE (%)
RegLin 0.73 78.99 302.08 0.98 75.57 113.50 
LinExp 0.72 53.70 191.16 0.96 25.81 32.95 
NN 0.57 62.59 265.95 0.76 31.87 64.01 
LinM 0.71 55.84 198.24 0.98 36.22 47.02 
LinExpM 0.74 51.05 184.27 0.97 20.81 27.21 
NNM 0.76 45.37 172.77 0.99 10.39 14.82 
ModelUS county (N = 3223)
US state (N = 53)
R2NMAE (%)NRMSE (%)R2NMAE (%)NRMSE (%)
RegLin 0.73 78.99 302.08 0.98 75.57 113.50 
LinExp 0.72 53.70 191.16 0.96 25.81 32.95 
NN 0.57 62.59 265.95 0.76 31.87 64.01 
LinM 0.71 55.84 198.24 0.98 36.22 47.02 
LinExpM 0.74 51.05 184.27 0.97 20.81 27.21 
NNM 0.76 45.37 172.77 0.99 10.39 14.82 

According to the water resources zoning of China, there are 10 first-level water resources zones and 82 second-level water resources zones. The MWW data was acquired from Water Resources Bulletin of different basins. Due to some MWW data of the second-level water resources zones not able to be accessed, we used data of the first-level water resources zones instead. The results of the above models by China water resources zone are shown in Table 5 and Figure 5(c). The NNM produces better estimates than other models, which achieved the highest R2 (0.98) and the lowest NMAE (10.18%) and NRMSE (16.01%).

Table 5

Model performance by China water resources zone

ModelChina water resources zone (N = 63)
China province (N = 31)
R2NMAE (%)NRMSE (%)R2NMAE (%)NRMSE (%)
RegLin 0.83 30.72 53.78 0.70 33.95 52.15 
LinExp 0.87 25.22 39.46 0.85 22.28 30.60 
NN 0.79 28.18 55.42 0.73 28.82 45.38 
LinM 0.88 26.78 40.35 0.86 24.45 30.35 
LinExpM 0.92 21.25 31.90 0.89 21.14 27.22 
NNM 0.98 10.18 16.01 0.98 6.64 10.57 
ModelChina water resources zone (N = 63)
China province (N = 31)
R2NMAE (%)NRMSE (%)R2NMAE (%)NRMSE (%)
RegLin 0.83 30.72 53.78 0.70 33.95 52.15 
LinExp 0.87 25.22 39.46 0.85 22.28 30.60 
NN 0.79 28.18 55.42 0.73 28.82 45.38 
LinM 0.88 26.78 40.35 0.86 24.45 30.35 
LinExpM 0.92 21.25 31.90 0.89 21.14 27.22 
NNM 0.98 10.18 16.01 0.98 6.64 10.57 

For provincial level comparison of China, we achieved similar results as other areas. The NNM model shows the best accuracy with a lower NMAE and NRMSE. The R2, NMAE and NRMSE for NNM model are 0.98, 6.64%, and 10.57%, respectively.

Brazilian municipalities are administrative subdivisions of the Brazilian states. Brazil currently has 27 states and 5570 municipalities. The model performance by Brazil municipality is somewhat different from previous cases (Table 6 and Figure 5(e)). Here it is shown that LinM is the best model, although NNM is still relatively good and acceptable.

Table 6

Model performance by Brazil municipality

ModelBrazil municipality (N = 5570)
R2NMAE (%)NRMSE (%)
RegLin 0.97 52.17 485.12 
LinExp 0.91 45.57 295.16 
NN 0.40 86.19 654.92 
LinM 0.97 38.64 211.03 
LinExpM 0.95 37.37 211.49 
NNM 0.86 41.33 377.97 
ModelBrazil municipality (N = 5570)
R2NMAE (%)NRMSE (%)
RegLin 0.97 52.17 485.12 
LinExp 0.91 45.57 295.16 
NN 0.40 86.19 654.92 
LinM 0.97 38.64 211.03 
LinExpM 0.95 37.37 211.49 
NNM 0.86 41.33 377.97 

Lastly, country-level MWW comparisons are performed between our best NNM model and Huang et al. (2018) and FAO AUASTAT (Figure 6). Huang et al. (2018) reconstructed a global monthly gridded (0.5°) sectoral water withdrawal dataset for the period 1971–2010 using a spatial-temporal statistical downscaling method. MWW data of year 2010 from Huang et al. (2018) was used for comparison. Clearly, the estimations from our NNM model outperforms that from Huang et al. (2018).
Figure 6

Country-level MWW comparison between FAO and Huang et al. (2018), and NNM model of this study.

Figure 6

Country-level MWW comparison between FAO and Huang et al. (2018), and NNM model of this study.

Close modal
Figure 7

Global MWW distribution estimated by (a) LinM and (b) NNM models.

Figure 7

Global MWW distribution estimated by (a) LinM and (b) NNM models.

Close modal

Overall, it is renowned that the NNM model performed in a superior manner at most regions. LinM also performs well and is stable. In addition, the pixel-level indirect models (LinM, LinExpM, and NNM) are found to be more optimal in estimating MWW in terms of the results presented in Figure 5 and Tables 4,56.

Results of global MWW and its gridded distribution

Through the above verification and performance comparison, we conclude that NNM is the best model and LinM is the second best model. Thus, we present the estimation results of global gridded MWW only considering these two models.

Global total MWW

Global total MWW for the year 2015 from LinM and NNM are estimated at 449.81 and 476.96 km3 respectively (Table 7), which are very close to estimations from other studies. According to Wada et al. (2011) and Gleick (2012), world total MWW for the year 2000 was 453.15 km3. Global total domestic water use is estimated for the year 2010 to be approximately 450 km3 by Wada et al. (2016a), and 422 km3 by Huang et al. (2018). World Resources Institute (Gassert et al. 2014) estimated world total MWW as 534.59 km3 for the year 2010. Table 7 shows the world and continental total MWW from our models.

Table 7

World and continental MWW (km3)

ContinentLinMNNM
Africa 36.84 34.30 
Asia 248.70 253.85 
Europe 64.41 66.51 
North America 57.76 79.18 
Oceania 3.03 4.44 
South America 39.07 38.68 
World 449.81 476.96 
ContinentLinMNNM
Africa 36.84 34.30 
Asia 248.70 253.85 
Europe 64.41 66.51 
North America 57.76 79.18 
Oceania 3.03 4.44 
South America 39.07 38.68 
World 449.81 476.96 

Spatial distribution of global MWW

Global and regional gridded MWW from different models is shown in Figures 7,8910. These figures clearly depict detailed local variations. Because of the higher intensity of water use, urban areas are noticeable at this scale. The spatial patterns from different models were consistent with each other. Higher MWW mostly occurred in South Asia, Europe, southeastern China, and southeastern United States.
Figure 8

MWW distribution in the United States estimated by (a) LinM and (b) NNM models.

Figure 8

MWW distribution in the United States estimated by (a) LinM and (b) NNM models.

Close modal
Figure 9

MWW distribution in China and India estimated by (a) LinM and (b) NNM models.

Figure 9

MWW distribution in China and India estimated by (a) LinM and (b) NNM models.

Close modal
Figure 10

MWW distribution in the Mediterranean rim estimated by (a) LinM and (b) NNM models.

Figure 10

MWW distribution in the Mediterranean rim estimated by (a) LinM and (b) NNM models.

Close modal

ANN model performance

The excellent performance of the NNM in different regions and scales demonstrates the predictive accuracy of the ANN model. Generally, machine learning models have better performance than traditional regression models (Povak et al. 2014; Lamorski et al. 2017; Yan et al. 2019). The results from the cross-validation show that the performance of the NNM reaches satisfactory levels, indicating that the employed technique is suitable for estimating MWW.

Limitations of the machine learning models are also identified. The behavior of the machine learning models may be less intuitive to interpret than traditional regression models because its algorithm cannot be fully described mechanistically. To address this problem, interpretation methods such as feature importance quantification and partial dependence analysis can provide a deeper understanding of the models. Recently, a newly interpretation approach, Shapley Additive exPlanations (SHAP) proposed by Lundberg & Lee (2017), has shown promising performance in terms of its interpretability (Matin & Pradhan 2021; Wang et al. 2021a, 2021b). SHAP can perform local and global interpretability simultaneously, and it has a solid theoretical foundation compared with other methods. Unfortunately, this new approach currently does not support our aggregated output learning problem. Overfitting is another machine learning common problem; use grid search or random search method, sometimes even by trial and error, to find the best model parameters.

Uncertainty in MWW estimation

For regions where their water use data are used for the above model training, the aggregated region-level uncertainty of municipal water use estimation has been checked by coefficient of variation (CV) of 2 × 5-fold cross-validation (Figure 11). The CV value in most regions is less than 20%, indicating the good robustness of our model at the region-level. High CV value most occurs in Africa, Middle East and Tibet Plateau where municipal water use is relatively little.
Figure 11

Region-level (a) standard deviation, (b) mean, and (c) CV from 2 × 5-fold cross-validation.

Figure 11

Region-level (a) standard deviation, (b) mean, and (c) CV from 2 × 5-fold cross-validation.

Close modal
However, for the pixel level, the uncertainty of municipal water use estimation is much larger. For the NNM model, the uncertainty of per grid cell MWW is expressed as CV (Figure 12) by computing the 20 replicates of the model. It shows that the CV value may be higher than 1 and indicates the model is not very robust and the change of training data may cause evident change of the results at the pixel level. The high CV areas include most of Africa, pan Qinghai Tibet Plateau, Andes Mountains and Brazil Highlands, as well as Papua Island, where per capita domestic water withdrawal is relatively low.
Figure 12

Pixel-level (a) standard deviation, (b) mean, and (c) CV in MWW estimation among the 20 NNM replicates.

Figure 12

Pixel-level (a) standard deviation, (b) mean, and (c) CV in MWW estimation among the 20 NNM replicates.

Close modal

Feature importance

Neural networks are often considered as ‘black box’ as they are difficult or impossible to interpret. Here we use the permutation feature importance (PFI) to compare the relative importance of input variables. PFI is a global interpretation method which describes the average behavior of a machine learning model. PFI measures the increase in the prediction error of a model after shuffling a feature (variable) (Molnar 2018).

Figures 13 and 14 show the PFI scores for the NN model and NNM model, respectively. Positive scores mean positive correlations whereas negatives indicate the opposite. For the NN model, population density (pd_15), fossil fuel CO2 emissions (co2_15), and nighttime light index (ntl_15) are the most influential variables and are positively correlated with MWW, indicating population and socioeconomic factors are the most influencing factors when using the MWW as the direct response. In terms of least important variables, slope (slp) and standard deviation of slope (sd_slp) are voted to be least important. In contrast, topographic variables (sd_dem, slp) and temperature (ta_15) are the top important variables for NNM model, which uses per capital MWW as the response. These variables are all negatively correlated with per capital MWW.
Figure 13

PFI in estimating MWW for NN model.

Figure 13

PFI in estimating MWW for NN model.

Close modal
Figure 14

PFI in estimating per capital MWW for NNM model.

Figure 14

PFI in estimating per capital MWW for NNM model.

Close modal
Figure 15

Linear coefficient for each feature of LinM model.

Figure 15

Linear coefficient for each feature of LinM model.

Close modal
Compared to the NNM model, the LinM model is more intrinsically interpretable. The variable coefficients of the LinM model are shown in Figure 15. Note that all the dependent variables except population (Pop) are inverse hyperbolic sine transformed and standardized. It is sometimes confusing that the linear coefficients of each feature of LinM model (Figure 15), whose absolute value can show the relative importance among features, are not consistent with the PFI value of features (Figure 16). The reason for this may be the existence of collinearity between features which causes the mutual interference and uncertainty.
Figure 16

PFI in estimating per capital MWW for LinM model.

Figure 16

PFI in estimating per capital MWW for LinM model.

Close modal

In this study, six different models, region-level linear regression model (RegLin), pixel-level direct linear-exponential model (LinExp), pixel-level direct ANN model (NN), pixel-level indirect linear model (LinM), pixel-level indirect linear-exponential model (LinExpM), and pixel-level indirect ANN model (NNM), are constructed and evaluated in estimating gridded MWW.

  • (1)

    The pixel-level indirect models (LinM, LinExpM, and NNM) show better performance than the other two types of models.

  • (2)

    NNM and LinM can provide good estimation of global gridded MWW, with verification NMAE of 31.09%, 31.34% respectively by global country/region. NNM is more accurate in MWW estimation, whereas LinM is more interpretable.

  • (3)

    Region-level uncertainty analysis indicates the robustness of our model at the region scale. For major regions, CV of cross-validation is less than 0.2.

  • (4)

    However, there is relatively large uncertainty of estimation results in the pixel level, shown by the high CV of gridded MWW for NNM model 20 replicates test. Although it does not influence much the suitability of the methods and the reliability of the estimation results in regions with relatively high municipal water use, there is need for further study as to how this happens and how to solve the problem.

  • (5)

    Future research should draw more attention on time series of MWW due to its much more application scenarios in hydrological modelling and water resources management.

This research was carried out with support from the National Key Research and Development Program of China (2021YFE0103900) and the National Natural Science Foundation of China (41901047).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Agarap
A. F.
2019
Deep Learning using Rectified Linear Units (ReLU). arXiv:1803.08375 [cs, stat]. http://arxiv.org/abs/1803.08375.
Beck
H. E.
,
van Dijk
A. I. J. M.
,
Levizzani
V.
,
Schellekens
J.
,
Miralles
D. G.
,
Martens
B.
&
de Roo
A.
2017
MSWEP: 3-hourly 0.25° global gridded precipitation (1979–2015) by merging gauge, satellite, and reanalysis data
.
Hydrology and Earth System Sciences
21
(
1
),
589
615
.
https://doi.org/10.5194/hess-21-589-2017
.
Bierkens
M. F. P.
,
Bell
V. A.
,
Burek
P.
,
Chaney
N.
,
Condon
L. E.
,
David
C. H.
,
de Roo
A.
,
Döll
P.
,
Drost
N.
,
Famiglietti
J. S.
,
Flörke
M.
,
Gochis
D. J.
,
Houser
P.
,
Hut
R.
,
Keune
J.
,
Kollet
S.
,
Maxwell
R. M.
,
Reager
J. T.
,
Samaniego
L.
,
Sudicky
E.
,
Sutanudjaja
E. H.
,
van de Giesen
N.
,
Winsemius
H.
&
Wood
E. F.
2015
Hyper-resolution global hydrological modelling: what is next?: ‘Everywhere and locally relevant’
.
Hydrological Processes
29
(
2
),
310
320
.
https://doi.org/10.1002/hyp.10391
.
Boretti
A.
&
Rosa
L.
2019
Reassessing the projections of the World Water Development Report
.
npj Clean Water
2
(
1
),
1
6
.
https://doi.org/10.1038/s41545-019-0039-9
.
Danielson
J. J.
&
Gesch
D. B.
2011
Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010)
.
US Department of the Interior, US Geological Survey
,
Washington, DC
,
USA
.
Derval
G.
,
Docquier
F.
&
Schaus
P.
2020
An Aggregate Learning Approach for Interpretable Semi-supervised Population Prediction and Disaggregation Using Ancillary Data. In, Cham, 2020. https://doi.org/10.1007/978-3-030-46133-1_40
.
Dieter
C. A.
,
Maupin
M. A.
,
Caldwell
R. R.
,
Harris
M. A.
,
Ivahnenko
T. I.
,
Lovelace
J. K.
,
Barber
N. L.
&
Linsey
K. S.
2018
Estimated use of Water in the United States in 2015. (Circular, 1441)
.
Reston, VA
.
Ellingson
N.
,
Hargiss
C. L. M.
&
Norland
J.
2019
Understanding municipal water Use and data availability: a case study across North Dakota, USA
.
Water Resources Management
33
(
14
),
4895
4907
.
https://doi.org/10.1007/s11269-019-02411-8
.
FAO
2021
AQUASTAT – FAO's Global Information System on Water and Agriculture
.
Fensholt
R.
&
Proud
S. R.
2012
Evaluation of earth observation based global long term vegetation trends – comparing GIMMS and MODIS global NDVI time series
.
Remote Sensing of Environment
119
,
131
147
.
https://doi.org/10.1016/j.rse.2011.12.015
.
Flörke
M.
,
Kynast
E.
,
Bärlund
I.
,
Eisner
S.
,
Wimmer
F.
&
Alcamo
J.
2013
Domestic and industrial water uses of the past 60 years as a mirror of socio-economic development: a global simulation study
.
Global Environmental Change
23
(
1
),
144
156
.
https://doi.org/10.1016/j.gloenvcha.2012.10.018
.
Gassert
F.
,
Landis
M.
,
Luck
M.
,
Reig
P.
&
Shiao
T.
2014
Aqueduct Global Maps 2.1
.
Water Resources Institute (WRI)
,
Washington, DC
.
Georgoulias
A. K.
,
van der
A. R. J.
,
Stammes
P.
,
Boersma
K. F.
&
Eskes
H. J.
2019
Trends and trend reversal detection in 2 decades of tropospheric NO2 satellite observations
.
Atmospheric Chemistry and Physics
19
(
9
),
6269
6294
.
https://doi.org/10.5194/acp-19-6269-2019
.
Gervasoni
L.
,
Fenet
S.
,
Perrier
R.
&
Sturm
P.
2018
Convolutional neural networks for disaggregated population mapping using open data
. In:
2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
,
October 2018
,
Turin, Italy
.
https://doi.org/10.1109/DSAA.2018.00076
.
Gleick
P. H.
2012
The World's Water: the Biennial Report on Freshwater Resources
.
Island Press
,
Washington, DC
.
Guo
B.
,
Chen
Y.
,
Shen
Y.
,
Li
W.
&
Wu
C.
2013
Spatially explicit estimation of domestic water use in the arid region of northwestern China: 1985–2009
.
Hydrological Sciences Journal
58
(
1
),
162
176
.
https://doi.org/10.1080/02626667.2012.745081
.
Hanasaki
N.
,
Fujimori
S.
,
Yamamoto
T.
,
Yoshikawa
S.
,
Masaki
Y.
,
Hijioka
Y.
,
Kainuma
M.
,
Kanamori
Y.
,
Masui
T.
,
Takahashi
K.
&
Kanae
S.
2013
A global water scarcity assessment under shared socio-economic pathways – part 1: water use
.
Hydrology and Earth System Sciences
17
(
7
),
2375
2391
.
https://doi.org/10.5194/hess-17-2375-2013
.
Harris
I.
,
Osborn
T. J.
,
Jones
P.
&
Lister
D.
2020
Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset
.
Scientific Data
7
(
1
).
https://doi.org/10.1038/s41597-020-0453-3
.
He
X.
,
Chaney
N. W.
,
Schleiss
M.
&
Sheffield
J.
2016
Spatial downscaling of precipitation using adaptable random forests
.
Water Resources Research
52
(
10
),
8217
8237
.
Hernández-González
J.
,
Inza
I.
,
Granado
I.
,
Basurko
O. C.
,
Fernandes
J. A.
&
Lozano
J. A.
2019
Aggregated outputs by linear models: an application on marine litter beaching prediction
.
Information Sciences
481
,
381
393
.
https://doi.org/10.1016/j.ins.2018.12.083
.
Huang
Z.
,
Hejazi
M.
,
Li
X.
,
Tang
Q.
,
Vernon
C.
,
Leng
G.
,
Liu
Y.
,
Döll
P.
,
Eisner
S.
,
Gerten
D.
,
Hanasaki
N.
&
Wada
Y.
2018
Reconstruction of global gridded monthly sectoral water withdrawals for 1971–2010 and analysis of their spatiotemporal patterns
.
Hydrology and Earth System Sciences
22
(
4
),
2117
2133
.
https://doi.org/10.5194/hess-22-2117-2018
.
Kato
S.
,
Rose
F. G.
,
Rutan
D. A.
,
Thorsen
T. J.
,
Loeb
N. G.
,
Doelling
D. R.
,
Huang
X.
,
Smith
W. L.
,
Su
W.
&
Ham
S.-H.
2018
Surface irradiances of edition 4.0 Clouds and the Earth's Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) data product
.
Journal of Climate
31
(
11
),
4501
4527
.
https://doi.org/10.1175/JCLI-D-17-0523.1
.
Kummu
M.
,
Taka
M.
&
Guillaume
J. H. A.
2018
Gridded global datasets for gross domestic product and human development index over 1990–2015
.
Scientific Data
5
(
1
),
180004
.
https://doi.org/10.1038/sdata.2018.4
.
Lamorski
K.
,
Šimůnek
J.
,
Sławiński
C.
&
Lamorska
J.
2017
An estimation of the main wetting branch of the soil water retention curve based on its main drying branch using the machine learning method
.
Water Resources Research
53
(
2
),
1539
1552
.
https://doi.org/10.1002/2016WR019533
.
LeCun
Y.
,
Bengio
Y.
&
Hinton
G.
2015
Deep learning
.
Nature
521
(
7553
),
436
444
.
https://doi.org/10.1038/nature14539
.
Lips
S. E.
2020
Towards A Global High Resolution Water Demand Dataset
.
MSc Thesis
,
Utrecht University
.
Lloyd
C. T.
,
Chamberlain
H.
,
Kerr
D.
,
Yetman
G.
,
Pistolesi
L.
,
Stevens
F. R.
,
Gaughan
A. E.
,
Nieves
J. J.
,
Hornby
G.
,
MacManus
K.
,
Sinha
P.
,
Bondarenko
M.
,
Sorichetta
A.
&
Tatem
A. J.
2019
Global spatio-temporally harmonised datasets for producing high-resolution gridded population distribution datasets
.
Big Earth Data
3
(
2
),
108
139
.
https://doi.org/10.1080/20964471.2019.1625151
.
Loshchilov
I.
&
Hutter
F.
2019
Decoupled Weight Decay Regularization. arXiv:1711.05101 [cs, math]. Available from: http://arxiv.org/abs/1711.05101.
Lundberg
S. M.
&
Lee
S.-I.
2017
A unified approach to interpreting model predictions. In:
Proceedings of the 31st International Conference on Neural Information Processing Systems
,
December 2017, Red Hook, NY, USA. https://dl.acm.org/doi/10.5555/3295222.3295230
.
Matin
S. S.
&
Pradhan
B.
2021
Earthquake-induced building-damage mapping using explainable AI (XAI)
.
Sensors
21
(
13
),
4489
.
https://doi.org/10.3390/s21134489
.
Mills
S.
,
Weiss
S.
&
Liang
C.
2013
VIIRS day/night band (DNB) stray light characterization and correction
. In:
SPIE Optical Engineering + Applications
.
San Diego, California
,
United States
,
https://doi.org/10.1117/12.2023107
.
Molnar
C.
2018
Interpretable Machine Learning (Second Edition)
.
Leanpub
. .
Musicant
D. R.
,
Christensen
J. M.
&
Olson
J. F.
2007
Supervised learning by training on aggregate outputs
. In
Seventh IEEE International Conference on Data Mining (ICDM 2007)
,
October 2007
,
Omaha, NE, USA
.
https://doi.org/10.1109/ICDM.2007.50
.
Oda
T.
2015
ODIAC Fossil Fuel CO2 Emissions Dataset
.
https://doi.org/10.17595/20170411.001
.
National Institute for Environmental Studies
.
Pelletier
C.
,
Valero
S.
,
Inglada
J.
,
Champion
N.
&
Dedieu
G.
2016
Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas
.
Remote Sensing of Environment
187
,
156
168
.
Povak
N. A.
,
Hessburg
P. F.
,
McDonnell
T. C.
,
Reynolds
K. M.
,
Sullivan
T. J.
,
Salter
R. B.
&
Cosby
B. J.
2014
Machine learning and linear regression models to predict catchment-level base cation weathering rates across the southern Appalachian Mountain region, USA
.
Water Resources Research
50
(
4
),
2798
2814
.
https://doi.org/10.1002/2013WR014203
.
Qiu
Y.
,
Zhao
X.
,
Fan
D.
&
Li
S.
2019
Geospatial disaggregation of population data in supporting SDG assessments: a case study from Deqing County, China
.
ISPRS International Journal of Geo-Information
8
(
8
),
356
.
https://doi.org/10.3390/ijgi8080356
.
Rasouli
K.
,
Hsieh
W. W.
&
Cannon
A. J.
2012
Daily streamflow forecasting by machine learning methods with weather and climate inputs
.
Journal of Hydrology
414–415
,
284
293
.
https://doi.org/10.1016/j.jhydrol.2011.10.039
.
Ritchie
H.
&
Roser
M.
2017
Water Use and Stress. Our World in Data. Available from: https://ourworldindata.org/water-use-stress.
Shiklomanov
I. A.
,
Babkin
V. I.
&
Balonishnikov
Z. A.
2011
Water resources, their use, and water availability in Russia: current estimates and forecasts
.
Water Resources
38
(
2
),
139
148
.
https://doi.org/10.1134/S009780781101012X
.
Šimbera
J.
2020
Neighborhood features in geospatial machine learning: the case of population disaggregation
.
Cartography and Geographic Information Science
47
(
1
),
79
94
.
https://doi.org/10.1080/15230406.2019.1618201
.
Vandecasteele
I.
,
Bianchi
A.
,
Batista e Silva
F.
,
Lavalle
C.
&
Batelaan
O.
2014
Mapping current and future European public water withdrawals and consumption
.
Hydrology and Earth System Sciences
18
(
2
),
407
416
.
https://doi.org/10.5194/hess-18-407-2014
.
Wada
Y.
&
Bierkens
M. F. P.
2014
Sustainability of global water use: past reconstruction and future projections
.
Environmental Research Letters
9
(
10
),
104003
.
https://doi.org/10.1088/1748-9326/9/10/104003
.
Wada
Y.
,
van Beek
L. P. H.
,
Viviroli
D.
,
Dürr
H. H.
,
Weingartner
R.
&
Bierkens
M. F. P.
2011
Global monthly water stress: 2. water demand and severity of water stress: global monthly water stress, 2
.
Water Resources Research
47
(
7
).
https://doi.org/10.1029/2010WR009792
.
Wada
Y.
,
de Graaf
I. E. M.
&
van Beek
L. P. H.
2016a
High-resolution modeling of human and climate impacts on global water resources
.
Journal of Advances in Modeling Earth Systems
8
(
2
),
735
763
.
https://doi.org/10.1002/2015MS000618
.
Wada
Y.
,
Flörke
M.
,
Hanasaki
N.
,
Eisner
S.
,
Fischer
G.
,
Tramberend
S.
,
Satoh
Y.
,
van Vliet
M. T. H.
,
Yillia
P.
,
Ringler
C.
,
Burek
P.
&
Wiberg
D.
2016b
Modeling global water use for the 21st century: the Water Futures and Solutions (WFaS) initiative and its approaches
.
Geoscientific Model Development
9
(
1
),
175
222
.
https://doi.org/10.5194/gmd-9-175-2016
.
Wang
F.
,
Wang
Y.
,
Zhang
K.
,
Hu
M.
,
Weng
Q.
&
Zhang
H.
2021a
Spatial heterogeneity modeling of water quality based on random forest regression and model interpretation
.
Environmental Research
202
,
111660
.
https://doi.org/10.1016/j.envres.2021.111660
.
Wang
K.
,
Tian
J.
,
Zheng
C.
,
Yang
H.
,
Ren
J.
,
Liu
Y.
,
Han
Q.
&
Zhang
Y.
2021b
Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP
.
Computers in Biology and Medicine
137
,
104813
.
https://doi.org/10.1016/j.compbiomed.2021.104813
.
Wang
Y.
,
Zhou
Y.
,
Franz
K.
,
Zhang
X.
,
Ding
K. J.
,
Jia
G.
&
Yuan
X.
2021c
An agent-based framework for high-resolution modeling of domestic water use
.
Resources, Conservation and Recycling
169
,
105520
.
https://doi.org/10.1016/j.resconrec.2021.105520
.
Yan
J.
,
Jia
S.
,
Lv
A.
&
Zhu
W.
2019
Water resources assessment of China's transboundary river basins using a machine learning approach
.
Water Resources Research
55
(
1
),
632
655
.
https://doi.org/10.1029/2018WR023044
.
Zhang
Y.
,
Charoenphakdee
N.
,
Wu
Z.
&
Sugiyama
M.
2021
Learning from Aggregate Observations. arXiv:2004.06316 [cs, stat]. Available from: http://arxiv.org/abs/2004.06316
.
Zhu
H.
,
Howes
A.
,
van Eer
O.
,
Rischard
M.
,
Li
Y.
,
Sejdinovic
D.
&
Flaxman
S.
2022a
Aggregated Gaussian Processes with Multiresolution Earth Observation Covariates. arXiv:2105.01460 [stat]. Available from: http://arxiv.org/abs/2105.01460.
Zhu
M.
,
Wang
J.
,
Yang
X.
,
Zhang
Y.
,
Zhang
L.
,
Ren
H.
,
Wu
B.
&
Ye
L.
2022b
A review of the application of machine learning in water quality evaluation
.
Eco-Environment & Health
1
(
2
),
107
116
.
https://doi.org/10.1016/j.eehl.2022.06.001
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).