Water use intensity (WUI) reveals water withdrawals with respect to economic output. Decomposing WUI into factors provides inner-system information affecting the indicator. The present study investigates variability in WUI among provinces in China by clustering the principal components of the decomposed factors. Motivated by the index decomposition method, the authors decomposed WUI into seven factors: water use in agricultural, industrial, household and ecological sectors, exploitation rate of water resources, per capita water resources and population intensity. Those seven factors condense into four principal components under application of principal component analysis. Comprehensive WUI is calculated by these four components. Then the cluster analysis is applied to get different patterns in WUI. The principal components and the comprehensive intensity are taken as cluster variables. The number of clusters is determined to be three by applying the *k*-means clustering method and the *F*-statistic value. Variability in WUI is detected by implementing three clustering algorithms, namely *k*-means, fuzzy *c*-means and the Gaussian mixture model. WUI in China is clustered into three clusters by the *k*-means clustering method. Characteristics of each cluster are analyzed.

## INTRODUCTION

Water shortage crisis has become a key issue restricting China's sustainable development. In order to overcome the water crisis, China's authorities advocated Water-Saving Society Construction in 1998. The government then issued total volume control and quotas for agriculture irrigation, for high water-consumption industries. Water management decisions must be made regarding the amount of water required to maintain diversity and ecological characteristics of the ecosystem. In 2012, the Chinese government proposed the development strategy of Construction of Water Ecological Civilization, which takes water ecological civilization construction as its nucleus. In the following year, the water use limiting index was set up and this target has been included in the future medium- and long-term plans for national economical and social development.

Water use has gained substantial attention. Focus lies mainly on the relationship between economic growth and water resources. A range of single or mixed methods, such as data envelopment analysis (Bian *et al.* 2014) and stochastic frontier analysis (Ferro *et al.* 2014), have been applied. Based on structural decomposition analysis, Cazcarro *et al.* (2013) found that growth in Spanish demand would have implied an increase in water. By using index number analysis, Luyanga *et al.* (2006) found that despite considerable increase in water tariffs, sectoral water efficiency in Namibia had not improved. Wang *et al.* (2014) analyzed the relationship between economic sectors and water use based on an input–output model. Some researchers have studied the factors of water use from the perspective of water footprint (Hubacek *et al.* 2009; Wang *et al.* 2013).

The goal in this study is to find a regional pattern of water use intensity (WUI) in China. As the largest developing country, there are great economic development disparities among provinces. Water resources exhibit highly heterogeneous geographical distributions. Most important of all, contradictions between environmental protection and economic development lead to variations in performance in different regions. Therefore, finding the differences in WUI among regions is vital for China's water management policy so that it can adapt to different regions. Water use in this study refers to water diverted or withdrawn from its source. In this article, water use is equivalent to water supply assuming there is no loss of water. WUI is defined as the water use per economic output and thus it should be influenced by several factors. The goal is achieved through two steps. The first is to identify the contributions of different influencing factors and their combination to the WUI. The second is to apply cluster analysis to get different patterns in WUI.

The IPAT model will be extended to identify factors of WUI. The IPAT model (Chertow 2001) represents environmental impact (I) as the product of population (P), affluence (A) and technology (T). Commoner *et al.* (1971) were the first with the algebraic formulation and the application of it to data analysis. Noting that not all factors are significant, principal component analysis (Syms 2008) will be applied to find out some new directions from those factors.

Cluster algorithm groups, or clusters, objects according to measured or perceived intrinsic characteristics or similarity. In recent years, a variety of clustering approaches (Berkhin 2006; Shamshirband *et al.* 2015) have been developed for applications in statistical data analysis. Three clustering approaches (-means, fuzzy -means and the Gaussian mixture model) are applied to determine regions with similar WUI patterns. Therefore, this study can fill the knowledge gaps of WUI in China.

## METHODOLOGY

### WUI factor identification

WUI measures the pressure of the economy on water resources in terms of the volume of water per unit of value added. Let *I* be the WUI. Then , where *S* is water use in physical units and *Y* represents the value of gross domestic product (GDP).

*G*signifies the water use of a region,

*R*represents the total water resources, and

*P*is the population size.

The formula gives a simple tool for understanding the components of water use intensity: the term is called the structure distribution of water use; is agricultural, industrial, household and ecological structure of water use; represents the exploitation rate of water resources; means per capita water resources; and can be termed the population intensity of a region.

### Principal component analysis of WUI

*X*is given as Since the first few principal components contain more of the variance than the later components, can be reasonably well approximated by including only the first few components in (2).

k -means clustering, fuzzy c -means clustering and the Gaussian mixture model

In general, the goal of using cluster methods is to find an optimal grouping for which the observations or objects within each cluster are similar, but the clusters are dissimilar to each other.

Mathematically, -means clustering is an optimization problem: find the *k*-cluster centers and assign the objects to the nearest cluster center, such that the squared error between the empirical mean of a cluster and the points in the cluster is minimized. In -means clustering, clusters are represented by a central vector, which may not necessarily be a member of the data set.

Fuzzy -means clustering allows one object to partially belong to more than one cluster (Sadri & Burn 2011). The aim of this clustering algorithm is to achieve a minimized total intra-cluster variance.

In Gaussian mixture models, clusters are defined as objects belonging most likely to the same Gaussian distribution (Krishnamurthy n.d.); *k*-clusters are found by learning the *k*-Gaussian distributions. In this paper, two-dimensional Gaussian mixture models are estimated for WUI.

### Parameter specification

*F*-statistic value criteria. Suppose there are

*r*clusters in the

*k*-means clustering where the operator means taking the integer part). For the th cluster, let be the th object, be the number of objects, and be its center. Let be the center of the whole data set. Then the

*F*-statistic is defined as The clustering is significant if , where is the significance level, is the upper critical value of the

*F*-distribution with parameter . The optimal number of clustering

*k*is selected by maximizing .

## EXPERIMENTAL RESULTS

### Data

Seven influencing factors (agricultural structure of water use, industrial structure of water use, household structure of water use, ecological structure of water use, exploitation rate of water resources, per capita water resources and population intensity) were used to analyze WUI in China. Data were obtained from the China Statistical Yearbook for the time period 2004–2013, including provinces except Hong Kong, Macao and Taiwan. Economic output is measured by current price and population size is counted by resident population.

All provinces shared a similar decreasing trend in WUI in the study period. The original data were averaged over the time period to decrease the temporal variability. Data then were normalized by *Z*-score methods.

### Principal component analysis

The WUI indicators were computed at provincial level by using the normalized data. Let be the normalized value of .

Eigenvalues of are 2.8489, 1.4789, 1.1643, 0.7910, 0.5175, 0.1994 and 0. The values are relatively low after the fourth component. The sum of the first four eigenvalues exceeds 85%, leading to the conclusion that the data set is condensed into a reduced set of four new variables , which are called principal components.

Component loadings are shown in Table 1. The columns refer to the principal components while the rows represent factors of WUI. The number marked with an asterisk indicates the greatest absolute value in each column.

. | Principal component . | |||
---|---|---|---|---|

Variable . | 1st . | 2nd . | 3rd . | 4th . |

Agricultural structure of water use | 0.9080* | 0.3788 | 0.0887 | −0.1066 |

Industrial structure of water use | −0.6762 | −0.6970 | 0.1332 | 0.0964 |

Household structure of water use | −0.7970 | 0.1918 | −0.4132 | 0.0790 |

Ecological structure of water use | −0.4784 | 0.7985* | −0.1953 | −0.0129 |

Exploitation rate of water resources | −0.0968 | 0.2153 | 0.8748* | 0.2468 |

Per capita water resources | 0.5028 | −0.0457 | −0.3134 | 0.8027* |

Population intensity | 0.6640 | −0.3562 | −0.2575 | −0.2424 |

. | Principal component . | |||
---|---|---|---|---|

Variable . | 1st . | 2nd . | 3rd . | 4th . |

Agricultural structure of water use | 0.9080* | 0.3788 | 0.0887 | −0.1066 |

Industrial structure of water use | −0.6762 | −0.6970 | 0.1332 | 0.0964 |

Household structure of water use | −0.7970 | 0.1918 | −0.4132 | 0.0790 |

Ecological structure of water use | −0.4784 | 0.7985* | −0.1953 | −0.0129 |

Exploitation rate of water resources | −0.0968 | 0.2153 | 0.8748* | 0.2468 |

Per capita water resources | 0.5028 | −0.0457 | −0.3134 | 0.8027* |

Population intensity | 0.6640 | −0.3562 | −0.2575 | −0.2424 |

* indicates the greatest absolute value in the column.

*t*

_{2}as the ecological component of water use. For , the coefficient of is the largest number, which reflects the supply ability of water resources. The authors defined to be the water supply capacity component. For , the coefficient of is much greater than the others, which indicates the abundance of water resources, so the authors called the carrying capacity component.

### Variability in WUI

Regional variability in WUI was studied by the clustering method. The four principal components and the comprehensive component were selected as clustering variables. The clustering process was carried out on a PC using MATLAB 2012(a) software. Random initiation was used.

The clustering number was determined by using *k*-means clustering. For clustering numbers , the differences are 2.7319, 10.7960, 7.8990 and 5.4394. The optimal value was obtained since it corresponded to the greatest difference.

Then *k*-means, fuzzy *c*-means and the Gaussian mixture model were applied to study the variability of WUI. Each clustering was repeated 50 times with clustering number . The one with the maximal -statistic was selected as the clustering result and the -statistic values averaged to provide the final estimate.

The average and the maximal -statistic of the -means, fuzzy *c*-means and Gaussian mixture model are 8.6609, 4.2398, 8.8242 and 12.0543, 8.3612, 10.3283, respectively. All clustering results are significant at the 95% confidence level. The greater the *F*-statistic, the better is the performance of the clustering algorithm, and *k*-means clustering and fuzzy *c*-means clustering performed better. But the maximal -statistic was obtained in *k*-means clustering. Therefore, the final clustering result was determined by the *k*-means clustering with the maximal *F*-statistic.

Figure 3 shows comprehensive WUI for each province. The horizontal axis refers to provinces and the vertical axis represents comprehensive WUI. There is a downtrend in the WUI from cluster 1 to cluster 3. Each province in cluster 3 has negative comprehensive water use intensity. Thus provinces in cluster 3 have the best performance in WUI and those in cluster 1 have the worst.

From the above, one can see that provinces having the worst performance in WUI are located in the western and the northern parts of China. The best performing provinces are mainly developed areas (except Chongqing). But three developed provinces, namely, Guangdong, Jiangsu and Shandong are in the average-performing cluster.

## CONCLUSIONS AND DISCUSSION

### Influencing variables of WUI

Among the seven influencing variables, those in economic sectors or activities have more important roles. Equation (7) shows that agricultural structure of water use is the principal contributor to WUI, followed by those in industrial and household sectors.

Performance in WUI is almost proportional to the socio economic status of provinces. Developed areas in China have the best performance in WUI. Ten undeveloped provinces perform worst in WUI. WUI in other provinces is medium-sized.

### Contributors to regional variations in WUI

Component loadings (Table 1) show that regional variations in WUI are mainly caused by the agricultural structure of water use and the ecological structure of water use. The ecological structure of water use plays a secondary but never negligible role.

Variability in WUI coincides with differences in ecological status. Neighboring provinces share similar WUI levels due to similar socio-economic and ecological status. WUI decreases from the northwest to the southeast, showing the characteristics of a highly heterogeneous geographical distribution. Having a vast territory, eco-environmental problems in China are closely related to regional geographical characteristics (Ouyang *et al.* 2000). Provinces in cluster 1 (except Tibet) suffer most from salinization and desertification. Many provinces in cluster 2 have serious soil erosion problems. The worse the eco-environmental problems, the less that water use has ecological functions and the greater is the WUI. Thus three developed provinces, Guangdong, Jiangsu and Shandong, unlike other developed provinces, are all in the average-performance cluster because eco-environmental problems in these provinces are moderate, neither very serious nor minor.

### Implications for the water ecological civilization of China

The decomposition of the influencing factors of WUI and analysis of regional variability provides useful references for examining the effects of existing water-saving policies and supporting the formulation of future policies for the Water Ecological Civilization of China.

To lower water use intensity, the agricultural sector remains focal as the agricultural structure of water use has the largest and positive coefficient in comprehensive WUI. It is shown that China's proportion of agriculture in GDP dropped to 9.2% in the year 2014, which is just a turning point of the national economy. Thus reducing the size of the agricultural sector, promoting water-saving irrigation and implementing water-saving agricultural technologies should be placed high on the agenda of China's authorities.

The secondary activity to lower water use intensity should be industrial restructuring. More efforts should be paid to reducing or controlling the size of high water consumption industries such as the power, paper-making, metallurgy, chemicals and textile (silk) industries.

Since the ecological structure of water use plays a second important role in regional variability, local policy makers should take into consideration the construction of a water ecological civilization according to its own ecological characteristics. Although each province should implement the strictest water resources management system, there should be a greater difference in emphasis for different clusters. For example, provinces in cluster 1 should place more emphasis on restoration and protection of aquatic ecosystems. For cluster 2, it is water and soil conservation construction and construction of water conservancy areas for cluster 3.

Finally, it should be pointed out that a more detailed breakdown in WUI may provide more instruction for comparing WUI among those provinces. Also, other decomposition and clustering techniques could be applied to gain comparable results if compelling reasons are provided.

## ACKNOWLEDGEMENTS

The research work is supported by the National Natural Science Foundation of China (51276081).