## Abstract

Household water consumption plays an important role in addressing the problem of water shortage and achieving sustainable water development. To identify, assess, and analyze the impact of a family structure on household water consumption, this study develops a mathematical statistical method to conduct multi-scenario simulations of average annual household water consumption based on data from the 2016 China Family Panel Studies (CFPS). The Kolmogorov–Smirnov test and the two independent sample *t*-tests were used to obtain the distribution with the highest degree of fitting, and the probability distribution and expected value of average annual household water consumption were obtained from the distribution probability function. The results demonstrated that the Birnbaum–Saunders distribution was the optimal distribution; families comprising one and two generations were dominant in terms of water consumption; and the number of water-saving households was far less than that of households with high levels of water consumption. The findings of this study have valuable implications for water governance and the domestic water planning.

## HIGHLIGHTS

This study develops a mathematical statistical method to conduct multi-scenario simulations of average annual household water consumption based on data from the 2016 China Family Panel Studies (CFPS).

The mathematical statistical method was used to compare the degree of fit between the samples of water consumption of various families and the distribution of various functions.

## INTRODUCTION

Water shortage is a common challenge worldwide. The sixth goal of Sustainable Development Goals (SDGs) is to ensure the availability and sustainable management of water and sanitation. Water shortage is caused not only by natural factors but also by human and social factors (Corbella & Pujol 2009). Domestic water accounts for less than 10% of all human water consumption. Domestic water includes urban and rural domestic water. Specifically, it includes water for residential, public, and livestock purposes. There are significant regional differences in global water consumption, and people from different countries have diverse levels of water consumption. China has one of the highest water consumption levels in the world. According to the 2019 China Water Resources Bulletin, China's total water consumption was 602.12 billion m^{3}, of which 87.17 billion m^{3} was domestic water, accounting for 14.4% of the total water consumption. In 2019, China's population was 1.398 billion, compared with the global population of 7.673 billion, accounting for 18.2% of the global population. China's per capita total water consumption is 431 m^{3}. The urban per capita domestic water consumption was 225 L/day, and the per capita domestic water consumption of rural residents was 89 L/day. This substantial water consumption level requires a detailed longitudinal analysis of household water consumption. Therefore, examining household water consumption in China is of great significance not only for China but also for global sustainable water development.

There is extensive literature on household water consumption regarding individual consumption preferences, willingness to pay, water-saving cognition, and institutional culture (Basu *et al.* 2017; Vieira *et al*. 2018; Garcia *et al.* 2019; Liao *et al*. 2021). The characteristics and influencing factors of household water consumption are hot topics in water governance and domestic water planning. Existing studies analyze household water consumption mainly from the supply and demand sides. From the perspective of the water demand side, consumers' attitudes toward water savings are closely related to the degree of sustainable water development. Social and demographic characteristics (Jorgensen *et al*. 2010), family characteristics, and water resource utilization and protection (Syme *et al.* 2004) are the key factors that have impacted the changes in household water consumption. Willis *et al.* (2011) used econometrics, questionnaire surveys, factor analysis, and cluster analysis to explore the relationship between water-saving attitudes and household water consumption based on 132 independent households on the Gold Coast of Australia and found that residents with positive attitudes toward water saving had significantly less water consumption than those with negative attitudes. Fielding *et al.* (2012) collected the water consumption data of 1,008 households in Australia and found that factors such as population, social psychology, behavior, occupancy rate, and infrastructure determine household water consumption. Studies have shown that demographic factors are a key element that affects household water consumption; families with a strong water-saving culture and habits consume less water. Salman found that there is a significant correlation between water consumption and variables such as household type and age, which is of great significance to the management of regional water resources. Dudkiewicz & Laska (2019) used the input–output analysis method to quantitatively analyze the life cycle water consumption of Chinese households from 2002 to 2015 and found that demographic changes can reduce the household water consumption of rural families but increase the household water consumption of urban families. Shan *et al.* (2015) analyzed three major factors pertinent to the behavior of domestic water consumers: end-use behaviors, socio-demographic and property characteristics, and psychosocial constructs. In addition, the impact of behavior on household water consumption has received increasing attention in recent years (Cary 2008; Shahangian *et al.* 2022).

The management and study of the water supply side have always been key determinants of household water consumption. The main influencing factors include climate change and meteorology (Slavíková *et al.* 2013), water pricing, and water policy (Jorgensen *et al*. 2010). For example, the Chinese government believed that price is one of the important tools of water resource supervision for more than two decades. The sharp rise in water prices in China has improved water use efficiency. However, the implementation of water resource taxes has shortcomings such as unclear responsibilities, low collection rates, and poor governance capabilities (Olmstead *et al*. 2007). In 2013, China's National Development and Reform Commission pointed out that, although the wealthiest 5% of families were willing to pay three times or more for basic water consumption, about 80% of low-class families were unwilling to do so. Therefore, if the Chinese central government adopted reform measures to reduce the demand for large amounts of water, the water consumption of most families would not meet their basic needs because of the high prices. Jorgensen pointed out that the changes in water policy by the water authorities in England and Wales have clearly deviated from social equity. The new charging policy is unfair to low-income families, and the charging strategy does not fully consider the differences in supply and demand caused by social and geographic disparities (Jorgensen *et al.* 2014). Keshavarzi *et al.* (2006) quantitatively analyzed the impact of price and non-price factors on residential water demand through household survey data in 10 countries and found that there was a complementarity between household water-saving behaviors and average water prices. Martins & Fortunato (2005) analyzed panel data obtained from a 72-month survey of five communities in Portugal and established that household size was positively correlated with residential water demand. Although it has weak elasticity, price plays a role in water demand management. Zhang *et al.* (2017) used difference-in-differences models to evaluate China's water price reforms and found that the policy reform reduced annual residential water demand by 3–4% in the short run and by 5% in the long run. Lam (2010) found that consumers' subjectivity to water saving has a positive effect on alleviating water use in arid areas, and further asserted that household income and educational background have significant but inconsistent effects on water use.

At present, scholars generally study how consumers affect domestic water consumption by analyzing family members at the individual level. Clearly, individual differences in families do have a certain impact on household water consumption. However, in practice, most families are composed of multiple people, and the consumption of household resources, including water and energy, is actually the result of collective living consumption, and the household water consumption is not simply the sum of individual water consumption (Ren *et al.* 2016; Hu *et al.* 2020). In addition, household water consumption needs to consider the size of the family and the consumption differences of different age populations. Existing studies also indicate that there are scale effects, intergenerational effects, and marginal effects on household consumption of resources such as electricity (Hu *et al.* 2020; Wu *et al.* 2021). It is reasonable to assume that the same pattern applies to water consumption. Therefore, family structure, namely, combinations of different population sizes and different generations, have a crucial impact on household water consumption.

When a family consumes water resources, it is necessary to consider not only the size of the family, but also the needs of people of different ages to achieve a better balance of household water consumption. It is important to consider the preferences and needs of various family members in daily household water consumption, especially when different generations are living together, and it is particularly important to consider the difference between generations. Therefore, studying the simulation and prediction of the impact of family structure on household water consumption is crucial for an effective analysis of the changes in household water consumption. Finding the optimal household structure can provide theoretical support and guide the rational formulation of household water control standards and policies.

Currently, there is limited literature on the impact of family structure on household water consumption examined through mathematical statistical methods. Innovative statistical and machine learning methods have been introduced in the last years to analyze household water consumption (Duerr *et al.* 2018; Dimauro *et al.* 2022). Since household water consumption has different characteristics, finding a comprehensive and precise model for simulation and prediction that includes the main factors are challenging (Fontdecaba *et al.* 2013; Chenoweth *et al.* 2016). The novelty of this study is to calculate the annual household water consumption of different family types from the perspective of household structure based on the China Family Panel Studies (CFPS) data, to obtain the optimal distribution through mathematical function simulations, and finally to estimate the probability distribution and expected value of the average annual household water consumption of different family structures through probability density functions. This study provided a demographic profile of household water consumption and preliminarily revealed the correlation between family structure and water consumption by using statistical analysis and mathematical model fitting. The data were first cleaned and statistically analyzed, then the optimal model was selected using an iterative approach, and finally, the optimal Birnbaum–Saunders model was selected to fit the probability distribution of the family structure and household water consumption. The results of this study help to fully understand the impact of family structure on household water consumption, thereby providing a basis for precise policy implementation. Furthermore, another contribution of this study is conducted an extensive literature review of household water consumption and identified an optimal mathematical model based on a detailed database.

## MATERIALS AND METHODS

### Data source and processing

The research data were obtained from the 2016 CFPS. The CFPS is an open database provided by the Institute of Social Science Survey of Peking University, China. The CFPS is a national longitudinal survey of Chinese communities, families, and individuals. The CFPS is designed to collect individual-, family-, and community-level longitudinal data in contemporary China. The studies focus on the economic and non-economic well-being of the Chinese population, with a wealth of information covering topics such as economic activities, education outcomes, family dynamics and relationships, migration, and health. The CFPS has successfully interviewed almost 15,000 families and almost 30,000 individuals within these families, with an approximate response rate of 79%. All members over the age of 9 in a sampled household are interviewed. These individuals constitute the core respondents of CFPS.

To reduce the effects of contingency and ensure the validity and reliability of the results, the sample was preliminarily processed as follows. People in the sample who were not at home, residents that were non-economically related to the family were excluded. In addition, many families in China work in other cities all year round and only return home during holidays, which also needs to be excluded as the outliers. These families could not be directly identified in the sample, and we performed a simple calculation. The introduction has already introduced that the per capita water consumption of rural residents in China is 89 L/day. Suppose that a rural family has only one person, according to the minimum standard of water consumption, the annual water consumption is 32.5 m^{3}. Therefore, it can be assumed that the family with less than 30 m^{3} of water consumption is not in the local all year round, and those families need to be excluded from the sample. After the aforementioned data processing, 5,321 families were retained in the sample. The sample data showed that the numbers of families with one to four generations were 1,648, 2,025, 1,262, and 386, respectively. Among the samples, families with two generations accounted for the largest proportion, followed by families with one generation. Families with four generations were the least common (Table 1).

Number of generations in family . | Sample number . | Proportion of total sample (%) . |
---|---|---|

1 | 1,648 | 30.97 |

2 | 2,025 | 38.06 |

3 | 1,262 | 23.72 |

4 | 386 | 7.25 |

Total | 5,321 | 100 |

Number of generations in family . | Sample number . | Proportion of total sample (%) . |
---|---|---|

1 | 1,648 | 30.97 |

2 | 2,025 | 38.06 |

3 | 1,262 | 23.72 |

4 | 386 | 7.25 |

Total | 5,321 | 100 |

The data covered most of China's mainland areas (Table 2), with a total of 24 provinces. The sample distribution was relatively uniform. Nineteen regions had a sample size of more than 100, while in eight regions, the proportion of the sample size exceeded 5%. Gansu Province had the largest sample size (641), accounting for 12.05% of the total sample. Henan Province had the second largest sample with 604 families, accounting for 11.35% of the total sample. The sample numbers in Guangdong, Hebei, Liaoning, Shandong, and Shanghai were 423, 344, 585, 273, and 279, and they accounted for 7.95, 6.46, 10.99, 5.13, and 5.24% of the total sample, respectively. The region with the smallest sample number was Beijing, whose sample number was only 44, which accounted for 0.83% of the total sample. The sample number in the remaining regions accounted for 40% of the total.

Regions . | Sample number . | Proportion of total sample (%) . |
---|---|---|

Anhui | 109 | 2.05 |

Beijing | 44 | 0.83 |

Fujian | 59 | 1.11 |

Gansu | 641 | 12.05 |

Guangdong | 423 | 7.95 |

Guangxi | 111 | 2.09 |

Guizhou | 197 | 3.70 |

Hebei | 344 | 6.46 |

Henan | 604 | 11.35 |

Heilongjiang | 173 | 3.25 |

Hubei | 106 | 1.99 |

Hunan | 157 | 2.95 |

Jilin | 119 | 2.24 |

Jiangsu | 98 | 1.84 |

Jiangxi | 108 | 2.03 |

Liaoning | 585 | 10.99 |

Shandong | 273 | 5.13 |

Shanxi | 120 | 2.26 |

Shaanxi | 376 | 7.07 |

Shanghai | 279 | 5.24 |

Sichuan | 42 | 0.79 |

Tianjin | 155 | 2.91 |

Yunnan | 122 | 2.29 |

Zhejiang | 76 | 1.43 |

Others | 109 | 2.05 |

Total | 5,321 | 100 |

Regions . | Sample number . | Proportion of total sample (%) . |
---|---|---|

Anhui | 109 | 2.05 |

Beijing | 44 | 0.83 |

Fujian | 59 | 1.11 |

Gansu | 641 | 12.05 |

Guangdong | 423 | 7.95 |

Guangxi | 111 | 2.09 |

Guizhou | 197 | 3.70 |

Hebei | 344 | 6.46 |

Henan | 604 | 11.35 |

Heilongjiang | 173 | 3.25 |

Hubei | 106 | 1.99 |

Hunan | 157 | 2.95 |

Jilin | 119 | 2.24 |

Jiangsu | 98 | 1.84 |

Jiangxi | 108 | 2.03 |

Liaoning | 585 | 10.99 |

Shandong | 273 | 5.13 |

Shanxi | 120 | 2.26 |

Shaanxi | 376 | 7.07 |

Shanghai | 279 | 5.24 |

Sichuan | 42 | 0.79 |

Tianjin | 155 | 2.91 |

Yunnan | 122 | 2.29 |

Zhejiang | 76 | 1.43 |

Others | 109 | 2.05 |

Total | 5,321 | 100 |

### Calculation of household water consumption

*P*, the tier 1, tier 2, and tier 3 water prices were represented as

*P*

_{1},

*P*

_{2}, and

*P*

_{3}, respectively. The corresponding cap of tiered water consumption at all three levels were

*T*

_{1},

*T*

_{2}, and

*T*

_{3}, respectively. Thus, the equations for household water consumption at all three levels, represented by

*C*

_{1},

*C*

_{2}, and

*C*

_{3}, respectively, and the annual household water consumption amount

*T*were calculated as follows:

Table 3 presents the descriptive statistics of the household residential water consumption.

. | Families with one generation . | Families with two generations . | Families with three generations . | Families with four generations . |
---|---|---|---|---|

Average | 183.25 | 191.02 | 209.09 | 178.08 |

Standard error | 135.69 | 147.35 | 174.41 | 151.54 |

Maximum | 1,926.00 | 2,372.93 | 1,442.43 | 1,488.04 |

Minimum | 30.00 | 30.00 | 32.97 | 32.97 |

Range | 1,896.00 | 2,342.93 | 1,409.47 | 1,455.08 |

Median | 153.19 | 158.35 | 171.43 | 132.26 |

Mode number | 125.00 | 102.13 | 137.14 | 34.78 |

Coefficient of variation | 0.74 | 0.77 | 0.83 | 0.85 |

Kurtosis | 24.94 | 38.89 | 11.75 | 20.43 |

Skewness | 3.21 | 4.13 | 2.75 | 3.44 |

. | Families with one generation . | Families with two generations . | Families with three generations . | Families with four generations . |
---|---|---|---|---|

Average | 183.25 | 191.02 | 209.09 | 178.08 |

Standard error | 135.69 | 147.35 | 174.41 | 151.54 |

Maximum | 1,926.00 | 2,372.93 | 1,442.43 | 1,488.04 |

Minimum | 30.00 | 30.00 | 32.97 | 32.97 |

Range | 1,896.00 | 2,342.93 | 1,409.47 | 1,455.08 |

Median | 153.19 | 158.35 | 171.43 | 132.26 |

Mode number | 125.00 | 102.13 | 137.14 | 34.78 |

Coefficient of variation | 0.74 | 0.77 | 0.83 | 0.85 |

Kurtosis | 24.94 | 38.89 | 11.75 | 20.43 |

Skewness | 3.21 | 4.13 | 2.75 | 3.44 |

Unit: m^{3}/year.

^{1}, and 7.25% of all the families with four generations in the samples of this study, which exceeds the national average. Second, considering the population structure, there are generally very old people and very young children in four-generation families, which consume less water, while families of three generations and below are mainly young adults, who consume more water. At last, there is an effect of diminishing marginal water use in family structure, and four generations are a critical inflection point. The Kurtosis coefficients of all four categories of families were greater than zero, thereby indicating that there were few instances of extreme data on both sides. The distribution of annual household water consumption was lower than the normal distribution, thereby showing a sharp peak distribution. The skewness coefficients were all greater than zero, the peak of the frequency distribution was shifted to the left side, and the long tail extended to the right side, which indicates a positive skewness distribution. The coefficients of variation of household water consumption of families with one to four generations were all less than one, thereby indicating that the sample data were relatively concentrated and representative (Figure 1).

### Building the fitting model

*et al*. 2021; Wee

*et al*. 2021). The consistency test function was as follows:where

*C*represents the household water consumption of families with different generation numbers obeying a certain distribution.

_{i}*F*refers to the accumulation of the distribution of probability density function,

*I*is the confidence interval, and

_{α}*ρ*is the probability of significance in the Kolmogorov–Smirnov test,

*ρ*ɛ [0, 1]. The frequency of

*ρ*≥ 0.05 indicated that the number of distribution acceptance obtained by the simulation was high, there were no significant differences between the sample distribution from the actual data and the distribution derived by the simulation, and the degree of fit was high. When the frequency of

*ρ*< 0.05, at a high rejection frequency, there were significant differences between the data distribution from the actual data and the simulated distribution of the sample data, and the degree of fit was low. Eight distributions have been tested including Birnbaum–Saunders distribution, Burr distribution, Gamma distribution, Generalized Extreme Value distribution, inverse Gaussian distribution, Log-Logistic distribution, lognormal distribution, and t Location-Scale distribution. At last, the distribution with more rejections was directly excluded, and the three candidate distributions with the highest number of acceptance and the highest degree of fit was retained, namely, the Birnbaum–Saunders, lognormal, and inverse Gaussian distributions.

*Φ*(x) is the distribution function of the standard normal distribution. Setting as a simple random sample group with a sample size of N from the population of the B–S distribution, the observed values are , respectively, at each point in time. Thus, the shape and scale parameters could be determined using the maximum likelihood method. The maximum likelihood function is expressed as follows:

## RESULTS

### Characteristic of household water consumption

The Birnbaum–Saunders distribution, inverse Gaussian distribution, and lognormal distribution were chosen as the candidate distributions after comparing the degree of fit of various distributions of household water consumption.

### Analysis of distribution simulation

To obtain the optimal distribution, further optimization of the three candidate distributions was required. The specific optimization method is as follows:

First, the Monte Carlo method was used to perform several simulations using the sample data of household water consumption. Then, the consistency test between sample data and simulation data distribution was conducted using the two-sample Kolmogorov–Smirnov test and the two independent sample *t*-tests. Five groups of Kolmogorov–Smirnov tests were conducted based on the data of household water consumption and three candidate distributions; each group of simulations was performed 100 times. Thus, a large number of simulation cycles effectively guaranteed the stability and reliability of the calculation results. Table 4 shows the number of simulations with the results of *P* ≥ 0.05. In each simulation, if *P* ≥ 0.05, the distribution of the simulation was acceptable. Consequently, the degree of acceptance of the candidate distribution depended on the time of the simulation with the results of *P* ≥ 0.05. As shown in Table 4, the first data indicates that there existed 95 times with the results of *P* ≥ 0.05 in the first simulation cycle of the sample data of household water consumption for families with one generation using the Birnbaum–Saunders distribution. Based on the principle that the higher the number of acceptances, the better the simulation result, it can be found that the acceptances of all three candidate distributions differed very little and the degree of fit was high.

Family types . | Distribution functions . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . |
---|---|---|---|---|---|---|---|---|---|---|---|

Families with one generation | B–S | 95 | 94 | 97 | 97 | 94 | 91 | 93 | 95 | 94 | 97 |

I–G | 93 | 91 | 89 | 97 | 91 | 87 | 89 | 93 | 96 | 90 | |

Lognor | 90 | 91 | 92 | 94 | 94 | 90 | 88 | 90 | 91 | 90 | |

Families with two generations | B–S | 94 | 94 | 89 | 87 | 96 | 94 | 95 | 91 | 92 | 93 |

I–G | 93 | 86 | 88 | 93 | 93 | 89 | 91 | 87 | 89 | 89 | |

Lognor | 93 | 90 | 93 | 88 | 89 | 95 | 91 | 91 | 94 | 91 | |

Families with three generations | B–S | 88 | 97 | 96 | 92 | 93 | 93 | 95 | 93 | 91 | 96 |

I–G | 94 | 95 | 90 | 88 | 94 | 89 | 92 | 95 | 91 | 97 | |

Lognor | 94 | 94 | 89 | 93 | 99 | 95 | 93 | 92 | 95 | 91 | |

Families with four generations | B–S | 100 | 97 | 95 | 99 | 98 | 98 | 93 | 97 | 97 | 94 |

I–G | 96 | 93 | 95 | 94 | 100 | 97 | 94 | 99 | 98 | 98 | |

Lognor | 95 | 98 | 97 | 94 | 96 | 100 | 97 | 98 | 94 | 99 |

Family types . | Distribution functions . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . |
---|---|---|---|---|---|---|---|---|---|---|---|

Families with one generation | B–S | 95 | 94 | 97 | 97 | 94 | 91 | 93 | 95 | 94 | 97 |

I–G | 93 | 91 | 89 | 97 | 91 | 87 | 89 | 93 | 96 | 90 | |

Lognor | 90 | 91 | 92 | 94 | 94 | 90 | 88 | 90 | 91 | 90 | |

Families with two generations | B–S | 94 | 94 | 89 | 87 | 96 | 94 | 95 | 91 | 92 | 93 |

I–G | 93 | 86 | 88 | 93 | 93 | 89 | 91 | 87 | 89 | 89 | |

Lognor | 93 | 90 | 93 | 88 | 89 | 95 | 91 | 91 | 94 | 91 | |

Families with three generations | B–S | 88 | 97 | 96 | 92 | 93 | 93 | 95 | 93 | 91 | 96 |

I–G | 94 | 95 | 90 | 88 | 94 | 89 | 92 | 95 | 91 | 97 | |

Lognor | 94 | 94 | 89 | 93 | 99 | 95 | 93 | 92 | 95 | 91 | |

Families with four generations | B–S | 100 | 97 | 95 | 99 | 98 | 98 | 93 | 97 | 97 | 94 |

I–G | 96 | 93 | 95 | 94 | 100 | 97 | 94 | 99 | 98 | 98 | |

Lognor | 95 | 98 | 97 | 94 | 96 | 100 | 97 | 98 | 94 | 99 |

For the three candidate distributions, the two independent sample t-tests were used to optimize the distributions of sample data for families with one to four generations. Table 5 shows the significance coefficient (double-tails) of the candidate distribution, i.e., the *P* value obtained. Here, *P*_{BI} refers to the significance coefficient of double independent *t*-test of the Birnbaum–Saunders distribution and the inverse Gaussian distribution (double-tails), *P*_{BL} refers to the significance coefficient of double independent *t*-test of the Birnbaum–Saunders distribution and the lognormal distribution (double-tails), and *P*_{IL} refers to the significance coefficient of double independent t-test of the inverse Gaussian distribution and the lognormal distribution. If the *P* value was less than 0.05, there was a significant difference between the two distributions. If the *P* value was less than 0.01, the two distributions were considered to have a statistically significant difference. If the *P* value was greater than 0.05, there was no significant difference between the two distributions. Table 5 compares the data and shows that there was a significant difference between the Birnbaum and Saunders distribution and the inverse Gaussian distribution when the double independentt-test was performed on the household water consumption samples of families with one generation. The average simulated acceptance number of the Birnbaum–Saunders distribution was 86.16, which was higher than that of the inverse Gaussian distribution (83.36). Thus, the Birnbaum–Saunders distribution was superior to the inverse Gaussian distribution. When the value of PBL is 0.00 (less than 0.01), it can be considered that there is an extremely significant statistical difference between the Birnbaum–Saunders distribution and the lognormal distribution. As the average number of acceptances of the simulated results for the Birnbaum–Saunders distribution was higher than the lognormal distribution, the Birnbaum–Saunders distribution was regarded as being superior to the lognormal distribution. Among the results of double independent t-tests of household water consumption of the families with two generations, the number of simulated acceptances (when PBI was less than 0.05) of the Birnbaum–Saunders distribution was higher than that of the inverse Gaussian distribution, which indicates that the Birnbaum–Saunders distribution was superior to the inverse Gaussian distribution. When the double independent *t*-test was performed on the household water consumption of families with three or four generations, the *P* values were all greater than 0.05, thereby indicating that there was no significant difference between the three candidate distributions. All distributions could be considered as suitable distributions. Based on the simulations of families with one or two generations, it can be concluded that the Birnbaum–Saunders distribution has the highest degree of fit for the distribution pattern of household water consumption for families with different generations. Thus, the Birnbaum–Saunders distribution was the optimal distribution in this study.

. | Families with one generation . | Families with two generations . | Families with three generations . | Families with four generations . |
---|---|---|---|---|

Number of acceptances in the B–S distribution | 86.18 | 84.27 | 85.18 | 88.36 |

Number of acceptances in the I–G distribution | 83.36 | 81.82 | 84.36 | 88.00 |

Number of acceptances in the lognormal distribution | 82.82 | 83.36 | 85.27 | 88.36 |

P_{BI} | 0.02 | 0.04 | 0.49 | 0.70 |

P_{BL} | 0.00 | 0.39 | 0.94 | 1.00 |

P_{IL} | 0.61 | 0.13 | 0.44 | 0.69 |

. | Families with one generation . | Families with two generations . | Families with three generations . | Families with four generations . |
---|---|---|---|---|

Number of acceptances in the B–S distribution | 86.18 | 84.27 | 85.18 | 88.36 |

Number of acceptances in the I–G distribution | 83.36 | 81.82 | 84.36 | 88.00 |

Number of acceptances in the lognormal distribution | 82.82 | 83.36 | 85.27 | 88.36 |

P_{BI} | 0.02 | 0.04 | 0.49 | 0.70 |

P_{BL} | 0.00 | 0.39 | 0.94 | 1.00 |

P_{IL} | 0.61 | 0.13 | 0.44 | 0.69 |

Currently, the B–S distribution is widely used in reliability statistical analysis, and its characteristics are as follows: (a) arises from the process of fatigue, (b) the density function of the B–S distribution is skewed to the right side, which is consistent with the descriptive statistical results of the sample data of household water consumption, and further illustrates the advantages of the B–S distribution in this context, (c) the failure probability function is in an inverted bathtub shape, and (d) the scale parameter is its median.

### Probability distribution and expectation of household water consumption

When the number of generations in a family increased from one to three, household water consumption increased gradually. The highest household water consumption of 215.16t was observed in families with three generations. When the number of generations in a family further increased to four, household water consumption decreased greatly. Household water consumption of families with four generations was the lowest (173.37*t*). Thus, it can be concluded that families with three generations are high household water consumption families, while families with four generations are water-conservation families. Household water consumption of families with one and two generations is between that of the other two types of families.

The probability of the occurrence of high levels of water consumption in families with one to two generations gradually increases, while that in families with two to four generations gradually decreases. The probability of the occurrence of high levels of water consumption in families with two generations was the highest (38%), and the probability of the occurrence of high levels of water consumption in families with four generations was the lowest (0.06). It can be inferred that currently, families in China are more inclined to be with fewer generations or contemporaneous residences.

*x*represents the sample data of household water consumption for families with one to four generations. The integrals of are the mathematical expectations for household water consumption, and Figure 5 shows the calculated results. The average household water consumption of each generation number in families decreased gradually and reached the lowest value in families with four generations. The average household water consumption of families with one to three generations increases gradually and reaches the highest value in families with three generations. When the number of generations in families increased from three to four, the average household water consumption decreased significantly and reached the lowest value. It was found that the expected household water consumption of families with one to four generations is consistent with the probability distribution, indicating that the fitted Birnbaum–Saunders distribution is reasonable and feasible for describing the effect of the number of generations in families on household water consumption.

In general, families with one and two generations were dominant in terms of household water consumption, with a cumulative probability of about 67.5%. However, the number of water-saving households was far less than that of households with high levels of water consumption. In conclusion, it seems that the current dominant family structure and household lifestyle are not conducive to water saving.

### Reliability test

*p*> 0, the optimal Birnbaum–Saunders distribution is effective and feasible.

## CONCLUSION

Based on the 2016 CFPS data, this study classified the sample families according to the number of generations in a family. Then, a mathematical statistical method was used to compare the degree of fit between the samples of water consumption of various families and the distribution of various functions. Three candidate distributions were obtained. Finally, the optimal distribution was obtained by further comparison based on the test results. The probability distribution of the average annual household water consumption of the four types of families and the expected value of the household average and generation average were calculated using the probability density function of the optimal distribution.

The results of this study demonstrated that the Birnbaum–Saunders distribution is the optimal distribution among families with one to four generations. The average household water consumption of the three-generation family was the largest, and the average household water consumption of the four-generation family was the smallest. The number of water-saving households (four-generation families) was far less than that of households with high levels of water consumption (three-generation family). The families with one to two generations dominated household water consumption, and their water consumption accounted for about 67.5% of the total water consumption.

This study proved that family structure is an important factor influencing household water consumption, and the Birnbaum–Saunders distribution is an optimal fit between the family structure and household water consumption. The prediction of household water consumption is complicated, and this study is carried out from the perspective of household structure, which highlights the correlation between family structure and water consumption by statistical analysis and numerical simulation. Policymakers can use these predictions to devise effective policies to intervene in household water consumption to promote sustainable water development. For example, the results of this study indicate that the number of water-saving households was far less than that of households with high levels of water consumption. Therefore, Chinese government needs to strengthen education and policy incentives for household water-saving. In addition, results show that the highest household water consumption was observed in families with three generations in current China, and the Chinese government needs to pay attention to this phenomenon due to many factors including the impact of the one-child policy, the challenge of ageing, and the influence of China's traditional culture of ‘big family’. There are some limitations in this study need to improve in the following research. First, the data in this study are from China. Whether the research conclusions are applicable to other countries, especially developing countries, needs to be verified by data from other countries. Second, the methods in this study are statistical analysis and mathematical modeling, which need to be verified by other methods. Finally, policy interventions as important factors need to be further evaluated, such as the future impact of China's newly introduced three-child policy.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## ACKNOWLEDGEMENTS

All co-authors deeply mourn the passing of the first author Ms. Mei Wang. May her soul rest in peace and light forever.

## REFERENCES

## Author notes

^{†}

deceased