Abstract
Population movement, such as commuting, can affect water supply pressure and efficiency in modern cities. However, there is a gap in the research concerning the relationship between water use and population mobility, which is of great significance for urban sustainable development. In this study, we analyzed the spatial–temporal dynamics of the population and its underlying mechanisms, using multi-source geospatial big data, including Baidu heat maps (BHMs), land use parcels, and point of interest. Combined with water consumption, sewage volume, and river depth data, the impact of population dynamics on water use was investigated. The results showed that there were obvious differences in population dynamics between weekdays and weekends with a ratio of 1.11 for the total population. Spatially, the population concentration was mainly observed in areas associated with enterprises, industries, shopping, and leisure activities during the daytime, while at nighttime, it primarily centered around residential areas. Moreover, the population showed a significant impact on water use, resulting in co-periods of 24 h and 7 days, and the water consumption as well as the wastewater production were observed to be proportional to the population density. This study can offer valuable implications for urban water resource allocation strategies.
HIGHLIGHTS
Analysis of spatiotemporal population distribution and mobility based on the Baidu heat map.
Population dynamics mechanisms related to land use.
A novel idea exploring the impact of population dynamics on water use.
Valuable implications for optimizing and controlling water supply and wastewater treatment systems.
INTRODUCTION
Rapid urbanization worldwide has led to a significant influx of people into cities presenting new issues in urban planning and the environment. These issues include unbalanced social development, water scarcity, traffic congestion, and air pollution (Bao et al. 2023). Particularly, the problem of water scarcity has been exacerbated by the increasing water demand resulting from urban population growth, which is compounded by the issue of irrational water resource utilization (Sadeghi et al. 2023). Within the city, the unevenly distributed population can result in various water demands, as a result, highly populated regions will face tremendous pressure on water supply. For the efficient management of water resources, it is crucial to investigate the relationship between population distribution and water usage patterns, as it is meaningful for the planning of water supply and supporting sustainable city development (Bakchan et al. 2022).
Several factors can affect water consumption, such as population, climate, seasonality, economy, and water price (Rasifaghihi et al. 2020; Wang et al. 2020; Yu et al. 2023). Among them, the importance of the population was emphasized by previous studies (Atinkpahoun et al. 2018). The water consumption is not only affected by the total population but also affected by its dynamic behavior. For example, Yu et al. (2023) suggested that water consumption and wastewater treatment plant (WWTP) discharges in a certain area are closely related to the total population of the area, which varies over time due to human movement. Smolak et al. (2020) considered the impact of total population on water consumption. Some previous studies reported the differences in urban water consumption before and after the coronavirus (COVID-19) pandemic (Kalbusch et al. 2020; Bakchan et al. 2022), while others have compared the differences in population aggregation patterns (Jia et al. 2020; Zeng et al. 2021; Zhang et al. 2022). These studies indirectly proved that the changes in population behavior had a certain impact on water use. In addition, there were studies on the water use patterns in residential areas, and the results show that water use and wastewater discharge are related to the commuting and lifestyle habits of the population (Atinkpahoun et al. 2018). It has been reported that in the residential area, water consumption can peak just before office hours, decline in the afternoon, and then increase again during the evening (Kavya et al. 2023). The mechanisms between population dynamics and water use is a subject worthy of further research, with significant implications for optimizing and controlling water supply and wastewater treatment systems in high-population-density cities.
Nevertheless, although the relationship between the total population and water use has been demonstrated, there is still a gap in the precise quantitative analysis of the impact of population dynamics on water use. Similarly, population dynamics are not sufficiently taken into account in water use predictions. These can lead to an increase in the operating costs and energy consumption of water supply and WWTP systems, and may even lead to the pollution of the water environment if sewage cannot be treated effectively. This is primarily owing to the difficulty of characterizing the dynamics of population activity. In recent years, the rapid development of mobile devices and the abundance of location-based services (LBS) big data enabled researchers to analyze human mobility patterns with finer temporal resolution (Gu et al. 2018). This also provides the possibility to study the impact of population mobility on urban water consumption and WWTP emissions. Baidu heat map (BHM) is one of the most popular dynamic LBS data and has gained extensive applications in studies related to behavioral characteristics and the spatiotemporal distribution of urban residents due to its good accessibility (Li et al. 2019; Bao et al. 2023). Previous studies have shown that the spatiotemporal dynamic distribution of populations is influenced by residents' behaviors and is subject to specific spatiotemporal rules, and most behaviors are predictable (Song et al. 2010; Zhang et al. 2023). For example, residents frequently commute among some fixed locations such as residential and workplaces (Kung et al. 2014). And they may prefer to go to leisure places or rest at home on weekends. The spatiotemporal changes in population density are influenced by differences in population activities and spatial functions (Shi et al. 2020), which can be analyzed by combining geospatial data, such as point of interest (POI) data. With these multi-source big data and water use data, the relationship between water use and population dynamics can be explored.
Based on these contexts, this study aims to analyze the influence of spatiotemporal dynamic changes in the population on urban water usage using the northern part of the Haidian District in Beijing, which is one of the biggest cities in China, as the study area. Using BHM data, we analyzed the temporal and spatial distribution characteristics of the population on weekdays and weekends. Subsequently, combined with POI data, we investigated the underlying mechanisms of population dynamics by analyzing the relationship between population and land use. Then, water consumption and sewage volume data were used to analyze the relationship between water use and population dynamics. The main contribution of this study to the current knowledge is the innovative empirical analysis of the relationships between spatiotemporal population dynamics and urban water usage using multi-source geospatial big data to provide valuable insights for the optimal allocation of urban water resources.
The remainder of this article is organized as follows. The section ‘Materials and Methods’ introduces the study area, data sources, and the employed methods. The section ‘Methods’ sequentially analyzes the spatiotemporal dynamics of the population, the relationship between population density and land use, and the relationship between water use and population. The section ‘Results’ discusses urban water usage under the influence of population dynamics. Finally, the conclusions are given in the last section.
MATERIALS AND METHODS
Study area
Data and preprocessing
BHM data
Baidu is the largest search engine company in China with a massive user base and influence, holding a 66% market share in the Chinese search engine market. As of 2022, over 90% of internet users in China are Baidu users. In 2011, Baidu launched a type of big data visualization product called BHM (Li et al. 2019). This product utilizes location information from users accessing Baidu services like Baidu Maps, Baidu Search, Baidu Weather, and more (Lyu & Zhang 2019). By calculating the calorific value of human flows in different areas, the data are visualized on Baidu Maps. The BHM is updated every 15 min, allowing for a real-time representation of the heat of the crowd in specific areas and providing dynamic population distribution information (Zhang et al. 2020). Due to these advantages, it has been widely used for urban research.
This study collected the original BHM data for a duration of 14 days, from February 21, 2022, to March 6, 2022. The data were collected at hourly intervals, resulting in a total of 336 BHM images. The acquired raw data underwent image processing, georeferencing, and projection conversion, and were saved in the GeoTIFF format with RGB (Red, Green, Blue) bands. The raw data exhibit variations in population concentrations across different regions, depicted by a range of colors including black, blue, light blue, cyan, green, yellow, orange, and red. These colors correspond to an ascending trend in population density. To perform quantitative analysis on the regional heat data, we assigned integer values from 0 to 7 to these eight colors in ascending order, representing the heat values. After incorporating geographical coordinate information, the processed data were stored in the GeoTIFF format. Subsequently, the vector boundary file of the study area was used to clip, obtaining the heat map data of the study area. The heat values of each raster pixel were then identified and stored in another database. It should be noted that the data collected have no personal privacy issues.
Land use data
In this study, we used POI data and land use parcel data to assess land use patterns. POI data refer to specific locations of various geographic entities and typically include attributes such as the name, address, coordinates, and category. POI categories resemble land use categories and effectively depict people's preferences and social functions. The density of different POI types can serve as an indicator of land use and functional zoning (Wu et al. 2018).
In our study, we obtained 15,532 POIs in 2022, including 23 major categories (e.g., enterprises, commercial houses, tourist attractions, finance, and insurance services). Referring to the Chinese Land Use Classification Criteria (GB/T21010-2017) and considering the characteristics of the study area (Bao et al. 2023), we identified the main POI categories that were most relevant to human behavior patterns. These categories were further classified into commercial (CPOI), residential (RPOI), shopping (BPOI), tourist attraction (TPOI), education (EPOI), science (SPOI), and government (GPOI) categories. Specifically, The CPOI includes enterprises, finance, and insurance services; the RPOI represents residential areas; the BPOI signifies shopping services; the TPOI denotes tourist attractions; the SPOI includes science and culture services; the EPOI comprises a variety of schools; and the GPOI represents government agencies and social groups.
Water quantity data
As it is difficult to obtain daily water consumption data due to its accessibility, the annual water consumption of each water user in 2021 was attained. However, for developed regions with a high wastewater collection ratio, the daily wastewater volume received by WWTPs on dry days can provide a measurable indicator of water consumption. Therefore, under the assumption of negligible leakage in the sewage network, it was used as an alternative.
The daily sewage volume data recorded by the flow meters at the Yongfeng WWTP from November 2021 to March 2022 were collected. The Yongfeng WWTP is located in XBW and primarily handles wastewater from the town. This data provide valuable insights into the patterns of wastewater generation, capacity requirements of the wastewater treatment infrastructure, the need for planning expansions or upgrades, and the overall efficiency of the system. The sewage volume can vary based on factors such as population size, industrial activities, and water usage patterns. Considering the high population density and significant population mobility in XBW, it was selected as a representative area to analyze the relationship between population mobility and water usage.
In addition, since the treated water from the Yongfeng WWTP is discharged into the Nansha River, we installed a water level meter in the river at the outfall of the Yongfeng WWTP to monitor water level changes. The monitoring frequency was set at 5-min intervals, and the data were collected from November 26, 2021 to February 9, 2022. During this period, there was almost no precipitation in Beijing, and variations in river water levels were primarily influenced by the volume of recycled water discharged from upstream sewage plants after treatment.
METHODS
The TPI and the PDI
Spatial autocorrelation analysis
This study computed the PDI for 200 m × 200 m spatial units during the active period of 8:00–23:00 and then carried out spatial autocorrelation analysis to investigate the spatial clustering pattern of the population. Moran's I index (Moran 1948) is an index for spatial autocorrelation analysis that is capable of evaluating the dependency and heterogeneity of a certain variable in space. It can be categorized into global Moran's I and local Moran's I according to its application manner.
Both indicators have values ranging from −1 to 1. Under a certain level of significance, a positive value of the global Moran's I indicates positive spatial autocorrelation, while a negative value suggests negative spatial autocorrelation. When the value approaches 0, it means the absence of spatial autocorrelation. The local Moran's I values can be used to distinguish four spatial types. A positive value indicates High–High (HH) clustering or Low–Low (LL) clustering, where the local average is either higher or lower than the overall average. Conversely, a negative value suggests High–Low (HL) clustering or Low–High (LH) clustering, where high values are surrounded by low values or vice versa (Zeng et al. 2021).
Kernel density estimation
Random forest
Random forest (RF) is a tree-based machine learning model that incorporates the principles of ensemble learning and randomness (Breiman 2001). It operates by constructing multiple decision trees by randomly selecting m sub-samples and n sub-features from the original dataset. Each tree is trained independently on different subsets of the data and random subsets of the features. The final prediction is made by combining the predictions of individual trees, typically through averaging or voting. The randomness makes the model more robust and less prone to overfitting. The randomness in RF does not only come from randomly selecting a subset of data samples, i.e., bagging, but also from the feature randomness that enhances the diversity among the trees. By leveraging the strength of multiple trees and the inherent randomness, RF can effectively capture complex relationships and interactions in the data. It is particularly suitable for handling nonlinear and high-dimensional datasets (Bao et al. 2023). Recognizing the good performance of the RF model, we used it to capture the intricate and nonlinear relationship between population distribution and land use types across different periods.
In addition, to evaluate the accuracy of the RF model, we used mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (R2) to assess its performance. MAE measures the average absolute errors between predicted values and actual values. RMSE represents the square root of the average of the squared differences between predicted and actual values. The coefficient of determination, often denoted as R2, represents the proportion of the variance in the dependent variable that is predictable from the independent variables, indicating the extent to which the model can explain the variations in the observations. The better the model's performance, the closer the MAE and RMSE are to 0, and the closer R2 is to 1.
Linear regression model and continuous wavelet analysis
The relationship between population dynamics and daily sewage volume was analyzed by adopting a linear regression model based on ordinary least squares. The sewage volume was taken as the dependent variable and the TPI as the independent variable to construct the regression model.
RESULTS
Temporal characteristics of population aggregation
In terms of the variation mode, the populations exhibited daily periodicity. The period between 8:00 and 23:00 demonstrated the highest activity level, as indicated by a high heat value, suggesting increased movement and concentration. On the contrary, during the period of 0:00–7:00, the active population decreased, and there was a decrease in the positioning frequency as individuals were primarily at rest, resulting in a low heat value. Furthermore, significant population fluctuations were observed during commuting hours (7:00–9:00 and 17:00–20:00). These findings align with our understanding of human activity patterns during work and sleep times.
Spatial distribution characteristics of population aggregation
Compared to the weekdays, weekends exhibited lower peak population density and generally lower levels of population aggregation, with a more dispersed population distribution. The high aggregation areas on weekends primarily comprised commercial residential communities, such as Yongwang Home and Tujing Jiayuan in XBW Town, Dong Xiaoying Village and Cuibei Jiayuan in SZ Town, Tongzeyuan in SJT Town, and Baijiatong District in WQ Town. The secondary level of population aggregation areas consisted of smaller residential areas and villages. Notably, the villages in Beijing are urbanized villages with high housing density and a diverse population of residents. Along the Sixth Ring Road, there was a noticeable increase in population from 16:00 to 18:00, most likely due to recreational activities or evening dining.
To quantitatively analyze the population differences between weekdays and weekends, we computed the ratio of heat values during the active periods for weekdays and weekends in spatial terms (Figure 6(c)). Additionally, we derived the ratio of TPI values for different land use plots (Figure 6(d)). According to the figure, Yongfeng Industrial Park and ZE-Park emerged as the main areas with ratios greater than 1, confirming the higher population density on weekdays in industrial and enterprise zones. The ratios for industry and office land parcels also exceeded 1, further supporting the observation. In contrast, residential areas and villages exhibited ratios below 1 during the daytime, indicating a larger proportion of people resting at home during weekends compared to weekdays.
Relationship between population distribution and land use
The RF regression achieved satisfactory fitting accuracy for all four time periods for both the weekdays and weekends. For weekdays, during the 0:00–7:00 time period, the population density distribution was mainly influenced by residential land use. During the 8:00–12:00 and 13:00–17:00 time periods, commercial land use played a dominant role in determining the population distribution, indicating that a large number of people were concentrated in the enterprise, finance, and insurance functioning zones. During 18:00–23:00, shopping land use and commercial land use were the main influencing factors, suggesting intensive activities for shopping and working purposes in this period. For weekends, similar patterns were observed during the 0:00–7:00 time period, in which residential land use strongly influenced the population density distribution. During the remaining three time periods on weekends, shopping land use and residential land use were the main influencing factors of the population distribution.
In general, the population distribution at nighttime (0:00–7:00) was mainly influenced by residential land use on both weekdays and weekends. During the weekday daytime (8:00–17:00), working was the main activity, while during the weekend daytime, shopping and leisure activities became the primary focus. In the evening (18:00–23:00), most people engaged in shopping and leisure activities.
Relationship between population and water use
DISCUSSIONS
Population dynamics and mechanisms
As described in the ‘Results’ section, a high degree of population aggregation and differentiation within the study area was observed, both temporally and spatially. The changes in population quantity and density between weekdays and weekends were found to be different. This phenomenon can be attributed to variations in resident activity patterns and urban spatial functional differentiation. On weekdays, most people are engaged in work activities, resulting in a higher population density in workplaces, such as enterprises and finance districts. Consequently, the heat map value increases along the main roads during commuting times. On weekends, people engage in leisure and rest activities, such as shopping, dining out, visiting scenic spots, and spending time at home. This shift in activities leads to higher population density in residential areas, shopping centers, and recreational zones, and this phenomenon is consistent with previous studies (Wang et al. 2011; Li et al. 2019; Bao et al. 2023). Additionally, we investigated the characteristics and differences of sub-regions. Within the region, XBW Town and WQ Town are characterized by commerce finance, and industrial functions, leading to a higher population density on weekdays compared to weekends. Conversely, SZ Town and SJT Town are more oriented to residential and leisure functions, resulting in a slightly higher concentration of people on weekends compared to weekdays.
In the results of the RF analysis, higher feature importance indicates that this particular land use type is more influential on population density during that time period, suggesting a closer association between residents and activities related to this land use type. The results of the RF feature importance show that on weekdays, people generally follow a pattern of ‘rest-work-leisure-rest’ throughout the day, while on weekends, the activity pattern shifts to ‘rest-leisure-rest’. Similar results were found in previous studies (Bao et al. 2023). On the other hand, the intensity variation of the influencing factors reflects not only the daily activity patterns of residents but also the heterogeneity of urban functional zoning. At night, residential land significantly influences urban population distribution, whereas during the day, the distribution is primarily affected by commercial land (on weekdays) and shopping land (on weekends). Different from previous studies in other cities (Feng et al. 2019; Li et al. 2019), enterprises have a larger influence during weekdays' daytime than educational and scientific institutions. This can be explained by the differences in urban function zoning between cities. Specifically, unlike the region studied by the literature (Feng et al. 2019; Li et al. 2019), there is a higher density of enterprises and financial institutions in this study area, while educational institutions are relatively rare. The spatial and temporal distribution of the population is actually the result of the purpose-oriented activities of the residents, which are affected by the distribution of the functional zones. The combined effect of institutional change, market economy, planning, and regulation diversified urban space and functional zoning. Therefore, it can be said that the interaction between residents' activity purpose and spatial function difference leads to the spatial and temporal variation of population distribution.
The impact of population dynamics on water use
In addition, a 7-day period is also evident. According to the previous analysis, it can be seen that the movement of the population in the XBW Town is 19% more on weekdays than on weekends. Hence, the weekly variation of the water use population resulted in the 7-day period pattern of wastewater discharges. Combined with Figure 10, it can be concluded that the significant separation of jobs and residences in metropolitan areas can have a significant impact on urban water use and wastewater discharges. It is consistent with previous literature (Atinkpahoun et al. 2018). Therefore, as a suggestion, maybe it is feasible to adjust the water supply mode and the wastewater treatment operation mode in a more efficient and less energy-consuming way, in conjunction with the activity pattern and the migration pattern of the population, so as to ensure a more cost-effective operation of the city.
Limitations and implications
It should be noted that the measurement of water consumption and wastewater production is a complex and systematic project that may also involve factors such as spatial geographic features, types of water use, and water resource management. The current research still has some limitations. Limited by data sources, this study only selected 14 days of BHM data, as well as daily sewage and annual water consumption data for research. On the one hand, although the BHM data can represent a significant number of Baidu users, it is based on sampled data and may contain potential uncertainties. Additionally, it may not capture data from individuals without smartphones, such as children and the elderly, or those who do not use Baidu-related products, these exceptions could result in some deviations in the results. On the other hand, the length and accuracy of the water volume data can still be improved in the future for a more incisive analysis. Therefore, the analysis presented here provides preliminary results regarding water usage and sewage discharge under the influence of population changes, and there is still a significant distance to cover in using population-related data to provide precise technical guidance for water resource supply and demand management.
Despite its preliminary characteristics, this study can demonstrate the potential to utilize population dynamics for predicting water consumption and sewage discharge, optimizing water supply infrastructure, and improving water resources management level. These findings are of significant importance to society's sustainable development and urban well-being. Some important issues remain to be addressed by future studies. Higher spatial resolution of big data may further improve the knowledge of water use mechanisms and help in urban water consumption prediction. As advanced machine learning methods, such as relevance vector machine tuned with improved Manta-Ray foraging optimization (RVM-IMRFO), the hybrid adaptive neuro-fuzzy inference system coupled with the new hybrid heuristic algorithm techniques (ANFIS-WCAMFO), and spatiotemporal attention long short-term memory (STA-LSTM), have been developed rapidly, they can be effectively employed in the modeling and prediction of water and wastewater time series. Furthermore, constructing the population estimation models using multi-source big data, clarifying the water use quotas of different population groups during various periods, and further assessing the actual water use efficiency are of great significance in promoting the efficient use of water resources. These efforts will bring substantial improvements to the accuracy of evaluation results and the innovation of evaluation work and therefore provide substantial references to urban planning and water resource management strategies.
CONCLUSIONS
In this study, we analyzed the spatiotemporal dynamics and influencing mechanisms of population using BHM, land use parcels, and POI data, and studied the impact of population dynamics on water use using data on water consumption, sewage volume, and river depth. The main findings of this article include:
- i.
From a temporal perspective, we observed different patterns of population change between weekdays and weekends. On weekdays, there are two additional minor peaks observed during commuting hours, distinguishing them from weekends. And the activity time of residents on weekends tends to be slightly delayed compared to weekdays.
- ii.
Spatially, the populations display H–H clustering both on weekdays and weekends, with significant overlap in spatial distribution. The population density is generally higher on weekdays compared to weekends. Specifically, on weekdays, high population concentrations are mainly observed in industrial parks and enterprise districts, while on weekends, they tend to be more concentrated in residential areas. Moreover, different sub-regions have different population dynamic characteristics.
- iii.
In terms of the underlying mechanisms of population dynamics, during the daytime, population distribution is primarily influenced by work-related areas on weekdays, while shopping and recreation areas have a predominant impact on weekends. During nighttime, residential areas play a major role in population distribution. The interaction between residents' activity purpose and spatial function difference leads to the spatial and temporal variations of population distribution.
- iv.
Population density and water use are positively related and both show diurnal and weekly patterns. The daily routine and commuting activities of the residents, and the distribution of urban functional areas, jointly lead to variations in urban water consumption and wastewater discharge.
By comprehensively analyzing the interaction between population dynamics and water use, our study contributes to a better understanding of urban water management and provides valuable insights for sustainable resource allocation and planning. While we have demonstrated the influence of changing population dynamics on urban water use, our study still has some limitations stemming from data acquisition and the scale of our study. To further advance our understanding of the complex mechanisms underlying water usage in urban areas, future works should integrate more multi-source big data and higher-resolution water quantity data. It will enable a more in-depth exploration of these dynamics, ultimately contributing to the enhancement of the operational efficiency of water supply and wastewater treatment systems.
ACKNOWLEDGEMENTS
We thank for the water quantity data provided by the Beijing Water Science and Technology Institute.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.