An algorithm of rule-based rain gauge network design in urban areas was proposed in this study. We summarized three general criteria to select the sites of rain gauges, including: (i) installment in open space; (ii) priority consideration of important regions and even distribution; and (iii) keep strong signal and avoid weak interference. Aided by spatial kernel density, the candidate locations were determined through clustering the residential buildings at first. Secondly, the overlay and buffer spatial analyses were carried out to optimize the candidate sites to avoid signal interference. Finally, the quality of site location was evaluated by cross-validation in using observed historical rainfall and ranked by mean square error for final consideration. A study case in Xicheng district, Beijing, China was selected to demonstrate the proposed method. The result showed that it could be well applied in urban areas with the capability of considering complex urban features through defining rules. It thus could provide scientific evidence for decision making in rain gauge site selection.

Increasing urban floods have been observed in recent years due to rapid climate change (e.g. increasing heavy rainfall) and urbanization (e.g. change of hydrological characteristics). Studies on rainfall data acquisition and spatial interpolation have received much enthusiasm in both academic and industrial community. It is because of rainfall data is of particular important in urban flood forecasting, water resources management and regional planning. Rainfall network design, to determine the density and distribution of rain gauges, is thus challengeable. It is required to obtain rainfall observations effectively to meet various requirements of data usage, whilst minimizing the investment (Karimi Hosseini et al. 2011).

Site selection, to choose optimal locations for monitoring stations, is one of the essential tasks in designing rainfall monitoring networks. It is considered to have long-term impacts on both business profitability (e.g. retail store siting) and effectiveness of operational services (e.g. emergency response) developed on that (Tayman & Pol 2011; Fan 2014). In the past decades, a number of spatial-analysis-aided methods have been proposed and tested for supporting various site selection problems. For examples, Stewart & Lambert (2011) incorporated bivariate regression in spatial clustering to determine the optimal locations of an ethanol plant considering many complex geographic and economic factors. Fan (2014) developed a simulated annealing algorithm to search optimal location of emergency centers. In terms of rain gauge site selection, geostatistics was widely adopted. Pardo-Igúzquiza (1998) combined the geostatistical variance-reduction method with simulated annealing algorithm to determine the optimal number and location of rain gauges for regional rainfall estimation. Volkmann et al. (2010) designed a rain gauge network for flash flood forecasting based on multi-criteria at a catchment scale with complex terrain characteristics. Putthividhya & Tanaka (2012) employed geostatistical analysis to optimize the rain gauge network for spatial precipitation mapping. Shaghaghian & Abedini (2013) proposed a coupled geostatistical and multivariate technique for rain gauge network design.

Novel entropy-based methods were also proposed recently for rain gauge network design and optimization. Vivekanandan & Jagtap (2012) introduced entropy theory and nonparametric test to optimize the network density. The reduction of the number of gauge stations from 25 to 19 stations (i.e. 774 km2 per gauging station) was achieved in their study. Some studies attempted to incorporate entropy into spatial analysis in order to consider the temporal and spatial patterns of rainfall data (Awadallah 2012; Su & You 2014; Wei et al. 2014). Additionally, artificial intelligent techniques were employed to search the optimal locations of the rainfall stations. Examples of using genetic algorithm and colony optimization technique were reported by Li et al. (2009), Karimi Hosseini et al. (2010, 2011), respectively.

The aforementioned studies have demonstrated the success of applying various methods, particularly aided by spatial analysis, in rain gauge network design for specific purposes. However, they all focused on relatively large scale areas (e.g. catchment). Few studies tested their applicability in urban areas, where the complex land covers and signal transmission are required of being considered. Moreover, the rapid urbanization (Grimmond et al. 2010; United Nations 2014), change of land use, population explosion and urban heat island (Shepherd 2005; Jisong & Wenjun 2007) have significantly influenced the spatiotemporal characteristics of rainfall and geographic factors on the ground, which requires a generic methodology to consider diverse factors in rain gauge design.

Therefore, the objective of this study is to propose a Rule-based algorithm aided by Spatial Kernel Density (RSKD) for optimally selecting the rainfall gauge locations in the urban area. It intends to consider various complex urban conditions including geographical conditions, historical rainfall, and electromagnetic radiation. A study case in Xicheng district, Beijing is selected to demonstrate the applicability of the proposed RSKD. This paper is organized in the following manner: Section 1 gives a short overview of previous literatures of rain gauge site selection and highlights the objective of this study; Section 2 presents the detail of the proposed method and how the site selection criteria are considered; Section 3 introduces the study case and relevant data used; the application of the proposed RSKD and the corresponding results are discussed in Section 4, and finally draw the conclusions and point out the future works.

Site selection criteria

The World Meteorological Organization (WMO) argues that the density of rain gauges in practical application is always poorer than that recommended in industrial standards (WMO 1994; Awadallah 2012). Although WMO provides the recommendation of densities of rain gauge stations applicable for different types of catchments, there are no specific criteria suitable for urban regions. It might be the reason that the requirements are diverse and hard to be quantified mathematically. Practically, in urban area, we normally select the location where better gauge signal could be received, for instances in areas without coverage, weak electromagnetic interference and open space for installation. Nevertheless, there is lack of scientific evidence to support that. This study attempts to summarize general site selection rules applicable for urban through considering both complex geographical characteristics and requirement of rainfall monitoring itself. They are expressed as follows:

  • Rule 1: installment in open space. The candidate sites should be located in open space area to avoid occlusion by other buildings or obstacles. For instance in this study, garbage collection buildings or public toilet roofs are major candidate locations;

  • Rule 2: priority consideration of important regions and even distribution. The candidate sites should be placed at prioritized regions like high-population-density neighborhoods, center business districts, and tourism areas in order to achieve high accuracy of estimated rainfall. They are then expected to be distributed uniformly in the other regions (e.g. sub-district); and

  • Rule 3: keep strong signal and weak interference. The candidate locations should be placed in a region with strong signal and weak interference to ensure smooth data transmission using General Packet Radio Service and avoid electromagnetic radiation interference. For example, they should be far away from electrical substations and high-voltage power lines.

Workflow of RSKD

Figure 1 shows the workflow of the proposed rule-based algorithm for rain gauge site selection aided by spatial kernel density. It includes three major steps:
Figure 1

The workflow of the proposed RSKD algorithm.

Figure 1

The workflow of the proposed RSKD algorithm.

Close modal

Step 1: selection of candidate sites

Grid-based initial siting schema, which divides study area into a grid-based representation system and each grid uses as one initial siting, is widely used (Shepherd et al. 2004; Su & You 2014). However, it is not suitable for small urban area siting considering the complex surroundings, such as signal transmit, buildings and infrastructures etc. According to literature (Shepherd et al. 2004), the final rain gauge sites obtained from grid-based schema needed to be shifted or relocated because many sites were on private property. The authors recommended the placement in public properties such as school locations and community colleges. In this study, garbage collection buildings and public toilet roofs (e.g. candidate siting spatial data) are considered to be the input of RSKD as initial sites. It is because (i) these properties are owned and managed by government; (ii) they are normally in open space of urban area with rare coverage; and (iii) their clustering and compactness are consistent with those of residential buildings, which are the primary rainfall monitoring area under protection of flood hazards.

In applying Rule 1, garbage collection buildings and public toilets are selected as potential locations for candidate sites. The corresponding spatial data (i.e. polygon features) are named as candidate siting spatial data (see Figure 1). The constraint data are then prepared as well to generate the candidate sites in consideration of Rule 2. In this study, residential building polygons are the main constraint since clustering of such data indicates the population density and city development pattern.

Spatial density analysis spreads the phenomenon across a landscape based on the quantities that are measured at specific locations with their spatial relationships. The generated density surfaces could thus show where point or line features (e.g. population) are concentrated. It employs a kernel density function to calculate the density of the surrounding elements as a measure of importance of various phenomena. A typical quadratic form of the kernel function for point u, as employed in this study, could be written as (Silverman 1986):
formula
1
where K is the quadratic kernel function representing density of residential buildings, and h is the width of the search window, which indicates the distance between the point u and the candidate surrounding observations. Determination of the width of search window (h) includes the following steps: (i) to calculate the geometric center (GC) of n input points, denoted as ; (ii) to calculate Euclidean distances between GC and surrounding points and determination of median distance (Dm); it identifies the location that minimizes overall Euclidean distance to the points in a dataset, which is considered to be a robust measure of central tendency for a dataset; and (iii) to calculate h using the following formulas (ESRI 2014):
formula
2a
formula
2b
where i is the index of the input points, and SD is the standard distance which represents compactness of a distribution.

Step 2: optimization of candidate sites

The Rule 3 is applied to remove or relocate the candidate sites falling into signal interference areas with the purpose of avoiding strong electromagnetic interference. To do so, the spatial buffer analysis is used to determine the signal interference areas. In urban areas, the electrical substations and high-voltage power lines are major sources of strong electromagnetic interference. They are input into spatial buffer analyst to generate the restricted zone of rain gauge sites, where the buffer radius is normally determined according to expert experience or corresponding regulations (e.g. National Power facilities Protection Regulations in China (National Energy Administration 2012)). In order to ensure weak or zero signal interference, in this study, the buffer radius is set as 50 meters (i.e. input parameter of buffer analysis). Finally, an optimization aided by spatial overlay analysis is applied on the candidate sites and buffer areas to avoid the affection of geographic surroundings. Those candidate sites falling into buffer areas are marked as poor locations nominated for deletion.

Step 3: evaluation of optimized sites

This step is used to evaluate the quality of the optimized candidate sites in step 2 for final consideration. It is because the number of proposed candidate sites obtained in step 2 may exceed the limitations and thus lead to over-investment. The k-fold cross-validation method is employed to obtain mean square error (MSE) for ranking. According to Haberlandt (2007) and Looper & Vieux (2012), a lower value of MSE means the higher quality (e.g. accuracy) of the candidate sites, indicating how closely the estimated rainfall to the observations. Thus, the candidate sites with lower quality could be removed, whereas the accuracy of the interpolated rainfall from surrounding high quality rain gauges could still be achieved. This allows the trade-off between the site number and investment. The MSE could be calculated as:
formula
3
where is the estimated rainfall of point ; is the true rainfall of ; n is the total number of points used for estimation. The recorded rainfall data, which could be normally obtained from local meteorological department, is the input of k-fold method. The initial value of all candidate sites is interpolated from these recorded rainfall data, which is considered to be the true value . In applying k-fold method, all candidate sites are divided into 1 set of testing samples and k-1 sets of training ones. The estimated rainfall corresponding to certain site is interpolated from the training samples. The detailed procedure includes the following steps: (i) the candidate sites are divided into k (i.e. 6 in this study) parts, wherein, one part is used as testing sample and the remaining k-1 ones are used for training; (ii) the true rainfall and estimated one of testing samples are used for MSE calculation; (iii) according to (i) and (ii), iteration of all candidate sites to obtain corresponding MSE; and (iv) ranking all candidate sites based on the calculated MSE for final consideration. The number of rain gauges is normally determined based on available investment budget, and their locations are determined according to the proposed RSKD algorithm.
A case study in Xicheng District, Beijing, China is selected to demonstrate the proposed RSKD algorithm. It is a narrow region covering a total area of approximately 50 km2 with about 5 km from east to west and 10 km from north to south. At present, there are totally 34 rain gauge stations located in 15 sub-districts. The locations of these rain gauges were determined without sufficient scientific consideration (e.g. occluded by buildings or trees, too close to each other etc.), thus, affect the estimated accuracy of ungauged area. Such a problem is a typical issue in urban area in China, which is desired to be improved to yield an optimal rainfall network. Figure 2(a) demonstrates the study area and Figure 2(b) illustrates the higher signal interference map, which includes numerous electrical substations and a high-voltage power line.
Figure 2

Maps of the case study: (a) main roads and lakes; and (b) electrical substation and high-voltage power line.

Figure 2

Maps of the case study: (a) main roads and lakes; and (b) electrical substation and high-voltage power line.

Close modal
Figure 3(a) is the map of public toilets and garbage buildings, which are considered to be the initial candidate locations used as inputs of RSKD. The spatial kernel density analyst on residential building is performed for the whole area and individual sub-district areas (i.e. total 15 sub-districts), respectively. Figure 3(b) shows the result of whole area, and Figure 3(c) is the mosaic of the results of 15 sub-district areas. As indicated in Rule 2, the former one (i.e. Figure 3(b)) allows the most compactness could be achieved for the whole area, and the latter one (i.e. Figure 3(c)) ensures the even distribution of candidate sites over all 15 sub-district areas. The most aggregated area marked as red indicates the more suitable area to deploy the rain gauges. Overlay of Figure 3(b) and Figure 3(c) could not only achieve the same effect of even distribution as gridded schema does, but also takes more consideration on significance of residential density in different regions. Eventually, spatial overlaying with initial candidate locations (Figure 3(a)) results in totally 60 candidate sites for further analysis (Figure 3(d)). Figure 3(e) shows the overlay of 60 candidate sites and a buffer zone map that is created with 50 m radius of electrical substation and high-voltage power lines. Those sites (such as No 35, 36, 56 and 49 etc.) falling into the buffer areas are potentially poor sites and are deleted. After optimizing, the number of candidate sites is reduced from 60 to 42. They are re-numbered as shown in Figure 3(f).
Figure 3

Case study map.

Table 1 lists 5 classes of MSE with corresponding candidate sites. It shows that about 7 candidate sites (i.e., 21, 23, 30, 32, 34, 35 and 37) could achieve lowest MSE (i.e. <0.3), which implies the most suitability location for rain gauge deployment with higher quality. Figure 4(a) shows the comparison of MSE trend of candidate sites under three observed rainfall events. The result shows that the change of MSE in three rainfall events has the similar characteristics of rising and falling trend (e.g. under the selected three rainfall events, the peaks occur at 1, 25, 36 and 42, and the troughs are all at 4, 22, 31 and 35). It implies that a good reliability and robustness could be achieved using proposed RSKD algorithm to mitigate the impacts of different pattern of rainfalls.
Table 1

The quality of candidate sites based on MSE ranking

MSE RangeNumber of sites falling into the rangeSite no.
[0, 0.3) 21, 23, 30, 32, 34, 35, 37 
[0.3, 0.5) 12 4, 7, 22, 26, 27, 28, 29, 31, 33, 38, 39, 16 
[0.5, 1.0) 10 3, 5, 6, 13, 14, 17, 18, 19, 20, 24 
[1.0, 1.3) 9, 10, 15, 25, 40 
[1.3, 3.0] 36, 8, 12, 11, 41, 2, 1, 42 
MSE RangeNumber of sites falling into the rangeSite no.
[0, 0.3) 21, 23, 30, 32, 34, 35, 37 
[0.3, 0.5) 12 4, 7, 22, 26, 27, 28, 29, 31, 33, 38, 39, 16 
[0.5, 1.0) 10 3, 5, 6, 13, 14, 17, 18, 19, 20, 24 
[1.0, 1.3) 9, 10, 15, 25, 40 
[1.3, 3.0] 36, 8, 12, 11, 41, 2, 1, 42 
Figure 4

MSE tendency and relationship with distance (from candidate site to kernel center). (a) The MSE tendency of three rainfalls; (b) MSE and Distance (First rainfall); (c) MSE and Distance (Second rainfall); (d) MSE and Distance (Third rainfall).

Figure 4

MSE tendency and relationship with distance (from candidate site to kernel center). (a) The MSE tendency of three rainfalls; (b) MSE and Distance (First rainfall); (c) MSE and Distance (Second rainfall); (d) MSE and Distance (Third rainfall).

Close modal
From Figures 4(b), 4(c) and 4(d), the results also show that the MSE is correlated significantly with the distance between candidate sites to kernel center. The closer to kernel center, the smaller the MSE is. It indicates the little deficit between the estimated rainfall and the true one. This similar pattern of positive correlation between MSE and distance to kernel center could be observed under different rainfalls. Correlation coefficient for MSE and distance to kernel is 0.526, 0.450 and 0.503, respectively, under these three events. It shows that right candidate site should be close to kernel density center. Figure 5(a) is the overlay map between candidate sites and the kernel map of whole area, and Figure 5(b) is the overlay of those with the kernel map of 15 sub-district areas. The sites 21, 23, 30, 35 and 37 (star symbol in Figure 5(a) and 5(b)) with lowest MSE lie near center of spatial kernel (i.e. the distance is close to zero). However, some sites (i.e., No 1, 11, 36, 41, 42) have larger MSE values. They are far away from kernel center, which indicate the poor site selection for rain gauge.
Figure 5

Overlay map between kernel density map and candidate sites: (a) whole area map; (b) 15 subdistrict map.

Figure 5

Overlay map between kernel density map and candidate sites: (a) whole area map; (b) 15 subdistrict map.

Close modal

This study proposed a rule-based rainfall gauges design aided by spatial kernel density analysis. It shows the following advantages: (i) the emphasis could be placed on complex urban features (e.g. residential building, garbage building and toilet building) and ensure even spatial distribution as well; and (ii) it is relatively more robust than other cluster methods (e.g. k-means algorithm) since it is not sensitive to the randomly specified initial location of the cluster centers. However, the RSKD algorithm neglects the potential impacts of climate factors (e.g. wind and temporal variation) which required to be further investigated.

This study proposed an algorithm of rule-based rain gauge network design (RSKD) in order to consider the complex geographic factors in urban areas like residential building density, land cover and other environment factors etc. A study case in Xicheng, Beijing, China showed that, based on the defined rules, initial 60 candidate sites were obtained from spatial kernel density analyst on spatial patterns of residential buildings. The number is reduced to 42 to avoid interference of electronic substations and high-voltage power lines. The quality of the proposed candidates sites were evaluated using cross validation and corresponding MSE ranking. The results showed that MSE values of 7 sites fell into [0, 0.3); 12 sites are in [0.3, 0.5); 10 sites are in [0.5, 1.0) and 13 sites are >1.0; these MSE ranking could be used for final consideration of rain gauges according to available investment budget. It was also found that a positive correlation between MSE and the distance of candidate sites to kennel center is existed. It thus could facilitate better decision making to priority site the rain gauges at spatial kernels of geographic factors (e.g. residential building) for particular the study cases where the meteorological data is scarce. The proposed RSKD algorithm could be well applied in urban area scientifically for rain gauge network design to consider complex urban features, and thus allows both the emphasis on patterns of important regions and consideration of even distribution. Future works are needed to compare with other spatial analysis methods such as geostatistic and information entropy.

This research is partially funded by Key Laboratory for Urban Geomatics of National Administration of Surveying, Mapping and Geoinformation (Research on spatial anomaly clustering pattern mining of urban management event in complex urban environment, NO.20141204NY) and Science Foundation of Beijing University of Civil Engineering and Architecture (Research on key technology of urban elaborate management based on IOT and MMS, NO.ZF15071). We would like to thank the editors and reviewers for their inspiring comments on earlier versions of this paper.

Awadallah
A. G.
2012
Selecting optimum locations of rainfall stations using kriging and entropy
.
International Journal of Civil and Environmental Engineering IJCEE-IJENS
12
(
1
),
36
41
.
ESRI, Inc.
2014
(
accessed 24 September
).
Grimmond
C.
Roth
M.
Oke
T. R.
Au
Y. C.
Best
M.
Betts
R.
Carmichael
G.
Cleugh
H.
Dabberdt
W.
Emmanuel
R.
2010
Climate and more sustainable cities: climate information for improved planning and management of cities (producers/capabilities perspective)
.
Procedia Environmental Sciences
1
,
247
274
.
Jisong
S.
Wenjun
S.
2007
The effect of urban heat island on winter and summer precipitation in Beijing region
.
Chinese Journal of Atmospheric Sciences (in Chinese)
31
(
02
),
311
320
.
Karimi Hosseini
A.
Bozorg Haddad
O.
Shadkam
S.
2010
Rainfall network optimization using transinformation entropy and genetic algorithm
. In
21st Century Watershed Technology: Improving Water Quality and Environment
,
21–24 February 2010
,
Universidad EARTH
,
Costa Rica
.
Karimi Hosseini
A.
Bozorg Haddad
O.
Mariño
M. A.
2011
Site selection of raingauges using entropy methodologies
.
Proceedings of the ICE-Water Management
164
(
7
),
321
333
.
Li
X.
He
J.
Liu
X.
2009
Intelligent GIS for solving high–dimensional site selection problems using ant colony optimization techniques
.
International Journal of Geographical Information Science
23
(
4
),
399
416
.
National Energy Administration
2012
National Power facilities Protection Regulations. http://www.nea.gov.cn/2012-01/04/c_131262622.htm
. (
Accessed 8 October 2014
).
Shaghaghian
M. R.
Abedini
M. J.
2013
Rain gauge network design using coupled geostatistical and multivariate techniques
.
Scientia Iranica
20
(
2
),
259
269
.
Shepherd
J. M.
Taylor
O. O.
Garza
C.
2004
A dynamic GIS–Multicriteria Technique for siting the NASA–Clark Atlanta Urban Rain Gauge Network
.
Journal of Atmospheric and Oceanic Technology
21
(
9
),
1346
1363
.
Silverman
B. W.
ed.
1986
Density Estimation for Statistics and Data Analysis
.
Chapman and Hall
,
New York
.
Tayman
J.
Pol
L.
2011
Retail site selection and geographic information systems
.
Journal of Applied Business Research (JABR)
11
(
2
),
46
54
.
United Nations
2014
Department of Economic and Social Affairs, Population Division (2014)
.
World Urbanization Prospects: The 2014 Revision, Highlights (ST/ESA/SER.A/352)
.
Vivekanandan
N.
Jagtap
R. S.
2012
Evaluation and selection of rain gauge network using entropy
.
Journal of the Institution of Engineers (India): Series A
93
(
4
),
223
232
.
WMO
W. M. O.
1994
Guide to hydrological Practices: Data Acquisition and Processing, Analysis, Forecasting and other Applications
.
WMO
,
Geneva
.