This paper demonstrates the potential for crowdsourced rainfall data to infill gaps in the official rain gauge network and to provide new datasets for use in research. We use data from the Met Office Weather Observation Website (WOW) over 10 years (2011–2020) to generate two open-source datasets for Britain; multi-parameter raw data in an easy-to-use format; and an hourly rainfall dataset. We have compiled and prepared the data and detail here station selection, rain depth calculation, and data resampling to hourly intervals to create a consistent dataset for further processing (including statistical quality control) and application. Mapping the new rainfall dataset establishes that WOW observations fill spatial gaps in the official ground-based rain gauge network over Britain, particularly in urban areas. This could be particularly useful for post-event analysis of rainfall that results in pluvial flash flooding. Here, we focus on Britain but due to agreements with meteorological services in Belgium, the Netherlands, Australia, New Zealand, Sweden, and the Republic of Ireland, plus many citizen scientists globally opting to share data via WOW, there is potential for the development of similar datasets using these methods around the world.
Processing of British citizen science crowdsourced data from the Met Office WOW database from 2011 to the end of 2020.
Distribution of rainfall data from ground-based rain gauges in Britain.
The potential of citizen science weather data.
The Met Office maintains a network of ground-based weather stations throughout the UK to record key observations; most frequently these include temperature, wind speed and direction, and rainfall. The stations are located at approximately 40 km intervals, spaced predominately to capture information about the low-pressure frontal systems that dominate UK weather (Met Office 2016). In 2021, there were 256 ground-based weather stations in the United Kingdom (UK) reporting hourly rainfall observations (Met Office 2021)1, with data shared via the Met Office Integrated Data Archive System (MIDAS). The Met Office rain gauge network in Britain, although comprehensive, may not always provide the density of rainfall observations required for all applications (Schilling 1991; Villarini et al. 2008; Ochoa-Rodriguez et al. 2015). Highly localised, single-cell convective storms that arise quickly and discharge relatively high volumes of rainfall are small enough to occur between formal rain gauges. Equally, the discharge of the highest intensity rainfall from multi- and super-cell storms may not correspond with rain gauge locations (Schroeer et al. 2018). Rain gauge network inadequacy will be exacerbated in the future due to changing weather patterns and climate that will result in an increase in convective storms, especially in the summer months (Brooks 2013; Kendon et al. 2014; Miller & Hutchins 2017).
The Met Office does not rely solely on ground-based monitoring; weather observations are made using weather balloons, satellite, and weather radar (referred to simply as radar henceforth). Radar covers 99% of the UK; however, the accuracy of radar is affected by ground clutter, overshooting, ‘bright band’, and drift (Wilson & Brandes 1979; Sauvageot 1994; Joss & Lee 1995; Krajewski et al. 2010). Radar is particularly useful for determining the temporal and spatial distribution of rain over a wide area (Sauvageot 1994), and is considered most reliable when blended with rain gauge data to correct for errors in precipitation depth (Steiner et al. 1999; Trapero et al. 2009; Rabiei & Haberlandt 2015).
Responsible flood authorities, including the Environment Agency (EA), Scottish Environmental Protection Agency (SEPA), National Resources Wales (NRW), and Lead Local Flood Authorities (LLFAs), rely on accurate weather forecasting to predict when and where flooding will occur. The EA, NRW, and SEPA (referred to as ‘the agencies’ henceforth) supplement the Met Office monitoring network with additional rain gauges to satisfy their remit relating to flood management (referred to as ‘Official’ data henceforth). The agencies add a further ∼1,200 rain gauges to the ground-based hourly or sub-hourly rainfall observation network in Britain, bringing the total to approximately 1,500 (Environment Agency 2017; Natural Resources Wales 2022; Scottish Environmental Protection Agency 2022). A further data source is citizen scientists who share observations from private automated weather stations (PAWS) (Figure 1) to online platforms, e.g., the Met Office Weather Observations Website (WOW).
Aims and objectives
We aim to create two datasets from WOW citizen science observations using: (i) original reporting intervals, (ii) hourly intervals, and use these to determine potential uses for the rainfall data. Our objectives are to select PAWS data from stations within Britain with sufficient data to warrant inclusion in a database, to generate descriptive statistics on those PAWS stations (e.g., location, duration of reporting, etc.) and to determine whether PAWS rainfall data fills gaps in the Official ground-based rain gauge network.
Rain gauge data
The Met Office Weather Observation Website (WOW) is available online for sharing and viewing weather data (see https://wow.metoffice.gov.uk/). It was established in 2011 as a digital repository for manual weather observations collected by trained volunteer climate observers and observations from weather stations operated by any registered user. The platform accepts observations from automatic and manual weather stations/equipment around the world. There are linkages with the Belgian, Dutch, Swedish, Irish, Australian, and New Zealand meteorological services, resulting in relatively high numbers of weather stations reporting to WOW in the respective nations.
The WOW platform is one of several ways data can be shared from PAWS. There are open platforms such as WOW and Weather Underground (WU) that accept data from a variety of weather stations. There are also proprietary platforms where data from a particular brand of weather station can be uploaded (e.g., Davis Instruments, Netatmo).
The weather observations used in this research were provided by the Met Office at the original reported time interval.2 WOW data were provided as date- and time-stamped reports. A report comprises all parameter observations for a given weather station at a given time. A total of 55 parameters (including hazard warnings) are contained within each record. The parameters in each report are determined by the operator, but most typically comprise air temperature, dew point, pressure (at station), relative humidity, rainfall rate, rainfall accumulation, wind speed, and wind direction. In addition to weather observations, each report includes a unique identifier, station ID, latitude, longitude, and a date- and time-stamp. The reporting interval of WOW observations is user determined and may range from 1 min to >24 h. WOW contributors can choose to provide station metadata, including the make and model of the weather station, and details of the installation that allow a site rating to be assigned derived from World Meteorological Organization guidelines. Contributors may also link to their personal weather-related websites, many of which present higher-resolution temporal data (than the publicly available WOW archive observations), and often explain the motivation for hosting a weather station along with details of the installation and equipment.
WOW station observation summary statistics were generated to help understand what data were available and to facilitate data processing, including location (latitude and longitude); count of the total reports; counts of reports per year; counts of observations for air temperature, dew point, pressure, relative humidity, rain rate, wind speed, wind direction and rain accumulation; start and end date of records; duration of reporting in days; mean, mode, and minimum intervals between reports. When undertaking preliminary exploration of the data, it was clear there were peculiarities. Anomalies were noted with the duration of reporting, with some stations having very short operational records, and the time intervals between observations could be variable or erratic. The WOW instructions encourage users to set up a test station to ensure that data are being shared as intended; however, it appeared that these test stations may remain in the exported data.
Station metadata was not provided with the observation records; therefore, a method of filtering stations was developed to ensure only genuine stations providing a usable amount of hourly or shorter interval rainfall data are included in the generation of statistics and for further analysis. The filters are applied to the weather station summary statistics and the subsequent station list is used to select rainfall observations from corresponding stations. The following filters are applied to the station summary data table: Duration station active >28 days, to remove those with shorter observational records. Record count per year >365. Record length >600 rain records, to remove those reporting for less than one month at hourly intervals.3Minimum interval between observations <61 min (to remove stations reporting at intervals greater than hourly). Modal interval between observations <61 min (to remove stations frequently reporting at intervals greater than hourly). The efficacy of the filtering is assessed in detail during the application of quality control measures (see Conclusions).
Determining rainfall at original time intervals
Individual station files with the following parameters were generated for the filtered stations: Station ID, Observation Date and Time, Latitude, Longitude, Rain Rate4 (mm h−1), and Rain Accumulation (mm). During exploratory analysis, we noted that rainfall accumulated from one report to the next and reset daily. The time of the reset varied, with some operators using the British meteorological standard reporting day of 9 am GMT, while others used midnight (commonly the default for PAWS). Rainfall (mm) per report was therefore calculated by deducting the rainfall at time t + 1 from the rainfall at time t. Prior to calculating the depth of rainfall at each interval, we remove any duplicate reports as there were instances of weather stations sending simultaneous reports, potentially due to different sensors reporting, i.e., the wind vane generating a different report to the rain gauge. Where this occurred, the rainfall accumulation field reports zero; therefore, records are sorted by date and time (ascending) and rainfall accumulation (descending) with the first record (containing any rain observation) being retained.
Calculating rainfall observations at hourly intervals
Observations are available at a variety of time intervals, ranging from 1 to 60 min. It was noted during data cleaning that the reporting interval could vary for any given station. To generate a consistent dataset, we aggregate rainfall observations to total hourly rainfall (mm). There are applications where sub-hourly data are desirable; therefore, the original data are retained and can be reviewed should more detailed analysis be required. Observations are summed for the given hour, with the depth of rainfall assigned to the hour following the time at which it fell, in accordance with meteorological practice (e.g., all rain falling from 10:00:00 to 10:59:59 was assigned to 11 am). Data in the accumulation field is primarily used to create the hourly dataset. The Met Office recommends that rainfall accumulation be recorded; however, it was noted during exploratory analysis that some PAWS only reported rain as a rate. When resampling from the original time interval to hourly, the sum of the rainfall depth within the hour is used or, where necessary, the mean rate taken.
Gap filling potential
As an example, PAWS reporting to WOW during 2018 are used to assess gap filling.5 This analysis provides a snapshot for illustrative purposes, so as not to over-represent the potential by considering all PAWS sharing data via WOW. Official weather stations reporting hourly or sub-hourly observations in 2018 were obtained from MIDAS (Met Office 2006) and from the EA, NRW, and SEPA (from Villalobos-Herrera et al. (2022)). Data were available from 1,115 stations managed by the EA, SEPA, and NRW, and an additional 228 stations reporting via MIDAS. WOW PAWS were selected where the sum of all rain observations was greater than 0 mm (i.e., they had recorded rainfall), and there were observations made in 2018.
To determine the ‘coverage’ of stations, both the WOW and Official weather stations are plotted in GIS, on a base map derived from the Mean High-Water Mark for Great Britain (ONS 2021) with the urban areas as defined by Morton et al. (2020) for the Land Cover Map, 2015. The land area derived from these base maps is 230,147 km2 for Britain, of which 15,593 km2 is classified as Urban. We calculate the ‘coverage’ of Britain and Urban Britain at a series of extrapolation distances from station locations, using radii of 1, 2, 5, 10, 20, and 40 km. We exclude any of the extrapolation extent beyond the British coastline. For Urban Britain, we select extrapolation extents that coincide with urban land use and are within the coastline of Britain. From here, we calculate the area coverage (in km2 and a percentage of total) for Britain and Urban Britain at each radial extent, for both the WOW stations and the Official stations independently. We determine the ‘unique’ coverage at each radial extent, and for Britain and Urban Britain, by deducting areas where there is overlap between the WOW and Official coverage maps. This allows us to calculate the area (in km2 and as a percentage of total) uniquely covered by WOW and Official stations, at each radial extent. Finally, the WOW and Official coverage at each radial extent is summed to determine the respective total cover for Britain and Urban Britain.
There were 3,920 unique weather stations within Britain sharing data to WOW between 2011 and 2020. The filtering process described in the Methods removes 1,203 (31%) of these, leaving 2,717 for further analysis. The station observation summary statistics are available online, the statistics do not include observation data but serve as a handy lookup to establish whether WOW data may be available for a given time and location (data reference: 10.25405/data.ncl.21724970). The observation data are available at the original time interval for multiple parameters and aggregated hourly rainfall data in two easily accessible .csv format datasets, and have also been published (10.25405/data.ncl.21724970).
The number of PAWS reporting to WOW varies with time, peaking at 1,375 in 2016. This figure conceals a high degree of turnover, with new stations added and others no longer reporting. The mean duration of PAWS reporting to WOW was 3.6 years. There were 183 (7%) PAWS reporting for nine or more years and 29% reporting for five or more years. 643 (24%) of PAWS reported for less than one year. The duration of reporting is a function of when the station began uploading data; therefore, as time passes, it is expected that the number of longer records will increase. The relatively short station lifespan means that WOW observations may not be ideal for long-term climate studies; however, the analysis presented in this paper did not include manual daily data provided by trained climate observers, which is also available via WOW and covers a longer period.
The distribution of PAWS was not even across Britain, and it varied between years. Although PAWS numbers reduced post-2016, their distribution was more widespread, particularly in Scotland. The number of PAWS fluctuated from 351 in 2011 to 1,306 in 2020, with the percentage located in urban areas remaining reasonably consistent, ranging from 54% in 2011 to 48% in 2020. There was a relatively high concentration of stations around Exeter, presumably due to the presence of the Met Office headquarters and people with a particular interest in the weather/WOW.
The modal interval of reporting for all PAWS was 15 min. The interval between reports is determined by the weather station operator and was found to be variable between and within PAWS observations (i.e., there were PAWS that did not always report at the same interval over the duration of their lifespan). These discrepancies in intervals of reporting undermine the quality of data from PAWS. By deploying and connecting a Netatmo PAWS to WOW, we noted that data aggregation and transfer provided by a third-party application can take place at irregular intervals, varying from a few seconds difference to several hours (e.g., successive reports at intervals of 2 min, 2 min 20 s, and 25 min). A comparable irregularity was noted by de Vos et al. (2019). The discrepancy between the reporting intervals is less significant when considering longer-duration events where observations may be aggregated, but at the shortest time-scales desirable for urban hydrology, the potential loss of accuracy can be problematic, not to mention highly frustrating. Although the Met Office recommends rainfall accumulation as the preferred field for rainfall measurements, we noted during the filtering process that 58 stations provided rainfall rate rather than accumulation (zero count in rainfall accumulation field). In fact, many PAWS had a higher record count for rate than accumulation. It was not clear why the rainfall rate was reported at more time steps than rainfall occurred (according to the rainfall accumulation field). The anomaly was discounted where there were values available in the rainfall accumulation field, as this was assumed to be the correct value in accordance with WOW guidance.
When observations were resampled to consistent hourly intervals, there were 11 PAWS where no rainfall was reported that were removed from the dataset. There were 2,690 stations with data in the accumulation field, which was used as the primary source of rainfall data. There were 16 PAWS with no observations in the rainfall accumulation field; therefore, rainfall rate was used. These were reviewed individually and appeared to be reporting hourly, making the rainfall rate (mm h−1) and the hourly rainfall accumulation (mm) the same.
In British urban areas, there a clear benefit in considering data from WOW PAWS at ≤5 km scale, with 30% of urban areas being within 5 km of WOW gauges only (see Figure 2, panel (b)). By combining WOW and Official gauges, the number of gauges in urban areas within 5 km of an observation point increases to 84% (from 54% for Official alone). At the 1 and 2 km extents, there are increases of 265 and 176%, respectively, when WOW locations are combined with Official. For high-resolution delineation of rainfall, the benefit of including WOW PAWS is therefore clear. As seen for Britain as a whole, the Official monitoring network has adequate coverage at the 20 and 40 km resolution for Urban Britain.
The process for selecting data for the dataset creation was lengthy and constituted a barrier to use of the WOW data, which has now been resolved by the provision of the newly processed datasets. The variability and inconsistencies in reporting intervals for WOW data remain, devaluing data reliability and quality. It is a matter of luck as to whether observations are available at an appropriate time interval and duration. This paper highlights some of the issues that may be encountered in data downloaded from the WOW archive. The potential value of rainfall data in WOW has been demonstrated by the simple metric of the number of WOW PAWS in urban areas, as compared to Official gauges (633 and 178, respectively). This is further confirmed by the comparison of the extent of the area within 1, 2, 5, 10, 20, and 40 km of WOW and Official rain gauges in both urban and rural areas. The proliferation of WOW rain gauges in urban areas indicates the potential benefit of incorporating WOW data into the assessment of impacts of rainfall in urban areas, for example, during the post-event analysis of pluvial flooding. The PAWS rain gauges also potentially provide data in areas where there is no Official monitoring, thus expanding the rain gauge network across Britain. ‘Potential’ is emphasised as the erratic nature of PAWS observations obtained via WOW and the relatively short reporting duration of many PAWS remains a barrier to be addressed to support the widespread update of PAWS data use.
Recommendations for using WOW rainfall data
There are several potential pitfalls to consider when using these generated WOW rainfall datasets to support further research. It is recommended that PAWS records are plotted for a ‘sanity check’ before proceeding to any detailed analysis or interpretation. The graphical representation of data is a useful tool in identifying inconsistent or excessive rainfall depths. We noted that data issues can be intermittent, meaning that although there may be an error at one point in time, there may still be useful data available from a given station. Due to variations in reporting intervals, the rainfall rate can be misleading, as although the rate is reported as mm h−1, it may be for a 2-min interval for one record but then a 30-min interval for the next. This makes rainfall accumulation an easier parameter to work with, and, as the rainfall rate can be back calculated from the rainfall accumulation, it is the recommended parameter for analysis. Caution is advised when using the dataset to ensure that the number of reports and the interval of reporting are sufficient for the intended application. PAWS with more reports are likely to be useful for a wider range of applications; however, a station reporting for only a short duration may have captured an event of interest. A further consideration is that the most popular PAWS rain gauges available to purchase by citizen scientists are unheated. When temperatures are subzero, it is therefore likely that precipitation may not be accurately recorded. As a result, they will also struggle to accurately represent hail, which can be associated with the intense convective events during which high spatial resolution observations would be particularly beneficial.
The difficulties in data processing highlighted issues that undermine confidence in the datasets, in particular the difficulty in accessing bulk station metadata which was not available during this research. Researchers have long pointed out the necessity for good metadata (Muller et al. 2013), going beyond what can be shared currently via WOW, e.g., incorporating any quality assurance and/or quality control procedures. Enforcing the provision of metadata may dissuade station operators; however, the lack of metadata ultimately risks denigrating confidence in data quality, rendering the database less attractive to potential users. If station metadata were required parameters for WOW contributors, additional analysis would be possible, e.g., assessing the reliability of different types of weather station, which has only been possible in relatively small-scale trials (Bell et al. 2015), or when working with data from a proprietary platform, e.g., Netatmo or Davis (de Vos et al. 2017; de Vos et al. 2019; Bárdossy et al. 2021).
The data processing undertaken as described in this paper has generated accessible and easy-to-use datasets of citizen science data from WOW spanning 10 years from 2011 to 2020 and provided metrics by which to make some initial judgements on the usefulness of the data. The resulting datasets and summary statistics are open-source and accessible (via 10.25405/data.ncl.21724970). Rainfall observations from WOW can fill gaps in the Official monitoring network, particularly in urban areas, where around 50% of the WOW gauges were located. There are limitations that make working with WOW citizen science rainfall data less straightforward than Official monitoring data, including issues with the time intervals of reporting and the longevity of PAWS. The research presented in this paper does not consider the quality of WOW citizen science rainfall data, which is a key concern of many potential users (and is being addressed in further research). It is the case that hydrologists may have to become more accustomed to working with non-traditional data sources or be more willing to accept the inaccuracies in data, to take advantage of the increased resolution such data sources provide. We also acknowledge that Official data (gauge and radar) are themselves not as accurate as desired. Further work being undertaken on these datasets includes the statistical quality control of rainfall observations from WOW and the use of quality-controlled data in hydrological modelling. It is hoped that with an assessment of the quality of WOW data the potential for gap filling of the Official observation network using WOW citizen science data will be more comprehensively addressed.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories. (https://doi.org/10.25405/data.ncl.21724970.v1).
CONFLICT OF INTEREST
The authors declare there is no conflict.
There are many more rain gauges reporting daily rainfall, operated by the Met Office, Climate Observers, etc.
Historical hourly observations in 31-day increments are available to any user via download from WOW.
The filter does not consider the observation at this point; therefore, there may be stations where the depth of rain recorded by the station was zero.
The rate of rainfall between observations extrapolated to mm h−1.
2018 was the last complete year at the time of the research and was selected as the most current so as not to over-represent the potential by including any PAWS that ever reported to WOW.
2018 was the last complete year at the time of the research and was selected as the most current so as not to over-represent the potential by including any PAWS that ever reported to WOW.