Facilitating radar precipitation data processing, assessment and analysis: a GIS-compatible python approach


 A review of existing tools for radar data processing revealed a lack of open source software for automated processing, assessment and analysis of weather radar composites. The ArcGIS-compatible Python package radproc attempts to reduce this gap. Radproc provides an automated raw data processing workflow for nationwide, freely available German weather radar climatology (RADKLIM) and operational (RADOLAN) composite products. Raw data are converted into a uniform HDF5 file structure used by radproc's analysis and data quality assessment functions. This enables transferability of the developed analysis and export functionality to other gridded or point-scale precipitation data. Thus, radproc can be extended by additional import routines to support any other German or non-German precipitation dataset. Analysis methods include temporal aggregations, detection of heavy rainfall and an automated processing of rain gauge point data into the same HDF5 format for comparison to gridded radar data. A set of functions for data exchange with ArcGIS allows for visualisation and further geospatial analysis. The application on a 17-year time series of hourly RADKLIM data showed that radproc greatly facilitates radar data processing and analysis by avoiding manual programming work and helps to lower the barrier for non-specialists to work with these novel radar climatology datasets.


INTRODUCTION
The high spatial and temporal variability of rainfall (Ramos et al. ; Fischer et al. ) dictates that high resolution precipitation data are needed (Thorndahl et al. ; Winterrath et al. ). Weather radar observations can help to satisfy this demand and particularly improve severe weather detection and quantification of precipitation during storm events (Krajewski & Smith ; Heistermann et al. ; Wright et al. ). Moreover, technical developments in hardware and software engineering as well as an increasing availability of data, some of which are available even free of charge, allows a wider audience to apply radar products.
Working with radar data, however, presents a string of challenges which make many potential users still reluctant to take advantage of these data. Weather radar is an indirect measurement method suffering from numerous potential error sources and uncertainties in terms of precipitation quantification (Krajewski & Smith ; Gjertsen et al. ; Raghavan ; Meischner ; Sene ; Seo et al. ). Consequently, the use of radar-based precipitation estimates necessitates additional effort for data quality assessment and probably further corrections. Yet, many national weather services recently put much effort into re-analyses of radar data time series applying state-ofthe-art bias correction and adjustment algorithms. Keupp et al. () give an overview of current reanalysis activities in Europe aimed at the establishment of radar climatologies.
These projects, such as the radar climatology RADKLIM (Winterrath et al. a, b)  Beyond the outlined uncertainties regarding data quality, several technical barriers exist that can prevent potential users from working with radar data. These include different file formats for exchange and storage, provision in proprietary binary file formats, a scarcity of easy-to-use and free-of-charge processing software, spatial visualisation and clipping tools, missing compatibility or interfaces to Geographic Information Systems (GIS) and the vast amount of data (Heistermann et al. , ; Fischer et al. ). As a consequence, the processing of radar data not only requires considerable expertise in data handling and programming, but it also takes much time to develop user-customised workflows, which discourages many potential users.
Despite initiatives such as the OPERA (Operational Program on the Exchange of Weather Radar Information) weather radar information model (Michelson et al. ), which has been widely adopted for international data exchange within Europe (Heistermann et al. ), many national weather authorities still provide weather radar composites in their own custom formats. This also holds true for the German weather radar data products RADKLIM and its counterpart for operational applications called RADOLAN ('Radar Online Adjustment'). Consequently, software adaption and interfacing to support the processing of different data formats is still necessary on a national scale, and available tools for automated, GIScompatible radar data processing, e.g., for American (Zhang & Srinivasan )  as a possible solution to the bottlenecks in current weather radar data processing and assessment in Germany and beyond. Radproc intends to lower the barrier to radar data usage by automating radar and rain gauge data processing and providing an interface for data exchange with GIS.
First, an overview of the data basis and existing tools for the processing of German weather radar composites is provided in order to illustrate the motivation and need for developing radproc. Moreover, a short outlook on software for the processing of radar data in other countries is given.
Next, the development goals, the implementation of the technical framework and its potential for transferability to other precipitation data are presented. Afterwards, radproc's functional scope is demonstrated based on a typical workflow including raw data processing, temporal aggregation, heavy rainfall detection and data export to ArcGIS. Finally, limitations are discussed, a perspective of future developments and improvements is given and conclusions are drawn.

Data basis
The DWD operates a network of 17 ground-based C-band Doppler radar stations and, in 2005, launched the operational application of the RADOLAN programme to provide near real-time nationwide quantitative precipitation estimations on a 1 km 2 raster in temporal resolutions of 5 and 60 minutes (Winterrath et al. ). The hourly radar rainfall composites are adjusted to precipitation measurements from a network of approximately 1,300 rain gauges (Bartels et al. ; Keupp et al. ).
In 2017, the DWD concluded its 'radar climatology' project, in which all available weather radar data have been reanalysed back to the year 2001 applying state-of-the-art bias correction and adjustment algorithms (Winterrath et al.  In recent years, a variety of processing tools supporting German weather radar data have been developed that contain different functions and target different user groups.
The following review is structured according to the software distribution model since the availability, costs and customisability of a tool are factors strongly influencing a user's choice of software.

Open source software
According to the open source definition, a software is regarded as open source software (OSS), if its source code is made available and its license grants the rights to use and modify the software to anyone and for any purpose, including non-exclusive commercial exploitation and redistribution of derivate works of the software itself (St Laurent ). Moreover, it has an active and growing user community and a website with extensive documentation. Besides many functions dedicated to the typical tasks of weather radar raw data processing outlined above, wradlib provides some visualisation tools and a function for reading in a single binary RADOLAN or RADKLIM file into a NumPy array.
NumPy (Oliphant ) is a widely used package for scientific computing with Python and provides an array object for efficient numerical computations. Nevertheless, the processing workflows beyond single file import as well as the data structure for storage and further analysis have to be developed and programmed by the user. For users with little programming skill, this is a difficult, time-consuming and most likely discouraging task. Furthermore, many users working with spatial data use GIS. Wradlib provides functions to export radar data as GeoTIFFs or ESRI ASCII files as well as a series of functions for georeferencing and reprojection, but it does not have any direct interface to any GIS nor does it support clipping the nationwide composites. However, the latter is an important feature to reduce the amount of data and to limit analyses to a desired study area. In addition, GIS users might encounter serious difficulties installing wradlib due to separate and possibly incompatible installations of GDAL (Geospatial Data Abstraction Library; http://www.gdal.org), which is indispensable for georeferencing.
In Germany, several other OSS have been developed in the last years that are supposed to lower the barrier for working with RADOLAN and providing processing functions, most of them being rather small projects for data conversion, visualisation or for solving some very specific tasks.
• The radolan Go library (https://gitlab.cs.fau.de/since/ radolan) supports the parsing and visualisation of several RADOLAN products but does not provide any analysis functions.
• The Java Radolan parser (http://www.bitplan.com/index. php/Radolan) is a Java port and extension of the radolan Go library for interactive and animated visualisation of different RADOLAN products. Aggregation functions and RADKLIM support are in preparation.
• Rdwd (https://github.com/brry/rdwd) is an actively maintained R package (R Core Team ) to select, download and read DWD climate data into R. It does not support RADOLAN and RADKLIM data yet, but an extension is in preparation.
• The C þþ RADOLAN library (https://github.com/meteoubonn/radolan) offers several functions for the import of RADOLAN files and conversion to NetCDF and Shapefile format, but it does not seem to be an actively maintained project anymore.
• The RADAR and ArViRadDB toolkit (https://www. hs-rm.de) is a collection of freely available compiled routines supporting the hourly RADOLAN RW product and targeted at some very specific needs in hydrological engineering as well as the detection of heavy rainfall intervals. RADAR yields GIS-compatible ESRI ASCII files as outputs, but it does not provide a real GIS integration that would allow data clipping. Consequently, besides wradlib, which is primarily targeted at advanced users from the weather radar community, there is a scarcity of OSS for RADOLAN data processing and a total absence of OSS for the automated processing and analysis of RADKLIM and the temporally highly resolved RADOLAN and RADKLIM products.

Commercial software
To the best of the author's knowledge, there are currently six commercial software products available that support the processing of German radar data, each of them with different target groups and functionalities.
• The ArcGIS extension NVIS (http://www.itwh.de) is intended for RADOLAN visualisation, precipitation nowcasting, analysis of heavy rainfall events and calibration of sewer system models. NVIS offers functions for operational or short-term analysis of hourly RADOLAN RW data and seamless GIS integration, but is neither targeted at long-term climatological analysis nor does it support any radar composites with 5 minute temporal resolution.
• The HydroNET-SCOUT toolkit (http://hydronet-scout. de/) supports the import and visualisation of several different radar data formats, radar raw data correction, precipitation nowcasting and warnings as well as the export of temporally or spatially aggregated precipitation data.
• AquaZIS (http://www.aquaplan.de) is a software that supports the analysis and management of a variety of meteorological data. It is primarily targeted at time series and heavy rainfall analysis for water management tasks and provides the possibility to export results into Shapefiles.
• The KISTERS Meteo and HydroMaster toolkit (https:// water.kisters.de) is a collection of software products for water management tasks providing a wide range of analysis, nowcasting and visualisation tools for many meteorological datasets including all presented RADOLAN and RADKLIM products. Nevertheless, specialised commercial software products like these are hardly affordable for many users. Water management authorities undoubtedly prefer these products for operational purposes due to their easy-to-use, reliable and tailor-made functionality. However, for smaller companies with little financial resources, companies striving to develop new technologies and methods and especially in research, the use of OSS could make the radar data more accessible and facilitate the development of new methods.
Moreover, as most of the commercial software products seem to be completely based on GUIs, it is not obvious to what extent they allow for customisation and individual analysis of the data.
All reviewed commercial tools support the processing of some or all RADOLAN products, but up to now, none of them explicitly indicate the support of RADKLIM data on their websites or in their release notes.

Many of the presented open source and commercial
tools aim to support either data visualisation, retrospective analysis of single rainfall events, operational rainfall nowcasting or raw radar station data processing including the application of correction, merging and gauge adjustment algorithms, whereas the number of tools supporting longterm climatological analysis of temporally highly resolved radar data is still rather small. Furthermore, the review revealed that there is a considerable gap in regard to the number of provided functions and ease of usability between open source and commercial software products.
Outlook on software tools for other radar data formats Without claiming to be exhaustive, this section gives an overview of software tools for processing other radar data and of recent developments that should lead to a standardisation of radar data formats. The great variety of different radar data formats that exists internationally impedes data exchange and has led to the development of many software tools, which are only applicable in specific regions or individual countries. However, there are increasing efforts to foster data exchange and cooperation through a standardisation of data formats and the creation of regional radar composites. In Europe, this is coordinated by the European and Hydro-NEXRAD is a prototype software to provide hydrologists with radar-rainfall maps (Seo et al. ).
In Southeast Asia, radar data exchange is encouraged in order to create a regional radar composite for disaster risk reduction. This is supposed to be achieved by using the same data format, GRIB2 (General Regularly-distributed Information in Binary form) (Kakihara ). This can be used by SATAID (https://www.wis-jma.go.jp/cms/sataid/ app.html), which is a software for daily weather analysis and forecasting widely used by meteorological service providers in Southeast Asia. It has been developed by the Japan Meteorological Agency's Meteorological Satellite Center.
In South Korea, the web-based module WERM-S has been developed for rainfall erosivity index calculations from radar data provided in ASCII format (Risal et al. ).

The South African Weather Service uses the commercial
HydroNet software for its rainfall monitoring and decision support based on radar data (http://www.weathersa.co.za/ product-and-services/2-uncategorised/443-hydronet) and the application of this software is also endorsed in Australia in a separate module (see Figure 1). This way, a partial use of radproc without clipping and GIS export functions, is  In the current version 0.1.4, radproc consists of five modules for data processing and analysis as well as an API and a sample-data module (see Figure 1).
• The raw module comprises all functions for the automated processing and import of RADOLAN and RADKLIM raw data into HDF5. This includes extracting the binary data from compressed monthly or daily data archives, importing data into monthly DataFrames and saving these into radproc's uniform HDF5 file structure.
• The dwd_gauge module offers automated processing and import of rain gauge data with 1 minute resolution provided by the DWD into the same HDF5 file structure as the radar data.
• The core module offers a variety of functions to load data from HDF5 and to resample them to annual, seasonal, monthly, daily or hourly precipitation sums. All functions of this module build solely upon the created HDF5 files and are thus independent from the original raw data formats.
• The heavyrain module contains functions for the calculation of duration sums as well as for the identification and counting of heavy rainfall events exceeding arbitrary thresholds. As it loads all data from HDF5 files via the core module, it is independent of the raw precipitation data formats.
• The arcgis module comprises all functions based on the ArcGIS arcpy package, e.g., functions for clipping data to a study area and for data exchange between DataFrames and raster datasets or attribute tables.
• The API module serves for more convenient function calls and takes care of exception handling, e.g., in case ArcGIS is not available.
• The sampledata module contains data for facilitating the use of radproc such as the projection file for the stereographic projection defined by DWD for RADO-LAN products.
Radproc's fundamental concept constitutes a conversion of all input data into a standardised HDF5 file containing a uniform structure with one group per year and therein monthly Dataframes as datasets. Thus, one HDF5 file contains the entire time series of a precipitation dataset for a defined study area split into monthly portions. The splitting is necessary in order to keep the required working memory to a manageable amount and to enable the processing of temporally highly resolved data for large study areas  Figure 2). These are fundamental features since radproc is intended, among other applications, for data quality assessment, which necessarily involves the intercomparison of different datasets.
The only difference in data processing between gridded and point-scale data can be the export of results into GIS, because data can either be exported to raster datasets or to new fields in attribute tables. As all environment settings during raster export (e.g., location, spatial reference) are derived from a so-called ID raster required as input parameter (see section on raw data processing below), this function isas well as all analysis functionsneither limited to Germany nor to the RADOLAN, RADKLIM and gauge datasets currently implemented in radproc.
Thus, radproc's data processing workflow is transferable to any other precipitation time series dataset, provided that the required individual import routine converts the dataset into monthly pandas DataFrames, stores them into the described uniform HDF5 format and creates an ID raster for it in order to clip and export the data.

A TYPICAL DATA PROCESSING WORKFLOW USING RADPROC
In the following, a typical basic radar data processing and analysis workflow including raw data processing, temporal aggregation, heavy rainfall detection and data exchange with ArcGIS using radproc and the 17-year time series of the hourly RADKLIM RW product is illustrated and an overview of the most important functions is given. Whereas the RADKLIM and DWD gauge raw data processing is specific for Germany, the analyses and GIS exports shown in the other subsections are equally applicable for any other precipitation dataset imported into radproc's standardised HDF5 file format, introduced above.

RADKLIM raw data processing and clipping
The raw RADOLAN and RADKLIM data are usually pro- As the RADKLIM data formats were adopted from RADOLAN, the data processing is very similar for both products and both will be referred to as RADOLAN throughout this section.
All raw data archives need to be unzipped for data import using the function unzip_RW_binaries() for hourly data or unzip_YW_binaries() for 5-minute data from radproc's raw module. Both functions automatically generate a folder structure of yearly and monthly directories for Optionally, if ArcGIS is available, a polygon GIS shapefile or feature class containing the outline of a study area can be passed to the processing function. In that case, radproc's arcgis module is accessed to create a so-called ID raster for the national RADOLAN grid in stereographic projection which allows for spatial localisation of the numbered RADOLAN pixels. Each ID value of this raster corresponds to a DataFrame column since these are labelled with the ID numbers. The tool automatically detects the input radar data product and applies the corresponding grid size and location. The ID raster is then clipped to the extent of the given shapefile to obtain the IDs located within the study area. Finally, the clipped ID raster is converted into a onedimensional NumPy array called ID array, and NoData values are removed (see Figure 3). The resulting ID array is used to select the RADOLAN pixels within the study area upon DataFrame creation.
The generated HDF5 file with monthly datasets, which is compressed by default to save hard drive space, can be directly and quickly accessed by pandas and is the basis for all other radproc functions. The entire workflow of raw data processing is illustrated in Figure 4.

Temporal aggregation
Besides the use of precipitation sums for climatological or hydrological analysis or as model inputs, the aggregation of longer time periods should always be one of the first steps in a workflow using weather radar data in order to assess data quality in a given study area. Many systematic measurement and correction errors which cause bias such as spokes, clutter pixels or areas of missing data, are visible, e.g., in a map showing the mean annual precipitation sum.
From any HDF5 file having the structure described above, single monthly DataFrames can be loaded with  After the loops, the output DataFrame is finally resampled to the target frequency.

Data exchange with ArcGIS
Radproc's arcgis module provides a set of functions for data exchange between ArcGIS and Python as well as some geospatial analysis functions, e.g., for extended zonal statistics and data extraction from raster cells to points.
For the export of radar data from DataFrames to single raster datasets, the function export_to_raster() can be used, whereas the function export_dfrows_to_gdb() handles the export of entire DataFrames into new File Geodatabases.
The latter function exports every DataFrame row to one raster dataset, whereby it automatically derives the file names from the DataFrame index. Additionally, a list of statistical parameters can be passed to the function to calculate some statistical characteristics from the input DataFrame and export these, too. For example, a statistics list with the entries 'mean' and 'max' yields two additional exported raster datasets, each of them containing the mean and maximum value per cell, respectively. Figure 6 shows the function call and its results for exporting the DataFrame with the annual precipitation sums generated in the 'temporal aggregation' subsection.
Moreover, feature-class attribute tables can be directly imported into pandas DataFrames with attribute_table_ to_df() and, in return, a list of DataFrame columns can be joined to an attribute table using join_df_columns_to_ attribute_table(). Besides data exchange with other geodata, this provides a seamless integration of point feature-classes, which is the typical geodata format for rain gauge Using the same iterative approach as the resampling functions, find_heavy_rainfalls() accesses a given HDF5 file via the load functions in the core module and checks the time series between the parameters year_start and year_end for rainfall intervals exceeding specific thresholds.
Here, the parameter thresholdValue defines the rainfall intensity threshold in mm per time unit (given by input data) to be checked for exceedance independently for each raster cell.
Additionally, the parameter minArea specifies the number of raster cells in which the threshold must be exceeded for the interval to be selected, whereby these cells do not need to be adjacent. This parameter can be used to consider the surface area of rainfall cells, but also to take potentially known cells biased by clutter into account. Finally, the time period to be checked can be described in more detail by setting the season parameter to periods such as year, summer, winter or any single month or range of months.
As an example, Figure 7 shows a function call, which checks whether a precipitation amount of 100 mm/h (as the input dataset RW has an hourly resolution) has been Moreover, this cell selection gives an idea, in how many cells such high rainfall amounts occurred.
As a result, this short analysis of the RADKLIM RW dataset reveals that, throughout the entire dataset, a

DWD MR90 rain gauge data processing
In order to facilitate data comparison and, thus, data quality assessment, radproc's dwd_gauge module provides functions for automated rain gauge data processing. Currently, only file path. However, instead of exporting the rows of the output DataFrame to rasters as shown in Figure 6, the rows can be exported to new fields of a feature class attribute table using join_df_columns_to_attribute_table().

FUTURE DEVELOPMENTS, LIMITATIONS AND CONCLUSIONS
In this paper, the Python library radproc providing a GIScompatible platform for automated radar data processing and analysis was introduced.
The software review revealed that there is a considerable Nevertheless, the chosen implementation of radproc still has some technical limitations. The most important one is that all data processing operations are performed in working memory. Unfortunately, in HDF5 files, the space reserved for column header information in flexible 'table' datasets is limited to approximately 2,000 columns, which is by far exceeded by the number of cells in most study areas.
Consequently, the monthly DataFrames need to be stored as 'fixed' tabular datasets, which do not support flexible operations such as searching and selecting subsets of the data. Instead, the DataFrames need to be loaded into working memory entirely. For the hourly RW products, this is not a major issue, but for 5-minute radar data, the size of the study area that can be processed is scaled with the available memory. Furthermore, the pandas HDF5 API does not yet provide options for flexible metadata storage, which spills over to radproc.
Like all other reviewed OSS, radproc does not provide any GUI, yet. Hence, its application still requires a certain readiness by the user to learn some very basic Python syntax. But with the increasing number of online courses and radproc's extensive documentation including an installation guide, a full library reference and tutorials directly generated from Jupyter Notebooks, this is feasible without much effort even for users without any prior programming skills.
However, due to the tight integration of ArcGIS, it could be an option to develop a radproc GIS toolbox to facilitate application. So far, this has been tested, but the connection between the separate Python installations of ArcGIS and a scientific distribution like Anaconda (https://www. anaconda.com), which is necessary to access all of radproc's dependencies and the ArcGIS arcpy module, is rather difficult to establish. Enabling an Anaconda IDE such as Spyder or the Jupyter Notebook to import arcpy is easy and quickly done, but enabling the import of any additional sitepackages into ArcGIS, which is necessary to execute GIS tools accessing radproc, is much more complicated and sparsely documented. This might become easier through the planned porting to Python 3 and the implementation of ArcGIS Pro and will be pursued within this context.
Another option for future developments and a repeated request is the addition of a module to support QGIS as an alternative to ArcGIS in order to turn the entire workflow into an open source project. Due to radproc's extensible modular structure, such a QGIS module or any other modules to support further radar or gauge data formats could be added, but neither of these is specifically planned, yet, except for the additional DWD gauge data import routines described in the previous section. Currently, an additional module for the calculation of rainfall erosivity, the R factor of the Universal Soil Loss Equation (Wischmeier & Smith ), is being developed and will be added in future.
Radproc constitutes a powerful open source tool for automated weather radar data processing and analysis and has considerable potential for further development and improvement. It contributes to facilitating radar data processing, allowing non-specialised users to cope with the vast amount of binary data and put the novel RADKLIM dataset to use.
Thus, radproc can help to enable radar data usage for all applications that benefit from high resolution precipitation data, e.g., in research, hydrological engineering, disaster control, erosion and flood protection and environmental planning.