Abstract
Rainfall forecasting is a high-priority research problem due to the complex interplay of multiple factors. Despite extensive studies, a systematic quantitative review of recent developments in rainfall forecasting is lacking in the literature. This study conducted a systematic quantitative review of statistical, numerical weather prediction (NWP) and machine learning (ML) techniques for rainfall forecasting. The review adopted the preferred reporting items for systematic reviews and meta-analyses (PRISMA) technique for screening keywords and abstracts, leading to 110 qualified papers from multiple databases. The impact of rainfall threshold, meteorological parameters, topography, algorithm techniques, geographic location, the horizontal resolution of the model, and lead time on rainfall forecast was examined. The review shows the importance of precipitable water vapor (PWV) along with other meteorological parameters for accurate nowcasting in coastal and mountainous regions. An increase in rainfall forecast uncertainty with an increase in the lead time makes the NWP model less popular for the short-term forecast. The pre-processing techniques increased the accuracy of ML techniques by considering extreme values and detecting the irregularly distributed multi-scale features of rainfall in space and time. Future research can focus on hybrid models with improved accuracy for nowcasting. The output from the hybrid model serves as input for the decision support system required for urban flood risk management.
HIGHLIGHTS
The PRISMA method is applied for a systematic quantitative review.
The Global Navigation Satellite System-derived precipitable water vapor (PWV) is found to be capable of analyzing the real-time profile of water vapor content.
Forecast can be improved by considering additional meteorological parameters along with the PWV.
A longer lead time in the NWP model enhances the forecast uncertainty.
There is a significant improvement in the forecast by machine learning models after pre-processing.
Graphical Abstract
INTRODUCTION
According to the IPCC (Intergovernmental Panel on Climate Change) reports, climate change is responsible for extreme weather events (Pachauri et al. 2014; IPCC 2021), which lead to floods and droughts around the world. On a global scale, flooding causes significant damage to human lives and properties (Balica et al. 2012). The recent Sixth Assessment Report (AR6) comprises three groups which are for the physical science basis, impacts, adaptation, vulnerability, and mitigation, with a greater focus on regional information, which can be utilized for climate risk assessments (IPCC 2021). With changing hydro-climatology, the frequency and severity of extreme rainfall events are likely to increase, leading to frequent flooding (Imhoff et al. 2020). While these natural disasters cannot be prevented, the resilience of society toward these events can be improved by adopting state-of-the-art management techniques. One of the methods is the early and timely forecast of extreme events providing a sufficient lead time for preparedness. A reliable mitigation and management measure can be adopted through a decision support system if flooding can be predicted well in advance by integrating flood modeling and rainfall forecasting. Recent technical advances have significantly enhanced the rainfall forecasting skill (Bauer et al. 2015; Bhomia et al. 2019), which has steadily evolved over the years. In this context, nowcasting is a useful tool for short-term rainfall forecasting in specific urban settings.
The development in technology and knowledge about the atmospheric processes is required for the advances in rainfall forecast techniques (Benjamin et al. 2019; Randall et al. 2019). Table 1 summarizes the evolution of rainfall forecasting techniques from the 1900 to date and the predominant methods adopted during different periods. There is a gradual development from statistical methods (blind of processes) in the pre-2000 age to more process-based revolutionary big data and internet of things (IoT) for refined rainfall forecasting in the present. This was possible only due to the advancement in the real-time measurements of weather data and high computational developments including supercomputers (Golding 2000; Kumar et al. 2017; Wiston & Mphale 2018). The role of weather satellites deployed by various nations and the advanced weather radar system for microclimate monitoring has gone a long way in the improvement of rainfall forecasting (Randall et al. 2019). The present-day research is mainly focused on very short-term nowcasting and very long-term decadal forecasting, an overlap of interannual variability and long-term climate change (Krishnan & Sugi 2003; Vareed et al. 2013; Sun et al. 2014; Liu et al. 2015; Foresti et al. 2016; Salvi et al. 2017; Choudhury et al. 2019; Smith et al. 2019; Imhoff et al. 2020). While nowcasting is mandatory for the timely prediction of flooding at the local scale, decadal forecasting is important for long-term planning considering the impact of climate variabilities.
Time period . | Methods . | Observational data . | |
---|---|---|---|
Pre-2000 | 1900–1940 | • Predominance of statistical methods • Method of correlation coefficients • Frontal analysis • Multiple regression equations | • Surface observations • Graphs, maps for determining pressure contours |
1941–2000 | • Correlated upper-level air/waves with the surface pressure pattern • During 1981–1990, focus was on long-range forecast (LRF) methods, i.e., stochastic methods • Power regression models • NWP models | • Aircraft used to collect real-time weather data • Satellite and radar observations became popular • Use of sea surface temperature, mean sea level pressure • Zonal and meridional wind data | |
Post-2000 | 2000–2017 | • Progress in meteorology • Use of high-speed computational facility • Radar rainfall forecast • Advent of empirical/ML methods | • Ground-based/weather satellites • Deployment of extensive radar networks |
Current research | 2018–2021 | • Focus on speedy and effective algorithms • Era of big data, IoT for rainfall forecast • Considered the dependence of rainfall on a number of atmospheric variables • Efforts to enhance forecast reliability | • Focus on collecting data of high temporal and spatial resolution • Advanced radar data • Wireless and fully automated system for data collection |
Time period . | Methods . | Observational data . | |
---|---|---|---|
Pre-2000 | 1900–1940 | • Predominance of statistical methods • Method of correlation coefficients • Frontal analysis • Multiple regression equations | • Surface observations • Graphs, maps for determining pressure contours |
1941–2000 | • Correlated upper-level air/waves with the surface pressure pattern • During 1981–1990, focus was on long-range forecast (LRF) methods, i.e., stochastic methods • Power regression models • NWP models | • Aircraft used to collect real-time weather data • Satellite and radar observations became popular • Use of sea surface temperature, mean sea level pressure • Zonal and meridional wind data | |
Post-2000 | 2000–2017 | • Progress in meteorology • Use of high-speed computational facility • Radar rainfall forecast • Advent of empirical/ML methods | • Ground-based/weather satellites • Deployment of extensive radar networks |
Current research | 2018–2021 | • Focus on speedy and effective algorithms • Era of big data, IoT for rainfall forecast • Considered the dependence of rainfall on a number of atmospheric variables • Efforts to enhance forecast reliability | • Focus on collecting data of high temporal and spatial resolution • Advanced radar data • Wireless and fully automated system for data collection |
The numerical weather prediction (NWP) models are becoming increasingly popular for short-term rainfall forecasting. The different short-term rainfall forecasting NWP models adopted by different countries are summarized in Table 2. The NWP models are classified into global models and mesoscale models. The global models are used for the medium-range forecast as they cannot run at high resolution. In contrast, mesoscale models are used for the short-range forecast. The mesoscale models require weather forecasts obtained from global models for initialization and adjusting the boundary conditions (Diagne et al. 2013; Cogan 2016; Ramírez & Vindel 2017). Global and mesoscale models can provide surface weather details and be efficiently used for climatic simulations at a given region of interest (Pu & Kalnay 2019). The global model has global coverage but less ability to resolve explicitly convective systems, while the regional (mesoscale) model with fine-grid spacing (a few kilometers) performed better in accurately analyzing convective-scale features (WMO 2017).
Center . | Country . | Model . | Application . |
---|---|---|---|
BOM | Australia | STEPS | Short-term rainfall forecast |
KMA | Korea | MAPLE/KLAPS | Short-term rainfall forecast |
ECMWF | Europe | SWIRLS | Short-term/nowcasting |
IMD | India | WRF/GFS | Short-term rainfall forecast |
HKO | Hong Kong | SWIRLS | Nowcasting |
NOAA | USA | GFS | Short-term rainfall forecast |
JMA | Japan | MSM | Very short-term forecast |
UKMO | UK | NIMROD | Very short-term forecast |
Meteo France | France | AROME | Short-term rainfall forecast |
CPTEC | Brazil | BAM | Short-term rainfall forecast |
SCMO | China | TITAN/TREC | Nowcasting |
CMC | Canada | GEPS | Quantitative precipitation forecast |
Center . | Country . | Model . | Application . |
---|---|---|---|
BOM | Australia | STEPS | Short-term rainfall forecast |
KMA | Korea | MAPLE/KLAPS | Short-term rainfall forecast |
ECMWF | Europe | SWIRLS | Short-term/nowcasting |
IMD | India | WRF/GFS | Short-term rainfall forecast |
HKO | Hong Kong | SWIRLS | Nowcasting |
NOAA | USA | GFS | Short-term rainfall forecast |
JMA | Japan | MSM | Very short-term forecast |
UKMO | UK | NIMROD | Very short-term forecast |
Meteo France | France | AROME | Short-term rainfall forecast |
CPTEC | Brazil | BAM | Short-term rainfall forecast |
SCMO | China | TITAN/TREC | Nowcasting |
CMC | Canada | GEPS | Quantitative precipitation forecast |
BOM, Bureau of Meteorology; KMA, Korean Meteorological Administration; ECMWF, European Centre for Medium-range Weather Forecasts; IMD, Indian Meteorological Department; HKO, Hong Kong Observatory; NOAA, National Oceanic and Atmospheric Administration; JMA, Japan Meteorological Administration; UKMO, United Kingdom's Meteorological Office; Meteo France, France Meteorological Service; CPTEC, Centro de Previsão do Tempo e Estudos Climáticos (Portuguese for Center for Weather Forecast and Climatic Studies); SCMO, Shanghai Central Meteorological Observatory; CMC, Canadian Meteorological Centre; STEPS, Short-Term Ensemble Prediction System; MAPLE, McGill Algorithm for Precipitation Nowcasting by Lagrangian Extrapolation; KLAPS, Korea Local Analysis and Prediction System; SWIRLS, short-range warning of intense rainstorms in localized systems; WRF, Weather Research and Forecasting; GFS, Global Forecasting System; MSM, Mesoscale Model; NIMROD, Nowcasting and Initialisation for Modelling Using Regional Observation Data System; AROME, Applications of Research to Operations at Mesoscale; BAM, Brazilian Global Atmospheric Model; TITAN, Thunderstorm Identification, Tracking, Analysis, and Nowcasting; TREC, tracking radar echoes by correlation vectors; GEPS, Global Ensemble Prediction System.
STEPS is a probabilistic precipitation forecast approach that combines an extrapolation nowcast and a NWP forecast (Bowler et al. 2006). The Flash Flood Guidance System (FFGS) in the USA is the hydro-meteorological modeling system that combines remote sensing of rainfall (radar/satellite) and NWP output to provide early measures on flash floods for the next 6 h (WMO 2017).
The quantitative rainfall nowcasting techniques used by different countries are based on radar echo-tracking and extrapolation to produce real-time forecasts. The more sophisticated methods involve blending multiple observation systems with the NWP output for an accurate nowcast (WMO 2017). Based on the extrapolation method, they are classified as ‘area trackers’ and ‘cell trackers’. The IMD uses a Short-range Warning of Intense Rainstorms in Localized System (SWIRLS-2) and a Warning Decision Support System (WDSS) for nowcasting (Roy et al. 2019). The SWIRLS uses an area-tracking extrapolation method for radar echoes, while the WDSS uses centroid tracking for nowcasting in the Indian region (Li & Lai 2004; Lakshmanan et al. 2007). The Korean Meteorological Administration (KMA) employed the McGill Algorithm for Precipitation Nowcasting by Lagrangian Extrapolation (MAPLE) and the Korea Local Analysis and Prediction System (KLAPS) for short-term rainfall forecast (Germann & Zawadzki 2002, 2004). This technique uses variational echo-tracking with semi-Lagrangian advection of radar reflectivity and correlation of the forecast with the observation. The Shanghai Central Meteorological Observatory (SCMO) uses Thunderstorm Identification, Tracking, Analysis, and Nowcasting (TITAN) for storm forecasting and rainfall nowcasting (Dixon & Wiener 1993). The TITAN uses a cell-tracking method for echo movement tracking and extrapolation. The Hong Kong Observatory (HKO) has developed the SWIRLS model for rainfall nowcast operations, a robust method for tracking the existing rain system (Li & Lai 2004). The United Kingdom's Meteorological Office (UKMO) developed an automated NIMROD (Nowcasting and Initialisation for Modelling Using Regional Observation Data), which uses pixel-based linear extrapolation of radar echo for quantitative precipitation forecast.
Globally, high spatial and temporal resolution radar networks and satellite data are in use for weather forecasts (Liguori & Rico-Ramirez 2014; Heuvelink et al. 2020). Satellite-based information is used in mountainous regions and other areas where a limited number of rain gauge measurements are available (Duan & Bastiaanssen 2013; Zhu et al. 2020). The studies reported that rainfall forecast for the next 0–6 h can be obtained by integrating the NWP models and radar extrapolation techniques (Bowler et al. 2006; He et al. 2013; Liguori & Rico-Ramirez 2014; Wang et al. 2016; Chu et al. 2018; Shehu & Haberlandt 2021). It is explicit that several models have been developed in the recent past for short-term rainfall forecasting. Since rainfall depends on several factors and has complex nonlinear associations, the same model may yield different results in different regions. The main objective of this systematic quantitative review is to appraise such variabilities associated with some of the recent models used for short-term rainfall forecasting. The review summarizes the findings and factors affecting the forecasting accuracy by considering rainfall forecasting methods at multiple time scales. The different methods considered in this study are (1) statistical methods, (2) the NWP model, and (3) machine learning (ML).
METHODOLOGY
DESCRIPTIVE ANALYSIS OF SHORT-TERM RAINFALL FORECASTING TECHNIQUES
Short-term rainfall forecast based on GNSS-derived PWV
The IMD and other weather forecasting agencies have been using statistical methods for rainfall forecast for more than 100 years (Thapliyal 1982; Gowariker et al. 1989; Parthasarathy et al. 1993; Singh & Pai 1996; Thapliyal 1997; Rajeevan et al. 2000; Gadgil et al. 2005; Munot & Kumar 2007; Rajeevan et al. 2007; Kumar et al. 2012). The traditional statistical models are based on the long-term measurement of rainfall and its dependence on various meteorological parameters (Abbot & Marohasy 2018). Some of the statistical models like auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA) methods were applied in the nonlinear hydrological process. However, its accuracy mainly depends on user knowledge and experience (Anh et al. 2019). These models mainly fall under the paradigm of stationarity. However, natural processes like rainfall are chaotic in nature. Despite so many efforts in the forecast by statistical methods, there is still enough scope for improving the forecast efficiency. This review focused on the more advanced statistical technique based on GNSS-derived PWV.
The GNSS-derived PWV is a convincing approach for rainfall analysis and forecasting and has wide applications in rainfall forecasting, global climate analysis (Yao et al. 2017), improving NWP (Gutman et al. 2003; Gendt et al. 2004; Guo et al. 2021). This forecasting technique comes under statistical methods for short-term rainfall forecast, which is further classified based on the location. The GNSS-derived PWV reflects the water vapor stored in a vertical air column above a certain area (Manandhar et al. 2018). The GNSS comprises a satellite constellation that transmits a radio signal through the atmosphere, which is received by the ground-based GNSS receiver. The atmospheric water vapor interferes with the propagated signal along the path, causing a delay referred to as ‘tropospheric delay’. This tropospheric delay mainly accounts for zenith total delay (ZTD), from which the PWV can be obtained by a conversion factor (Shi et al. 2015; He et al. 2019; Li et al. 2020; Zhao et al. 2020). The PWV is the atmospheric water vapor expressed as the height of an equivalent column of liquid water (Manandhar et al. 2018). It is observed that in coastal and mountainous regions, the convection process arises due to rapid spatial variation in water vapor content. The GNSS is capable of analyzing the real-time distribution of water vapor content that can be used for precipitation nowcasting (Yuan et al. 2014). Therefore, PWV is considered to be a more reliable parameter for rainfall forecasts in mountainous areas (Kawase et al. 2006).
This review summarizes GNSS-derived PWV in the temperate, subtropical, and tropical regions for rainfall nowcast as shown in Tables 3,4–5, respectively. These tables represent how the true forecast rate (TFR) and the false forecast rate (FFR) change with the change in the precipitation threshold, lead time of the forecast, length of data, and algorithm applied for the rainfall forecast/nowcast. The TFR is defined as the ratio of the number of correctly forecasted rainfall events to the actual number of rainfall events. The FFR represents the false alarm situations and is defined as the ratio of the number of falsely forecasted rainfall events (no rainfall occurred) to the actual number of rainfall events.
Author . | Study region . | Threshold . | Lead time . | Algorithm . | Data . | TFR (%) . | FFR (%) . |
---|---|---|---|---|---|---|---|
Benevides et al. (2015) | Temperate region of Lisbon Portugal | 1.5 mm/h | 1–6 h | Least-square fitting analysis | 2010–2012 hourly data | 75 | 65 |
Benevides et al. (2019) | Temperate region of Lisbon Portugal | 0.5 mm/h | 1 h | Nonlinear auto-regressive exogenous neural network model | 2011–2015 hourly data | 71.9 | 23.3 |
Łoś et al. (2020) | Central and northern Poland | – | 0–2 h | Random forest classifier | 2017–2019 hourly data | 87 | – |
Author . | Study region . | Threshold . | Lead time . | Algorithm . | Data . | TFR (%) . | FFR (%) . |
---|---|---|---|---|---|---|---|
Benevides et al. (2015) | Temperate region of Lisbon Portugal | 1.5 mm/h | 1–6 h | Least-square fitting analysis | 2010–2012 hourly data | 75 | 65 |
Benevides et al. (2019) | Temperate region of Lisbon Portugal | 0.5 mm/h | 1 h | Nonlinear auto-regressive exogenous neural network model | 2011–2015 hourly data | 71.9 | 23.3 |
Łoś et al. (2020) | Central and northern Poland | – | 0–2 h | Random forest classifier | 2017–2019 hourly data | 87 | – |
TFR, true forecast rate; FFR, false forecast rate.
Author . | Study region . | Threshold . | Lead time . | Algorithm . | Data . | TFR (%) . | FFR (%) . |
---|---|---|---|---|---|---|---|
Yao et al. (2017) | Zhejiang Province, China | 0.6–0.8 mm/h | 5.15 h | Precise Point Positioning (PPP) data-processing software | 2014–2015 hourly data | 82 | 66 |
Zhao et al. (2018) | Zhejiang Province, China | – | 2–6 h | Least-square fitting time- series analysis | 2014–2015 hourly data | 90 | 60–66 |
Zhao et al. (2020) | Zhejiang Province, China | – | 2–6 h | Precise Point Positioning (PPP) data-processing software | 2014–2015 hourly data | >95 | <30 |
Li et al. (2020) | Subtropical region of Hong Kong | 1.1–1.7 mm/h | 5.15 h | Pre-processing based on WMO, U.S. National Weather Service criteria | 2010–2019 hourly data | 95.5 | 28.9 |
Li et al. (2022) | Subtropical region of Hong Kong | Anomaly-based percentile thresholds | 4.13 h | GNSS data acquisition and pre-processing | 2010–2019 hourly data | 97.6 | 13.4 |
Author . | Study region . | Threshold . | Lead time . | Algorithm . | Data . | TFR (%) . | FFR (%) . |
---|---|---|---|---|---|---|---|
Yao et al. (2017) | Zhejiang Province, China | 0.6–0.8 mm/h | 5.15 h | Precise Point Positioning (PPP) data-processing software | 2014–2015 hourly data | 82 | 66 |
Zhao et al. (2018) | Zhejiang Province, China | – | 2–6 h | Least-square fitting time- series analysis | 2014–2015 hourly data | 90 | 60–66 |
Zhao et al. (2020) | Zhejiang Province, China | – | 2–6 h | Precise Point Positioning (PPP) data-processing software | 2014–2015 hourly data | >95 | <30 |
Li et al. (2020) | Subtropical region of Hong Kong | 1.1–1.7 mm/h | 5.15 h | Pre-processing based on WMO, U.S. National Weather Service criteria | 2010–2019 hourly data | 95.5 | 28.9 |
Li et al. (2022) | Subtropical region of Hong Kong | Anomaly-based percentile thresholds | 4.13 h | GNSS data acquisition and pre-processing | 2010–2019 hourly data | 97.6 | 13.4 |
Author . | Study region . | Threshold . | Lead time . | Algorithm . | Data . | TFR (%) . | FFR (%) . |
---|---|---|---|---|---|---|---|
Manandhar et al. (2018) | Tropical region of Singapore | 0.3–0.4 mm/h | 5 min | – | 2010–2013 | 87.7 | 38.6 |
Manandhar et al. (2019a) | Tropical region of Singapore | – | 5 min | Data-driven ML technique (SVM) | 2012–2015 | 80.4 | 20.3 |
Manandhar et al. (2019b) | Tropical region of Singapore | 0.2–0.3 mm/h | 45–60 min | – | 2010–2016 | 79.62 | 50.38 |
Liu et al. (2019) | Tropical region of Singapore | 0.1 mm/h | 10–60 min | Improved BP-NN | 2010–2012 | >96 | <40 |
Biswas et al. (2021) | Tropical region of Singapore and Brazil | 0.7–0.9 mm | 6 h | GPS-derived atmospheric gradient and residual | 2010–2013 and 2016 | 87 | 36.6 |
Author . | Study region . | Threshold . | Lead time . | Algorithm . | Data . | TFR (%) . | FFR (%) . |
---|---|---|---|---|---|---|---|
Manandhar et al. (2018) | Tropical region of Singapore | 0.3–0.4 mm/h | 5 min | – | 2010–2013 | 87.7 | 38.6 |
Manandhar et al. (2019a) | Tropical region of Singapore | – | 5 min | Data-driven ML technique (SVM) | 2012–2015 | 80.4 | 20.3 |
Manandhar et al. (2019b) | Tropical region of Singapore | 0.2–0.3 mm/h | 45–60 min | – | 2010–2016 | 79.62 | 50.38 |
Liu et al. (2019) | Tropical region of Singapore | 0.1 mm/h | 10–60 min | Improved BP-NN | 2010–2012 | >96 | <40 |
Biswas et al. (2021) | Tropical region of Singapore and Brazil | 0.7–0.9 mm | 6 h | GPS-derived atmospheric gradient and residual | 2010–2013 and 2016 | 87 | 36.6 |
SVM, support vector machine; BP-NN, back-propagation neural network.
Researchers (Benevides et al. 2015; Yao et al. 2017; Zhao et al. 2020) compared longer duration series of PWV values with the rainfall series over the same period of time. They found that when the factors such as PWV, PWV variation, and the rate of change of PWV reached a particular value, the rainfall probability increased significantly. This particular value is known as the rainfall threshold. From Tables 3,4–5, it is observed that different threshold gives different forecast results. The TFR values obtained from the studies ranged between 71.9 and 96%, while the FFR values varied from 20.3 to 66.6%. This indicates that overall PWV values are a good indicator of rainfall occurrence. In temperate regions, rainfall forecast was obtained with a lead time of 1–6 h, in subtropical regions 1–5.15 h, and in tropical regions 5–60 min. For meteorological nowcasting suggested, the lead time is up to 30 min. From Tables 3,4–5, it is observed that GNSS-derived PWV can be successfully applied for nowcasting in tropical regions. Also, the forecast accuracy is found to be more in tropical and subtropical regions than in temperate regions (Li et al. 2020).
In all these studies, hourly data are used for analysis with the data length of 1 year, 2, 3, 6, and 9 years. It is observed that with an increase in data length, rainfall forecast probability beyond 2 days can be obtained (Seco et al. 2012; Benevides et al. 2015), thereby allowing time for preparedness during extreme events. In tropical regions, mostly convective rainfall occurs and has a duration of less than 30 min (Manandhar et al. 2019b). Therefore, hourly sample data are not good in this region. Different algorithms yield different forecast results. Rainfall depends on a number of parameters such as PWV, temperature, pressure, and relative humidity. The method, which can encompass most of these factors, gives an accurate TFR. A detailed explanation of the influence of PWV and different algorithms on rainfall forecast is explained below.
Relationship between PWV and rainfall
A positive relationship was observed between PWV and rainfall (Champollion et al. 2004; Bastin et al. 2007; Yan et al. 2009; Brenot et al. 2014; Zhao et al. 2018), with PWV increasing just before the rainfall event and decreasing after the rainfall (Yao et al. 2017; Barindelli et al. 2018). However, the rise in PWV will not always cause rainfall if other factors are not favorable (Sharifi et al. 2015). Certain external factors, such as thermodynamic variations, need to be considered as it affects the rainfall (Shoji 2013). Thermodynamic properties are mainly associated with temperature profile and moisture content determining the convection process (Pall et al. 2007; Lepore et al. 2015). Thermodynamic indices are important to understand the atmospheric instability, which ultimately triggers the rainfall (Ajilesh et al. 2020). Manandhar et al. (2019b) stated that for temperate and subtropical regions, an apparent change in PWV values is observed certain hours before the rain. In contrast, PWV values are very high for tropical regions and change marginally before the rain.
Relationship between threshold PWV and rainfall
Researchers (Jin & Luo 2009; Manandhar et al. 2018; Zhao et al. 2018) have studied the influence of the seasonal variation in PWV on the rainfall forecast. The seasonal variation in the PWV value was found to be more on a rainy day as compared to a non-rainy day (Manandhar et al. 2018). Therefore, it is understood that there is a threshold PWV beyond which rainfall can occur. According to Jin & Luo (2009), the threshold PWV value for rainfall forecast is the location and season-specific. Seasonal variations in PWV were observed over many GNSS stations. The authors noted that, during the intermonsoon season, PWV values are high as compared to the monsoon season. As the temperature in the intermonsoon is high, it can hold more water vapor, causing an increase in the PWV values. Zhao et al. (2018) also proved the same results (the PWV value is more in summer than in winter) based on the rainfall forecast experiment. Researchers concluded that the threshold PWV value is sensitive to the location (Jin & Luo 2009; Benevides et al. 2015; Yao et al. 2017; Manandhar et al. 2018, 2019b). Manandhar et al. (2018) stated that the threshold value also changes with the location. The PWV range in temperate, subtropical, and tropical regions is about 0–45, 0–80, and 30–70 mm, respectively. The PWV values in tropical regions are higher because of high temperature and relative humidity. Yao et al. (2017) noted that for the same threshold, at different stations, rainfall forecast results are found to be different. This shows that threshold values vary with the location. It is observed that if the calculated PWV value (evaluating criteria) exceeds the threshold value, then the probability of rainfall occurrence is high (Benevides et al. 2015). Researchers stated that the threshold of maximum PWV contributes to rainfall forecast in the tropics, whereas, in temperate and subtropical regions, the threshold of the maximum rate of the increment of PWV is the deciding factor (Benevides et al. 2015; Yao et al. 2017; Manandhar et al. 2019b).
Influence of different algorithms
Effect of using multiple meteorological parameters on the rainfall forecast
Predictor . | Study . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Benevides et al. (2015) . | Yao et al. (2017) . | Manandhar et al. (2018) . | Zhao et al. (2018) . | Benevides et al. (2019) . | Liu et al. (2019) . | Manandhar et al. (2019a) . | Manandhar et al. (2019b) . | Li et al. (2020) . | Zhao et al. (2020) . | |
PWV value | × | × | √ | √ | √ | √ | √ | × | √ | √ |
PWV variation/PWV increment | √ | √ | × | × | × | × | × | √ | √ | √ |
PWV increment rate/first derivative | × | √ | × | × | × | × | × | √ | √ | √ |
ZTD variation | × | × | × | × | × | × | × | × | × | √ |
ZTD first derivative | × | × | × | √ | × | × | × | × | × | √ |
PWV second derivative | × | × | √ | × | × | × | × | × | × | × |
Monthly PWV | × | √ | × | × | × | × | × | √ | × | × |
Seasonal PWV | × | × | √ | × | × | × | × | × | × | × |
Pressure | × | × | × | × | √ | √ | × | × | × | × |
Temperature | × | × | × | × | √ | √ | √ | × | × | × |
Relative humidity | × | × | × | × | √ | √ | √ | × | × | × |
Dew temperature | × | × | × | × | × | √ | √ | × | × | × |
PWV decrement | × | × | × | × | × | × | × | × | √ | × |
PWV decrement rate | × | × | × | × | × | × | × | × | √ | × |
Solar radiation | × | × | × | × | × | × | √ | × | × | × |
Predictor . | Study . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Benevides et al. (2015) . | Yao et al. (2017) . | Manandhar et al. (2018) . | Zhao et al. (2018) . | Benevides et al. (2019) . | Liu et al. (2019) . | Manandhar et al. (2019a) . | Manandhar et al. (2019b) . | Li et al. (2020) . | Zhao et al. (2020) . | |
PWV value | × | × | √ | √ | √ | √ | √ | × | √ | √ |
PWV variation/PWV increment | √ | √ | × | × | × | × | × | √ | √ | √ |
PWV increment rate/first derivative | × | √ | × | × | × | × | × | √ | √ | √ |
ZTD variation | × | × | × | × | × | × | × | × | × | √ |
ZTD first derivative | × | × | × | √ | × | × | × | × | × | √ |
PWV second derivative | × | × | √ | × | × | × | × | × | × | × |
Monthly PWV | × | √ | × | × | × | × | × | √ | × | × |
Seasonal PWV | × | × | √ | × | × | × | × | × | × | × |
Pressure | × | × | × | × | √ | √ | × | × | × | × |
Temperature | × | × | × | × | √ | √ | √ | × | × | × |
Relative humidity | × | × | × | × | √ | √ | √ | × | × | × |
Dew temperature | × | × | × | × | × | √ | √ | × | × | × |
PWV decrement | × | × | × | × | × | × | × | × | √ | × |
PWV decrement rate | × | × | × | × | × | × | × | × | √ | × |
Solar radiation | × | × | × | × | × | × | √ | × | × | × |
PWV, precipitable water vapor; ZTD, zenith total delay.
A few studies carried out in the temperate region of Austria, Europe, and Italy include Karabatić et al. (2011), Guerova et al. (2016), and Barindelli et al. (2018). Karabatić et al. (2011) observed that prediction errors mainly occurred due to uneven topography in the alpine region of Austria. Therefore, the author has considered the effect of station height, latitude, and temperature gradient while calculating/extrapolating the pressure at the station. Guerova et al. (2016) studied the GNSS meteorology in Europe and presented state-of-the-art weather prediction, climate monitoring, and the assimilation of GNSS products into the NWP models. Barindelli et al. (2018) evaluated the relationship between PWV time variations and rainfall events. The author observed the peak in PWV when the rain clouds approached the station, followed by a decrease in the value by 5–10 mm when they moved past the station.
Yao et al. (2017) considered PWV variation, monthly PWV, and the rate of change of PWV values. In this case, the obtained TFR was 81% and the FFR was 66%. In the subtropical region, Li et al. (2020) considered two new predictors, PWV decrement and the rate of the PWV decrement, for the first time, along with the PWV values for short-term rainfall forecast/nowcast. With this addition, it was noted that TFR improved around 20% from 75 to 95.5% as compared to the studies in the temperate region, and the FFR significantly reduced to 28.9%. Zhao et al. (2018) considered the PWV value ZTD as the main parameter. The TFR was quite good with a value of 90%, but the FFR was around 65%, which was not an improvement. Therefore, to improve results, Zhao et al. (2020) considered additional parameters such as PWV first derivative, ZTD variation, and ZTD first derivative, and noted that FFR was significantly reduced to 30% with the same TFR of 90%. Li et al. (2022) used seven predictors, including hourly PWV and its six types of derivatives, showing the overall picture of PWV variation prior to the rainfall event. The TFR was found to be as high as 97.6%, and the FFR reduced to 13.4% for a prediction window of 4.13 h. This comparison plot is shown in Figure 5.
From Table 6, it is seen that only one study has considered the impact of solar radiation on rainfall forecast. Very few researchers have considered the decrement of PWV and first-order and second-order derivatives of PWV values. However, the performance is marginally high for the three-factor model (three-meteorological parameters) and five-factor models (five meteorological parameters). Li et al. (2020) obtained a better forecast by using a combination of PWV increment, PWV increment rate, PWV decrement, PWV decrement rate, and PWV values. This is one of the reasons for the better performance of the method in the tropical and subtropical regions compared to the temperate region. It is evident from the above discussion that the PWV, when reinforced with other meteorological parameters, improved the TFR of short-term rainfall forecasts. Further studies are needed for optimizing and identifying the meteorological parameters required for accurate nowcasting.
NWP models
The NWP model represents the numerical simulation of atmospheric conditions to forecast the evolution of weather. A high-resolution model is required for the detailed representation and understanding of the atmospheric condition. The chaotic nature of the weather significantly affects the NWP forecast. A summary of rainfall forecast by popular NWP models is listed in Table 7. The different NWP models presented in Table 7 include the fifth-generation mesoscale model (MM5), Australian Community Climate Earth-System Simulator (ACCESS), Global/Regional Integrated Model system (GRIMs), WRF, Advanced Research WRF (ARW), GFS, the UKMO, Advanced Regional Prediction System (ARPS), and the Unified Model (UM). Table 7 describes various studies based on different factors such as the location of the study, different models used for short-term rainfall forecast, resolution of the model, and the forecast lead time. The following section explains the factors that influence the rainfall forecast by NWP models.
Reference . | Coverage . | Model . | Resolution . | Lead time . |
---|---|---|---|---|
Mass et al. (2002) | Washington | MM5 | 4/12/36 km | 24 h |
Goswami et al. (2012) | India | MM5 | 10/30/90/60 km | 24 h |
Shrestha et al. (2012) | Southeast Australia | ACCESS | 80 km/37.5 km/12 km/5 km | 1–24 h |
Jang & Hong (2014) | Korea | GRIM | 25/50/100 km | 24 h |
Kumar et al. (2016) | India | WRF | 5/15/45 km | 24 h |
Li et al. (2016) | China | WRF | 5/10/15/20/30/45 km | 24/48/72/96 h |
Prakash et al. (2016) | India | GFS UKMO | 22/17 km | Days 1–5 |
Shahrban et al. (2016) | Southeast Australia | ACCESS | 12 km | 13–24 h |
Wang et al. (2016) | China | ARPS | 3 km | 1–6 h |
Jee & Kim (2017) | Korea | WRF | 5 km – outer 1 km – inner | 18/12/6 h |
Moya-Álvarez et al. (2018) | Peru | WRF | 0.75/3/6/18 | 10 days |
Chu et al. (2018) | China | WRF-ARW | 3 km | 1–6 h |
Jabbari et al. (2020) | Korea | WRF | 1/2/4/8/12/16/20 km | 12/24/36/48/60/72 h |
Sridevi et al. (2018) | India | GFS | 25 km | Days 1–5 |
Zhou et al. (2018) | China | WRF | 10 km | 36 h |
Bhomia et al. (2019) | India | WRF | 24/48 h | |
Sharma et al. (2021) | India | UM | 10/40 km | 24/48/72 h |
Reference . | Coverage . | Model . | Resolution . | Lead time . |
---|---|---|---|---|
Mass et al. (2002) | Washington | MM5 | 4/12/36 km | 24 h |
Goswami et al. (2012) | India | MM5 | 10/30/90/60 km | 24 h |
Shrestha et al. (2012) | Southeast Australia | ACCESS | 80 km/37.5 km/12 km/5 km | 1–24 h |
Jang & Hong (2014) | Korea | GRIM | 25/50/100 km | 24 h |
Kumar et al. (2016) | India | WRF | 5/15/45 km | 24 h |
Li et al. (2016) | China | WRF | 5/10/15/20/30/45 km | 24/48/72/96 h |
Prakash et al. (2016) | India | GFS UKMO | 22/17 km | Days 1–5 |
Shahrban et al. (2016) | Southeast Australia | ACCESS | 12 km | 13–24 h |
Wang et al. (2016) | China | ARPS | 3 km | 1–6 h |
Jee & Kim (2017) | Korea | WRF | 5 km – outer 1 km – inner | 18/12/6 h |
Moya-Álvarez et al. (2018) | Peru | WRF | 0.75/3/6/18 | 10 days |
Chu et al. (2018) | China | WRF-ARW | 3 km | 1–6 h |
Jabbari et al. (2020) | Korea | WRF | 1/2/4/8/12/16/20 km | 12/24/36/48/60/72 h |
Sridevi et al. (2018) | India | GFS | 25 km | Days 1–5 |
Zhou et al. (2018) | China | WRF | 10 km | 36 h |
Bhomia et al. (2019) | India | WRF | 24/48 h | |
Sharma et al. (2021) | India | UM | 10/40 km | 24/48/72 h |
MM5, fifth-generation mesoscale model; ACCESS, Australian Community Climate Earth-System Simulator; GRIMs, Global/Regional Integrated Model system; WRF, Weather Research and Forecasting; ARW, Advanced Research WRF; GFS, Global Forecast System; UKMO, the UK Met Office Unified Model; ARPS, Advanced Regional Prediction System; UM, Unified Model.
Influence of horizontal resolution on the rainfall forecast accuracy of NWP models
The NWPs are mostly grid models, in which the horizontal resolution is defined as the spacing between the grid cells. For other models with a global domain, such as spectral models, it is related to the number of waves that can be resolved by the model (Giunta et al. 2019). It was observed that the accuracy of the forecast can be enhanced with finer grid cells (Subramanian & Gopalakrishnan 2020). The short-term rainfall forecast skill depends on the location, season, and model resolution (Mass et al. 2002; Das et al. 2008). Researchers have performed sensitivity studies to understand the impact of horizontal resolution on NWP model forecast (Martin 1998; Gallus 1999; Goswami et al. 2012; Jang & Hong 2014; Li et al. 2016; Wang et al. 2016). To assess the performance of different methods, the statistical measures, including the probability of detection (POD), bias score (BS), false alarm ratio (FAR), critical success index (CSI), and equitable threat score (ETS), were used and are listed in Table 8.
Acronym . | Full form . | Description . |
---|---|---|
POD | Probability of Detection | Fraction of events correctly forecasted |
ETS | Equitable Threat Score | Account for the hits that would occur purely due to random chance |
CSI | Critical Success Index | Fraction of all correctly diagnosed observed and forecast events (excluding false and missed alarms) |
FAR | False Alarm Ratio | Fraction of events that were actually non-events |
BS | Bias Score | The ratio of predicted to observed rain |
RMSE | Root Mean Square Error | Average error magnitude |
TFR | True Forecast Rate | The ratio of the number of correctly forecasted rainfall events to the actual number of rainfall events |
FFR | False Forecast Rate | The ratio of the number of forecasted rainfall events but no rainfall actually occurred to the actual number of rainfall events |
r | Correlation Coefficient | Measures the degree of a linear relationship between observed and forecasted data |
R2 | Coefficient of Determination | How well the model represents the data |
MAE | Mean Absolute Error | Used for error characterization of a model |
NSE | Nash–Sutcliffe Efficiency | To measure the model performance |
CE | Coefficient of Efficiency | To check how well the observed and forecasted value fits |
Acronym . | Full form . | Description . |
---|---|---|
POD | Probability of Detection | Fraction of events correctly forecasted |
ETS | Equitable Threat Score | Account for the hits that would occur purely due to random chance |
CSI | Critical Success Index | Fraction of all correctly diagnosed observed and forecast events (excluding false and missed alarms) |
FAR | False Alarm Ratio | Fraction of events that were actually non-events |
BS | Bias Score | The ratio of predicted to observed rain |
RMSE | Root Mean Square Error | Average error magnitude |
TFR | True Forecast Rate | The ratio of the number of correctly forecasted rainfall events to the actual number of rainfall events |
FFR | False Forecast Rate | The ratio of the number of forecasted rainfall events but no rainfall actually occurred to the actual number of rainfall events |
r | Correlation Coefficient | Measures the degree of a linear relationship between observed and forecasted data |
R2 | Coefficient of Determination | How well the model represents the data |
MAE | Mean Absolute Error | Used for error characterization of a model |
NSE | Nash–Sutcliffe Efficiency | To measure the model performance |
CE | Coefficient of Efficiency | To check how well the observed and forecasted value fits |
Table 9 shows the effect of horizontal resolution on the rainfall forecast skill. Mass et al. (2002) applied the MM5 model and found that rainfall is underestimated at 12 km resolution as the coarser-resolution model does not identify small-scale features. However, increasing the resolution to 4 km over predicts the rainfall, thereby reducing the overall forecast skill. Jang & Hong (2014) applied the GRIMs with different horizontal resolutions of 25, 50, and 100 km for the quantitative forecast of heavy rainfall events over the Korean peninsula. It was noted that with enhanced resolution, complex topography can be well represented with an improved ETS.
Reference . | Horizontal resolution change (km) . | Improvement . | ||||
---|---|---|---|---|---|---|
POD/ETS (%) . | CSI (%) . | BS (%) . | FAR reduction (%) . | |||
Mass et al. (2002) | 12 | 4 | – | – | 35 | – |
Jang & Hong (2014) | 100 | 50 | 13 | – | – | – |
Kumar et al. (2016) | 45 | 5 | 40 | – | 24.3 | – |
Li et al. (2016) | 45 | 20 | 12 | 9.0 | 3.0 | 13.6 |
Sharma et al. (2017) | 40 | 17 | 29 | – | – | 24 |
Sridevi et al. (2018) | 25 | 12 | – | – | 20 | – |
Sharma et al. (2021) | 45 | 10 | 10 | 9.0 | – | 10 |
Reference . | Horizontal resolution change (km) . | Improvement . | ||||
---|---|---|---|---|---|---|
POD/ETS (%) . | CSI (%) . | BS (%) . | FAR reduction (%) . | |||
Mass et al. (2002) | 12 | 4 | – | – | 35 | – |
Jang & Hong (2014) | 100 | 50 | 13 | – | – | – |
Kumar et al. (2016) | 45 | 5 | 40 | – | 24.3 | – |
Li et al. (2016) | 45 | 20 | 12 | 9.0 | 3.0 | 13.6 |
Sharma et al. (2017) | 40 | 17 | 29 | – | – | 24 |
Sridevi et al. (2018) | 25 | 12 | – | – | 20 | – |
Sharma et al. (2021) | 45 | 10 | 10 | 9.0 | – | 10 |
POD, probability of detection; ETS, equitable threat score; CSI, critical success index; FAR, false alarm ratio; BS, bias score.
According to Kumar et al. (2016), increasing resolution improves rainfall forecasting skills. The POD increases by 40% when the resolution decreases from 45 to 5 km, and the BS increases by 24.3%. This indicated that better topographic representation positively impacts rainfall forecasting in the mountainous region. Sridevi et al. (2018) found that a high-resolution model has less overestimation with a BS of 1–1.25, while a low-resolution model has a significant overestimation with a BS of 1.5–2, implying that higher resolution improves forecast performance. The same results were observed by other studies reported in the literature (Li et al. 2016; Sharma et al. 2017, 2021). Li et al. (2016) observed that at high resolution (5 km), the BS was proportional to the rainfall threshold.
It is concluded that the standard verification metrics (POD, ETS, CSI, and BS) increase with an increase in the horizontal resolution with a sufficient reduction in the FAR. The accuracy improvement was mainly due to the accurate prediction of moisture and temperature with increased horizontal resolution. The above discussion noted that coarser horizontal resolution may not accurately represent land surface characteristics and topography, ultimately influencing the rainfall forecast accuracy. The finer resolution provided better orographic and mesoscale features and can be considered as an essential step toward accurate short-term rainfall forecasts. However, the model skill does not necessarily relate to increased horizontal resolution (Wang et al. 2004). The computational cost increases with the resolution (Mass et al. 2002) and also the increased resolution comes at the cost of substantial computational effort.
Influence of lead time on rainfall forecast accuracy
The lead time is defined as the length of time between the forecast issuance of the event and the occurrence of the forecasted event (Subramanian & Gopalakrishnan 2020). For many flood warning/forecasting studies, a lead time of 6–48 h is considered optimal (Herath et al. 2016). However, the forecast uncertainty increases with the lead time due to limited knowledge about the complex atmospheric processes. Therefore, the research efforts are directed toward increasing the lead time along with forecast accuracy, so that real-time measures can be taken against extreme events. The effect of lead time on rainfall forecasting skills was studied based on theoretical and modeling studies (Das et al. 2008; Shrestha et al. 2012; Jang & Hong 2014; Wang et al. 2016; Jee & Kim 2017; Chu et al. 2018; Sridevi et al. 2018; Bhomia et al. 2019; Jabbari et al. 2020; Sharma et al. 2021). Das et al. (2008) applied different mesoscale models, namely the MM5, Regional Spectral Model (RSM), the Eta model, and the WRF model over India, to check the forecast skill. It is noted that the mesoscale models performed better for the 1-day forecast, and the model's skill decreases with an increase in the lead time. The results of the performance score are summarized in Table 10. Shrestha et al. (2012) found that POD, which suggests the ability of the model to correctly diagnose the event, decreases with the lead time (POD reduces from 70 to 30%). The FAR increases, and the statistical measurement of the mean difference given by the BS fluctuates as the lead time increases. According to Jang & Hong (2014), the root mean square error (RMSE) increases, and bias decreases as the lead time increases by 24 h. The decrease in forecasting skills with an increase in the lead time was also endorsed by other literature (Chu et al. 2018; Sridevi et al. 2018; Jabbari et al. 2020).
Study . | Lead time . | Improvement (%) . | |||||
---|---|---|---|---|---|---|---|
POD/RS . | CSI/ETS . | Bias . | FAR/RMSE . | POD/RS . | |||
Das et al. (2008) | Day 1 | Day 3 | MM5-Western India | – | ETS = 44.5 | – | – |
ETA-Western India | – | 25.5 | – | – | |||
MM5-Eastern India | – | 39 | – | – | |||
ETA-Eastern India | – | 10 | – | – | |||
Shrestha et al. (2012) | Short | Long | 30 | 33 | – | 31 | |
Jang & Hong (2014) | 24 h | 48 h | – | – | – | RMSE = 13.67 | |
Wang et al. (2016) | 1 h | 6 h | – | 11 | 11.55 | – | |
Jee & Kim (2017) | 6 h | 18 h | 4 | ETS-2 | – | 3 | |
Chu et al. (2018) | 1 h | 6 h | – | 9 | – | – | |
Jabbari et al. (2020) | 0–12 h | 61–72 h | Event-2001 | – | – | 28 | – |
Event-2007 | – | – | 11 | – | |||
Event-2011 | – | – | 24 | – | |||
Sridevi et al. (2018) | Day 1 | Day 5 | 12.5 km | RS = 2 | – | – | – |
25 km | RS = 1 | – | – | – | |||
Bhomia et al. (2019) | 24 h | 48 h | – | 7 | 4 | 5 | |
Sharma et al. (2021) | Day 1 | Day 3 | 11 | 7.5 | – | 11 |
Study . | Lead time . | Improvement (%) . | |||||
---|---|---|---|---|---|---|---|
POD/RS . | CSI/ETS . | Bias . | FAR/RMSE . | POD/RS . | |||
Das et al. (2008) | Day 1 | Day 3 | MM5-Western India | – | ETS = 44.5 | – | – |
ETA-Western India | – | 25.5 | – | – | |||
MM5-Eastern India | – | 39 | – | – | |||
ETA-Eastern India | – | 10 | – | – | |||
Shrestha et al. (2012) | Short | Long | 30 | 33 | – | 31 | |
Jang & Hong (2014) | 24 h | 48 h | – | – | – | RMSE = 13.67 | |
Wang et al. (2016) | 1 h | 6 h | – | 11 | 11.55 | – | |
Jee & Kim (2017) | 6 h | 18 h | 4 | ETS-2 | – | 3 | |
Chu et al. (2018) | 1 h | 6 h | – | 9 | – | – | |
Jabbari et al. (2020) | 0–12 h | 61–72 h | Event-2001 | – | – | 28 | – |
Event-2007 | – | – | 11 | – | |||
Event-2011 | – | – | 24 | – | |||
Sridevi et al. (2018) | Day 1 | Day 5 | 12.5 km | RS = 2 | – | – | – |
25 km | RS = 1 | – | – | – | |||
Bhomia et al. (2019) | 24 h | 48 h | – | 7 | 4 | 5 | |
Sharma et al. (2021) | Day 1 | Day 3 | 11 | 7.5 | – | 11 |
POD, probability of detection; ETS, equitable threat score; CSI, critical success index; FAR, false alarm ratio; RMSE, root mean squared error; RS, ratio score.
Results reported in the literature (Shrestha et al. 2012; Jee & Kim 2017; Bhomia et al. 2019; Sharma et al. 2021) clearly show that FAR increased with the lead time, while CSI and POD decreased with the lead time. A higher lead time causes large internal variabilities (i.e., chaotic variabilities of the climate caused by the model itself), resulting in high uncertainty in the forecast skill (Lafaysse et al. 2014). Therefore, it is crucial to determine the optimum lead time for NWP models before employing it for rainfall forecasting.
Influence of location
The location and its topography play an important role in the short-term rainfall forecast efficiency. Shrestha et al. (2012) showed that the NWP models, such as Australian Community Climate Earth-System Simulator-VICTAS (ACCESS-VT) and Australian Community Climate Earth-System Simulator-Australia (ACCESS-A), overestimated rainfall up to 60% in low elevation and underestimated rainfall up to 30% in high elevation. Shahrban et al. (2016) also proved similar results with the ACCESS-A model showing the overestimation of rainfall in low precipitation areas and underestimation in high rainfall areas. Kumar et al. (2016) noted that with an increase in resolution, better topographic representations over hilly areas were obtained, improving the forecasting efficiency over the mountainous regions compared to plain areas. Bhomia et al. (2019) found that a high correlation between IMD observed (ground observation) and WRF forecasted results over central India with less bias and standard deviation (SD), indicating a better forecast efficiency. The Western Ghats (WG) and the north-east (NE) region of India receive the highest amount of monsoon rainfall during the month of June, July, August, and September (JJAS). Over the WG region (72 °E–76 °E, 13 °N–21 °N), the correlation between the IMD observed rainfall and WRF forecasted rainfall was good, but the increased BS indicated overestimation. High SD was found over the NE region (90 °E–98 °E, 22 °N–30 °N), showing low forecast efficiency. This suggests the lower accuracy of WRF over high terrain areas. The BS was found to be less over the NW and SE regions of India, suggesting the underestimation of rainfall. Sridevi et al. (2018) found that the rainfall is overestimated by 1–3 mm by the GFS model compared to observed rainfall in different parts of India. It is apparent that more studies are needed to clearly draw the bounds within which NWP models work efficiently with respect to resolution, lead time, and location.
ML methods
The above discussion clearly highlights the need for alternate methods of short-term rainfall forecasting that are versatile and, at the same time, provide high efficiency. In the recent past, researchers have attempted to resolve some of the drawbacks of statistical and physically based models by using the capabilities of ML approaches or hybrid models (Hong 2008; Sumi et al. 2012; Cramer et al. 2017; Balamurugan & Manojkumar 2021; Ridwan et al. 2021). Artificial neural network (ANN), k-nearest neighbor (KNN), SVM, decision tree (DT), and RF are some of the popular ML models that were employed to handle complex nonlinear association (Hong & Pai 2007; Hong 2008; Sumi et al. 2012; Akrami et al. 2014; Cramer et al. 2017; Abbot & Marohasy 2018). These models are entirely based on the relationship among input and output variables and do not require knowledge of the underlying physical process (Solomatine & Ostfeld 2008). The ML models were applied for short-term as well as long-term rainfall forecasts.
Pre-processing of ML techniques
It was noted that the peak values of rainfall and lag effect are not efficiently captured by ML models (Dawson & Wilby 2001; Jain & Srinivasulu 2004; De Vos & Rientjes 2005). Therefore, researchers proposed pre-processing the rainfall data before model application to improve the forecast accuracy (Chau & Wu 2010; Kalteh 2017; Ouyang & Lu 2018; Li & Zhang 2019).
In this review, three pre-processing algorithms were studied: (1) singular spectrum analysis (SSA); (2) wavelet analysis (WA); and (3) ensemble empirical mode decomposition (EEMD). These three pre-processing techniques are powerful mathematical tools that analyze the internal structure of non-stationary time series (Chau & Wu 2010; Feng et al. 2015; Ouyang & Lu 2018). Furthermore, these techniques are reported to be capable of removing the noise, providing time–frequency domains of the analyzed signal, and giving insights into the physical aspect of the data series, which helps improve the forecast quality. Table 11 shows the performance comparison of various ML methods with and without pre-processing. It is seen that ML methods were used for daily rainfall forecast (Partal & Kisi 2007; Chau & Wu 2010; Kisi & Cimen 2012; Sumi et al. 2012; Unnikrishnan & Jothiprakash 2020) and monthly rainfall forecast (Nourani et al. 2009; Ramana et al. 2013; Feng et al. 2015; Kalteh 2017; Ouyang & Lu 2018; Li & Zhang 2019) in different parts of the world. Conjunction/hybrid models performed better than a single model in both cases. The ML models were used to forecast rainfall a few days to a few months ahead; however, the forecast accuracy reduces with an increase in the lead time. The three data pre-processing techniques stated above were mainly used by studies presented in Table 11. The performance of each of these techniques for rainfall forecasting is explained below.
Author . | Study region . | Data . | Model . | Performance index (with pre-processing) . | Performance index (without pre-processing) . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MAE (mm) . | RMSE (mm) . | CE/NSE . | r/R2 . | MAE (mm) . | RMSE (mm) . | CE/NSE . | r/R2 . | ||||
Partal & Kisi (2007) | Turkey | Daily precipitation data 1987–2001 Afyon station | WT + Neuro-fuzzy | – | 1.06 | – | 0.913 | – | 3.53 | – | 0.037 |
Izmir station | – | 2.17 | – | 0.913 | – | 6.83 | – | 0.124 | |||
Mugla station | – | 3.07 | – | 0.881 | – | 8.14 | – | 0.146 | |||
Nourani et al. (2009) | Iran | Monthly precipitation 1973–1999 | WT + ANN | – | – | – | 0.784 | – | – | – | 0.31 |
Chau & Wu (2010) | China | Daily rainfall Wuxi station 1988–2007 | ANN + SVR + SSA | – | 4.18 | 0.87 | – | – | 10.59 | 0.17 | – |
Zhenwan station 1989–1998 | ANN + SVR + SSA | – | 3.18 | 0.92 | – | – | 10.68 | 0.09 | – | ||
Kisi & Cimen (2012) | Turkey | Daily Precipitation 1987–2001 Afyon station | WT + SVM | 9.0 | 21.4 | 0.647 | 0.815 | 14.2 | 38.7 | 0.154 | 0.103 |
Izmir station | 13.6 | 46.5 | 0.593 | 0.782 | 19.6 | 71.6 | 0.037 | 0.276 | |||
Sumi et al. (2012) | Japan | Daily rainfall 1975–2009 | ANN + Multi-model + PCA | – | 7.555 | 0.9973 | – | – | 14.633 | 0.9880 | – |
Ramana et al. (2013) | India | Monthly rainfall 1901–1975 | WA + ANN | – | 63.01 | 94.78 | 0.974 | – | 163.79 | 64.73 | 0.807 |
Feng et al. (2015) | China | Monthly rainfall 1960–2012 Yeniugou station | WA + SVM | 10.424 | 12.642 | 0.863 | 0.929 | 12.018 | 12.568 | 0.806 | 0.905 |
Qilian station | WA + SVM | 7.828 | 12.689 | 0.892 | 0.945 | 11.57 | 18.777 | 0.762 | 0.875 | ||
Tuole station | WA + SVM | 7.345 | 11.574 | 0.888 | 0.943 | 11.66 | 18.92 | 0.7 | 0.888 | ||
Kalteh (2017) | Iran | Monthly precipitation data 1986–2005 | ANN + SSA | – | 52.257 | 0.731 | 0.858 | – | 91.096 | 0.183 | 0.444 |
Ouyang & Lu (2018) | China | Monthly rainfall 1964–2013 Bamiansha-zhao, Chaganhua, and Chatai station | MGGP + ESN + EEMD + WT + SSA | 1.7703 | 2.6787 | 0.9953 | – | 3.8910 | 5.15570 | 0.9850 | – |
Li & Zhang (2019) | China | 1983–2013 Monthly average precipitation data | SSA + DA + SVR | 5.6120 | 7.4430 | 0.9751 | 0.9782 | 23.8732 | 30.4194 | 0.3430 | 0.5383 |
Unnikrishnan & Jothiprakash (2020) | India | 1961–2013 Daily rainfall data | SSA–ARIMA–ANN | 8.29 | 15.58 | 0.69 | 0.68 | 9.06 | 21.83 | 0.39 | 0.41 |
Author . | Study region . | Data . | Model . | Performance index (with pre-processing) . | Performance index (without pre-processing) . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MAE (mm) . | RMSE (mm) . | CE/NSE . | r/R2 . | MAE (mm) . | RMSE (mm) . | CE/NSE . | r/R2 . | ||||
Partal & Kisi (2007) | Turkey | Daily precipitation data 1987–2001 Afyon station | WT + Neuro-fuzzy | – | 1.06 | – | 0.913 | – | 3.53 | – | 0.037 |
Izmir station | – | 2.17 | – | 0.913 | – | 6.83 | – | 0.124 | |||
Mugla station | – | 3.07 | – | 0.881 | – | 8.14 | – | 0.146 | |||
Nourani et al. (2009) | Iran | Monthly precipitation 1973–1999 | WT + ANN | – | – | – | 0.784 | – | – | – | 0.31 |
Chau & Wu (2010) | China | Daily rainfall Wuxi station 1988–2007 | ANN + SVR + SSA | – | 4.18 | 0.87 | – | – | 10.59 | 0.17 | – |
Zhenwan station 1989–1998 | ANN + SVR + SSA | – | 3.18 | 0.92 | – | – | 10.68 | 0.09 | – | ||
Kisi & Cimen (2012) | Turkey | Daily Precipitation 1987–2001 Afyon station | WT + SVM | 9.0 | 21.4 | 0.647 | 0.815 | 14.2 | 38.7 | 0.154 | 0.103 |
Izmir station | 13.6 | 46.5 | 0.593 | 0.782 | 19.6 | 71.6 | 0.037 | 0.276 | |||
Sumi et al. (2012) | Japan | Daily rainfall 1975–2009 | ANN + Multi-model + PCA | – | 7.555 | 0.9973 | – | – | 14.633 | 0.9880 | – |
Ramana et al. (2013) | India | Monthly rainfall 1901–1975 | WA + ANN | – | 63.01 | 94.78 | 0.974 | – | 163.79 | 64.73 | 0.807 |
Feng et al. (2015) | China | Monthly rainfall 1960–2012 Yeniugou station | WA + SVM | 10.424 | 12.642 | 0.863 | 0.929 | 12.018 | 12.568 | 0.806 | 0.905 |
Qilian station | WA + SVM | 7.828 | 12.689 | 0.892 | 0.945 | 11.57 | 18.777 | 0.762 | 0.875 | ||
Tuole station | WA + SVM | 7.345 | 11.574 | 0.888 | 0.943 | 11.66 | 18.92 | 0.7 | 0.888 | ||
Kalteh (2017) | Iran | Monthly precipitation data 1986–2005 | ANN + SSA | – | 52.257 | 0.731 | 0.858 | – | 91.096 | 0.183 | 0.444 |
Ouyang & Lu (2018) | China | Monthly rainfall 1964–2013 Bamiansha-zhao, Chaganhua, and Chatai station | MGGP + ESN + EEMD + WT + SSA | 1.7703 | 2.6787 | 0.9953 | – | 3.8910 | 5.15570 | 0.9850 | – |
Li & Zhang (2019) | China | 1983–2013 Monthly average precipitation data | SSA + DA + SVR | 5.6120 | 7.4430 | 0.9751 | 0.9782 | 23.8732 | 30.4194 | 0.3430 | 0.5383 |
Unnikrishnan & Jothiprakash (2020) | India | 1961–2013 Daily rainfall data | SSA–ARIMA–ANN | 8.29 | 15.58 | 0.69 | 0.68 | 9.06 | 21.83 | 0.39 | 0.41 |
WA, wavelet analysis; EEMD, ensemble empirical mode decomposition; SSA, singular spectrum analysis; ARIMA, auto-regressive integrated moving average; ANN, artificial neural network; WA, wavelet analysis; WT, wavelet transform; ESN, echo state networks; SVR, support vector regression; ANFIS, adaptive neuro-fuzzy inference system; DA, dragonfly algorithm; MGGP, multi-gene genetic programming.
SSA as a pre-processing technique
WA and EEMD as a pre-processing technique
Ouyang & Lu (2018) applied multi-gene genetic programming (MGGP) and echo state networks (ESN) methods for monthly rainfall forecasts. The author used SSA, wavelet transform (WT), and ensemble empirical mode decomposition (EEMD) as data pre-processing techniques. WT and SSA performed better, while the performance of EEMD at all three stations was inferior. Among all, the WT technique was recommended for short-term rainfall forecast as it can capture the exact locality of any variation in data series (Ramana et al. 2013; Ouyang & Lu 2018). The study conducted by Partal & Kisi (2007) applied a wavelet and neuro-fuzzy conjunction model for a 1-day ahead daily precipitation forecast (Figure 7(c)). The determination coefficient by the neuro-fuzzy method was around 0.1, while the conjunction model increased 8–9 times, significantly improving the results. This may be due to the efficient forecast of extreme values by the conjunction model. In another study, wavelet transform was combined with support vector regression (WSVR) for daily precipitation forecast (Kisi & Cimen 2012). The mean absolute error (MAE), RMSE, Nash–Sutcliffe Efficiency (NSE), R2 value for the single SVR model (without pre-processing), and the hybrid model WSVR is presented in Table 11. Figure 7(d) represents a reduction in MAE and RMSE values after pre-processing at two different stations in Turkey. Previous studies conclude that WT is a superior tool for detecting irregularly distributed multi-scale rainfall features in space and time (Partal & Kisi 2007; Kisi & Cimen 2012; Ouyang & Lu 2018).
From the above discussion, the pre-processing technique considerably improved the model performance mainly by detecting the irregular components and forecasting the extreme values. Traditional ML methods failed to capture the peak values efficiently, which can be significantly improved with hybrid models. From Table 11, it is seen that error indices (i.e., RMSE and MAE) reduce with the application of pre-processing techniques, which may be attributed to the removal of fluctuations in rainfall series.
SUMMARY OF THE REVIEW AND THE WAY FORWARD
Important observations and recommendations from this review
This review paper deals with the PRISMA method of an SR and critical analysis of 110 selected papers from various databases, constituting 1,200 papers without duplication for short-term rainfall forecasting techniques. The study evaluated statistical procedures, physically based numerical weather forecasting models, and ML techniques for the said purpose. The GNSS-derived PWV was found to be capable of analyzing the real-time profile of water vapor content. The method performs well in mountainous regions as it is less affected by altitude and is found to be suitable for nowcasting in tropical regions. The relationship between rainfall, PWV, derivatives of PWV, and other meteorological parameters such as temperature, pressure, relative humidity, and solar radiation is well appraised in the study. It is noted that the threshold PWV value is sensitive to both the season and the location. It is concluded that when GNSS-derived PWV is combined with a suitable ML algorithm, the rainfall nowcast accuracy increases.
The discussed NWP models, ACCESS-VT, ACCESS-A, WRF, and GFS, show that rainfall is underestimated in high elevation areas and overestimated in low elevation areas. It is noted that uncertainty in the forecast increases with the lead time as a longer lead time enhances the internal variabilities in the model. It is concluded that rainfall forecast by the NWP model depends on location, model resolution, season, topography, and forecast lead time. It is also inferred that optimum resolution and lead time can significantly improve forecast accuracy.
The importance of ML techniques in short-term rainfall forecast with and without pre-processing is studied in the present study. A significant improvement in rainfall forecasting is observed with pre-processing techniques. Out of the three pre-processing techniques, the WT and SSA performed better than the EEMD by detecting irregularities, noise, and extreme rainfall values. The pre-processing enhanced the forecast quality by independent modeling of each subseries.
Future recommendations and development needed
The threshold PWV value is season and location-specific. Additional studies are required for the tropical, subtropical, and temperate regions to strengthen the observed results. Efforts need to be made to generalize threshold PWV by drawing its relationship with seasonal variation, location, and easily observed meteorological parameters. The review highlights the need to explore the optimum data length for rainfall forecast in the temperate, tropical, and subtropical regions. Different ML algorithms can be assimilated with GNSS-derived PWV techniques to improve the nowcasting skill. This review highlights the need to perform more sensitivity studies to understand the influence of different parameters like topography, resolution, and lead time on the forecast skill of NWP models. This review focused only on a few specific NWP models, which needs to be extended further to evaluate their efficiency for short-term rainfall forecast. While considering the ML techniques, only three pre-processing techniques were reviewed. Based on the desired output, other pre-processing techniques must be explored. This review did not explicitly consider the influence of ground information, radar, and satellite data affecting rainfall forecasting. Further research is needed to reduce the static noise in the radar data, which affects the forecast efficiency.
The past few decades have witnessed the frequent occurrence of high-intensity, short duration extreme rainfall events including cloud bursts, which cannot be forecasted with sufficient accuracy. This calls for continuous review and updating of existing short-term rainfall forecasting/nowcasting techniques. Apart from this, the development of new techniques needs to focus on efficiently considering the non-stationarity in rainfall time series. A location-based high-resolution nowcasting model is recommended specifically for urban catchments. The densely populated urban areas will have high temperatures, creating local heat differences resulting in localized rainfall events. Therefore, it is important to quantify the impact of urbanization on rainfall nowcast skill. Due to the complex atmospheric process, there is great difficulty in real-time forecasting. Recently, Ravuri et al. (2021) developed an observation-driven approach using deep generative models (DGMs) for skillful nowcast. The potential of DGMs for accurate rainfall nowcasting should be explored in detail. Its performance in terms of lead time, incorporating uncertainty at multiple spatio-temporal scales, and forecasting high-intensity rainfall events should be studied in detail. Similarly, efforts are needed to improve the capability of NWP models by considering the nonlinearities and randomness in rainfall events. For this purpose, the possibility of coupling NWP with the stochastic model (De Luca & Capparelli 2022) can be further evaluated and demonstrated. The rainfall forecast/nowcast with an adequate lead time serves as a mandatory input to the hydrologic models used for urban flood forecasting. Such an integrated module, along with a suitable decision support system, is the need of the hour for effective urban flood management.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.