ABSTRACT
To anticipate monthly rainfall time series, autoregressive integrated moving average (ARIMA) modelling was performed using R studio. For a period of 30 years, the monthly rainfall data (in mm) were gathered for eight localities in the Mandi district of Himachal Pradesh: Bharol, Chachiyot, Jogindernagar, Karsog, Kataula, Mandi, Sarkaghat, and Sundernagar. The augmented Dickey–Fuller test was performed to determine whether the rainfall data were steady before applying the ARIMA model. The minimal values of the Akaike information criterion, Bayesian information criterion, autocorrelation function, and partial autocorrelation function were used to select the best models. The best fit was determined to be ARIMA (0,0,2) (2,1,0)12 for five stations and ARIMA (0,0,0) (1,1,0)12 for the remaining three stations. A box test is performed to validate the ARIMA model and verify the autocorrelation in the data. Using the ARIMA model, forecasts are made for the 10 years (2021–2030). The validity and performance of the ARIMA models were evaluated using error statistics (mean absolute error = 34.65, root-mean-square error = 33.58, and R2 = 0.93). The projected data's mean and standard deviation were close to the actual data. As a result, the ARIMA model's monthly rainfall prediction parameters were satisfactory.
HIGHLIGHTS
Autoregressive integrated moving average modelling is beyond the traditional applications, which has been applied for the first time in the study area.
Minimal work has been done on forecasting of rainfall in the study area.
INTRODUCTION
Water resources hold a unique position among other natural resources since they are present everywhere and affect both the environment and human existence. The need to increase human food production has led to the practice of land irrigation for millennia (Figuères et al. 2012). India's main economic sector is agriculture. Rainfall is one of the most important natural resources for a country like India where the majority of people depend on agriculture and irrigation. Several water reservoirs can be employed for agricultural purposes, including rivers, canals, and sand bore wells. However, the primary source of water for these reservoirs is rainfall. Rainfall is a natural climatic phenomenon that is difficult to forecast. Given how much the agriculture industry contributes to the country's economy, its forecast plays a crucial role in combatting water scarcity issues in agriculture (Agrawal 2006). The main reasons why rainfall predictions are far from accurate are unreliable starting circumstances, subscale modelling methods, and a lack of adequate spatial resolution (Bustamante et al. 1999; Chou et al. 2005).
Rainfall is an essential part of the water cycle (Eltahir & Bras 1996; Trenberth et al. 2003), and, in the field of meteorology and climatology, it is the most important and dynamic factor associated with atmospheric circulation (Kidd & Huffman 2011). Organisms are predominantly composed of liquid water, which serves numerous functions and should never be regarded only as an inactive solvent. However, after extensive research, several characteristics of water remain baffling. Life is said to rely on the exceptional characteristics of water (Chaplin 2001). Precise and dependable data on precipitation are essential for analysing climatic patterns and fluctuations, as well as for effectively managing water resources and making accurate weather, climate, and hydrological forecasts (Larson & Peck 1974; Jiang et al. 2012). Typically, gauge measurements are employed to directly monitor precipitation at the Earth's surface (Kidd 2001). Nevertheless, gauge measurements suffer from many limitations, including inadequate area coverage and shortcomings in regions near oceanic and low population density (Xie & Arkin 1996; Rana et al. 2015; Kidd et al. 2017). Satellite observations compensate for these shortcomings by utilising sophisticated infrared and microwave equipment to offer more consistent spatial coverage and complete temporal coverage for large regions of the world (Xie et al. 2003; Kidd & Levizzani 2011). These various forms of precipitation data products have demonstrated their utility in a diverse array of study domains. Global and regional climate change patterns have been measured and demonstrated using several sets of data. The data products have been used to study climatology and changes in climate means and extremes in several regions (Alexander et al. 2006; Kunkel et al. 2015).
To represent hydrological time series including runoff, rainfall, evaporation, and monthly stream flow, a number of stochastic models have been developed. All three of these models – autoregressive moving average, seasonal autoregressive integrated moving average, and autoregressive integrated moving average (ARIMA) – have varying orders. Yule (1927), Keskin et al. (2006), Modarres (2007), Nirmala & Sundaram (2010), Asklany et al. (2011), Hassan Ali & Mahgoub Mohamed (2015), and Meshram et al. (2015) are only a few examples of using the aforementioned models. A multidisciplinary analytical technique used to handle prediction concerns is time series forecasting. Because all that is needed for implementation are historical observations of the required variables, it is simple and adaptable (Alsharif et al. 2019). The Box-Jenkins approach is the most important time series model, also known as ARIMA (Ali 2013). The three stages of model creation should be followed when applying the Box-Jenkins model, or generally any stochastic model, to a specific problem (Hipel et al. 1977). An ARIMA model combines three operators: an autoregressive (AR) function applied to historical rainfall values, a moving average function applied to completely random values, and an integration (I) component acting as the differencing operator by minimizing the difference between them (Swain et al. 2018). Usually, time series with temporal correlation is simulated and forecasted using the linear statistical model ARIMA (Wang et al. 2021). Observations from the past and random disturbance term processes are the basis of AR processes. White noises are linearly modelled by moving average approaches. Similar to artificial neural networks, the ARIMA model, which combines moving average and AR processes, is a popular approach for forecasting rainfall (Ahmed Osmani et al. 2021). The predictive power of the ARIMA model has been greatly improved by technological breakthroughs in fields like computers, communication, remote sensing, and geographic information systems.
Previous research on Himachal Pradesh concentrated on trends in seasonal and annual rainfall (Jaswal et al. 2015), trends in rainfall analysis (Ganguly et al. 2015), trends in rainfall impacts (Sharma et al. 2019), and trends in rainfall assessment (Mohd Wani et al. 2017). The Mann–Kendall test is used in all of the aforementioned articles to analyse rainfall trends. In this work, monthly rainfall in Mandi (Himachal Pradesh), India, was predicted and simulated using ARIMA models.
The specific objectives were as follows:
(a) To determine the variation in distribution over the area studied with statistical parameters.
(b) To determine the pattern of rainfall.
(c) To forecast rainfall using ARIMA modelling.
MATERIALS AND METHODS
Study area and data collection
For this study, the Mandi district was selected, which is located at 31°72′N latitude and 76° 92′E longitude (Figure 1). The geographical area of the Mandi district is 3,951 km2 (1,525 miles2), which is 7.10% area of the state and its elevation ranges from 450 to 4,800 m. Mandi is bound by six districts: Bilaspur, Hamirpur, Kangra, Kullu, Shimla, and Solan. Mandi experiences yearly temperature ranges between 6.7 °C (44.06 °F) and 39.6 °C (103.28 °F). The temperature ranges from 6.7 °C (44.06 °F) to 26.2 °C (79.16 °F) in the winter and from 18.9 °C (66.02 °F) to 39.6 °C (103.28 °F) in the summer. The relative humidity values are 100, 65, and 92% for the maximum, minimum, and average, respectively. Rainfall in the Mandi district ranges from 800 to 1,800 mm each year. The Köppen–Geiger climate classification system characterises the selected stations for investigation as having a humid subtropical climate influenced by the monsoons. This climatic zone is defined by hot, humid summers and cool winters. The National Aeronautics and Space Administration's Langley Research Center's Prediction of Worldwide Energy Resource (POWER), which utilises its satellite to calculate the amount of electromagnetic energy emitted by the raindrop or cloud, provided the rainfall data. For eight localities in the Mandi district, including Bharol, Chachiyot, Jogindernagar, Karsog, Kataula, Mandi, Sarkaghat, and Sundernagar, monthly rainfall data (in mm) for 30 years from 1991 to 2020 were extracted for rainfall modelling and forecasting.
Design of model
Establishing time series
Autocorrelation test of time series
Check for stationarity of data
The time series data were checked for stationarity to ensure that the mean and variance remained constant. The stationary test makes use of the augmented Dickey–Fuller (ADF) test. The R studio's adf.test() tool uses the ADF test as a unit root test to determine the time series property of rainfall data (Tam 2013).
Autoregressive integrated moving average model
Parameter estimation and model selection
The R package ‘forecast's ‘auto.Arima’ function was used to calculate the parameters. This software's primary goal is as correctly as possible to apply the ARIMA model to time series data. The Akaike information criterion (AIC) or Bayesian information criterion value is then used to determine which ARIMA model is the best. The ideal model is fit with the lowest AIC value (Awan & Aslam 2020).
Ljung–Box test for model validation
Stations . | ARIMA model . | AIC value . | Ljung–Box (p-value) . |
---|---|---|---|
Bharol | ARIMA(0,0,0)(1,1,0)[12] | 3,958.356 | 0.2365 |
Chachiyot | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Jogindernagar | ARIMA(0,0,0)(1,1,0)[12] | 3,958.356 | 0.2365 |
Karsog | ARIMA(0,0,2)(2,1,0)[12] | 3,812.969 | 0.7566 |
Kataula | ARIMA(0,0,0)(1,1,0)[12] | 3,958.356 | 0.2365 |
Mandi | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Sarkaghat | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Sundernagar | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Stations . | ARIMA model . | AIC value . | Ljung–Box (p-value) . |
---|---|---|---|
Bharol | ARIMA(0,0,0)(1,1,0)[12] | 3,958.356 | 0.2365 |
Chachiyot | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Jogindernagar | ARIMA(0,0,0)(1,1,0)[12] | 3,958.356 | 0.2365 |
Karsog | ARIMA(0,0,2)(2,1,0)[12] | 3,812.969 | 0.7566 |
Kataula | ARIMA(0,0,0)(1,1,0)[12] | 3,958.356 | 0.2365 |
Mandi | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Sarkaghat | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Sundernagar | ARIMA(0,0,2)(2,1,0)[12] | 3,931.858 | 0.9378 |
Measurement of model's performance
Based on several statistical variables, including absolute error and percentage error, that are taken into account when calculating error statistics, model performance was evaluated to determine the potential for regeneration (Dabral & Murry 2017).
RESULTS AND DISCUSSION
ARIMA modelling of monthly rainfall
Review of monthly rainfall time series using the developed ARIMA model
Month . | Mean observed values (mm) . | Mean predicted values (mm) . | Absolute error . | Relative error . | Root-mean-square error . | Mean absolute error . | Correlation coefficient . | Nash–Sutcliffe's coefficient . |
---|---|---|---|---|---|---|---|---|
January | 72.54 | 47.9 | 24.64 | 51.44 | 33.58 | 24.64 | 0.93 | 0.85 |
February | 60.64 | 68.82 | 8.17 | 11.88 | ||||
March | 86.88 | 99.05 | 12.17 | 12.29 | ||||
April | 52.47 | 48.44 | 4.03 | 8.32 | ||||
May | 58.54 | 16.39 | 42.14 | 257.06 | ||||
June | 103.23 | 133.62 | 30.39 | 22.74 | ||||
July | 257.34 | 302.97 | 45.63 | 15.06 | ||||
August | 294 | 239.98 | 54.02 | 22.51 | ||||
September | 107.97 | 172.34 | 64.37 | 37.35 | ||||
October | 5.27 | 14.25 | 8.98 | 62.99 | ||||
November | 31.11 | 3.54 | 27.57 | 779.65 | ||||
December | 21.49 | 23.73 | 2.24 | 9.44 |
Month . | Mean observed values (mm) . | Mean predicted values (mm) . | Absolute error . | Relative error . | Root-mean-square error . | Mean absolute error . | Correlation coefficient . | Nash–Sutcliffe's coefficient . |
---|---|---|---|---|---|---|---|---|
January | 72.54 | 47.9 | 24.64 | 51.44 | 33.58 | 24.64 | 0.93 | 0.85 |
February | 60.64 | 68.82 | 8.17 | 11.88 | ||||
March | 86.88 | 99.05 | 12.17 | 12.29 | ||||
April | 52.47 | 48.44 | 4.03 | 8.32 | ||||
May | 58.54 | 16.39 | 42.14 | 257.06 | ||||
June | 103.23 | 133.62 | 30.39 | 22.74 | ||||
July | 257.34 | 302.97 | 45.63 | 15.06 | ||||
August | 294 | 239.98 | 54.02 | 22.51 | ||||
September | 107.97 | 172.34 | 64.37 | 37.35 | ||||
October | 5.27 | 14.25 | 8.98 | 62.99 | ||||
November | 31.11 | 3.54 | 27.57 | 779.65 | ||||
December | 21.49 | 23.73 | 2.24 | 9.44 |
Forecasting
CONCLUSIONS
Forecasting and modelling rainfall is a challenging task. This study used ARIMA to build a forecasting model for the monthly average rainfall in the Mandi district of Himachal Pradesh. Using the ADF test, it was discovered that the monthly rainfall time series was stationary. Thus, ARIMA models were developed for eight distinct Mandi district regions. The Ljung–Box test reported white noise and normally distributed residuals in the diagnostic testing of all selected models. Higher prediction accuracy was also proven by the various performance statistics gathered during this study's model-building and validation phases. Finally, rainfall predictions can be made using the chosen model. For the 10-year period (2021–2030), monthly rainfall was predicted using the developed ARIMA models. Researchers can apply the findings of this study to other hydrological and water resource management studies. Farmers rely on the weather because of inadequate irrigation infrastructure. Abrupt weather changes necessitate enhanced long-term planning and administration. Consider the effects of climate change before choosing a disaster preparedness strategy. Policymakers should prioritise water demand management.
Although the models were able to accurately reproduce the majority of rainfall patterns, precisely modelling of extreme rainfall patterns remains a difficult task. In addition, the models did not consider several minor seasonal variations in the rainfall data. Additional research is necessary to enhance the models, potentially investigating the following domains:
To incorporate models that were developed for the purpose of predicting extreme rainfall.
To investigate models capable of fitting time series with multiple seasonal patterns, such as Box-Cox transformation, ARMA errors, trend, and seasonal components (BATS), trigonometric exponential smoothing state space model with Box-Coxt transformation, ARMA errors, trend, and seasonal components (TBATS).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare no conflict of Interest.