Rescaled range analysis and conditional probability-based probe into the intrinsic pattern of rainfall over North Mountainous India

In work reported here, we have explored rainfall over North Mountainous India for pre-monsoon (MAM), Indian summer monsoon (JJAS), post-monsoon (OND) and annual. The dependence of JJAS on MAM and OND on JJAS has been explored through conditional probabilities utilizing frequency distribution. An autocorrelation structure has shown that a low lag-1 autocorrelation coefficient characterizes all the time series. We have implemented rescaled range analysis. Through Hurst’s exponent and fractal dimension, we have observed that the MAM time series of rainfall over North Mountainous India has a smooth trend and low volatility. We have further observed that for MAM and JJAS, we have H . 0:5, and D is closer to 1 than to 2. However, we have further observed that for OND and annual rainfall over North Mountainous India H , 0:5 and D 2. Therefore, these two time series have been characterized by high volatility and randomness.


INTRODUCTION
The India Meteorological Department (IMD) adheres to the worldwide norm of four seasons, namely winter, from January to February, summer, occurring from March to May (MAM), also known as the pre-monsoon season. The monsoon lasts from June to September (JJAS) and post-monsoon or autumn seasons, occurring from October to December (OND). The onset of southwest monsoon over south India (Kerala) marks the beginning of the principal monsoon season for the Indian subcontinent. The Indian Summer Monsoon Rainfall (ISMR) contributes about 70-90% of yearly mean rainfall across the country (Pothapakula et al. 2020). In the Indian subcontinent, the majority of the country's population is dependent on agriculture as the primary and fundamental source of their livelihood, and Indian agriculture is, in turn, vastly dependent on ISMR. It is estimated that about 58% of the Indian population is entirely dependent on agriculture as its major and primary source of income and livelihood (Agriculture Today 2019). According to some studies, it is reported that Indian agriculture is one of the most significant contributors to the country's gross domestic product (GDP) (Bharti 2018). The rainfall over India in the four months of the summer monsoon, June-September (herein referred to as JJAS), is very prominent for the country's economy; a rainfall deficiency can have a gross impact during this period on the Indian economy. Hence, predicting the ISMR with adequate lead time is and has been an important area of research. This prediction is essential because it has a significant influence over the water resources of the country, hydrological purposes, agricultural practices and the lives of the country's population who depend on agriculture for their livelihood. Therefore, a postponed arrival of Indian monsoon rainfall can cause drought, which results in famine, whereas an early arrival of monsoon rainfall may cause floods. In either of the two cases, it has an enormous impact on the Indian agricultural practices and the lives of people who depend on them. The irregularities in ISMR affect the water resources of the country, agricultural sectors, health and power and the country's GDP. Many studies conducted by meteorologists worldwide have emphasized the predictability of arrival and withdrawal of monsoon rainfall (Greatbatch et al. 2013;Hall & Roy 1994;Joseph et al. 1994;Kakade & Kulkarni 2016). ISMR displays a prominent yearly inconstancy called troposphere biennial oscillation.
Forecasting rainfall has been an area of research for meteorologists all over the globe (Scaife et al. 2019;Aghelpour et al. 2020;Konduru & Takahashi 2020;Moron & Robertson 2013;Mukherjee 2017;Parthasarathy & Pant 1985;Patra et al. 2005;Sahai et al. 2003). A period of deficient monsoon rainfall from June to September in India is trailed by warm sea surface temperature (SST) abnormalities over the Indian tropical ocean and cold SST inconsistencies over the western Pacific Ocean. These abnormalities persevere until the accompanying monsoon, which yields average or extreme rainfall. In the past, during the 1980s, a strong connection with ENSO (El Niño-Southern Oscillation) was established, which showed an increased affinity of droughts during El Niño and increased in rainfall during La Niña, i.e., opposite of El Niño (Sikka 1980;Pant & Parthasarathy 1981;Rasmusson & Carpenter 1983;Rasmusson & Wallace 1983). Nevertheless, future studies and analysis showed that the link between the ISMR and ENSO had weakened in recent years (Kumar et al. 1999). As a result, it was concluded that we do not yet understand the link of the monsoon to ENSO.
As this study is concentrated on the monsoon over North Mountainous India, we need to have an overview of various meteorological parameters associated with it. Monsoon over North Mountainous India differs from the monsoon in different parts of India. The motivation of this study is to understand the behavior of the Indian summer monsoon over North Mountainous India in a univariate framework and to understand the self-similarity through rescaled range analysis (Krishnamurti et al. n.d.;Kulish & Horák 2016;Mandelbrot 2004;Mandelbrot & Wallis 1969) and Hurst exponents (Rao & Bhattacharya 1999;Tatli 2015). It has been reported in earlier studies that the local physiographic conditions lead to complex rainfall patterns over this region. Also, the spatial differences of rainfall events, trends and variability in the Himalayan Mountains are characterized by large elevation range (Mal 2012;Palazzi et al. 2013;Singh & Mal 2014). These complex relationships are poorly understood to date and have immense scope to be explored. Given the background of significance of rainfall study in Himalaya, this study was embarked on.
In the proposed project, we endeavor to carry out a thorough statistical analysis in the univariate framework. In this connection, we propose to study the autocorrelation function (ACF) (Popovici & Thiran n.d.) and successive autoregressive modeling. Subsequently, we propose to carry out a self-similarity analysis by implementing Hurst exponents through rescaled range analysis. In this way, we shall try to understand how the time series associated with ISMR over North Mountainous India maintains any self-similarity.

Data and materials
In this section, we will present a rigorous analysis of the ISMR data for the period 1845-2006 as obtained from the website of the Indian Institute of Tropical Meteorology (IITM), Pune, India. The weblink is https://www.tropmet.res.in/DataArchival-51-Page. The IITM is an Autonomous Institute of the Ministry of Earth Sciences, Government of India. The details of the data preparation and the map of the study zone are available in Sontakke et al. (2008). At this juncture, it may be noted that the present paper explores the homogenized rainfall data available at the data archival of the Indian Institute of Tropical Meteorology (IITM) (Government of India). The rainfall data presented in the link of the IITM for monthly, seasonal and annual rainfall are in the 10th of millimeter (mm), and the descriptive and inferential part of the current study utilizes the data in the scale similar to that given by the IITM (Pal et al. 2020). The data are homogenized by the IITM itself using the procedure developed in Parthasarathy et al. (1993). The methodology has also been made available by the IITM in the document copyrighted by the IITM, Homi Bhabha Road, Pune 411008, India, and is available at the website mentioned above.
The ISMR corresponds to the months from June to September, which could be abbreviated as JJAS. Apart from this, we have analyzed pre-monsoon, i.e., MAM, post-monsoon, i.e., OND, and the annual rainfall over North Mountainous India for the study as mentioned earlier. Before going into the autocorrelation analysis, we have carried out a descriptive statistical study by computing the standard deviation and mean of the data. In Figure 1, we have created boxplots for all MAM, JJAS, OND and Annual rainfall. Boxplot represents the degree of spread for a data set by dividing it by quartiles. In boxplots, spread for a data set is divided with quartiles (Acharya et al. 2013). The boxplot consists of a 'box' which lies between the first quartile Q1 (25th percentile) and the third quartile Q3 (75th percentile). The second quartile Q2 (50th percentile), which is basically the median of the data, is represented by a vertical line within the box (Acharya et al. 2013). Figure 1 displays the boxplots for all the cases under consideration. It may be noted that MAM and OND have much less spread than JJAS and Annual, and the JJAS and Annual have spreads close to each other. Also, in MAM and OND, the first and fourth quartiles have notable differences, whereas for JJAS and Annual, the first and fourth quartiles are apparently equal. In general, symmetry is notable in JJAS and OND, which implies that the behavior of the annual rainfall is influenced by the JJAS rainfall.

Autocorrelation function
In the current study, the rainfall data of ISMR over North Mountainous India for 1845-2006 are collected from IITM, Pune, India. Initially, an autocorrelation analysis has been performed on the seasonal and annual rainfall time series, i.e., MAM, JJAS, OND and Annual. The computation of the autocorrelation coefficients (for lags k) is carried out as follows (Chattopadhyay 2007;Chattopadhyay & Chattopadhyay 2010): Var(x first (n À k)) p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Var(x last (n À k)) p (1) where r k represents the autocorrelation of order k, x first (n À k) and x last (n À k) denote the first (n À k) and last (n À k) data values and k ranges from 1 to n. All the values of autocorrelation coefficients calculated from a time series are called the ACF. In the current work, autocorrelation coefficients are calculated up to 20 lags, and the corresponding graphs are plotted for MAM, JJAS, OND and Annual.
In the present work, we compute autocorrelation coefficients for all the time series up to lag 20. In all the cases, lag 1 autocorrelation coefficient is far below 1. As the lag 1 autocorrelation coefficient of 1 implies an absolute positive linear association between current and immediately past values lagged by 1 time step, the low values of the autocorrelation coefficients imply a significant departure of the time series from persistence, and hence persistence forecast seems to be not possible in the cases under consideration. Moreover, the ACF has no specific sinusoidal pattern. Therefore, we understand that all the time series are characterized by low persistence. The results are displayed in Figures 2-5 for MAM, JJAS, OND and Annual, respectively. It may be noted that UCL and LCL imply upper control limit and lower control limit, respectively. In Table 3, we have presented a descriptive statistics of the data.

Computation of conditional probabilities
The concept of conditional probability (Soraisam et al. 2018;Sunusi 2019) is one of the conventional concepts in the probability theory. Conditional probability is used to calculate the probability of occurrence of any event given the knowledge that some other event has already occurred. It is important to note that the conditional probability does not imply that there is some relationship between two events, and it does not indicate that two given events will occur simultaneously. The concept of conditional probability is fundamentally related to one of the most influential theories in statistics known as the Bayes theorem. The conditional probability of an event A given the condition that an event B has already occurred is defined as follows (Wilks 2006): where P(AjB) is the conditional probability of an event A occurring, given that event B has already occurred. P(A > B) is the joint probability of events A and B, i.e., probability that both events A and B occur, and P(B) is the probability of the event B.
In the present case, we have divided the entire rainfall range into four classes. The classes have been assigned the categories very low, low, high and very high. To have a conditional probabilistic view based on the samples under consideration, we  have computed the conditional probabilities of various ranges of JJAS rainfall with MAM rainfall as conditioning event; we have observed that the maximum conditional probabilities occur in the cases of high JJAS rainfall, given that MAM rainfall is very low and low JJAS rainfall given that MAM rainfall is low. This gives a probabilistic overview of the dependence of ISMR on pre-monsoon rainfall, and an inverse relationship is apparent based on the given samples. All such conditional probabilities are listed in Table 1. Afterward, we considered the conditional probabilities of post-monsoon rainfall categories, given the different categories of ISMR. First, it is observed that none of the conditional probabilities enlisted in Table 2 are very high, which indicated a lower degree of statistical dependence of post-monsoon rainfall on the ISMR. However, it may be noted that very low, low and high amounts of post-monsoon rainfalls have been conditioned by the similar categories of ISMR. Therefore, it may be interpreted that a positive association, although less prominent than the previous case, occurs between the rainfall during the summer monsoon and the post-monsoon rainfall in India.

Rescaled range analysis
The methodology and the computation rules used in the current study are largely summarized from works of Mittall & Bhardwaj (2011). A statistical method called Hurst exponent is used for understanding the properties associated with the time series without making any assumptions about the statistical restrictions. For calculating Hurst exponents, many methodologies have been proposed in the literature (Geweke & Porter 1983;Kendziorski et al. 1999;López-Lambraño et al. 2018;Chandrasekaran et al. 2019;Sarker & Mali 2021). Among these, in the current study the method of rescaled range analysis is used. In 1951, Hurst stated that  the distance covered by any random particle would increase with the square root of the time among the shorter time series as in the following: By dividing R by the standard deviation S of the time series, the dimensionless form is obtained, where R is the distance covered (range) and T represents the time index. The level of intensity of a trend and the level of noise within a time series can be calculated depending on how the R=S scales with time (Hurst 1951), that is, by how high the H value is above or below 0.5. To understand how R=S is operated, let us consider a time series, x i (i ¼ 1, 2, . . . , N). Initially, the time series is normalized by subtracting the sample mean, x from x i : The series y i is called mean deviated series that has a mean of zero. The cumulative time series is computed as follows: The range series R N is computed as follows: It was stated by Hurst (1951) that if N ¼ T , then applying Equation (1) shows that the time series, x i (i ¼ 1, 2, . . . , N), is independent for increasing values of N. Hurst (1951) also proposed a more general form of Equation (1), since this equation could only be implemented on the time series that is in Brownian motion: where the subscript N represents the duration of the individual short time series and c is a constant. The Hurst exponent is nothing, but the slope of the line equation obtained by plotting the log (R=S) N versus log(N), that is, If the H value is close to 0.5, then the system is identified to have a uniform probability distribution. If the H value is above 0.5, it suggests that every observation carries a persistence of all the cases preceding it, and the H value below 0.5 indicates anti-persistence or short-time persistence. Moreover, by using long-range ACF (r), the impact of the present on the future can be expressed, and it bears a relationship to the H values (Mittall & Bhardwaj 2011) as follows: Similarly, by using the H exponent, the fractal dimension, D, of a 1D time series can also be held (Mittall & Bhardwaj 2011): For Brownian motion, the value of D is equal to 1.5. Another index, the predictability index (PI), can also be calculated by using the H values (Mittall & Bhardwaj 2011): where j: j represents the absolute value of the argument.

RESULTS AND DISCUSSION
In this section, we are going to discuss the outcomes of the rescaled range analysis. Already we have shown that the time series MAM, JJAS, OND and Annual are not characterized by any specific pattern of ACF. This has been clearly depicted in Figures 2-5. This indicates that in none of the cases can we interpret anything regarding the time series apart from its complex behavior. In all the cases the lag 1 autocorrelation coefficients have come out to be much less than 1. Therefore, it has been understood that persistent forecast is not possible in this time series. Also, we have considered the conditional probabilities in Tables 1 and 2 to understand the dependence of JJAS on MAM and OND on JJAS by considering the events of interest and conditioning events accordingly. In the present case, we have divided the entire rainfall range into four classes. The classes have been assigned the categories very low, low, high and very high. To have a conditional probabilistic view based on the samples under consideration, we have computed the conditional probabilities of various ranges of JJAS rainfall with MAM rainfall as conditioning event; we have observed that the maximum conditional probabilities occur in the cases of high JJAS rainfall, given that MAM rainfall is very low and low JJAS rainfall given that MAM rainfall is low. This gives a probabilistic overview of the dependence of ISMR on pre-monsoon rainfall, and an inverse relationship is apparent based on the given samples. All such conditional probabilities are listed in Table 1. Afterward, we have considered the conditional probabilities of post-monsoon rainfall categories, given the different categories of ISMR. First, it is observed that none of the conditional probabilities enlisted in Table 2 are very high, which indicated a lower degree of statistical dependence of post-monsoon rainfall on the ISMR. However, it may be noted that very low, low and high amounts of post-monsoon rainfalls have been conditioned by the similar categories of ISMR. Therefore, it may be interpreted that a positive association, although less prominent than the previous case, occurs between the rainfall during the summer monsoon and the post-monsoon rainfall in India. Now we shall discuss the results of rescaled range analysis as explained in Figures 6-9. The entire computation has been carried out using Equations (3)-(11). In Figure 6, we observe that the slope of the linear equation fitted to the log-log scatter plot is 0.73, and hence Hurst exponent H ¼ 0:73. Since 0:5 , H , 1, we conclude that the time series has a behavior that is distinct from random walk; hence, the time series is not generated by a stochastic process that has the nth value not depending upon all the values before this. This scenario is referred to as long memory of positive linear autocorrelation. We have also computed the fractal dimension. It may be noted that the Hurst exponent is related to the autocorrelation of a given time series and the rate at which they decay, as the lag between pairs of values has an increase. The Hurst exponent H is directly related to fractal dimension, denoted by D. This D is a measure of randomness of a data series. Table 4 shows that for the 1st case, i.e., MAM, we have 1 , D , 2 and D ¼ 1:27. As D is closer to 1 than to 2, we understand that the MAM time series of   Table 3 display that for MAM and JJAS, we have H . 0:5 and D is closer to 1 than to 2. Hence, for MAM and JJAS, we have the same interpretation for Hurst exponent and fractal dimension. However, from Table 3, we further observe that for OND and Annual rainfall over North Mountainous India, H , 0:5 and D % 2. Therefore, these two time-series are observed to be characterized by high volatility and randomness.

CONCLUSION
In this section, we summarize the outcomes of the study as follows: 1. We have considered the seasonal as well as annual rainfall over North Mountainous India. 2. We have observed that the time series corresponding to MAM, JJAS, OND and Annual rainfall are characterized by low lag 1 autocorrelation coefficient. 3. We have observed that the time series do not have any sinusoidal form of ACF. 4. All the four time series have been exposed to the rescaled range analysis. 5. The rescaled range analysis has shown that the Hurst exponent is greater than 0.5 for MAM and JJAS and less than 0.5 for OND and Annual rainfall. 6. The fractal dimension for MAM and JJAS has been found to be close to 1, and for OND and Annual, they are close to 2. 7. Finally, it has been concluded that for MAM and JJAS, we have the same interpretation for Hurst exponent and fractal dimension. However, from Table 4, we further observe that for OND and Annual rainfall over North Mountainous  India, H , 0:5 and D % 2. Therefore, these two time-series are observed to be characterized by high volatility and randomness.
While concluding, let us note that in the present study we have explored the behavior of Seasonal and Annual rainfall time series over North Mountainous India. The intrinsic pattern has been demonstrated through Hurst exponent and fractal dimension. In future work, we propose to develop univariate predictive models depending upon the outcomes of this study.