## Abstract

The theoretical and practical understanding of projected changes in rainfall is desirable for planning and adapting to climate change. In this study, finite size Lyapunov exponents (FSLE) are used to study error growth rates of the system at different timescales. This is done to quantify the impact of enhanced anthropogenic greenhouse gas emissions on the predictability of fast and slow varying components of central Indian rainfall (CIR). The CIR time series for this purpose is constructed using the daily gridded high-resolution India Meteorological Department (IMD) dataset and Coupled Model Inter-comparison Project phase 5 (CMIP5) output for historical run and three representative concentration pathways (RCP2.6, RCP4.5, and RCP8.5) from the HadGEM2-ES, IPSL-CM5A-LR, CCSM4, BCC-CSM1.1, and MPI-ESM-LR models. The analyzed CIR dataset reveals a low dimensional chaotic attractor, suggesting that CIR requires a minimum of 5 and maximum of 11 variables to describe the state of the system. FSLE analysis shows a rapid decrease in the Lyapunov exponent with increasing timescales. This analysis suggests a predictability of about 2–3 weeks for fast varying components at short timescale of the CIR and about 5–9 years for slow varying components at long timescales.

## INTRODUCTION

The predictability study of rainfall or another atmospheric variable has its own importance. The projection from the CMIP5-based model outputs for historical as well as future scenarios has enabled us to utilize these data to study the predictability of the system for historical as well as future scenarios. This quantification of the impact of the increase in greenhouse forcing into the atmospheric predictability can be possible with these long-term data. The influence of anticipated changes in greenhouse gases on climate predictability over timescales of several decades to centuries can be explored using global models (Meehl *et al.* 2009).

The Lyapunov exponent of a dynamical system measures the average rate at which nearby trajectories of a system diverge. The positive exponent provides evidence that the underlying dynamical system is chaotic and there is a limit on the predictability of the dynamical system, i.e., the dynamical system may be predictable over some finite time periods. The nonlinear dynamical method – finite size Lyapunov exponent (FSLE) – is used for quantifying the predictability. It has several advantages over the traditional methods, since nonlinear dynamical methods like largest Lyapunov exponent (LLE) (Rosenstein *et al.* 1993; Kantz 1994) and Kolmogorov–Sinai entropy quantify the error growth rate exhibited by a dynamical system (Boffetta *et al.* 2002). The predictability analysis based on LLE holds for infinitesimal perturbations, which are linked with errors in the smallest, fastest-evolving scales of a dynamical system. However, the smallest errors, if finite, soon grow to nonlinear saturation, thereafter not playing any role in the error growth (Boffetta *et al.* 1998b; Cencini *et al.* 1999; Siqueira & Kirtman 2014,**)** of fast scale dynamics. On the other hand, relatively larger errors have slower error growth rates on larger timescales. Thus, LLE cannot estimate the predictability on longer timescales (Toth & Kalnay 1993; Boffetta *et al.* 1998a, 1998b). This advantage of FSLE over LLE motivated us to use the method for quantifying the predictability of both long and short timescales.

Aurell *et al.* (1997) introduced the FSLE as a measure of the divergence of nearby trajectories, to quantify predictability of processes developing on different timescales. Boffetta *et al.* (1998a) further advanced the concept of FSLE and investigated its applicability to detect large-scale (slow varying) dynamical properties of a measured time series characterized by different timescales. They showed that the large-scale error growth rate due to slow varying components of a dynamical system can be estimated by FSLE, even though data points are limited and embedding dimension is moderate. The FSLE has been used to estimate predictability on both fast and slow timescales (Boffetta *et al.* 1998a; Cencini & Vulpiani 2013; Siqueira & Kirtman 2014). The climate system is supposed to be a system with many characteristic timescales. In a study by Boffetta *et al.* (1998a), dynamics of a system with two different timescales were analyzed using the FSLE method. It was found that the system provides two plateaus, one for short-scale perturbation, which reveals FSLE of fast varying component *λ _{f}* and the second plateau at large perturbation, which gives FSLE of slow varying component

*λ*. They also investigated the role of short-scale parameterization by replacing the true dynamics of the fast-varying component with the stochastic process and found that even though the fast dynamics is not resolved it does not have any effect on correct computation of FSLE of slow varying components. It is also pointed out in their study that, even with a smaller number of data points and also with a moderate embedding dimension, the dynamics of slow varying components can be practically captured. Siqueira & Kirtman (2014) investigated the predictability of ENSO using FSLE on the fast timescale. On the other hand, Mitchell & Gottwald (2012) suggested that FSLE can well approximate the predictability at the slow timescale. It is suggested in their study that predictability at the slower timescale can be much higher than the local LLE

_{s}*λ*, since short-scale instabilities grow faster but become nonlinearly saturated at a much smaller time than large-scale instabilities. Thus, in the present study, the strength of FSLE over LLE to efficiently extract predictability at both fast and slow timescales motivated us to quantify the predictability of the CIR on both fast and slow time. The scale at small perturbation is referred to as short timescale and the scale at large perturbation, where the value of

_{max}**saturated (plateau) is referred to as long timescale.**

*λ*_{s}Studies about the impact of global warming on the Indian monsoon at different timescales are very limited. Mani *et al.* (2009) studied the impact of global warming over the medium range prediction. They showed that the medium range prediction of Indian monsoon will be less predictable due to global warming (Mani *et al.* 2009). They addressed the impact of climate change over medium range prediction but did not study its impact over long timescales. The short-scale predictability of CIR is about 2–3 weeks, but the clear limit on the predictability and role of climate change on long timescales is not well established. Therefore, in the present study, an attempt is made to quantify the predictability of CIR for historical as well as future projections using the theory of nonlinear dynamics on both short and long timescales.

There are several other studies, both on toy model and real-world scenarios, which suggest that the inclusion of forcing increases the predictability of the system (Lorenz model – Palmer 1993; Dwivedi *et al.* 2007; Singh *et al.* 2015). Kirtman *et al.* (2013a) have pointed out that external forcing like aerosol concentration, greenhouse gases, volcanic eruption, etc. provides predictability to the system, ranging from sub-seasonal to decadal timescales. Latif (2013) states that, on longer lead times of several decades, external forcing will cause weakening in Atlantic meridional overturning circulation, which will give rise to predictability. Luo *et al.* (2011) inferred that the influence of global warming results in an increase in predictability of the system on seasonal to inter-annual timescales.

The climate system is a combination of several timescales and the longer timescale components play an important role in climate change (Bigg *et al.* 2003). Climatic variables like rainfall also have variability from sub-seasonal to multidecadal timescales. Climate patterns such as the El-Niño Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), etc. reveal climate variability (Stocker 2013) and the Indian monsoon is found to be influenced by ENSO and the decadal oscillations like PDO and AMO (Krishnamurthy & Krishnamurthy 2016, 2017). The ocean surface variables play a vital role in the climate variability on shorter timescales, for example, SST changes leading to El Niño and La Niña events. On the other hand, the slow varying nature of deep ocean currents produces little influence on climate variability at shorter timescales (Kirtman *et al.* 2013a). However, on longer timescales, these deep ocean currents may play a significant, even a dominant role (Bigg *et al.* 2003; Marshall & Zanna 2014).

The fast-varying components of the atmosphere effect predictability at short timescales and the slow varying components of the atmosphere effect predictability at long timescales. The study of predictability at slow timescale is important because future emission scenarios have less influence over near-term predictions than long-term projections (Kirtman *et al.* 2013b). Over longer periods, these effects can accumulate to cause a critical change (Doblas-Reyes *et al.* 2006). For example, subtle yearly changes in the thickness of ocean ice in polar locales (influencing the surface temperature of the ice and its albedo) may cause long-term cumulative impacts driving climate change. Therefore, slow processes, which do not influence variability on seasonal to inter-annual timescales, may become important in influencing long-term climate variability and hence predictability (Hurrell *et al.* 2010; Solomon *et al.* 2011). At long timescales, the predictability of the system is primarily determined by the ocean, because it has large thermal capacity and longer time for adjustment (Slingo & Palmer 2011).

The predictability horizon depends on the initial condition and on the variable chosen (Anderson *et al.* 2009). The predictability of fast-varying components is generally governed by the internally generated atmospheric variability, i.e. weather, which loses its memory in around a month (Lorenz 1963; Shukla 1981). On the other hand, the predictability of slow-varying components varies at decadal timescales (Tang *et al.* 2008). However, the absolute limit on the predictability of slow-varying components is not well established. The Hasselmann mechanism suggests that the ‘scale of predictability is restricted to the order of the autocorrelation time of the slower component of the system, usually the ocean’ (NRC 1998). That is, the long-term memory of the ocean offers some hope for initialized decadal-scale prediction and predictability (Li *et al.* 2017). There are a large number of other studies which support the presence of predictability of atmospheric variations on multiyear timescale (Griffies & Bryan 1997a, 1997b; Collins *et al.* 2006; Boer & Lambert 2008; Meehl *et al.* 2009, 2014; DelSole *et al.* 2013; Ding *et al.* 2016; Wu *et al.* 2016).

The rainfall observation data from IMD and the data from IPCC CMIP5 model output of five models have been used. The main aim of this study is to assess the impact of increase in anthropogenic forcing on the predictability of short and long timescales of CIR using FSLE. Our result for the short timescale is in agreement with the result of Mani *et al.* (2009). The predictability knowledge of rainfall under future emission scenarios will empower preparedness for optimal water resource usage and disaster management. This paper is organized as follows. Below, the study region and datasets used are described, followed by the methodology adopted. Then, the next section shows the results of the calculated embedding dimensions and FSLE for both fast and slow timescales of IMD observation and CMIP5 model data. The final section explains the conclusions drawn from the results.

## STUDY REGION AND DATASETS

Time series of the daily central Indian rainfall (CIR) is constructed by area (spatial) averaging the daily gridded (0.25 × 0.25) IMD dataset over the monsoon trough region, (Figure 1) [15N–30N; 70E–90E]. A lowpass filter of 10 days is applied in each case in order to filter high frequency components having frequency up to 10 days. The choice of 10-day filter is made in order to preserve the information with the smallest loss and produce clearer information from the data, so that the high frequency noise component from the data is filtered. On the other hand, without applying the filter, due to the presence of high frequency components in the data, correlation exponent vs embedding dimension (*m*) is increasing with the increase in *m*, which is a behavior of random noise. Therfore, choice of the filtering high frequency components from the data is justified. The time series are also created using similar methodologies for Historical and RCP Scenarios of the IPCC AR5 CMIP5 models (Taylor *et al*. 2012), namely, HadGEM2-ES, IPSL-CM5A-LR, CCSM4, BCC-CSM1.1, and MPI-ESM-LR. Historical simulations are based on solar and volcanic forcing, land use, observed concentrations of greenhouse gases, and reconstructed aerosol emissions. Future projections are based on the three representative concentration pathways (RCPs) (Moss *et al*. 2010). In the RCP2.6 pathway the peak in radiative forcing reaches 3 W m^{−2} before the year 2100 and afterwards it declines to 2.6 W m^{−2} in the year 2300. RCP4.5 represents the pathways for which radiative forcing reaches 4.5 W m^{−2} in the year 2300. Similarly, RCP8.5 is the pathway for which radiative forcing reaches 8.5 W m^{−2} by the year 2300. Most of the GCMs are unable to simulate the mean, intra-seasonal variability of summer monsoon rainfall. According to Ashfaq *et al.* (2017), among the GCMs under study, only the MPI-ESM-LR model is able to simulate both rainfall maxima in July and less than 25% bias in precipitation for the south Asian summer monsoon during the historical time 1970–1999. However, the model selection is done on the basis of availability of long-term data to get the required data for time series reconstruction. Tsonis *et al.* (1993) have shown that 10^{(2+0.4D)} number of data points are sufficient to correctly estimate the correlation dimension, therefore, the number of data points listed in Table 1 are adequate for faithful computation on correlation dimension. Therefore the data up to the year 2300 is used, which is a combination of RCP up to 2100 and after that, extended RCP up to 2300; for simplicity it is termed RCP throughout this paper. Table 1 lists the number of data points and durations that have been used and Table 2 lists the model description.

Scenario | Duration | Years | Data points |
---|---|---|---|

Historical | 1860–2005 | 146 years | 53,290 |

RCP2.6 | 2006–2300 | 295 years | 107,675 |

RCP4.5 | 2006–2300 | 295 years | 107,675 |

RCP8.5 | 2006–2300 | 295 years | 107,675 |

IMD observation | 1901–2013 | 113 years | 41,273 |

Scenario | Duration | Years | Data points |
---|---|---|---|

Historical | 1860–2005 | 146 years | 53,290 |

RCP2.6 | 2006–2300 | 295 years | 107,675 |

RCP4.5 | 2006–2300 | 295 years | 107,675 |

RCP8.5 | 2006–2300 | 295 years | 107,675 |

IMD observation | 1901–2013 | 113 years | 41,273 |

Modeling center (or group) | Model |
---|---|

Met Office Hadley Centre | HadGEM2-ES |

Institute Pierre-Simon Laplace | IPSL-CM5A-LR |

National Center for Atmospheric Research | CCSM4 |

Beijing Climate Center, China Meteorological Administration | BCC-CSM1.1 |

Max Planck Institute for Meteorology | MPI-ESM-LR |

Indian Metrological Department | IMD observation |

Modeling center (or group) | Model |
---|---|

Met Office Hadley Centre | HadGEM2-ES |

Institute Pierre-Simon Laplace | IPSL-CM5A-LR |

National Center for Atmospheric Research | CCSM4 |

Beijing Climate Center, China Meteorological Administration | BCC-CSM1.1 |

Max Planck Institute for Meteorology | MPI-ESM-LR |

Indian Metrological Department | IMD observation |

## METHODOLOGY

From a measured time series, phase space reconstruction is done by using the method of time delay as described in the section immediately below**.** The time delay embedding was done for the total appended time series. Appropriate time delay is estimated as described in the section Mutual information. The method used for determining the embedding dimension (*m*) that efficiently accommodates the attractor is then described. This is followed by a section describing Determination of finite size Lyapunov exponents. Figure 2 shows the work flow of the methods shown below.

### Phase space reconstruction

*et al.*2014). The reconstructed trajectory,

*X*, can be expressed by a matrix, where each row is a phase space vector (Rosenstein

*et al.*1993), i.e.: where

*X*is the state of the system at discrete time

_{t}*t*. Let

*x*

_{t}, t*=*0,1, 2, ….,

*N*represent the single variable time series of precipitation data. An

*m*-dimensional phase space may be reconstructed from this time series using the method of time delay (Packard

*et al.*1980; Takens 1981). Let:

### Mutual information

*(d)*(Fraser & Swinney 1986; Siqueira & Kirtman 2014). The mutual information is calculated by using the formula: where

*P*is the probability that is in interval

_{i}*i*and

*P*is the joint probability that the time series is found to be in interval

_{ij}*i*and after time is found in interval

*j*(Battiti 1994; Sprott 2003). Figure 3 shows mutual information

*I(t)*vs steps

*(k)*for IMD rainfall time series over central India, and the inset panel shows

*d*

*=*72 (first minima of

*I(t)*).

### Embedding dimension

The calculation of embedding dimension *m* to acquire the minimum number of variables sufficient to model the dynamics of the system is done by using two methods, those of Grassberger & Procaccia (1983) (termed GP1983) and Cao (1997) (termed Cao 1997). Details about both methods are given in the following two subsections, respectively.

#### Embedding dimension using GP1983

The minimum number of variables required to map the system dynamics is estimated by using correlation dimension. Consider a single variable time series *x _{i}, i*

*=*

*1, 2,*

*…, N*of length

*N*. The method described earlier is used for phase space reconstruction for different values of

*m*. The correlation sum

*C(r)*in an m-dimensional phase space is defined as the ratio of the number of pairs of embedded data points having distance less than

*r*to the total number of pairs (GP1983).

*θ(x)*denotes the Heaviside step function, which is equal to zero for negative values and one for other values. This method counts the number of pairs (

*X*) having distance less than

_{i}, X_{j}*r*and is the total number of pairs for

*N*data points. The limit ensures the characterization of the entire attractor. The correlation dimension is defined by:

This mathematical definition cannot be used for experimental data. However, it is often found that *C(r)* estimated from a reasonably large dataset of size *N* satisfies a scaling law for a range of *r*. The scaling exponent is called the correlation exponent and for a particular embedding dimension *m* can be calculated from slope of the scaling region of the vs graph.

The correlation exponent is computed for different embedding dimensions *m*. A graph of vs *m* is plotted. If the value of saturates to a constant value as *m* is increased, the time series is considered to be chaotic. In this case, the dynamics of the system can be said to reside on a low-dimension attractor of correlation dimension *D**=*. On the other hand, if increases without saturation, as *m* is increased, the analyzed time series is stochastic because a stochastic process embedded in an *m* dimensional space always fills that space (Tsonis & Elsner 1990). The data size, sampling frequency, and noise present in the data are the main factors that affect estimation accuracy of the correlation dimension.

The smallest integer which is greater than the correlation dimension *D* gives the smallest number of suitably chosen variables that can map the dynamics. The value of the embedding dimension *m _{s}* at which the value of the correlation exponent begins to saturate gives the minimum number of dimensions in which the attractor can be embedded completely (Fraedrich 1986). Abarbanel

*et al.*(1990) has pointed out that although Taken's reconstruction theorem requires

*D*

*>*

*2D*

_{s}*+*

*1*to ensure faithful representation of the

*D*dimensional attractor as seen in D-dimensional embedding space, in practice,

_{s}*D*

*>*

*D*is usually sufficient.

_{s}#### Minimum embedding dimension using Cao 1997

*i*

*=*

*1,2,3………M*, and ||…|| is measure of Euclidean distance and mostly chosen to be maximum norm, which can be given as:

*X*

_{i}(m*+*

*1)*is the

*i*th reconstructed vector in the

*m*

*+*

*1*embedding dimension and

*n(i.m)*is an integer in the range 1 ≤

*n(i,m)*≤

*N-m*

*×*

*d,*such that

*X*is the nearest neighbor of

_{n(i,m)}(m)*X*in the

_{i}(m)*m*dimensional reconstructed phase space. The integer

*n(i,m)*in Equation (7) is the same as that of the denominator. The second nearest neighbor is used, if

*X*(m) ≈

_{n(i,m)}*X*. The mean value of all

_{i}(m)*a(i,m)*is denoted by

*a*quantity

*E(m)*, which is defined as:

*et al.*1992). This is due to the fact that the quantity

*E(m)*depends only on the dimension

*m*and lag

*d*. The variation

*E*of this quantity

_{1}*E*from

*m*to

*m*

*+*

*1*, can be given by: If

*E*saturates at any

_{1}(m)*m*then

_{0}*m*

_{0}*+*

*1*is the minimum embedding dimension, which can efficiently accommodate the attractor. To distinguish deterministic signals from stochastic signals the following quantity is calculated: here,

*n(i,m)*has the same meaning defined above for Equation (7). Its variation from

*m*to

*m*

*+*

*1*can be given as:

In the case of the time series from some random process, the *E _{1}(m)* will not saturate as

*m*is increased, which is due to the limited data samples. Thus, practically, it may be difficult to determine whether

*E*is changing slowly or has stopped changing. In such a situation,

_{1}(m)*E*will be useful because for random data future values are independent of the past values and

_{2}(m)*E*will be equal to

_{2}(m)*1*for any

*m*, whereas for deterministic signals there exists some

*m*such that

*E*≠

_{2}(m)*1*.

### Finite size Lyapunov exponent (FSLE)

*λ(δ)*is based on error growth time

*τ*, which is the smallest time it takes for a perturbation of initial size

_{r}(δ)*δ*to grow by a factor

*r*(>1, Cencini

*et al.*1999,

*)*. The ratio

*r*should not be too large, in order to avoid the growth through different scales. In this study

*r*= 2 is chosen for each case, and the error doubling time is denoted by

*τ*. The finite size Lyapunov exponent is then defined by: where is the time in which the separation between neighboring trajectories grows from

*δ*to 2

*δ*and is the average over many realizations, so that: The direction of the initial perturbation must align with the most unstable direction in the phase space. This can be ensured by considering two neighboring trajectories that have separation very much less than

*δ*. The two trajectories are followed until at time denoted by

*t*

*=*

*0*the separation is

*δ*and at time the separation is . It can be shown (Aurell

*et al.*1997) that: where denotes the natural measure along the trajectory by putting in

The FSLE *λ(δ)* were computed from the time series datasets using the method described by Boffetta *et al.* (1998a, 1998b). The computation of FSLE from measured time series dataset is done by sequentially applying subsequent steps. First, a phase space reconstruction is performed using the method described earlier and the time delay is calculated with the method described in the section Mutual information. Second, a threshold series of perturbation scales is taken. A value *δ _{min}* ≪

*δ*is chosen so that perturbations of size less than

_{0}*δ*align with the most unstable direction in the phase space. Then, for each reconstructed vector

_{min}*X*, its nearest neighbors are obtained. Lastly, from these nearest neighbors, only the nearest neighbors (

_{n}*X*), which have separation , smaller than

_{m}*δ*, are chosen. The trajectories starting from reconstructed vector

_{min}*X*and its nearest neighbor

_{n}*X*are used to calculate the error doubling time by measuring the growth time of perturbation from

_{m}*δ*to

_{n}*δ*. The growth of perturbation is followed from initial size less than

_{n+1}*δ*to the maximum value

_{min}*δ*in the threshold series. Finally, The FSLE

_{max}*λ(δ*at perturbation size

_{n})*δ*obtained by averaging the error doubling time

_{n}is*τ(δ*for several nearest neighbors

_{n})*X*and using Equation (13). A measure of the predictability of a system is the error doubling time. It is thus proportional to the inverse of the FSLE (Mitchell & Gottwald 2012).

_{m}## RESULTS

The nonlinear time series analysis of different datasets – CIR time series for IMD observation, historical, and various RCP scenarios of CMIP5 model outputs over central India – is performed using the method described in the section Methodology. The method described in Equations (5) and (6) (GP1983) is used for calculation of the correlation dimension *D* to obtain embedding dimension (*m)* for each of the datasets. In Figure 4(a), the plot of log *C*(*r*) vs log(*r*) is shown for IMD CIR data with the embedding dimension *m* ranging from 1 to 20 (from left to right). The slope of the straight line within the appropriately chosen scaling region for each *m* gives the correlation exponent *v* corresponding to that *m*. In Figure 4(b), the plot between correlation exponent *v* and embedding dimension *m* shows that *D _{s}* saturates to a value of 4.46 at about

*m*= 11. Thus, the CIR data reveal the existence of a low dimensional chaotic attractor having saturation dimension

_{s}*D*

_{s}*=*4.46. Hence, the next positive integer above

*D*, i.e., 5, is the minimum number of variables sufficient to describe the dynamics of the system (Fraedrich 1986). On the other hand, to confirm the deterministic features in data the surrogate data test is performed (Theiler

_{s}*et al.*1992). The four ensembles of surrogate time series were constructed by shuffling randomly the indices of HadGEM2-ES model historical time series data and then the method of correlation dimension on ensemble surrogate time series is applied. There is no saturation found in these cases, as shown in Figure 5. We also performed surrogate data tests on the rest of the data and similar results were found (figure not shown). Thus, the convergence of

*ν*with

*m*is a genuine characteristic of the data and not an artifact of the method used, confirming the deterministic features of the data. The GP1983 method is also chosen for calculation of correlation dimensions of all the other CMIP5 models listed in Table 2. Figure 6 shows the plot of the correlation exponent vs embedding dimension (

*m*) for historical as well as all the RCP scenarios for the HadGEM2-ES model of CMIP5 (figures for the rest of the models are not shown). The fractal dimension of the historical and RCP scenarios for different models of CMIP5 have been calculated and listed in Table 3. From Table 3, it is clear that similar results have been obtained for all IPCC AR5 CMIP5 models. In the present study,

*D*from Table 3 is found to be 5 for all the models. To validate the correct computation of minimum dimension that is sufficient to represent the dynamics of the system the Cao1997 method is used. Figure 7 shows

*E*and

_{1}*E*quantities and

_{2}*E*gives the number of minimum variables sufficient to represent the dynamics of the system and is found to be 5 (saturation value,

_{1}*m*

_{0}*+*

*1*) for the HadGEM2-ES model. It is similar to embedding dimension

*m*calculated from the GP1983 method (Table 3). The

*E*quantity is also calculated, as mentioned in the Cao1997 method (represented by diamonds in Figure 7) to check whether the datasets are the outcome of the deterministic or stochastic processes.

_{2}*E*is also found to be less than 1 for each dimension

_{2}*m*, which ensures the dataset under study results from a deterministic process. To check whether the Cao1997 method is able to capture stochasticity in the surrogate dataset, the method is also applied on the surrogate dataset obtained from the method described above for the HadGEM2-ES RCP2.6 scenario. Figure 8 clearly shows that the

*E*≥ 1 as

_{2}*m*increases, so the method is able to capture stochasticity of the surrogate dataset and it is also evident from the figure that the

*E*also do not saturate, which is expected for the stochastic dataset. The embedding dimension for the rest of the models listed in Table 2 is also calculated using the Cao1997 method and similar results are evident as in the GP1983 method, i.e., embedding dimension

_{1}*m*

*=*5 and the

*E*quantities for these models also suggest the deterministic nature of datasets since

_{2}*E*≤ 1 for each case (figure not shown).

_{2}Scenario | Correlation dimension | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | ||||||

D _{s} | m _{s} | D _{s} | m _{s} | D _{s} | m _{s} | D _{s} | m _{s} | D _{s} | m _{s} | |

Historical | 4.29 | 11 | 4.47 | 11 | 4.51 | 11 | 4.32 | 11 | 4.27 | 11 |

RCP2.6 | 4.78 | 11 | 4.58 | 11 | 4.71 | 11 | 4.82 | 11 | 4.75 | 11 |

RCP4.5 | 4.82 | 11 | 4.65 | 11 | 4.81 | 11 | 4.87 | 11 | 4.82 | 11 |

RCP8.5 | 4.54 | 11 | 4.68 | 11 | 4.85 | 11 | 4.85 | 11 | 4.91 | 11 |

IMD | 4.46 | 11 |

Scenario | Correlation dimension | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | ||||||

D _{s} | m _{s} | D _{s} | m _{s} | D _{s} | m _{s} | D _{s} | m _{s} | D _{s} | m _{s} | |

Historical | 4.29 | 11 | 4.47 | 11 | 4.51 | 11 | 4.32 | 11 | 4.27 | 11 |

RCP2.6 | 4.78 | 11 | 4.58 | 11 | 4.71 | 11 | 4.82 | 11 | 4.75 | 11 |

RCP4.5 | 4.82 | 11 | 4.65 | 11 | 4.81 | 11 | 4.87 | 11 | 4.82 | 11 |

RCP8.5 | 4.54 | 11 | 4.68 | 11 | 4.85 | 11 | 4.85 | 11 | 4.91 | 11 |

IMD | 4.46 | 11 |

The calculation of the FSLE *λ(δ)* is done by using Equation (13) to be on the safe side embedding dimension equal to 2*D* + 1, which is 11 for all datasets (as mentioned before, *D* is the next positive integer above saturation dimension *D _{s}*), is used as the embedding parameter for the calculation of FSLE. The FSLE can give information about how a finite initial uncertainty

*δ*grows with time. The perturbation threshold

*δ*= 0.07 is selected. The

_{min}*k*-nearest neighbor method is used for the computation of nearest neighbors and the Euclidian distance is used as a distance parameter. The 20 nearest neighbors corresponding to each point in the reconstructed phase space were calculated and the nearest neighbors which have distance less than

*δ*are chosen for calculation of FSLE (for instance, 23,305 nearest neighbors found below

_{min}*δ*in the case of HadGEM2-ES historical scenario). It is also verified that increasing the number of nearest neighbors greater than ten, corresponding to each point in the reconstructed phase space, does not contribute to an increase in the number of nearest neighbors below

_{min}*δ*, thus it is the optimal choice for the current analysis. The FSLE

_{min}*λ(δ)*, as a function of

*δ*for the IMD rainfall observed data, is shown in Figure 9. The FSLE

*λ(δ)*for the short scale, i.e., fast-varying component (

**) of IMD rainfall observed data saturates to a value 0.0343 (Table 4). The error doubling time obtained by using Equation (13) can be calculated from the formula , and is around 20.02 days which is in agreement with the weather predictability found by Dwivedi (2012). This quantifies the predictability time for fast variables. The FSLE**

*λ*_{f}*λ(δ)*for large perturbations, that evolve according to slow-varying components of IMD rainfall dynamics, saturate to a value of 3.34 × 10

^{−4}(Table 5), which indicates a predictability time of 5.7 years (Table 7) by using formula . Our analysis of the rainfall data suggests that the data can be the outcome of two weakly coupled dynamical systems, one having a fast timescale and the other having a slow timescale. The FSLE

*λ(δ)*for the historical and all RCP scenarios from all five models have been calculated by using the same method as described above for IMD rainfall observed data and, for brevity, only one figure for computation of FSLE

*λ(δ*) for the HadGEM2-ES model is shown in Figure 10. (Figures for the calculation of FSLE for the rest of the models are shown in Figures S1 to S4.)

Scenarios | Finite size Lyapunov (λ) exponent for small perturbations_{f} | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM | MPI-ESM-LR | |

Historical | 0.0352 | 0.0461 | 0.0373 | 0.0359 | 0.0324 |

RCP2.6 | 0.0373 | 0.0425 | 0.0411 | 0.0345 | 0.0484 |

RCP4.5 | 0.0360 | 0.0393 | 0.0445 | 0.0351 | 0.0480 |

RCP8.5 | 0.0315 | 0.0338 | 0.0324 | 0.0352 | 0.0398 |

IMD | 0.0343 |

Scenarios | Finite size Lyapunov (λ) exponent for small perturbations_{f} | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM | MPI-ESM-LR | |

Historical | 0.0352 | 0.0461 | 0.0373 | 0.0359 | 0.0324 |

RCP2.6 | 0.0373 | 0.0425 | 0.0411 | 0.0345 | 0.0484 |

RCP4.5 | 0.0360 | 0.0393 | 0.0445 | 0.0351 | 0.0480 |

RCP8.5 | 0.0315 | 0.0338 | 0.0324 | 0.0352 | 0.0398 |

IMD | 0.0343 |

Scenarios | Finite size Lyapunov (λ) exponent at large perturbations (×10_{s}^{−4}) | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | |

Historical | 3.78 | 3.60 | 3.67 | 3.41 | 3.62 |

RCP2.6 | 3.61 | 3.00 | 3.12 | 2.73 | 3.38 |

RCP4.5 | 3.11 | 2.47 | 3.14 | 2.58 | 2.43 |

RCP8.5 | 2.81 | 2.12 | 2.96 | 2.32 | 2.47 |

IMD | 3.34 |

Scenarios | Finite size Lyapunov (λ) exponent at large perturbations (×10_{s}^{−4}) | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | |

Historical | 3.78 | 3.60 | 3.67 | 3.41 | 3.62 |

RCP2.6 | 3.61 | 3.00 | 3.12 | 2.73 | 3.38 |

RCP4.5 | 3.11 | 2.47 | 3.14 | 2.58 | 2.43 |

RCP8.5 | 2.81 | 2.12 | 2.96 | 2.32 | 2.47 |

IMD | 3.34 |

The FSLE of historical and of future RCP projections for HadGEM2-ES, IPSL-CM5A-LR, CCSM4, BCC-CSM1.1, and MPI-ESM-LR models at small perturbations are shown in Table 4. The FSLE results for all the models and IMD observation for larger perturbations are listed in Table 5. The positive values of the FSLE confirm the chaotic nature of CIR time series (Hilborn 2000; Sprott 2003). Tables 6 and 7 show the estimated predictability times for different CIR datasets on fast and slow timescales, respectively. The predictability of the fast-varying component of CIR for all the datasets examined was found to be between 14 and 22 days. On the other hand, the predictability for slow-varying component of CIR for IMD observation and all historical scenarios was found to be around five to six years while the predictability of the CIR under the RCP scenarios was around six to nine years.

Scenarios | Predictability (in days) of the fast-varying component | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | |

Historical | 19.7 | 15.0 | 18.6 | 19.3 | 21.4 |

RCP2.6 | 18.6 | 16.3 | 16.9 | 17.0 | 14.3 |

RCP4.5 | 19.3 | 17.6 | 15.6 | 18.6 | 14.4 |

RCP8.5 | 22.0 | 20.5 | 21.4 | 16.0 | 17.4 |

IMD | 20.2 |

Scenarios | Predictability (in days) of the fast-varying component | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | |

Historical | 19.7 | 15.0 | 18.6 | 19.3 | 21.4 |

RCP2.6 | 18.6 | 16.3 | 16.9 | 17.0 | 14.3 |

RCP4.5 | 19.3 | 17.6 | 15.6 | 18.6 | 14.4 |

RCP8.5 | 22.0 | 20.5 | 21.4 | 16.0 | 17.4 |

IMD | 20.2 |

Scenarios | Predictability (in years) of the slow-varying component | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | |

Historical | 5.0 | 5.3 | 5.2 | 5.6 | 5.2 |

RCP2.6 | 5.3 | 6.3 | 6.1 | 7.0 | 5.6 |

RCP4.5 | 6.1 | 7.7 | 6.0 | 7.4 | 7.8 |

RCP8.5 | 6.8 | 9.0 | 6.4 | 8.2 | 7.7 |

IMD | 5.7 |

Scenarios | Predictability (in years) of the slow-varying component | ||||
---|---|---|---|---|---|

HadGEM2-ES | IPSL-CM5A-LR | CCSM4 | BCC-CSM1 | MPI-ESM-LR | |

Historical | 5.0 | 5.3 | 5.2 | 5.6 | 5.2 |

RCP2.6 | 5.3 | 6.3 | 6.1 | 7.0 | 5.6 |

RCP4.5 | 6.1 | 7.7 | 6.0 | 7.4 | 7.8 |

RCP8.5 | 6.8 | 9.0 | 6.4 | 8.2 | 7.7 |

IMD | 5.7 |

## DISCUSSION AND CONCLUSIONS

India is an agroeconomic country. Most of the areas in the central Indian region are rainfed. Therefore, production and yield of Kharif (April to September (summer)) and Rabi season (October to March (winter)) crops (like rice, wheat, etc.) majorly depend on the CIR. Nowadays, groundwater is heavily used in several places in central India for irrigation, but too much exploitation of groundwater resources has caused diminution of this finite resource. The increase in population, food demand, urbanization, and industrialization have caused some serious concerns over fresh water availability (Mall *et al.* 2007), and fresh water resources are likely to decrease due to climate change (Gosain *et al.* 2006). Climate change is supposed to increase extreme events, which could potentially alter the rainfall distribution in terms of time and space. The increase in aridity due to climate warming is likely to add to irrigation water demand (Hanasaki *et al.* 2013; Leng *et al.* 2015). These factors will cause serious concerns about water-related disasters over central Indian regions. Therefore, revealing the potential impact of climate change on the predictability of CIR has great significance.

The correlation dimensions for the different datasets of the study are determined by the technique described in the section Embedding dimensions. It is evident that a low dimensional chaotic attractor exists for all the datasets and a minimum of 5 and a maximum of 11 variables will be able to simulate the system (Table 3). Similarly, low dimensional attractors from rainfall data have been found by several investigations in the range of 4–9 (Dhanya & Kumar 2010; Jothiprakash & Fathima 2013). The results from Table 3 do not indicate any clear increasing or decreasing trend in the values of correlation dimensions as a result of climate change. In other words, the correlation dimensions obtained for different RCP scenarios of a particular CMIP5 model do not present any clear influence of increase in radiative forcing on the correlation dimension. It is clear from the table that IMD, historical as well as all future scenarios can be considered as residing on a strange attractor of non-integer dimensions (Mandelbrot 1982; Tsonis & Elsner 1989; Hilborn 2000).

The FSLE for small perturbations is substantially in the same range for all the datasets (Table 4). The limit on predictability for the fast-varying component, obtained from the growth rate of small perturbations, is found to be about 2–3 weeks for IMD rainfall, historical scenarios and the RCP scenarios of all the CMIP5 models (Table 6). The predictability estimates for the fast component are similar to those reported in the literature in the range of 2–3 weeks (Goswami & Xavier 2003; Waliser *et al.* 2003; Webster & Hoyos 2004; Dwivedi *et al.* 2006; Rai & Krishnamurthy 2011; Dwivedi 2012; Abhilash *et al.* 2014). Figure 11 (top panel) shows the potential predictability of the fast-varying components of CIR in the historical scenario and IMD observation is greater than the RCPs. This implies a short timescale will be less predictable due to global warming and this result is consistent with the result of Mani *et al.* (2009), although some models suggest increased predictability of a few days in RCP8.5 for the fast-varying components. A clear reason for this could not be established.

The FSLE obtained for the slow-varying component of CIR (Table 5) for all the historical data and the IMD observation data are significantly larger than the FSLE obtained for each of the RCP scenarios. Most of the models suggest a reduction in the value of ** λ_{s}** with the increase in radiative forcing. It can be concluded from the results that the predictability of slow-varying component of CIR for RCP scenarios is in the range of six to nine years (Table 7), whereas for historical data and for IMD observational data, the range is between five and six years. Thus, the lower panel of Figure 11 shows that the increase in radiative forcing from 2.6 W m

^{−2}to 8.5 W m

^{−2}will increase the predictability of CIR. Therefore, the CIR under the RCPs will be more predictable than the historical scenarios in long timescales for central India. This increase in the predictability of slow-varying component under RCPs might be because of the increase in the CO

_{2}concentration from RCP2.6 to RCP8.5. This will lead to increased anthropogenic forcing, which could be the possible source of the potential predictability as suggested by Kirtman

*et al.*(2013a), that external forcing like aerosol concentration, greenhouse gases, volcanic eruption, etc. provide predictability to the system, ranging from sub-seasonal to decadal timescales. Using a conceptual forced Lorenz model, Palmer (1993) demonstrated that inclusion of forcing results in increase in the predictability, in the regime residency time of the Lorenz attractor. This means that the dynamical system is confined to a specific region in phase space for a longer time, leading to increased predictability and suggesting that the climate response will be more predictable even in the presence of weak forcing. Dwivedi

*et al.*(2007) and Singh

*et al.*(2015) also used a conceptual forced Lorenz model and found that the forcing causes increase in predictability. This result from conceptual models is supported by studies on real-world data which suggest that external forcing such as aerosols and greenhouse warming may contribute to the long-term predictability (Meehl

*et al.*2010; Jia & DelSole 2011; Kirtman

*et al.*2013a). Beyond the monthly timescales, the atmospheric predictability arises from the forcing induced by slowly varying boundary conditions like SST, soil moisture, etc. (Wu

*et al.*2016).

However, these results should be taken with the caveat that the climate models are still in their evolutionary stage and substantial improvement is still required for better climate change information and improved ability to track the time evolution of the inherent long-scale variability in combination with the forced response.

## ACKNOWLEDGEMENTS

The first author is grateful to CSIR, Govt, of India for providing senior research fellowship letter no. *09/001/0399/2016-EMR-I*. We acknowledge the World Climate Research Programme's Working Group on Coupled Modeling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table 1) for producing and making freely available their model output. The authors are grateful to J. Clint Sprott for the discussion on nonlinear dynamical methods. Thanks are also due to IMD, Govt. of India for producing high-resolution gridded rainfall observations.