## Abstract

Global climate models (GCMs) developed by the numerical simulation of physical processes in the atmosphere, ocean, and land are useful tools for climate prediction studies. However, these models involve parameterizations and assumptions for the simulation of complex phenomena, which lead to random and structural errors called biases. So, the GCM outputs need to be bias-corrected with respect to observed data before applying these model outputs for future climate prediction. This study develops a statistical bias correction approach using a four-layer feedforward radial basis neural network – a generalized regression neural network (GRNN) to reduce the biases of the near-surface temperature data in the Indian mainland. The input to the network is the CNRM-CM5 model output gridded data of near-surface temperature for the period 1951–2005, and the target to the model used for bias correcting the input data is the gridded near-surface temperature developed by the Indian Meteorological Department for the same period. Results show that the trained GRNN model can improve the inherent biases of the GCM output with significant accuracy, and a good correlation is seen between the test statistics of observed and bias-corrected data for both the training and testing period. The trained GRNN model developed is then used for bias correction of CNRM-CM5 modelled projected near-surface temperature for 2006–2100 corresponding to the RCP4.5 and RCP8.5 emission scenarios. It is observed that the model can adapt well to the nature of unseen future temperature data and correct the biases of future data, assuming quasi-stationarity of future temperature data for both emission scenarios. The model captures the seasonal variation in near-surface temperature over the Indian mainland, having diverse topography appreciably, and this is evident from the bias-corrected output.

## HIGHLIGHTS

Analysed the performance of the GRNN in correcting the inherent biases of GCM output near-surface temperature (NST) gridded data.

The results show that the method has better accuracy in correcting the biases as compared to the prevalent techniques, considering the diverse topography and seasonal variations in the study area.

The method is also successful in correcting the biases in future NST gridded data with decent accuracy.

## INTRODUCTION

The global climate models (GCMs), which numerically simulate the historical and future dynamics of the earth's climate, are used nowadays as a reliable data source for predicting rainfall and temperature patterns on a finer scale. The highly non-linear and complex interactions among the land–ocean–atmosphere are simplified in the GCMs using various assumptions, approximations, and parameterizations. As a result, biases are produced in the model output from GCM, which become an inherent property of the model output (Dai 2001a, 2001b). These biases need to be corrected before using the data for reliable climate studies. Bias correction can be done using dynamic or statistical approaches. In the dynamic method, the GCMs can be embedded in a data assimilation algorithm to reduce the inherent biases in the data (Levy *et al.* 2014) used the warping technique to correct biases of 21 GCMs. The results showed 44% improvements in root mean square error (RMSE) in intensity and integrated precipitation. Xu & Yang (2012) developed an improved dynamical bias correction technique to correct the biases of GCM variables in North America, and this method provides a more realistic probability distribution function for the variables. Recent advances include providing bias-corrected near-surface temperature (NST) data for a short term in the future period by data assimilation of NCEP-gridded data with observed 2 m NST using a new forward operator approach (Lee *et al.* 2011). However, the dynamical bias correction method is computationally expensive, which hampers their use for data of the large spatial and temporal domains. The statistical bias correction is an alternative for efficient bias correction of large-scale climate data.

In statistical bias correction, a statistical relationship between the modelled and the observed variables is established over the historical period, and the constructed relationship is then used for bias correction of modelled projection. The observation statistics, namely mean, variance, and skewness, are used to detect and correct the biases from the GCM predicted output. The primary assumption in this method is that the data are stationary on a temporal scale for the model projections, so that the transfer function can be applied for bias correction of these future model projections. Numerous researchers have developed and adopted various statistical bias correction methods over time for bias correction of GCM output parameters. Ines & Hansen (2006) used a simple multiplicative shift for correcting the GCM-simulated monthly rainfall in semi-arid Kenya. This method, called the delta change method, shifted the mean of the predicted period based on the difference of means in the historical period. The frequency and intensity of rainfall were not corrected by this method. The bias-corrected precipitation was used for crop simulation. Sheffield *et al.* (2006) constructed a 50-year bias-corrected 3-hourly and 1° meteorological forcing dataset to drive land surface modelling. The precipitation and temperature of the National Centre for Environmental Prediction (NCEP) dataset were corrected using the CRU dataset as observations using the delta change method. He used the same multiplicative shift for correction of precipitation, whereas an additive scheme was used for bias correction of the temperature on the NCEP to the Climate Research Unit (CRU) dataset. Schmidli *et al.* (2006) modified the linear scaling method to correct the biases of the precipitation not only in the mean but also in frequencies and intensities of the wet days. This method, called the local intensity scaling method, has two steps, namely finding the threshold parameter and correcting wet day frequency with precipitation more than the threshold. Leander & Buishand (2007) used a power transformation to correct the mean and coefficient variation of the precipitation. The delta change method and the power transformation did not provide satisfactory results for projected future data as it assumes stationarity in adjustment factors. Maurer & Hidalgo (2008) developed the cumulative distribution function (CDF) method for correcting biases in rainfall data by mapping the CDF of the biased model outputs onto the distribution of observations. This method corrected not only the mean but also the variance and distribution of the data. However, this method assumed that the future modelled outputs have the same distribution as that of historical model simulations. Piani *et al.* (2010) used the quantile-based mapping approach to remove the biases of the daily precipitation in a regional climate model (DMI-HIRHAM version 5) over Europe*.*Li *et al.* (2010) developed the equidistant CDF (EDCDF) matching technique for bias correction of monthly precipitation and temperature data from AR4 models for the projection period by adjusting the CDF based on the difference between the model and observed data for the baseline period. The distribution-based methods outperformed the mean-based methods in many situations, but in some instances where there is low temporal consistency between the time series of modelled and observed values, the EDCDF and CDF methods were not able to correct biases in data, as proved by Chen *et al.* (2013). Acharya *et al.* (2013) compared six methods for bias correction of GCM output for Indian summer monsoon rainfall (ISMR), namely mean bias removal method, multiplicative shift, standardized reconstruction technique, regression method, quantile mapping method, and principal component regression. The standardized reconstruction method and quantile mapping provide better results compared to the other methods in simulating ISMR. Li *et al.* (2019) compared five statistical methods for bias correction of wind speed simulated by the regional climate model downscaled from CNRM-CM5 data. The quantile mapping approach showed better results compared to the other methods. Dimri (2020) showed the viability of empirical quantile mapping in correcting biases in basin-scale data comprising precipitation and temperature in the Upper Ganges and Satluj River basins. Enayati *et al.* (2020) analysed the performance of non-parametric quantile mapping methods in bias correction of GCM-simulated rainfall and temperature in the Karkheh River basin in Iran with diverse topographical and climatic conditions. Results showed that in certain conditions, the performance of quantile mapping was not up to the mark.

Machine learning is the field of study that allows the computer to learn or train to obtain certain outputs without being explicitly programmed. Regression models in machine learning are efficient statistical models that are used in diverse fields for the prediction and forecast of data. A regression model is typically used to establish a linear or non-linear relationship between the predictors and the predictand. A study conducted by Hewitson & Crane (1996) shows that non-linear models outperform linear models when a complex relationship is followed between the input and the output. Artificial neural networks (ANNs), inspired from biological neural networks, are sophisticated non-linear regression models that can approximate almost any continuous input and output mapping (Hornik 1991)*.*Maier & Dandy (2000) and Maier *et al.* (2010) reviewed the papers that used ANN models for flow parameters estimation in water resource systems. The results showed that the prediction was made using feedforward neural networks in most cases, which provided better results than simple regression models. Marzban (2003) used the ANN for calibrating and improving forecast systems of temperature using nine climate variables directly or indirectly affecting the temperature. In recent decades, the ANN gained significant momentum in variable prediction and parameter-estimation purposes in climate and water resources research. Acharya *et al.* (2014) used a type of ANN named extreme learning machine for developing a multi-model ensemble scheme for the northeastern monsoon rainfall from GCM product in the southern Indian peninsula during the winter months of October–November–December. The method performed reasonably well in the prediction of seasonal mean rainfall of northeastern monsoon. Moghim & Bras (2017) used the ANN for bias correction of climate variables. They developed a three-layer feedforward ANN model for bias correction of CCSM3 modelled air temperature and precipitation in the northern part of South America. This method proved to be effective in improving all the metrics, even when the correlation between the modelled and the observed dataset is low. Jeong & Lee (2018) used a generalized linear model for bias correction of weather research and forecasting (WRF)-projected near-surface temperature (NST) and wind speed. Results showed improved numerical weather prediction accuracy after bias correction.

A generalized regression neural network (GRNN) is a type of probabilistic neural network that was developed by Specht (1991)*.* It is a memory-based network that gives an estimation of continuous variables and converges to the underlying regression surface. He showed that the algorithm provides smooth transitions from one observed value to another even with sparse data in multi-dimensional measurement space. Ustaoglu *et al.* (2008) predicted the daily mean, maximum, and minimum temperature from station data in the southeast of Turkey using the GRNN. The time-series prediction gave promising results compared to a simple backpropagation neural network. Wang & Sheng (2010) used GRNNs to predict annual rainfall in the Zhengzhou region of China. The results proved better consistency than the traditional ANN with respect to capturing the underlying temporal and spatial variations and predicting the data for the future. In this study, a GRNN model is proposed for bias correction of the CNRM-CM5 modelled gridded NST for the Indian mainland, which is available at a resolution of 1.4° × 1.4° with reference to the Indian Meteorological Department (IMD) gridded NST available at 1° × 1°. The study domain and datasets are discussed in Section 2. The methodology and network architecture are discussed in Section 3 followed by Results and Discussions in Section 4. The conclusions are presented in Section 5 followed by the acknowledgment and the data availability statement.

## STUDY DOMAIN AND DATASETS

We have selected the landmass in India, excluding the Bay of Bengal and the Arabian Sea extending from 67°5E to 97.5°E latitude and from 7.5°N to 37.5°N longitude for the current study. The region exhibits different land uses such as forests, valleys, deserts, hills, mountains, and even marshy areas. The Himalayas is located on the northern border of India, covering a length of 2,400 km, thereby separating the Indian mainland from China. The Himalayas plays a vital role in Indian climatology in different seasons. The central, western, and eastern parts of India are mostly plain regions. In contrast, the northern and northeastern parts of the country are conglomerates of hills, valleys, and plain landforms with high variations in climate patterns. The southern peninsular region of India is mainly a plateau region with plain coastal areas. The Western Ghats in the southern peninsular region, with an elevation of 500–1,000 m from mean sea level, also plays a vital role in the climatology of south India. The northernmost and northeastern hilly parts of the country, lying in the foothills of the Himalayas, are at an elevation in the range of 2,000–3,500 m, and a slightly colder climate is expected in this region. The topographical map of India is shown in Figure 1.

NST is a primary climate variable used for climate change studies. This paper uses the new approach of bias correction for correcting the biases in the CNRM-CM5 model output NST developed by the National Center for Meteorological Research (France) under the Coupled Model Intercomparison Project 5 (CMIP5) project. CNRM-CM5 is a coupled earth system model that explicitly models the movement of carbon through the earth system and numerically simulates the land–atmosphere–ocean circulation. The gridded data used here are available at a scale of 1.4° × 1.4° (longitude × latitude). The historical GCM output data used for training the network are available from 1951 to 2005 daily, extending from 6.3°N to 38.5°N (latitude) and 67.5°E to 98.4°E (longitude). The future GCM output data that need to be bias-corrected are the daily NST data from the CNRM-CM5 model corresponding to the RCP4.5 scenario for 2006–2100. The data used as a reference for bias correction are the IMD-gridded NST data present at a resolution of 1° × 1° for the historical period of 1951–2005 (Bhaskar Rao & Ratna 2009). These data are developed by interpolation of station data of IMD after undergoing a proper quality check. A total of 395 temperature recording stations are used for developing the gridded data using interpolation. Then, the data are filtered to get rid of outliers and erroneous values on certain days. Further, the IMD data have no recorded value for some days at some grid points covering the northern hilly regions of Jammu and Kashmir and Ladakh due to the unavailability of station data on these days or a sparse number of stations in these areas. These missing values are replaced by linear interpolation from adjacent days when the days with no data record are isolated and less than 5% of the total number of days of record. This will not affect the training algorithm of the model as the temperature on 2/3 consecutive days will not vary much in a season. The grid points having missing data points greater than 5% of the total number of data points with consecutive days with missing values are discarded for the study, as this will cause erroneous training of the GRNN model. For our work, the IMD-gridded daily data for NST are used for the region delimited by the latitudes 7.5°N–37.5°N and longitudes 67.5°E–97.5°E, and the daily average value is taken from the daily maximum and minimum values for the study. To carry out the bias correction, the GCM data are re-gridded to the scale of IMD-gridded temperature data (1° × 1°) using bilinear interpolation for the entire time period. Bilinear interpolation is an interpolation technique for interpolating data on a 2D-mesh grid from one resolution to another. In this process, first linear interpolation is carried out in one direction and then again in the other direction. The final interpolation is not a linear process as a whole, but it is rather a quadratic process.

## METHODOLOGY

Bias correction is the process of removing the systematic biases in the GCM modelled output parameters with reference to the observed data for the historical period and then applying the developed model or the method for bias correcting the future data. The basic idea is about finding a sufficiently flexible and adaptive approach that can learn from the available data and develop a predictive function that performs well for the projection period. Here, in this study, a GRNN model, a type of ANN, is used to learn the bias structure at each grid point of the historical data for the study domain. The trained network is then used to make a bias-corrected prediction of the future data. The future projections are obtained from the same GCM for the time period 2006–2100 for the emission scenario of RCP4.5 and RCP8.5. The details of the method are discussed in subsequent sections.

### Basic structure of the ANN

A machine learning algorithm consists of two parts, namely training and testing, where training is the process of learning through experiences, examples, and adaptation, and testing is the process of checking the capability of the model to adapt to unseen data. The structure of ANN is composed of artificial neurons or nodes in multiple layers. The input layer consists of the input nodes connected to the input variables, which passes the information to the hidden layers consisting of the hidden nodes, and finally, the output layer consisting of output nodes. The nodes in the layers are connected to the subsequent layer by weights which are computed using an activation function. The model should self-organize its structure with iterations to react appropriately to unseen inputs to increase its generalization capability. The weights determine the effective amount of information that is contributed between nodes. The hidden layers provide the parallel and non-linear structure of ANN. The activation function can be linear or non-linear, and based on the complexity and type of problem, we can use the required activation function.

### Architecture of the GRNN

A GRNN is a form of neural network with radial basis function (RBF) as activation function that can be used for any regression problem with a complex non-linear relationship between input and target variables. It works well to converge to the optimum function value even when the available training data are insufficient to train an ANN. A GRNN follows a one-pass learning algorithm with a highly parallel structure and does not follow an iterative procedure. Hence, it proves to be more efficient than the backpropagation neural network in terms of iterations and time consumption. The GRNN can be used with even sparse data, and it is not sensitive to randomly initialized weights, making it better than the traditional feedforward backpropagation neural network (Specht 1991). The major difference between a GRNN and a traditional RBF neural network is the method the weights are determined. Instead of training weights, the GRNN assigns the target value directly to the weights from the training set associated with the input vector and the corresponding output vector, making it suitable for prediction, modelling, mapping, and interpolation. The advantages of the GRNN include its low training time and high accuracy, and disadvantages include the growing hidden layer size. These features make it a suitable technique for statistical bias correction of climate variables, as discussed in the study. The basic concept of generalized regression and the model development of GRNN are shown in subsequent sections.

#### Generalized regression

*Y*on the independent variable

*X*is the computation of the most probable value of

*Y*for each value of

*X*, where

*X*is the vector input, and

*Y*is the vector output of the network. If the joint pdf of

*X*and

*Y*is calculated, then the conditional pdf and expected value of

*Y*can be calculated from

*X*. The resulting regression equation can be implemented in a parallel neural network-like structure. If is the joint continuous probability density function of random variables

*X*and

*Y*estimated from the sample of observations, then conditional means of

*y*given

*X*is

Equation (3), which involves summation over the observation, is directly applicable to the numerical data. The smoothing parameter *σ* estimates the density of the distribution function; large *σ* (≈1) makes the density smooth and becomes a multivariate Gaussian, whereas small *σ* (≈0) indicates the density is taking non-Gaussian shapes.

#### Generalized regression neural network

A GRNN has a four-layer structure, namely input layer, pattern layer, summation layer, and the output layer. The input layer provides the scaled measurement variables to the second layer, the pattern units. The pattern unit is dedicated to one cluster centre. When a new vector is entered into the network, it is subtracted from the stored vector representing each cluster centre. The absolute values of the differences are fed into the exponential activation function, and the output is passed to the summation layer. The summation units perform a dot product between the weight vector and the vector composed from the output of pattern units and generate an estimate of the joint probability density function of the output vector from pattern neurons. The output layer finds the value of the required variable by taking the ratio of the output from summation units. The best fit network for the data is found by changing the smoothing parameter using the trial-and-error basis. The smoothing parameter basically alters the degree of generalization of the network. A high smoothing parameter approaching 1 increases the network's ability to generalize and degrade prediction error. Conversely, the low smoothing parameter degrades the generalization capability of the model. The architecture of the GRNN model developed for the study is shown in Figure 2.

### Model setup of the GRNN for the study

The CNRM-CM5 modelled NST-gridded data and IMD-gridded NST data available for the period 1951–2005 are divided into four parts or seasons, namely March–April–May (pre-monsoon), June–July–August–September (monsoon), October–November (post-monsoon), and December–January–February (winter). The season-wise division is required for the study since the simulation ability of the GCM varies with seasonal effects. The historical data of both the GCM and IMD observed data corresponding to 334 grid points over India for each season are now separated into two parts, namely training set and testing set with data from 1951 to 1990 representing the training set and the data from 1991 to 2005 representing the testing dataset.

The GRNN model is now trained using the gridded data of 334 grid points over India from CNRM-CM5 modelled data input and the IMD observed temperature as the target for the training period corresponding to each season. The developed model is then used to make predictions from the testing dataset using the GCM data as input. The mean square error and the regression coefficient between the GRNN model predictions and observed data for the test period are used as metrics to select the optimum value of the smoothing parameter for the dataset and determine the best fit model. Table 1 shows the smoothing parameter used in each season for the model.

Season . | Smoothing parameter (σ)
. |
---|---|

Pre-monsoon | 0.75 |

Monsoon | 0.8 |

Post-monsoon | 0.85 |

Winter | 0.8 |

Season . | Smoothing parameter (σ)
. |
---|---|

Pre-monsoon | 0.75 |

Monsoon | 0.8 |

Post-monsoon | 0.85 |

Winter | 0.8 |

Having developed the GRNN model for each season, the trained network is now used for bias correction of the future dataset of NST from the CNRM-CM5 model for the emission scenarios RCP4.5 and RCP8.5.

## RESULTS AND DISCUSSION

### Analysis of temperature time series at a grid point in the test period (1991–2005)

To validate the model performance in bias correcting the GCM outputs, a regression plot is plotted between the predicted output and target data for the test period (1991–2005) for each season, as shown in Figure 3. The CDF plot, which depicts the cumulative probabilities of the data values, is also an important metric for checking a model's viability in capturing trends in data. The CDF plots for observed data, GCM output data, and bias-corrected data by EDCDF and GRNN methods at a grid point covering the Indian capital region, i.e. Delhi (28.5°N latitude, 77.5°E longitude), for all the seasons are shown in Figure 4. From the subplots in Figure 3, it can be inferred that the GRNN model performs decently in capturing the climatology pattern in all four seasons in India. However, the model's adaptability and capability in learning the trend and pattern of IMD observed temperature data is better in the monsoon and post-monsoon seasons as compared to that of pre-monsoon and winter seasons as depicted by the regression coefficient values from Figure 3(a)–3(d). The CDF plots in Figure 4 also depict that the GRNN model performs well in shifting the CDF of the GCM output data to match the CDF of observed data for all the seasons. The model performs significantly better than the EDCDF method in correcting the CDF in monsoon and winter seasons shown in Figure 4(b) and 4(d). Figure 4(a) and 4(c) delineates the fact that the empirical CDF plots of GCM observed and predicted data in the pre-monsoon and post-monsoon are overlapping and are in close proximity to each other. This shows that the bias in the data in these two seasons is less compared to that of monsoon (Figure 4(b)) and winter (Figure 4(d)), where there is warm bias and cold bias, respectively, in the CNRM_CM5 simulated data.

The slight underperformance of the model in the pre-monsoon season is probably due to greater variability of NST in different regions of India during the pre-monsoon season, with northern and central India experiencing extreme annual temperatures and eastern, northeastern, and south-western India experiencing relatively mild temperatures. This can also be inferred from the CDF plot in Figure 4(a). There is also greater intra-seasonal variability in the data in winter, as observed from the CDF plot of Figure 4(d). The northern and central parts of Indian experience extremely cold temperatures, whereas the western, southern, and eastern parts of the country have a relatively warmer climate during this season, resulting in a greater variety of climate patterns across the country. However, in the monsoon and post-monsoon seasons, the variability of NST across the country is relatively less, with a majority of the regions experiencing moderate temperature during these seasons, as depicted by Figure 4(b) and 4(c). Further, the amount of data available per season for training the model also affects the model's learning capability. The quality of station data recorded in the winter season in the regions covering the foothills of the Himalayas may not be good sometimes due to the station's damage and difficulty in repairing in time due to extreme meteorological conditions. In the pre-monsoon season, tropical cyclones hit the eastern coast of the country almost every year, which might cause damage to the station resulting in faulty records of data.

The GRNN model is also checked for its efficacy with respect to the primarily used EDCDF and ANN methods of bias correction at a grid point covering India's capital Delhi. The Kolmogorov–Smirnov (KS) statistic, correlation coefficient, and the RMSE between the model predictions and the observed values are compared for the test period between methods for checking the efficacy of the model.

The KS test statistic from the GRNN model is compared with respect to the result from bias correction of the data by the EDCDF method and the traditional feedforward ANN. The corresponding correlation coefficient and the mean square error between the bias-corrected and reference IMD observed datasets are also calculated in the test period (1991–2005) for both the methods at the above-mentioned location (28.5°N latitude, 77.5°E longitude). The performance of the GRNN-based regression model in improving the bias correction of GCM with respect to the distribution-based EDCDF method is shown in Table 2.

Season . | March–April–May . | June–July–August–September . | October–November . | December–January–February . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Model . | EDCDF . | ANN . | GRNN . | EDCDF . | ANN . | GRNN . | EDCDF . | ANN . | GRNN . | EDCDF . | ANN . | GRNN . |

KS | 0.19 | 0.15 | 0.09 | 0.19 | 0.12 | 0.07 | 0.41 | 0.15 | 0.14 | 0.17 | 0.12 | 0.08 |

ρ | 0.65 | 0.64 | 0.67 | 0.57 | 0.61 | 0.67 | 0.55 | 0.54 | 0.54 | 0.06 | 0.09 | 0.11 |

RMSE | 3.30 | 3.09 | 3.10 | 3.17 | 2.71 | 2.67 | 2.81 | 2.69 | 2.68 | 4.10 | 4.03 | 3.87 |

Season . | March–April–May . | June–July–August–September . | October–November . | December–January–February . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Model . | EDCDF . | ANN . | GRNN . | EDCDF . | ANN . | GRNN . | EDCDF . | ANN . | GRNN . | EDCDF . | ANN . | GRNN . |

KS | 0.19 | 0.15 | 0.09 | 0.19 | 0.12 | 0.07 | 0.41 | 0.15 | 0.14 | 0.17 | 0.12 | 0.08 |

ρ | 0.65 | 0.64 | 0.67 | 0.57 | 0.61 | 0.67 | 0.55 | 0.54 | 0.54 | 0.06 | 0.09 | 0.11 |

RMSE | 3.30 | 3.09 | 3.10 | 3.17 | 2.71 | 2.67 | 2.81 | 2.69 | 2.68 | 4.10 | 4.03 | 3.87 |

As seen from Table 2, it can be observed that the KS statistic between the bias-corrected and observed IMD datasets is less in the GRNN-based model as compared to EDCDF and ANN-based prediction methods for all four seasons. However, the ANN method performs almost comparable to the GRNN method in improving the KS statistic values. The low values of KS statistic between the bias-corrected and observed time-series data at the grid point show that there is a negligible bias between the two datasets. Further, the correlation between the bias-corrected and observed datasets is improved by the GRNN and ANN models as compared to the EDCDF method for the test period. However, the values of the correlation coefficient in pre-monsoon and post-monsoon seasons are almost the same in both cases. In contrast, a GRNN performs much better in monsoon and winter seasons in improving the correlation between the datasets. This is probably due to the reason that the GCM dataset has its distribution in close proximity to the distribution of observed data, as seen from Figure 3, and the EDCDF method performs well when there is some correlation in the data. The error in terms of RMSE has only marginal decrement in the data predicted by the GRNN method with respect to that predicted by the ANN and the EDCDF methods. Thus, all three methods perform similarly in terms of decreasing the error between the datasets, with the GRNN giving slightly better results. The GRNN model also has slightly better computational efficiency with respect to the feedforward ANN method used in the analysis. Based on this analysis, it can be assumed that the proposed method provides promising results in terms of bias correction of GCM-simulated data at a grid point. In the subsequent sections, the model's viability in correcting biases in the entire study domain is analysed in detail for each season.

### Evaluation of test statistics for the validation of the model in bias correction

The temporal mean is the first moment about the origin of the data distribution that is the primary statistic that accounts for the central tendency of the data. The spatial variation of the mean at each grid point in the study area needs to be analysed involuntarily for checking the viability of the regression model in bias correction of the GCM data. Secondly, the variance is the second moment about the mean, and the positive square root of the variance is the standard deviation that defines the spread of the data. The bias correction methodology adopted must correct the standard deviation at each grid point with sufficient accuracy to match the standard deviation of the observed data to be accepted as a good bias correction technique. In the following sections, the ability of the model to capture the temporal and spatial variations of the statistics, namely temporal mean and standard deviation from the observed data to correct the biases in the input CNRM-CM5 NST data, is evaluated for each season.

#### Analysis of the NST climatology in the pre-monsoon season in the study region

In Figure 5(a)–5(c), the mean NST is predicted well by the GCM for northern India and the Western Ghats but has a cold bias in the mean in most parts of the southern, central, and eastern part of India. The state of Karnataka and some parts of Maharashtra in the Western Ghats experience slightly colder NST than the other regions in the peninsula in this season due to its higher elevation from the mean sea level. There is also a cold bias in the GCM data compared to the IMD observed mean temperature in the northeastern part of India. However, Figure 5(d)–5(f) shows that the GCM data have almost comparable spread from the mean with respect to the observed data in the southern and central parts of India. However, the bias in the variance in the data at western, eastern, and northeastern parts of India is positive compared to the observed data. The higher variance and standard deviation in the data in the northwestern part of the country can be attributed to the effect of drastic intra-season changes in the temperature in this season, with the weather shifting from cold to warm chronologically within the pre-monsoon season. Furthermore, the GCM predicts a higher deviation of daily NST from the mean in the northeastern part of India, and this is duly corrected by the GRNN model to match the standard deviation of IMD observed data.

The GRNN model significantly improves the biases in the NST data from the GCM, which is confirmed by the regression plot between the respective means and standard deviations of the observed and bias-corrected data pre-monsoon, as shown in Figure 6(a) and 6(b). Even though the regression coefficient between the observed and bias-corrected data was marginally less in this season compared to the monsoon season, the less variability of the GCM data from the observed mean data in most parts of the country eases the bias correction operation for this season. As illustrated by Figure 7(a) and 7(c), the mean NST for the future period (2006–2100), predicted by the GCM, has a significant cold bias in most parts of the country except the northern hilly region and some parts in the southeastern region for both the scenarios. From the spatial variation plot of standard deviation in Figure 7(e) and 7(g), it can be inferred that the deviation in the GCM modelled NST data has a similar spatial pattern as historical GCM data in most parts of the country, with slightly increased values for both the RCP4.5 and RCP8.5 scenarios. The trained GRNN model is capable of correcting this cold bias in the future GCM data to correlate the spatial variation of temporal mean and standard deviation of observed historical NST. This inference is clearly depicted by Figure 7(b), 7(d), 7(f) and 7(h).

#### Analysis of the NST climatology in the monsoon season in the study region

The temporal mean of GCM output NST deviates significantly from the observed mean temperature for the monsoon season. From Figure 8(a) and 8(b), it can be inferred that the temporal mean of GCM output NST has a warm bias as compared to the observed data in the western, central, and southeastern parts of the country and a cold bias in the northern and northeastern regions comprising of hills and plateaus. Most parts of southern and south-central India experience lower temperatures in this season than the pre-monsoon season due to the effect of the south-western monsoon that hits these areas every year, causing moderate to heavy rainfall. Further, the regions in the Western Ghats are at a slightly higher elevation than those in the central and eastern parts of the country, thereby adding to the effect of colder mean NST in this season.

The spatial plot of standard deviation from Figure 8(d) and 8(e) shows that the GCM output NST has a similar spatial variation of standard deviation in most parts of the country as observed data in the monsoon season. However, in some parts of eastern, northern, and central India, the deviation from mean temperature is overestimated by the GCM compared to the observed standard deviation. The higher variation in the NST in central India during monsoon, as observed from Figure 8(e), is probably due to a warmer average temperature in this region and the occurrence of scattered rainfall on certain days, creating a negative anomaly from the mean temperature. The spatial plots of the mean (Figure 8(c)) and standard deviation (Figure 8(f)) of the bias-corrected NST data show that the model captures the spatial variation of the observed data remarkably and corrects the biases in the GCM data. The model performance can also be inferred from the regression plots in Figure 6(c) and 6(d), with the regression coefficients between the observed and predicted means and standard deviations being 0.99 and 0.98, respectively. However, the regression plot in Figure 5(d) also depicts less correlation between the higher values of standard deviation from the data points in the regions covering central and western India. The model lags slightly in capturing the spread of the data in these regions with high precision, likely due to a relatively insufficient number of grid points with this pattern required for training the model.

Figure 9(a), 9(c), 9(e), and 9(g) shows a similar pattern in the biases of the GCM output mean and standard deviation of NST as the historical GCM predicted data for the future period 2006–2100 for both RCP4.5 and RCP8.5 scenarios. The GCM-simulated future data, corresponding to the RCP8.5 scenario, project higher mean and standard deviation values of NST than the RCP4.5 scenario in the central, southeastern, and western parts of the country. The trained GRNN model corrects the biases in the mean of the GCM-projected temperature for both RCP4.5 and RCP8.5 scenarios with decent accuracy to match the spatial pattern of the observed data, as evident from Figure 9(b) and 9(d). The bias in the variance in the projected future data corresponding to both the scenarios is corrected by the GRNN model, as shown in Figure 9(f) and 9(h). However, there is slightly lower precision in the bias correction of the GCM output data in India's western and central regions, similar to the bias correction of historical data. The bias-corrected output shows that the deviation of the projected data from the temporal mean is comparable to that of historical data.

#### Analysis of the NST climatology in the post-monsoon season in the study region

The CNRM-CM5 model predicts the NST in the post-monsoon season with a cold bias in the mean for eastern and northeastern parts of the country but has a positive bias in the predicted standard deviation for the western and northern regions. Figure 10(a) and 10(b) shows that the mean NST is predicted well by the GCM with cold bias in the northeastern and eastern parts. Figure 10(b) also illustrates the effect of topography on the mean NST and the regions with higher elevation in Western Ghats, northern, and northeastern hilly regions experiencing bit colder temperatures. Figure 10(d) and 10(e) shows that the NST predicted by the GCM has a higher deviation from the mean for the western, northern, and northeastern parts of the country. Figure 10(c) and 10(f) depicts that the biases in the mean and standard deviation of the GCM-simulated data are corrected by the GRNN model with significant accuracy. The regression plot between the bias-corrected mean temperature and observed mean temperature in Figure 6(e), with a regression coefficient of 0.99, shows that the GRNN model corrects the biases in mean for the warm regions with reasonable accuracy but has relatively lower accuracy in correcting the biases in the cold regions due to relative sparsity of data. Furthermore, the regression plot in Figure 6(f) depicts that the GRNN model corrects the biases in standard deviation in the GCM output data with relatively lower accuracy for regions having higher variability in the data in the western and central parts. The range of variability of the NST is lower in the post-monsoon season than in the other seasons. This also accounts for better results from the model in bias correction of the GCM data, even though the temporal range of data is less than other seasons.

For the future GCM-simulated data, the GRNN model projects the spatial pattern of the bias-corrected mean NST data to match the pattern of observed historical data as shown in Figure 11(a)–11(d) for both RCP4.5 and RCP8.5 scenarios. However, the future projected mean and standard deviation of NST has higher values than observed historical data. From the spatial variation plot of standard deviation in Figure 11(e) and 11(g), it can be inferred that the standard deviation of future GCM data does not vary much from historical GCM data. The GRNN model corrects the standard deviation of the data to match the spatial pattern of observed data but with slightly lower precision in the central and western regions for both RCP4.5 and RCP8.5 scenarios, as observed from Figure 11(f) and 11(h). The deviation of values from the temporal mean of the bias-corrected data in the future period is comparable to that of the historical period, with slightly higher temporal mean values.

#### Analysis of NST climatology in the winter season in the study region

In the winter season, the southern part of the country experiences comparatively warmer mean NST than the northern parts of the country because of its proximity to the sea and the equator. Figure 12(b) shows the spatial pattern of the temporal mean of NST in this season, and the effects of topography on the NST can also be observed for this season. Figure 12(a) shows that the CNRM-CM5 model has a cold bias in the mean NST for India's northern, eastern, and northeastern parts. Figure 12(d) depicts that the standard deviation is also not well projected by the GCM in the same regions and has a high positive bias in the standard deviation of data. The GRNN model can shift the bias in the mean temperature of the GCM data, as shown in Figure 12(c), which is also verified by Figure 6(g), where the regression coefficient between the observed and predicted mean values is 0.99. The spatial variation of standard deviation is also well preserved by the GRNN model, as observed from Figure 12(e) and 12(f). However, the GRNN model has slightly lower precision in correcting the standard deviation, especially for regions with higher variance in data in the western and northwestern parts of the country, which can also be inferred from the regression plot in Figure 6(h). The higher variability of temperature in winter in western India is probably due to the effect of cold waves blowing from the Himalayas in these regions resulting in extremely cold temperatures on certain days in December. There is also a significant diurnal variation of temperature with warmer days and colder nights in this region, as most of its parts are covered by desert.

The same pattern of bias is observed in the GCM-simulated NST data for the future for both the RCP4.5 and RCP8.5 scenarios, as shown in Figure 13(a), 13(c), 13(e), and 13(g). The GRNN model corrects the biases in the mean and standard deviation to preserve the spatial pattern of the observed mean and standard deviation of NST as seen from the plots of bias-corrected data in Figure 13(b), 13(d), 13(f), and 13(h). The mean and standard deviation plots for future bias-corrected data show that the mean and the standard deviation of NST in winter do not vary much from the historical mean for both RCP4.5 and RCP5.5 scenarios. However, we cannot confidently say that the variance in data is bias-corrected with high accuracy. We observed that the model underperforms slightly in capturing the variability of historical data in the western parts of India.

## CONCLUSIONS

From the heat maps representing the spatial variation of test statistics, it can be inferred that the CNRM-CM5 model gives a biased output of NST in India with values depending on the season. Further, the GCM lags in simulating northeast India's temperature pattern, specifically giving biased output for all the seasons. The bias correction methodology adopted in this study using the GRNN is an efficient and effective alternative method for bias correction of GCM-simulated NST data in India. The model performs decently in capturing the spatial and temporal variability of the observed data with regression coefficient between the observed and bias-corrected data, ranging from 0.79 to 0.91 (based on the season). The regression coefficient values (>0.98) between bias-corrected and observed mean NST and between bias-corrected and observed standard deviation in temperature also depict the efficacy of the GRNN model. However, the model slightly lags in precision while capturing the temperature variation when the data are sparse or when the region is experiencing drastic temperature changes due to local factors. This bias correction method based on the GRNN model can also be readily applied for bias correction of other climate model outputs for historical as well as future climate scenarios, provided the data are quasi-stationary. The good performance of the model in the Indian mainland, comprising of varied topography and seasonal variations, shows that the model is also well capable of capturing the spatial diversity in data. Thus, the model can be checked for its viability in other parts of the world with more drastic changes in climatic pattern and topography. The method has the disadvantage that it gets a very high number of hidden neurons when data are large, impacting its efficiency. Also, it sometimes lags behind a bit in learning the pattern of data if the input data have a very high variance from the mean NST. Future studies can involve developing an ensemble model for bias correction of various GCM-simulated parameters simultaneously and try it to correct the bias in precipitation, which is generally quite sparse. Also, recently developed near-surface data assimilation techniques can be employed as a short-term strategy to obtain bias-corrected NST from the GCM in the near future and compare the results with the current method.

## ACKNOWLEDGEMENTS

I would like to acknowledge the Indian Meteorological Department for developing the quality-controlled high-resolution temperature data from station data. Furthermore, I would acknowledge the National Centre for Meteorological Research, France, for developing the CNRM-CM data series with decent spatial resolution and making it easily accessible to researchers. Furthermore, this work was done in the Indian Institute of Technology, Guwahati, and I would acknowledge the institute for providing the necessary resources for carrying out the research.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.