Abstract
A novel approach for qualitative seasonal forecast of precipitation at a basin scale is presented as significant enhancement in seasonal forecast at regional and country scales in India. The process utilizes empirical and typically lagged relationships between target variables of interest, namely precipitation at the basin level and various large-scale climate predictors (LSCPs). A total of 14 LSCPs have been considered for the seasonal forecast of precipitation with lead times of 1, 2, and 3 months in the Kosi Basin, India. Random split training and testing were conducted on seven machine-learning (ML) models using a potential predictor dataset for model selection. The Logistic Regression (LR) model was adopted since it had the highest mean accuracy score compared to the remaining six ML models. The LR model has been optimized by testing it on all possible combinations of potential predictors using Leave-One-Out Cross-Validation (CV) scheme. The resulting Seasonal Prediction Model (SPM) provides the probability of each tercile categorized as Above Normal (AN), Normal (N), and Below Normal (BN). The model has been evaluated using various metrics.
HIGHLIGHTS
A basin-scale approach is presented instead of a larger country scale.
Use of large number of large-scale climate predictors for the development of an ML -based categorical forecast model.
The methodology is generic in nature and can be applied to any other basins.
Directions for further research are suggested for the generation of weather ensembles, automatic climate predictors, and for model operationalization.
INTRODUCTION
The availability and demand of water vary spatially and temporally making water management a daunting task for the decision-makers. Reservoir operations, water allocation strategies, and drought and flood management require seasonal forecasting of precipitation at the basin scale. There is a wide range of methodologies that have been developed for seasonal forecasting that vary from pure dynamical modelling to pure statistical modelling or a combination of both. Statistical models exploit empirical relationships between a target variable of interest and one or more predictor variables. Such models are developed from historical data, and their performance depends upon the quality of the historical oceanic, atmospheric, and hydro-meteorological data (Anderson et al. 1999; Lavers 2011). Statistical models are less costly to develop and run compared with the dynamical models (Barnston et al. 1994; Bierkens & Van Beek 2009). In addition, statistical models are relatively flexible in model construction and have the potential to improve the forecast lead time and predictability depending on the accumulation of observations and related data, although their predictability is unstable compared to dynamic models.
In the present work, several Large-Scale Climate Predictors (LSCPs) have been investigated for seasonal forecasting of precipitation over the Kosi Basin, India. A number of studies in the past have emphasized the importance of understanding the influence of large-scale climatic patterns on precipitation for improving forecast accuracy (Feng et al. 2020), and therefore many studies have analyzed the relationship between precipitation and climatic patterns in India. This research has shown that the relevant patterns are the Indian Ocean Dipole (IOD) (Behera et al. 1999; Krishnan & Sugi 2003; Goswami et al. 2006), the El Niño–Southern Oscillation (ENSO) (Kumar et al. 2006; Mokhov et al. 2012; Feliks et al. 2013), the North Atlantic Oscillation (NAO) (Bharath & Srinivas 2015), the Pacific Decadal Oscillation (PDO) (Dong 2016), and the Atlantic Multidecadal Oscillation (AMO) (Krishnamurthy & Krishnamurthy 2016). These large-scale indices have been used to forecast seasonal precipitation and stream flows (Lavers 2011; Rasouli et al. 2012; Arnal et al. 2017; Apel et al. 2018).
Previous studies of Indian South West Monsoon Rainfall (ISMR) forecasting (Rajeevan et al. 2007; Kumar et al. 2012) have demonstrated the use of multiple LSCPs. Substantial variation across India and across timescales has been found (Kurths et al. 2019). In particular, ENSO and IOD influence precipitation in the southeast at inter annual and decadal scales, respectively. The NAO has a strong connection to precipitation, particularly in the northern regions. The effect of the PDO stretches across the whole country, whereas AMO influences precipitation, particularly in the central arid and semi-arid regions.
Over the years, linkages between climatic patterns and precipitation have been investigated by using a range of statistical methods, such as correlation (Abid et al. 2018), principal component analysis, and empirical orthogonal functions (Hannachi et al. 2007) among others. Linear regression, auto-regressive moving average (ARMA), auto-regressive integrated moving average (ARIMA), and multiple linear regression (MLR), for example, are the other most commonly implemented statistical techniques (Eldaw et al. 2003; Archer & Fowler 2008;,Barlow & Tippett 2008; Gámiz-Fortis et al. 2010; Kirono Dewi et al. 2010; Purdie & Bardsley 2010). Kashid & Maity (2012) applied genetic programming (GP) to forecast ISMR based on LSCPs. The GP approach was found to adequately capture the complex relationship between the monthly ISMR and LSCPs.
During the last decade, machine-learning (ML) algorithms have received wide attention in both classification and regression tasks (Colomo et al. 2019). ML algorithms are capable of investigating hierarchical and non-linear relationships between the response variable and predictor variables, based on ensemble learning approaches (Shalev-Shwartz & Ben-David 2014). An artificial neural network (ANN)-based model was used to forecast summer rainfall in the Yangtze River basin, using LSCPs including SOI and the Scandinavia Pattern. Vathsala & Koolagudi (2017) presented an algorithm by integrating data mining and statistical techniques. The proposed technique predicted the rainfall in five different categories such as flood, excess, normal, deficit, and drought. Mishra et al. (2018) presented an ANN model for the forecast of rainfall with lead times of 1 and 2 months. The efficiency of the model was demonstrated through application to a dataset from multiple stations in north India. The performance of the ANN model was evaluated by using regression analysis, mean square error, and magnitude of relative error.
A study by Praveen et al. (2020) applied Artificial Neural Network-Multilayer Perceptron (ANN-MLP) to analyze and forecast the long-term spatio-temporal changes in rainfall using the data from 1901 to 2015 across India at the meteorological divisional level. The results of the analysis showed that the rainfall for the next 15 years exhibited a significant decline. Saha et al. (2021) employed a feature reduction approach based on non-linear deep learning to identify effective predictors for monsoon rainfall using climatic variables from different regions worldwide. The study found that certain predictors, such as sea surface temperature (SST) and zonal wind (ZW), were capable of forecasting the Indian summer monsoon 1 month in advance, while sea level pressure (SLP) could predict the season 10 months in advance. Additionally, the authors demonstrated that combining multiple climatic variables to derive predictors resulted in superior performance compared to using predictors derived from individual variables. A Gaussian Process Regression (GPR) approach, one of the ML methods, was employed by Subrahmanyam et al. (2021) on long time-series rainfall data for the determination of heavy and light rainfall days. Sharma & Goyal (2015) reported the application of the Bayesian network model for the forecast of rainfall at 21 stations in Assam, India. The efficiency of the forecast was found to be above 85% for most of the cases.
Although a few studies have been carried out at the national level, no significant research to demonstrate seasonal forecast skills of precipitation at the basin scale in India has been reported in the literature. Currently, the forecast of ISMR is made available before every monsoon. However, such forecasts have limited use due to significant variations in monsoonal rainfall over the country. The objective of the present research is to develop and evaluate a seasonal forecast model for forecasting ISMR at a basin scale using a ML model. The present research is novel in many ways: (1) a basin-scale approach is presented as the focus is on water management at the basin scale, (2) large numbers of LSCPs are used and analyzed for developing robust ML-based forecast models, and (3) the results are indicative of teleconnections between various LSCPs and seasonal precipitation in the Kosi Basin. A distinct practical advantage of the methodology employed herein is that it is fairly generic in nature and can be applied to other basins with minimal changes to the model developed herein.
KOSI BASIN
The part in Nepal has a great elevation drop from 8,848 m (Mt Everest) above mean sea level to 60 m above mean sea level. Precipitation is unevenly distributed throughout the basin. The annual precipitation is about 300–400 mm in the northern Himalayan region, 1,000–1,500 mm in the subtropical and tropical region, and 1,500–2,500 mm in the temperate region. Owing to such unique physiographical and topographical distribution, the basin features a variety of climates that range from the tropical savannah over the southern plains to the polar frost in the northern mountains.
The Kosi River Basin also has a severe drought problem characterized by vast affected areas. The drought mainly affects the middle Kosi River Basin, the Tibetan Himalayas, and the downstream plain. The Kosi Basin has several other water management issues, which include flood control, irrigation, hydroelectric generation, embankment management, and rise in riverbed level due to heavy siltation and barrage operation. The water resources of the Kosi Basin are largely untapped. There are 11 proposed development projects for hydroelectric generation and water storage (Chinnasamy et al. 2015). Furthermore, Nepal receives 225 km3 of surface water annually from the basin. However, less than 7% of water is utilized. Water is in surplus during monsoon months in Nepal's Tarai region and India's Bihar region, whereas there is shortage of water in pre- and post-monsoon seasons (Bharati et al. 2014). Seasonal precipitation forecasting for water management in the Kosi Basin will, therefore, facilitate better water management in both India and Nepal.
DATA
In total, 16 LSCPs relevant to ISMR have been adopted for the seasonal forecast of precipitation for the Kosi Basin. These predictors along with their spatial locations are shown in Table 1.
S. No. . | Predictor/target variable . | Spatial domain . | Data source . |
---|---|---|---|
1 | North Atlantic Sea Surface Temperature (NASST) | 20N–30N, 100W–80W | NOAA Optimum Interpolation (OI) Sea Surface Temperature (SST)V2 https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.html |
2 | Equatorial Southeast Indian Ocean Sea Surface Temperature (ESEIOSST) | 20S–10S, 100E–120E | |
3 | Arabian Sea Surface Temperature (ASST) | 5° N–20° N, 50° E–80° E | |
4 | NINO3.4 Sea Surface Temperature (NINO3.4) | 5S–5N, 170W–120W | https://www.cpc.ncep.noaa.gov/data/indices/sstoi.indices |
5 | NINO3 Sea Surface Temperature (NINO3) | 5N–5S, 150W–90W | |
6 | NINO4 Sea Surface Temperature (NINO4) | 5N–5S, 160E–150W | |
7 | NINO1 + 2 Sea Surface Temperature (NINO1 + 2) | 0–10S, 90W–80W | |
8 | East Asia Sea Level Pressure (EASLP) | 35N–45N, 120E–130E | NCEP Reanalysis Derived data (https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.derived.surface.html |
9 | Northwest Europe Sea Level Pressure (NWESLP) | 65N–75N, 20E–40E | |
10 | North Atlantic Sea Level Pressure (NASLP) | 35N–45N, 30W–10W | |
11 | North Central Pacific Zonal Wind (NCPZW) | 5N–15N, 180E–150W | |
12 | Warm Water Volume (WWV) | 5S–5N, 120E–80W | https://www.pmel.noaa.gov/tao/wwv/data/wwv.dat |
13 | Precipitation including Accumulated Non-Monsoon precipitation (ANMP) | Monthly and Daily in Basin/Sub-Basin | https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGM_06/summary |
14 | South Indian Ocean SST Index (IOD) | https://psl.noaa.gov/gcos_wgsp/Timeseries/Data/dmi.had.long.data | |
15 | North Atlantic Oscillation (NAO) | https://psl.noaa.gov/data/correlation/nao.data | |
16 | Pacific Decadal Oscillation (PDO) | https://psl.noaa.gov/data/correlation/pdo.data | |
17 | Atlantic Multidecadal Oscillation (AMO) | https://psl.noaa.gov/data/correlation/amon.us.data |
S. No. . | Predictor/target variable . | Spatial domain . | Data source . |
---|---|---|---|
1 | North Atlantic Sea Surface Temperature (NASST) | 20N–30N, 100W–80W | NOAA Optimum Interpolation (OI) Sea Surface Temperature (SST)V2 https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.html |
2 | Equatorial Southeast Indian Ocean Sea Surface Temperature (ESEIOSST) | 20S–10S, 100E–120E | |
3 | Arabian Sea Surface Temperature (ASST) | 5° N–20° N, 50° E–80° E | |
4 | NINO3.4 Sea Surface Temperature (NINO3.4) | 5S–5N, 170W–120W | https://www.cpc.ncep.noaa.gov/data/indices/sstoi.indices |
5 | NINO3 Sea Surface Temperature (NINO3) | 5N–5S, 150W–90W | |
6 | NINO4 Sea Surface Temperature (NINO4) | 5N–5S, 160E–150W | |
7 | NINO1 + 2 Sea Surface Temperature (NINO1 + 2) | 0–10S, 90W–80W | |
8 | East Asia Sea Level Pressure (EASLP) | 35N–45N, 120E–130E | NCEP Reanalysis Derived data (https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.derived.surface.html |
9 | Northwest Europe Sea Level Pressure (NWESLP) | 65N–75N, 20E–40E | |
10 | North Atlantic Sea Level Pressure (NASLP) | 35N–45N, 30W–10W | |
11 | North Central Pacific Zonal Wind (NCPZW) | 5N–15N, 180E–150W | |
12 | Warm Water Volume (WWV) | 5S–5N, 120E–80W | https://www.pmel.noaa.gov/tao/wwv/data/wwv.dat |
13 | Precipitation including Accumulated Non-Monsoon precipitation (ANMP) | Monthly and Daily in Basin/Sub-Basin | https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGM_06/summary |
14 | South Indian Ocean SST Index (IOD) | https://psl.noaa.gov/gcos_wgsp/Timeseries/Data/dmi.had.long.data | |
15 | North Atlantic Oscillation (NAO) | https://psl.noaa.gov/data/correlation/nao.data | |
16 | Pacific Decadal Oscillation (PDO) | https://psl.noaa.gov/data/correlation/pdo.data | |
17 | Atlantic Multidecadal Oscillation (AMO) | https://psl.noaa.gov/data/correlation/amon.us.data |
The LSCPs described in Table 1 have been adopted by the National Oceanic and Atmospheric Administration (NOAA) for the period from 1991 to 2020. In addition, accumulated non-monsoon precipitation (ANMP) from October to May at the basin level has also been included as a predictor to represent the local conditions before the onset of the monsoon. Due to the non-availability of actual precipitation observations in more than two-third of the basin, which lies in Nepal and China, satellite-based precipitation estimates from global precipitation measurement (GPM), available from 2000 onwards have been used. Two temporal products of GPM were used on a monthly and on a 30-min scale. Both products were obtained from the Google Earth Engine repository. The 30-min GPM data were resampled to daily and monthly scales. The monthly GPM product was compared month-wise with monthly resampled GPM data to determine the correction factors for every month. The factor for each month was then uniformly applied to all the daily resampled GPM values in that month.
METHODS
The analysis of precipitation data over the basin has been carried out to develop a statistical model for the seasonal forecast of precipitation. Monthly anomalies of LSCPs, which are perceived to impact the region of interest, have been derived. Subsequently, the linear relationship of monthly anomalies of predictors with the aggregated precipitation over the monsoon season for which the forecast is to be made has been ascertained using the Pearson correlation coefficient. The significant predictors have been adopted in lagging months in the non-monsoon season (October to May) for the forecast of monsoonal (June–September) rainfall with a lead time of 1 month (forecast by the end of May), 2 months (forecast by end April), and 3 months (forecast by end March).
Precipitation data
The precipitation data over the basin were analyzed for the monsoon season (June–September) and the non-monsoon season (October–May). GPM rainfall data were processed for determining monsoon and pre-monsoon rainfall at the basin level and for the six sub-basins for the period 2000–2020. In general, the quantity of precipitation in the monsoon period is four times and the variability is three times that during the non-monsoon season. The length of the non-monsoon period is twice that of the monsoon period, but the precipitation in the non-monsoon season is around 22% of the annual precipitation. This indicates that seasonal forecast of monsoon precipitation is critical for long-term water management in the basin.
Monthly LSCP anomalies
Correlation analysis
The prevailing conditions of climate predictors in the non-monsoon period (October–May) are considered as the key indicators for early forecast of seasonal precipitation over the basin during the monsoon period (June–September). To assess the strength of the relationship between each predictor belonging to non-monsoon months with the aggregated seasonal precipitation of the monsoon season (June–September), the Pearson correlation coefficient for all predictors comprising monthly anomalies, means of monthly anomalies over 2 consecutive months and means of monthly anomalies over 3 consecutive months were calculated using the respective time series from 2000 to 2020.
The associated significance level (p-value) between the variables used in the correlation analysis was assessed using scipy.stats.pearsonr function in python. A threshold significance level of 0.15 has been adopted in view of the conditional relationship between predictors, limited availability of data, and relatively small size of the basin. The predictors having p-values lower than 0.15 has been identified as statistically significant LSCPs.
Classification-based ML modelling
The precipitation totals in the monsoon season for the period from 2000 to 2020 were categorized into three equal groups as AN, N, and BN based on quantile analysis. Seven ML-based classification models that are logistic regression (LR), naïve Bayes (NB), KNN, decision tree (DT), random forest (RF), gradient boosting (GB), and support vector machine (SVM) were selected initially based on their ability to classify either using linear or non-linear techniques including ensemble approach. The scikit-learn package in python has been used to carry out the analysis for the above seven ML classification algorithms (Pedregosa et al. 2011).
The dataset containing the potential predictors for a lead time of 1 month was split randomly in a ratio of 80:20 to obtain training and test datasets. The training set was fitted to each ML model and the test dataset was used to evaluate each model's performance based on the accuracy score as defined in Equation (2). This process was repeated 2,000 times, which is about 10% of the total split sets possible. The model with the highest mean test score based on accuracy was selected as the most optimal for further analysis. The same ML model was adopted for lead time of 2 and 3 months as many predictors for lead time of 1 month are common with those for lead time of 2 and 3 months. The ML model with the highest accuracy score for 1 month lead time was then run on all possible combinations of potential predictors for lead time of 1, 2 and 3 months using a Leave-One-Out CV scheme on the complete dataset of 21 years. The number of combinations run for lead time of 1 month was 32,767 and for lead time of 2 and 3 months, it was 1,023. The combination for which the CV score based on mean accuracy was highest was adopted as the most optimal combination of predictors for each lead time.
The choice of hyperparameters for the ML model greatly affects the separability of the classes and performance of the algorithm. The hyperparameters of the ML model with highest accuracy score were fine tuned using the random search optimization technique (Pedregosa et al.) The mean accuracy score using a Leave-One-Out CV scheme was evaluated over 2,000 iterations using different sets of hyperparameters. The hyperparameter set which provided the highest mean accuracy score was adopted for each lead time.
Model skill assessment
The MBS is a metric that measures the https://www.statisticshowto.com/accuracy-and-precision/ accuracy of a probabilistic forecast. The best possible MBS is 0 for total accuracy and the lowest possible score is 1, which means the forecast was completely inaccurate. Smaller scores that are closer to zero indicate better forecasts. Scores in the middle can be hard to interpret as ‘good’ or ‘bad’, so these are sometimes converted to BSS. MBS can indicate how accurate a forecast was, but it does not indicate how accurate it is compared to other forecasts. BSS relates MBS to a benchmark forecast based on climatology. It measures the relative skill of a probabilistic forecast over that of climatology.
RESULTS AND DISCUSSION
Out of 15 adopted significant predictors, 11 predictors showed a negative correlation ranging from −0.35 to −0.6, whereas four predictors showed a positive correlation ranging from 0.35 to 0.4. Based on the correlation between the anomalies of predictors and aggregated precipitation during monsoon season14 potential predictors out of 16 affecting the ISMR were found to be statistically significant. The P-values for nine potential predictors were found to be less than 0.05. Five potential predictors had a p-value ‘between’ 0.05 to 0.15. As further segregation based on lead time, the number of potential predictors is 15 for lead time of 1 month and 10 each for lead time of 2 and 3 months. Here, eight potential predictors are common to lead time of 1, 2 and 3 months. This includes one local predictor that is ANMP. The segregated potential predictors with a lead time of 1, 2 and 3 months for the seasonal forecast are shown in Table 2.
Lead time . | ||
---|---|---|
1-month (May) . | 2-month (April) . | 3-month (March) . |
NINO3.4: Monthly Anomaly in May | NCPZW: Mean of monthly anomaly over three consecutive months from Feb to April | NCPZW: Mean of monthly anomaly over two consecutive months from Feb to March |
NCPZW: Mean of monthly anomaly over three consecutive months from April to May | WWV-Monthly Anomaly in April | WWV-Monthly Anomaly in March |
WWV-Monthly Anomaly in May | ||
EASLP – Monthly Anomaly in May | ||
NINO1 + 2: Monthly Anomaly in May | ||
NINO3: Monthly Anomaly in May | ||
NINO4: Monthly Anomaly in May | ||
NWESLP : Mean of monthly anomaly over 3 consecutive months from Jan to March | ||
NASST-Monthly Anomaly in Feb | ||
NASLP:Monthly anomaly in Dec (Previous year) | ||
NAO-Monthly Anomaly in Jan | ||
AMO- Mean of monthly anomaly over 2 consecutive months from Feb to March | ||
ASST- Mean of monthly anomaly over 3 consecutive months from Nov (Previous year) to Jan | ||
IOD- Monthly Anomaly in March | ||
ANMP from Oct (Previous year) to Feb |
Lead time . | ||
---|---|---|
1-month (May) . | 2-month (April) . | 3-month (March) . |
NINO3.4: Monthly Anomaly in May | NCPZW: Mean of monthly anomaly over three consecutive months from Feb to April | NCPZW: Mean of monthly anomaly over two consecutive months from Feb to March |
NCPZW: Mean of monthly anomaly over three consecutive months from April to May | WWV-Monthly Anomaly in April | WWV-Monthly Anomaly in March |
WWV-Monthly Anomaly in May | ||
EASLP – Monthly Anomaly in May | ||
NINO1 + 2: Monthly Anomaly in May | ||
NINO3: Monthly Anomaly in May | ||
NINO4: Monthly Anomaly in May | ||
NWESLP : Mean of monthly anomaly over 3 consecutive months from Jan to March | ||
NASST-Monthly Anomaly in Feb | ||
NASLP:Monthly anomaly in Dec (Previous year) | ||
NAO-Monthly Anomaly in Jan | ||
AMO- Mean of monthly anomaly over 2 consecutive months from Feb to March | ||
ASST- Mean of monthly anomaly over 3 consecutive months from Nov (Previous year) to Jan | ||
IOD- Monthly Anomaly in March | ||
ANMP from Oct (Previous year) to Feb |
For lead time of 2 and 3 months, the mean accuracy score further improved from 0.57 to 0.62 through optimization of hyperparameters of LR- ML model using a random search optimization technique. However, no improvement could be achieved in the mean CV score of 0.714 for lead time of 1-month.
The seasonal forecasts of precipitation in qualitative terms (AN, N, and BN) for lead times of 1, 2, and 3 months for the period 2000–2020 have been shown in Table 3. The actual category of observed precipitation has also been shown in Table 3. A comparison of the seasonal forecasts with the observed precipitation indicates a good agreement between the two. The overall variation in Precision, Recall and F1 Score is more (0.57–0.86) in 2- and 3-month lead time as compared to (0.71–0.86) for 1 month lead time. The Accuracy is more (0.81) for lead time of 1 month as compared to (0.71) for lead time of 2 and 3 months. For 2- and 3-month lead times, AN category has greater (0.86) precision, recall and F1 score as compared to 0.71 for BN category, while N category shows low value of 0.57. The result for lead time of 2 and 3 months indicates SPM model performance for the AN category is better replicated than the other two categories. For lead time of 1 month, performance of SPM is better for all three categories. The results are shown in Table 4.
Category . | Precision . | Recall . | F1 score . | Overall acuracy . |
---|---|---|---|---|
For a lead time of 1 month | ||||
BN | 0.86 | 0.86 | 0.86 | 0.81 |
N | 0.75 | 0.86 | 0.80 | |
AN | 0.83 | 0.71 | 0.77 | |
For a lead time of 2 and 3 months | ||||
BN | 0.71 | 0.71 | 0.71 | 0.71 |
N | 0.57 | 0.57 | 0.57 | |
AN | 0.86 | 0.86 | 0.86 |
Category . | Precision . | Recall . | F1 score . | Overall acuracy . |
---|---|---|---|---|
For a lead time of 1 month | ||||
BN | 0.86 | 0.86 | 0.86 | 0.81 |
N | 0.75 | 0.86 | 0.80 | |
AN | 0.83 | 0.71 | 0.77 | |
For a lead time of 2 and 3 months | ||||
BN | 0.71 | 0.71 | 0.71 | 0.71 |
N | 0.57 | 0.57 | 0.57 | |
AN | 0.86 | 0.86 | 0.86 |
CONCLUSIONS
Probabilistic forecast of seasonal precipitation for a basin using the information of large-scale circulation is an important issue that has been adequately addressed in the present work. In the context of lead time of 1 month, five climate predictors, namely NINO3, NINO1 + 2, SST over the Pacific, NCPZW, ASST and EASSP were found to impact precipitation over the Kosi Basin. In particular, the ENSO conditions in the month of May captured using SST in the region of NINO3 and NINO1 + 2 along with the intensity of the trade winds over the region indicated by NCPZW appeared to play a crucial role in enhancing or attenuation of precipitation over the Kosi Basin during the monsoon season. The relationship between the SST predictors and the seasonal precipitation over the basin was found to be inversely proportional. Additionally, it could also be inferred that the conditions in the Arabian Sea and East Asia Sea act as an important catalyst in concentrating the monsoon winds over the basin. The results of the research carried out in this work would prove to be useful in addressing water management issues in the Kosi Basin, especially during the monsoon season.
There are several potential advantages of the model developed in the present research. Based on seasonal forecasts, it would be possible to undertake crop planning and estimate irrigation demands. In case the forecast indicates AN rainfall, farmers can plant crops that require more water without worrying about water shortages. Water resource managers can use seasonal precipitation forecasts to develop strategies for meeting water demands in case of BN precipitation forecasts. Seasonal precipitation forecasting can aid communities and emergency services to prepare for potential disasters such as floods and droughts. With the model proposed herein, it would be possible for the businesses to plan for seasonal fluctuations in demand for their products and services that are related to precipitation. An important potential benefit of the research presented herein is that it could assist water resource managers in developing climate change mitigation and adaptation strategies based on seasonal precipitation forecasts. LSCPs can provide prognostic information for seasonal forecasting. Each predictor is likely to have its own affected zones and seasons. The significance of the creation of basin and sub-basin climate indices based on SST, SSP has been shown by Lamb (2010) on the Colorado River basin, where a new SST region is identified as a Hondo region and compared to other established LSCPs like SOI, PDO, NAO and AMO. The test result demonstrated that Hondo performs better at longer lead times than the existing LSCPs. For future research, a statistically-based seasonal precipitation forecast model that automatically identifies suitable predictors from globally gridded SST and climate variables could be developed. A statistical modelling approach could that utilizes ML techniques for categorical forecast of precipitation in terciles (BN, N, and AN) in combination with a conditional stochastic weather generator (CSWG) could also be developed.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories listed in Table 1.
CONFLICT OF INTEREST
The authors declare there is no conflict.