## Abstract

Reliable drought prediction plays a significant role in drought management. Applying machine learning models in drought prediction is getting popular in recent years, but applying the stand-alone models to capture the feature information is not sufficient enough, even though the general performance is acceptable. Therefore, the scholars tried the signal decomposition algorithm as a data pre-processing tool, and coupled it with the stand-alone model to build ‘decomposition-prediction’ model to improve the performance. Considering the limitations of using the single decomposition algorithm, an ‘integration-prediction’ model construction method is proposed in this study, which deeply combines the results of multiple decomposition algorithms. The model tested three meteorological stations in Guanzhong, Shaanxi Province, China, where the short-term meteorological drought is predicted from 1960 to 2019. The meteorological drought index selects the Standardized Precipitation Index on a 12-month time scale (SPI-12). Compared with stand-alone models and ‘decomposition-prediction’ models, the ‘integration-prediction’ models present higher prediction accuracy, smaller prediction error and better stability in the results. This new ‘integration-prediction’ model provides attractive value for drought risk management in arid regions.

## HIGHLIGHTS

Machine learning model has great value in short-term meteorological drought prediction.

Signal decomposition algorithm as a data pre-processing tool can significantly improve the prediction performance of machine learning model.

Deeply combining the results of multiple decomposition algorithms could achieve higher prediction accuracy.

The ‘integration-prediction’ model provides a new way for drought prediction in arid regions.

## INTRODUCTION

Climate change leads to a high incidence of natural disasters, especially in 2022, the world has experienced extremely high temperatures that are rare for decades. Drought disasters negatively affected many countries and regions of varying degrees, including China, Europe and North America. Studies indicate that the severity and duration of drought are expected to show an upward trend in the future as human activities and climate change intensify (Li *et al.* 2021). The essence of drought disaster is that reduction of precipitation will lead to a decrease in crop yield and river level, ultimately causing the loss of human society and economy. After years of exploration, scholars classified drought according to several drought-affected objects, and four types recognized by the academic circle are meteorological drought, agricultural drought, hydrological drought and socio-economic drought (Mishra & Singh 2010; Zhang & Jia 2013). Among them, meteorological drought is always the first to occur, so strengthening meteorological drought prediction is not only the best way to eliminate or reduce the negative effects of drought in advance, but also the core link in constructing efficient water safety management.

Drought is a long lasting process, although it is difficult to accurately define the specific time of its beginning and end, researchers still evaluate drought through meteorological and hydrological parameters, such as precipitation, soil moisture content, temperature and runoff, then develop many drought indices for quantitative calculation of drought degree based on parameter evaluation (Dai *et al.* 2020; Song *et al.* 2020). Applying drought indices can analyze the temporal and spatial changes of historical drought characteristics. In addition, the main content of meteorological drought prediction is to use mathematical models to predict time series of drought indices over some future period, thus providing an important basis for decision makers to judge the trend of drought (Wang *et al.* 2020). There are many widely used meteorological drought indices, for instance, ‘Rainfall Anomaly Index’ (RAI) (Van-Rooy 1965), ‘Standardized Precipitation Index’ (SPI) (McKee *et al.* 1993), ‘Comprehensive Index of Meteorological Drought’ (CI) (He *et al.* 2014), ‘Palmer Drought Severity Index’ (PDSI) (Palmer 1965) and ‘Standardized Precipitation Evapotranspiration Index’ (SPEI) (Vicente-Serrano *et al.* 2010). Yu *et al.* (2013) collected meteorological data from 16 meteorological stations in Yunnan Province, China from 1956 to 2010, and adopted CI to analyze the frequency, scope and severity of meteorological drought in the province. Fang *et al.* (2018) studied the occurrence and evolution of meteorological drought in Ningxia, China during 1960–2016, and analyzed the interannual characteristics of drought by utilizing SPEI at different time scales. Mehta & Yadav (2021) calculated the RAI and SPI based on precipitation data from 1901 to 2002 in the Barmer District of Rajasthan State to assess the local meteorological drought characteristics. The evolution of drought characteristics in a certain region in recent decades can be evaluated by employing historical time series of meteorological drought indices. However, this is not enough to deal with the unknown challenges brought by strong climate change to formulation of relevant drought relief policies. In this sense, the importance of meteorological drought prediction is growing.

With the rapid development of artificial intelligence technology, drought prediction models have gradually changed from physical models to data-driven machine learning (ML) models. Compared with physical models, ML models have faster computation speed, lower resource consumption and reliable accuracy (Abbot & Marohasy 2014). In the existing literature, a large number of meteorological drought prediction models based on ML technology have been developed and achieved good results. These models can be roughly divided into two categories: stand-alone models and hybrid models. Stand-alone models refer to bringing drought index time series directly into an ML model for training to complete model development, such as Deo *et al.* (2018) proposed a Support Vector Regression (SVR) model to predict SPEI at nine stations in Australia and demonstrated that the SVR is very effective in predicting drought characteristics. Achour *et al.* (2020) explored the potential of artificial neural network (ANN) model and SPI to predict drought in the plains of northwestern Algeria, with satisfactory results. Almikaeel *et al.* (2022) used stand-alone models ANN and SVR to predict the Gidra River hydrological drought index and achieved excellent results beyond expectations. These examples indicate that stand-alone models are feasible and have great potential in the field of drought prediction. Scholars then proposed various improvement methods on the basis of stand-alone models, thus evolving another type of hybrid model with better performance.

There are many ways to compose the hybrid models, but they can be mainly divided into two parts according to the improvement ideas. One part starts from the improvement of the model itself, combining the optimization algorithm with a stand-alone model or directly combining multiple models. For example, Malik *et al.* (2021) proved SVR model based on Particle Swarm Optimization (PSO) and Harris Hawkes Optimization (HHO) algorithms outperformed the SVR model in predicting Effective Drought Index (EDI) in Uttarakhand, India. Danandeh Mehr *et al.* (2022) combined the Long Short-Term Memory (LSTM) network with Convolutional Neural Network (CNN) to form a hybrid model, predicted SPEI-3 and SPEI-6 in Ankara province, Turkey, experimental results show that the performance of CNN-LSTM model is more successful than single benchmark model. This kind of hybrid model successfully improves the prediction accuracy, but the improvement range is sometimes not ideal. Therefore, considering that meteorological drought index time series has characteristics of nonlinear, unstable and multi-scale. When stand-alone models are making autoregressive predictions, the feature information that can be captured from input time series is less and not prominent, which reduces the performance of models to a certain extent (Djerbouai & Souag-Gamane 2016). To solve this problem, some scholars introduce the signal decomposition algorithms into stand-alone models to build another part ‘decomposition-prediction’ type hybrid models (Belayneh *et al.* 2014). Common signal decomposition algorithms include Wavelet Decomposition (WD), Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD) and Variational Mode Decomposition (VMD) (Zuo *et al.* 2020). During the modeling process, decomposition of the input time series can extract a more predictable part of the sequence and bring it into the model, reducing the adverse impact of noise (Adarsh & Janga Reddy 2019). For instance, Khan *et al.* (2020) developed a hybrid model combining WD, Autoregressive Integrated Moving Average (ARIMA) and ANN to predict future SPI in Malaysia's Langat River Basin, and the results show that the performance of hybrid prediction model (WD-ARIMA-ANN) is better than stand-alone prediction model. Roushangar *et al.* (2021) used WD and EEMD to decompose the SPI time series of the input ML model, which improved the model performance by 40% when predicting drought in northwest Iran. Citakoglu & Coşkun (2022) tried to introduce VMD, WD and EMD as data pre-processing tools to build a hybrid ML model, and predicted the short-term meteorological drought in northwest Turkey very accurately. In general, the prediction performance of hybrid models is obviously better than stand-alone models, among which the hybrid models using data pre-processing tools are almost the best performing and very popular category at present. However, different signal decomposition algorithms have their own advantages and disadvantages when extracting time series features. How to integrate the advantages of different algorithms to reduce the limitations of using a single algorithm is a scientific problem worth studying.

Therefore, considering the diversity of signal decomposition algorithms and their importance to short-term meteorological drought prediction models, this study aims to propose a new ‘integration-prediction’ model construction method that deeply combines the results of multiple decomposition algorithms. In the subsequent study, based on the Guanzhong region of Shaanxi Province, China, we adopt Gate Recurrent Unit (GRU) and Light Gradient Boosting Machine (LGBM) as stand-alone prediction models, and combine the stand-alone models with different decomposition algorithms (EMD, EEMD and VMD) to construct the ‘decomposition-prediction’ models commonly described in literature and the ‘integration-prediction’ models proposed in this paper. All models are carried out with extensive comparative discussion on the results by using some evaluation indicators. This study supplements the current short-term meteorological drought prediction to a certain extent, it explores for the first time whether the ‘integration-prediction’ model can reduce the limitations brought by single decomposition in drought prediction and further improve the model performance.

## STUDY AREA AND DATA

^{2}. Terrain is high in the southwest, low in the northeast, and the central part is a flat and wide plain. The annual average temperature is 12.9 °C, and annual average precipitation is 580 mm. Figure 1 shows the general view of the study area. Guanzhong region is an important agricultural production area in China, but this region often faces the problem of drought, it is very essential to conduct drought investigation and research. Through relevant research, we can understand the climate and water resources in the Guanzhong region, enhance drought event management and provide scientific basis for the formulation of drought measures.

From Figures 1 and 2, three meteorological stations are evenly distributed in the Guanzhong region. Although they all belong to the warm temperate continental monsoon climate, there are obvious differences in annual average precipitation among the stations, Pucheng station is 511.3 mm, Fengxiang station is 636.8 mm and Wugong station is 590.9 mm. In addition, the annual average temperature at the three stations is slightly different, Pucheng station is 13.8 °C, Fengxiang station is 11.8 °C and Wugong station is 13.3 °C.

## METHODOLOGY

### Standardized Precipitation Index (SPI)

*x*with respect to gamma function is shown in Equation (1):where and represent shape and scale parameters of gamma function, and

*x*represents the accumulated precipitation. SPI calculation formula after standardized normal processing is shown in Equation (2):where

*k*is the positive and negative coefficient of probability density, when , , ; , . In addition, , , , , and . Detailed SPI calculation principles and drought classification tables can be made available in the literature of Kisi (Kisi

*et al.*2019).

### Signal decomposition algorithms

#### Empirical Mode Decomposition (EMD)

*et al.*(1998) in 1998, which can decompose the original time series into several near-periodic Intrinsic Mode Functions (IMF) and trend items. IMF is a random oscillating function with different amplitudes and frequencies, it must meet the following two conditions: (1) The number of IMF extreme points must be equal to the number of zero crossings, or at most differ by 1; (2) The upper and lower envelope composed of IMF local maxima and minima, whose mean value at any time is 0. EMD is especially suitable for nonlinear and non-stationary signal process processing with noise. Through EMD decomposition, the original time series can be expressed as the sum of several IMF components and one residual component, shown in Equation (3):where the residual is a constant or a monotone function.

#### Ensemble Empirical Mode Decomposition (EEMD)

With the deepening of research, scholars found that the EMD algorithm is prone to mode mixing phenomenon, which will lead to multiple IMF components containing duplicate information. To address this shortcoming, Wu & Huang (2009) proposed the EEMD algorithm in 2009, which improved the EMD algorithm by adding auxiliary white noise to the original sequence many times. The specific steps are:

- (1)
A new time series is generated by adding white noise with an amplitude of to the original time series .

- (2)
The EMD method is used to decompose , it can get a new set of components and residual .

- (3)
The above two steps are repeated

*M*times to get*M*group components and residuals . The EEMD decomposition results can be got by calculating the arithmetic average of*M*group components and residuals respectively, as shown in Equation (4):

EEMD is a recursive algorithm, and it needs to determine two parameters, white noise amplitude and total times of adding white noise *M*. The should not be too small, otherwise, it may not produce the extreme value change required by EMD. Additionally, increasing *M* can reduce the effect of white noise to a negligible level. This paper sets *M* and to 100 and 0.2, respectively.

#### Variational Mode Decomposition (VMD)

- (1)A variational problem is constructed. The constraint is that the sum of components obtained by decomposition is the original signal, constraint expression is shown in Equation (5):where
*f*is the original signal,*K*is the number of decomposition modes determined in advance, and is the*K*th mode component after decomposition and corresponding central frequency,*t*is the time variable,*j*is the imaginary number symbol, * is the convolution operation, is the Dirac function, is the exponential signal of the original signal and is the partial derivative operation.

- (2)
To solve the variational problem. Lagrange multiplication operator and quadratic penalty factor are introduced, and the augmented Lagrangian expression obtained is shown in Equation (6):

*L*2 norm.

- (3)
Find each mode component and center frequency. The saddle points of augmented Lagrangian function are searched, and the optimal results of , and

*λ*are found alternately after iteration.

Compared with EMD and EEMD, VMD has stronger mathematical theory support, and can decompose several more stable subsequences from complex non-stationary time series under the condition of reducing mode mixing.

### Stand-alone machine learning model

#### Gated Recurrent Unit (GRU)

*et al.*2022). Similar to LSTM, it is proposed to solve problems, such as the failure of long-term memory in RNN and the gradient in backpropagation. The internal structure of the LSTM model is complex and training time-consuming. As a simplified variant of LSTM, GRU improves the training efficiency of the model by reducing structural parameters of the gated neural network. The structure of GRU neurons is shown in Figure 3. There are only two gates, the reset gate and the update gate , which can still ensure high accuracy. The mathematical operation of GRU is shown in Equation (7):where is the input variable at time

*t*, is the output result of the previous hidden layer, is the output result of the hidden layer of this unit, is the candidate state of the hidden layer, is the connection between vectors, , and is the trainable parameter matrix,

*l*is the identity matrix, is Hadamard product, is the gated activation function sigmoid and is the activation function when candidate memories are generated.

#### Light Gradient Boosting Machine (LGBM)

*et al.*2017). It can solve the shortcomings of the GBDT algorithm, such as long training time and large memory consumption, while ensuring high accuracy. The main improvement measures of LGBM are reflected in four aspects: (1) Gradient-based One-Side Sampling (GOSS). It can guarantee the distribution of sample data does not change, and give more attention to data with a large gradient when tree nodes are split; (2) Exclusive Feature Bundling (EFB). EFB bundles mutually exclusive features together to reduce the number of features and improve the training speed of the model; (3) A decision tree algorithm based on histogram. LGBM speeds up tree growth through the histogram algorithm during iteration without the need to traverse all data multiple times, thus saving time and space overhead; (4) Leaf-wise growth strategy with depth limit, as shown in Figure 4. Leaf-wise growth strategy avoids the over-fitting caused by too deep growth of the decision tree and achieves higher accuracy.

### Model Evaluation Index

*N*is the total number of samples observed, and are the predicted and observed value, is the mean of the observed value. NSE can quantify the prediction accuracy of the simulation model, its value range is [−∞,1], and the closer value is to 1, the better the prediction performance of the model is. MSE and RMSE can evaluate the average error of the model predicted value, its value range is [0, + ∞], and the closer value is to 0, the better prediction performance of the model is.

## MODEL DEVELOPMENT

### Prediction method framework

*et al.*2022). This is because SPI values over long time scales are smoother and feature information is richer. Therefore, to ensure the high accuracy of prediction models, long time scale SPI-12 is decided as short-term meteorological drought prediction index in this study, and the framework of prediction method is shown in Figure 5.

A total of two stand-alone models and eight hybrid models are developed in this research, Table 1 for details. Sample sets of all models are divided into the training set, validation set and testing set with a ratio of about 60%:20%:20%. The hybrid model's prediction strategy is to predict subsequences separately and sum them to get the final prediction result.

Experiment setup . | Model type . | Model . | Quantity . |
---|---|---|---|

1 | Stand-alone | GRU | 2 |

Stand-alone | LGBM | ||

2 | Hybrid (decomposition) | EMD-GRU | 6 |

Hybrid (decomposition) | EEMD-GRU | ||

Hybrid (decomposition) | VMD-GRU | ||

Hybrid (decomposition) | EMD-LGBM | ||

Hybrid (decomposition) | EEMD-LGBM | ||

Hybrid (decomposition) | VMD-LGBM | ||

3 | Hybrid (integration) | INT-GRU | 2 |

Hybrid (integration) | INT-LGBM |

Experiment setup . | Model type . | Model . | Quantity . |
---|---|---|---|

1 | Stand-alone | GRU | 2 |

Stand-alone | LGBM | ||

2 | Hybrid (decomposition) | EMD-GRU | 6 |

Hybrid (decomposition) | EEMD-GRU | ||

Hybrid (decomposition) | VMD-GRU | ||

Hybrid (decomposition) | EMD-LGBM | ||

Hybrid (decomposition) | EEMD-LGBM | ||

Hybrid (decomposition) | VMD-LGBM | ||

3 | Hybrid (integration) | INT-GRU | 2 |

Hybrid (integration) | INT-LGBM |

### Model hyper-parameters optimization

In the process of ML, the internal configuration variables summarized by the model through training data are called model parameters. Beyond that, external variables of the model that are artificially set before ML begins are called model hyper-parameters. The model hyper-parameters are not obtained by system learning. It is necessary to specify a reasonable value range for it according to existing experience, and iteratively search for optimal value to achieve the best performance of the model. This process is called hyper-parameters optimization (Tran *et al.* 2020).

Based on the Bayesian Optimization (BO) method under the Optuna framework in this study (Akiba *et al.* 2019), the hyper-parameters of GRU and LGBM models are optimized. BO is a serialization model optimization method, which is usually applied to black box objective function optimization with unknown real distribution or very difficult solutions. Both the objective function and loss function are MSE, and the total number of optimization iterations is 50. Hyper-parameters required to be optimized for GRU and LGBM models and their search range settings are shown in Table 2.

Model . | Hyper-parameter . | Value range . |
---|---|---|

GRU | number of hidden layer nodes | [16: 32: 256]^{a} |

number of hidden layers | [1: 1: 2] | |

dropout | [0: 0.02: 0.2] | |

LGBM | max_depth | [1: 1: 9] |

gamma | [1×10^{−8}, 1]^{b} | |

learning_rate | [1×10^{−8}, 1] | |

alpha | [1×10^{−9}, 1×10^{−3}] | |

lambda | [1×10^{−5}, 1] | |

booster | [‘gbtree’, ‘gblinear’, ‘dart’] |

Model . | Hyper-parameter . | Value range . |
---|---|---|

GRU | number of hidden layer nodes | [16: 32: 256]^{a} |

number of hidden layers | [1: 1: 2] | |

dropout | [0: 0.02: 0.2] | |

LGBM | max_depth | [1: 1: 9] |

gamma | [1×10^{−8}, 1]^{b} | |

learning_rate | [1×10^{−8}, 1] | |

alpha | [1×10^{−9}, 1×10^{−3}] | |

lambda | [1×10^{−5}, 1] | |

booster | [‘gbtree’, ‘gblinear’, ‘dart’] |

^{a}The value range of [16: 32: 256] means that the lower limit of parameter is 16, the search step is 32 and the upper limit of parameter is 256, which is the same for the first three lines.

^{b}The value range of [1×10^{−8}, 1] shows the lower and the upper limit of parameters, since the sampling method is logarithmic sampling.

### Model input optimization

*t, t*− 1,

*t*− 2,

*t*− 3…) of SPI-12, and the output variable is set to 1-month lead time (

*t*

*+*

*1*) according to research purpose. The model's input variables are also one of the important parameters for sample generation before prediction, and partial autocorrelation function (PACF) is usually used to determine the specific lag time length (Poornima & Pushpalatha 2019). However, PACF is affected by subjective sift, and it is difficult to achieve optimal performance of the models. In this paper, the lag time length is also brought into the Optuna framework as one of the model hyper-parameters for iterative optimization, the search range is set as minimum length 4, maximum length 64, and step length 4. The prediction accuracy of GRU and LGBM under different lag time lengths is shown in Figure 7.

From Figure 7, there are obvious differences in the optimal length of lag time between the two models. The optimal lag time length of GRU model is mainly concentrated around 50, and the longer input lag time length, the more information GRU model can capture for prediction, so the higher accuracy achieve. When lag time length exceeds 50, model prediction accuracy will gradually decrease because of the redundancy of information. The optimal lag time length of the LGBM model is mainly concentrated around 25. As lag time length increases, the effective prediction information learned by the model decreases instead, which leads to the decline in prediction accuracy. Thus, it can be seen that the response of GRU and LGBM to optimal lag time length of the input is inconsistent. Only using PACF to determine model input does not guarantee that it applies to all models, iterative optimization could solve this problem.

### Development environment

Python 3.7 is used for programming in this study. The construction of ML models are completed based on sklearn and XGBoost library. Three signal decomposition algorithms adopt the third-party libraries PyEMD, PyEEMD and PyVMD, respectively. Furthermore, when determining the decomposition mode number *K* of VMD, enumeration method applies to find that decomposition effect is the best and there is no mode mixing phenomenon when *K* = 8.

## RESULTS AND DISCUSSION

Station . | Model . | Lead time . | NSE . | MSE . | RMSE . |
---|---|---|---|---|---|

Pucheng | GRU | t + 1 | 0.711 | 0.217 | 0.466 |

EMD-GRU | t + 1 | 0.815 | 0.138 | 0.372 | |

EEMD-GRU | t + 1 | 0.846 | 0.116 | 0.341 | |

VMD-GRU | t + 1 | 0.801 | 0.149 | 0.386 | |

INT-GRU | t + 1 | 0.824 | 0.132 | 0.364 | |

LGBM | t + 1 | 0.767 | 0.175 | 0.418 | |

EMD-LGBM | t + 1 | 0.844 | 0.116 | 0.342 | |

EEMD-LGBM | t + 1 | 0.904 | 0.071 | 0.268 | |

VMD-LGBM | t + 1 | 0.819 | 0.135 | 0.368 | |

INT-LGBM | t + 1 | 0.919 | 0.06 | 0.246 | |

Fengxiang | GRU | t + 1 | 0.833 | 0.186 | 0.432 |

EMD-GRU | t + 1 | 0.887 | 0.125 | 0.354 | |

EEMD-GRU | t + 1 | 0.906 | 0.105 | 0.324 | |

VMD-GRU | t + 1 | 0.878 | 0.136 | 0.369 | |

INT-GRU | t + 1 | 0.912 | 0.098 | 0.314 | |

LGBM | t + 1 | 0.835 | 0.184 | 0.429 | |

EMD-LGBM | t + 1 | 0.872 | 0.142 | 0.378 | |

EEMD-LGBM | t + 1 | 0.942 | 0.064 | 0.254 | |

VMD-LGBM | t + 1 | 0.862 | 0.154 | 0.392 | |

INT-LGBM | t + 1 | 0.953 | 0.052 | 0.228 | |

Wugong | GRU | t + 1 | 0.698 | 0.218 | 0.467 |

EMD-GRU | t + 1 | 0.812 | 0.135 | 0.368 | |

EEMD-GRU | t + 1 | 0.825 | 0.125 | 0.354 | |

VMD-GRU | t + 1 | 0.781 | 0.157 | 0.397 | |

INT-GRU | t + 1 | 0.832 | 0.121 | 0.347 | |

LGBM | t + 1 | 0.732 | 0.193 | 0.439 | |

EMD-LGBM | t + 1 | 0.859 | 0.101 | 0.318 | |

EEMD-LGBM | t + 1 | 0.874 | 0.091 | 0.301 | |

VMD-LGBM | t + 1 | 0.771 | 0.165 | 0.406 | |

INT-LGBM | t + 1 | 0.901 | 0.072 | 0.268 |

Station . | Model . | Lead time . | NSE . | MSE . | RMSE . |
---|---|---|---|---|---|

Pucheng | GRU | t + 1 | 0.711 | 0.217 | 0.466 |

EMD-GRU | t + 1 | 0.815 | 0.138 | 0.372 | |

EEMD-GRU | t + 1 | 0.846 | 0.116 | 0.341 | |

VMD-GRU | t + 1 | 0.801 | 0.149 | 0.386 | |

INT-GRU | t + 1 | 0.824 | 0.132 | 0.364 | |

LGBM | t + 1 | 0.767 | 0.175 | 0.418 | |

EMD-LGBM | t + 1 | 0.844 | 0.116 | 0.342 | |

EEMD-LGBM | t + 1 | 0.904 | 0.071 | 0.268 | |

VMD-LGBM | t + 1 | 0.819 | 0.135 | 0.368 | |

INT-LGBM | t + 1 | 0.919 | 0.06 | 0.246 | |

Fengxiang | GRU | t + 1 | 0.833 | 0.186 | 0.432 |

EMD-GRU | t + 1 | 0.887 | 0.125 | 0.354 | |

EEMD-GRU | t + 1 | 0.906 | 0.105 | 0.324 | |

VMD-GRU | t + 1 | 0.878 | 0.136 | 0.369 | |

INT-GRU | t + 1 | 0.912 | 0.098 | 0.314 | |

LGBM | t + 1 | 0.835 | 0.184 | 0.429 | |

EMD-LGBM | t + 1 | 0.872 | 0.142 | 0.378 | |

EEMD-LGBM | t + 1 | 0.942 | 0.064 | 0.254 | |

VMD-LGBM | t + 1 | 0.862 | 0.154 | 0.392 | |

INT-LGBM | t + 1 | 0.953 | 0.052 | 0.228 | |

Wugong | GRU | t + 1 | 0.698 | 0.218 | 0.467 |

EMD-GRU | t + 1 | 0.812 | 0.135 | 0.368 | |

EEMD-GRU | t + 1 | 0.825 | 0.125 | 0.354 | |

VMD-GRU | t + 1 | 0.781 | 0.157 | 0.397 | |

INT-GRU | t + 1 | 0.832 | 0.121 | 0.347 | |

LGBM | t + 1 | 0.732 | 0.193 | 0.439 | |

EMD-LGBM | t + 1 | 0.859 | 0.101 | 0.318 | |

EEMD-LGBM | t + 1 | 0.874 | 0.091 | 0.301 | |

VMD-LGBM | t + 1 | 0.771 | 0.165 | 0.406 | |

INT-LGBM | t + 1 | 0.901 | 0.072 | 0.268 |

From Table 3, the prediction effect of all stand-alone models show the lowest prediction performance in testing sets of three stations, and the performance of GRU model is lower than LGBM model. The minimum NSE values of GRU and LGBM models appear at Wugong station, which are 0.698 and 0.732, respectively.

By combining EMD, EEMD and VMD methods into stand-alone models, the ‘decomposition-prediction’ models are constructed and compared with the stand-alone models. Among them, the models based on EEMD show better prediction performance, followed by EMD, VMD improves models at the lowest level. The maximum NSE values of EEMD-GRU and EEMD-LGBM appear at Fengxiang station, both of which are above 0.9, indicating that decomposition pre-processing can significantly improve prediction performance.

The INT-ML models proposed in this paper have got satisfactory experiment results, especially INT-LGBM not only has the best prediction performance among all the models of three stations, but also has the smallest error statistics. At Pucheng station, the INT-LGBM model produces NSE of 0.919, 20% higher compared to the LGBM model, the MSE and RMSE are 66 and 41% lower. At Fengxiang station, the INT-LGBM model produces NSE of 0.953, 14% higher compared to the LGBM model, the MSE and RMSE are 72 and 47% lower. At Wugong station, the INT-LGBM model produces NSE as 0.901, 23% higher compared to the LGBM model, the MSE and RMSE are 63 and 39% lower. As for the INT-GRU model, its performance improvement to GRU model is also at the highest level except for Pucheng station.

In general, the prediction accuracy of ML models after data pre-processing has been improved to different degrees, and the prediction performance of newly developed INT-ML models are better than models constructed by a single decomposition algorithm.

From Figures 8–10, the fitting effect of stand-alone models are at the lowest level, especially when predicting the valley values of time series, nearly half of the prediction results have obvious errors with varying degrees of time shift. After decomposition pre-processing, the fitting degree of hybrid models are greatly improved. The INT-LGBM model has the highest accuracy in predicting peak and valley values, and shows the lowest time shift error. These factors prove the importance of decomposition pre-processing for ML models training, among which the ‘integration-prediction’ models learn feature information in SPI-12 time series most adequately.

Based on the above analysis of experimental results, we could conclude that GRU and LGBM stand-alone models have good potential in predicting short-term meteorological drought in the Guanzhong region, and LGBM has better comprehensive prediction ability than GRU. This result further confirms that ML models have more advantages than linear models in capturing the nonlinear relationship of drought index time series, and also indicates that different models have different learning abilities (Lima *et al.* 2013; Xu *et al.* 2020). In the process of practical application, at least two stand-alone models should be established for selection (Khosravi *et al.* 2017). In addition, the prediction performance of stand-alone models will be significantly improved after the input data decomposition pre-processing, which is also consistent with the research conclusion of current scholars, that is, the ‘decomposition-prediction’ hybrid models can obtain higher prediction accuracy (Başakın *et al.* 2021; Wang *et al.* 2021). However, the hybrid models constructed by using a single decomposition algorithm have certain limitations. To reduce the adverse effect of this limitation on the models, a new ‘integration-prediction’ hybrid model construction method is proposed in this study. After comprehensive comparison with the developed stand-alone models and ‘decomposing-prediction’ models, we find that the ‘integration-prediction’ model performs best among all the models, because it fuses more valuable feature information, enhances the learning ability of ML models, further improves the simulation effect and provides a new way for drought prediction research.

## CONCLUSION

ML technology has been widely used in hydrological modeling, especially in predicting complex nonlinear and non-stationary drought index time series. In this research, a new type ‘integration-prediction’ model is proposed, which deeply combines the signal decomposition algorithms with GRU and LGBM models, and applies BO to adjust the model hyper-parameters. The developed INT-GRU and INT-LGBM models have predicted SPI-12 of three stations in the Guanzhong region for the next one month, and compared with stand-alone models and ‘decomposition-prediction’ models. The following conclusions can be drawn:

- (1)
Both ML models GRU and LGBM successfully predict SPI-12 time series, and LGBM has better prediction ability than GRU. The experiment results also verify that signal decomposition algorithms as data pre-processing tools can greatly improve the prediction performance of ML models, among which EEMD has the largest improvement, followed by EMD and VMD.

- (2)
When there are multiple decomposition algorithms, besides comparing and choosing the best between them, we can integrate all decomposition results to reduce the limitations generated by a single decomposition algorithm.

- (3)
An ‘integration-prediction’ model construction method is proposed. Compared with existing stand-alone models and ‘decomposition-prediction’ models, the ‘integration-prediction’ model INT-LGBM produces higher prediction accuracy, smaller prediction error and better stability.

- (4)
This study provides a new idea for drought index prediction in the Guanzhong region. In future work, we will select more arid regions to further prove the universal adaptability of ‘integration-prediction’ models, and to offer a reliable reference for decision makers.

## FUNDING

This work is supported by the National Natural Science Foundation of China (No. 52209035) and Xi'an University of Technology (No. 256082016).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.