## Abstract

Rainfall is a precious water resource, especially for Shenzhen with scarce local water resources. Therefore, an effective rainfall prediction model is essential for improvement of water supply efficiency and water resources planning in Shenzhen. In this study, a deep learning model based on zero sum game (ZSG) was proposed to predict ten-day rainfall, the regular models were constructed for comparison, and the cross-validation was performed to further compare the generalization ability of the models. Meanwhile, the sliding window mechanism, differential evolution genetic algorithm, and discrete wavelet transform were developed to solve the problem of data non-stationarity, local optimal solutions, and noise filtration, respectively. The k-means clustering algorithm was used to discover the potential laws of the dataset to provide reference for sliding window. Mean square error (MSE), Nash–Sutcliffe efficiency coefficient (NSE) and mean absolute error (MAE) were applied for model evaluation. The results indicated that ZSG could better optimize the parameter adjustment process of models, and improved generalization ability of models. The generalization ability of the bidirectional model was superior to that of the unidirectional model. The ZSG-based models showed stronger superiority compared with regular models, and provided the lowest MSE (1.29%), NSE (21.75%), and MAE (7.5%) in the ten-day rainfall prediction.

## HIGHLIGHTS

Proposing a deep learning model based on zero sum game.

Improving unidirectional propagation model into bidirectional propagation model.

Introducing sliding window mechanism to solve the problem of data non-stationarity.

Designing the differential evolution genetic algorithm to solve local optimal solutions.

Using the discrete wavelet transform to filter out noise.

## INTRODUCTION

The population and the buildings in Shenzhen are dense, and the modernization degree is high. Therefore, Shenzhen faces the serious problem of local water resources shortage, and water supply is heavily dependent on water diversion. Rainfall is a precious water resource that can be used in urban water supply scheduling, and it can effectively alleviate the water supply pressure. However, uncertainty in rainfall depth is an important cause of an inappropriate water diversion plan for next year. Hence, accurate rainfall prediction is crucial to the water resources planning and water diversion plan. The mechanistic models depend on the external physical environment, to a certain extent, and require a large number of measured data. The measured data often have some missing values and outliers, the external physical environment is changing, and the modeling time of mechanistic models is long. These problems will have a great influence on the prediction of mechanistic models, which may weaken the effectiveness of the predicted results. Nevertheless, a large number of studies show that the data-driven models show high accuracy in rainfall prediction (Bagirov *et al.* 2017; Ni *et al.* 2020; Ridwan *et al.* 2021).

Hydrological data, such as rainfall, often have potential long-term trends and periodic patterns, which can be scientifically predicted with the appropriate tools. In recent years, data-driven methods, such as regression analysis (Danandeh Mehr *et al.* 2019; Ali *et al.* 2020), artificial neural network (ANN) (Jaddi & Abdullah 2018; Sulaiman & Wahab 2018; Liu *et al.* 2019), time series model (Le *et al.* 2019; Poornima & Pushpalatha 2019), and deep learning model (Yen *et al.* 2019), have been used to predict rainfall. The data-driven models can discover the potential quantitative relationship between data through self-learning, and autonomously learn previously unknown knowledge. The modeling speed is fast and the accuracy is high, and the model is not affected by changes in the external physical environment. Ramana *et al.* (2013) applied the wavelet and ANN to effectively predict monthly rainfall and prove that wavelet neural network models are more effective than the ANN models. Liu & Shi (2019) better predicted monthly rainfall based on genetic programming. In addition, many similar studies also show that the data-driven models have a good performance in monthly rainfall prediction.

However, Shenzhen has a large floating population, water supply dispatch is dominated by short-term dispatching, and the dispatching period is generally one week to ten days. Meanwhile, reservoirs are the main objects of water storage, and the water supply reservoirs are mainly medium-scale and small-scale reservoirs. The time span of the monthly rainfall is too long to provide support for water supply dispatching. This cannot effectively make use of rainfall, which tends to cause a great deal of water to be abandoned from the reservoirs in the flood season. Therefore, it is of great importance to accurately predict ten-day rainfall.

In the process of ten-day rainfall prediction, it is found that the stationarity (Ng *et al.* 2020) of ten-day rainfall data is worse than that of monthly rainfall data. Due to the non-stationarity of data, the generalization ability of data-driven models tends to deteriorate. The ANN model tends to overfit (Sari *et al.* 2017; Brodeur *et al.* 2020) and the regression models easily generate a spurious-regression equation. Although Estévez *et al.* (2020) used wavelet neural network model to solve the non-stationarity of local weather situations in monthly rainfall, the non-stationarity of ten-day rainfall data is worse. On the other hand, the learning ability of ANN is insufficient, which leads to bad predicted results. The most important point is that there is no standard to evaluate the advantages and disadvantages of the modeling process.

The purpose of this study is to solve the above problems, and accurately predict ten-day rainfall in Shenzhen. In this study, a deep learning model based on zero sum games (ZSG) (Dahmani *et al.* 2020) coupling bidirectional (Chen *et al.* 2014) long short-term memory (BiLSTM) and support vector machine (SVM) was proposed for ten-day rainfall prediction. The sliding window (SW) mechanism and the k-means clustering algorithm (KCA) were developed to carry out the data pre-processing, and the adaptive moment estimation (AME), differential evolutionary genetic algorithm (DEGA), and least squares (LS) are used to solve the model. The accuracy of the ZSG-based model was compared with the regular models in terms of the mean square error (MSE), Nash–Sutcliffe efficiency coefficient (NSE), and mean absolute error (MAE), and cross-validation (CV) was performed to further compare the prediction accuracy of the models. The influence of SW and discrete wavelet transform (DWT) on prediction accuracy was further compared and discussed. The methods in this study can not only provide guidance for hydrologic forecasting in other fields, but also provide reference to solve the problem of non-stationarity and noise filtration.

## STUDY AREA AND DATA

Shenzhen (Figure 1), located on the shore of the South China Sea, adjacent to Hong Kong, is a special economic zone of China. It is the first city in China to be fully urbanized. Shenzhen has a subtropical marine climate, and annual average temperature is 22.3 °C. The rapid growth of the population and economy have increased water demand, which poses a huge challenge for water supply in Shenzhen. However, Shenzhen has abundant rainfall, with an average annual amount of 1,830 mm. The rainfall is concentrated in April to September, accounting for more than 80% of the annual rainfall. The abundant rainfall in the flood season increases the amount of abandoned water from reservoirs, leading to serious waste of water resources. In this study, the daily average rainfall for a 40-year period from 1981 to 2020 in Shenzhen was used. The daily average rainfall data from January 1981 to December 2020 are from the Meteorological Bureau of Shenzhen Municipaity. The daily data are summarized into ten-day data, and each year consists of 36 ten days. The daily data of 40 years are summarized into 1,440 ten-day data, so the length of dataset in this study is 1,440.

## METHODS

### Model development

*et al.*2020; Xu

*et al.*2020) is composed of a four-layer neural network, and the model can maintain a memory unit. LSTM is a time series model with high accuracy, but it only has the ability of forward propagation. Considering the limitations of unidirectional propagation, the BiLSTM model (Cheng

*et al.*2019; Yin

*et al.*2020) has been developed (Equations (1)–(6)) to improve the generalization ability of the LSTM model through bidirectional propagation. The BiLSTM model is composed of forward LSTM (LSTM

_{F}) and backward LSTM (LSTM

_{B}).where

*a*,

*h*,

*x*,

*W*, and

*I*are attenuation factor, the output of hidden layer, the input of model, weight, and intercept, respectively;

*S*and

*T*are the activation functions to increase the nonlinearity factor; is the memory learned at time

*t*;

*m*is the final memory at time

_{t}*t*;

*o*is the model output. Since the dimension of output data has doubled, the dimension of final output needs to be restored through linear rectification.

*et al.*2020) instead of the nonlinear mapping (Gao

*et al.*2021) to the high-dimensional space is adopted in SVM (Norouzi

*et al.*2019; Adnan

*et al.*2020), which has better classification efficiency. The process of SVM searching the optimal solution is a process of finding the shortest linear distance between the points and the plane (Equation (7)). The constraint conditions are shown in Equation (8).where

*dist*is the distance from the (

*x*) to the plane;

_{i},y_{i}*w*and

*b*are weight and bias;

*K*is the radial basis kernel function.

First, the measured data (*Y*) and noise data (*Noise*) are imported into the SVM model for training, so that the SVM model can distinguish the label of measured data as 1 and the label of other data as 0. The purpose is that only the measured data can be recognized as 1. If the error between the predicted data generated by BiLSTM model and measured data is large, the discriminant result of the SVM model is 0. If predicted data want to be discriminated as 1, the predicted data must have a small error with the measured data, so as to make clear the optimization direction for the BiLSTM model. Therefore, the smaller the error between the measured data and the predicted data is, the closer the discriminant result is to 1. Then, the input data (*X*) are imported into the BiLSTM model to generate the predicted data. After the generated data are discriminated by the SVM model, the results will be fed back to the BiLSTM model, and the parameter adjustment process of modeling can be better guided through the ZSG between the two models. Therefore, the entire modeling process is the ZSG process of the SVM-BiLSTM (SBiLSTM) model.

*et al.*2018). Considering that the number of hidden layers and neurons is large, in order to reduce the modeling time, this study will no longer use traditional data structure to calculate each element by iteration. The data structure used in this study is the tensor of deep learning to simplify the calculation. A one-dimensional tensor is equivalent to vector, and a two-dimensional tensor is equivalent to a matrix. In addition, in order to better find the optimal solution, AME (Equations (10)–(12)) and LS (Equation (13)) are used to solve the model. The final parameters of the model are the parameters when the error between the measured data and the predicted data is minimum and the discriminant result of SVM model is maximum.where

*Softmax*is activation function that converts the output value into a probability value;

*y*is model output;

*s*and

*r*are the first-order momentum and second-order momentum;

*ρ*

_{1}and

*ρ*

_{2}are rectification factors less than 1 for

*s*and

*r*;

*grad*,

*t*,

*θ*, and

*η*are the gradient, number of iterations, model parameter, and learning rate, respectively;

**and**

*Y***are measured data tensor and predicted data tensor;**

*P**ε*is 10

^{−8};

*ρ*

_{1}and

*ρ*

_{2}are 0.9 and 0.999 in this study. As

*t*gets bigger and bigger,

*ρ*and

_{1}*ρ*become less and less important.

_{2}*t*have the best correlation relationship with the data of 6 lag time (

*t*− 1 to

*t*− 6). Therefore, the four models were constructed through 6 lag time, and the iteration number of model training is 5,000. All models and algorithms are developed by Python3.where

*y*

_{min}and

*y*

_{max}are the minimum and maximum values of the dataset;

*y*is the value at time

_{i}*i*;

*n*is the length of the series.

### SW

*FSW*and

*SSW*are the first-order SW series and second-order SW series;

*x*and

*f*are the value of original series and first-order SW series;

*s*

_{1},

*s*

_{2},

*t*,

*n*, and

*l*are the window size of first-order SW, window size of second-order SW, time, length of original series, and length of

*FSW*, respectively.

### Optimizer

Genetic algorithms (GA) are a commonly used heuristic algorithm. GA (Delgoda *et al.* 2017; Sotomayor *et al.* 2018) can help individuals in the population to complete evolution, but the degradation phenomenon of the population may also occur. Sometimes this phenomenon is obvious, resulting in the bad fitness of individuals. Even after continuous evolution for many generations, the individual in the population cannot be better. If mutation probability of self-adaptive attenuation is set, the decreasing probability value with the iteration cannot make the individual obtain higher fitness, so the GA cannot converge. Therefore, DEGA is developed in this study. Mutation is generated by the difference of the parent generation, and new individuals are generated by crossing with the parent generation individuals to solve the degradation phenomenon.

*et al.*2017; Mouatadid

*et al.*2019) is an effective method of time series analysis (Equations (17) and (18)). DWT can filter out high-frequency or low-frequency components by decomposing, filtering, and reconstruction of time series. Experimental findings show that most of the valid data are low-frequency components, while noises are high-frequency components. Based on the findings, the soft threshold method of DWT is applied in this study to filter out the high-frequency components in the process of reverse SW.where

*DWT*,

_{f}*a*,

*t*,

*τ*, and

*ψ*are the wavelet transform coefficient, scale, time, deviation, and wavelet base, respectively. The Daubechies wavelet base is used in the study.

In addition, in order to reduce the fluctuation of the training process, the adaptive learning rate is set in this study. The initial learning rate is set to 0.01 so that the model can converge fast. If the training error continues to decrease, the learning rate will not change. When the error rebounds, the monitor can start to track the following 100 rounds of training. If the errors of following the 100 rounds of training continue to decrease, the learning rate remains unchanged, otherwise the learning rate is set to 0.9 times to slow down the step size of parameter adjustment. Then, the monitor is turned off and there is a wait for the 100 rounds of training before it can be turned on again. When the error rebounds to more than twice that of the current time error, the model is considered to jump out of the local optimal solution, so the learning rate is set to 0.01.

### Exploratory data analysis

Before modeling, exploratory data analysis (Xiao *et al.* 2012) is carried out to find the potential law of this dataset. In this study, KCA (Kim & Parnichkun 2017; Hamid 2019) is carried out for data of different time spans, and the results show that there is obvious law in the clustering results of annual data. The data normalization is carried out to present clustering results clearly.

According to Figure 4, the annual rainfall data from 1981 to 2020 is divided into two categories. The annual data are split into four breakpoints, which are 1992, 2001, 2008, and 2017. Therefore, the annual rainfall has periodic patterns. The periodic patterns of rainfall data can provide reference for SW. Based on the laws, *s*_{1} is set to 36 and *s*_{2} is set to 2 to smooth first-order SW series. As can be seen from Figure 5, the law of the second-order SW series is simple, and the trend is relatively obvious. This makes it easier to construct data-driven models. The second-order SW series is used for modeling in this study, and the predicted results of original time series can be obtained by reverse SW.

### Model evaluation

**and**

*Y***are the measured value tensor and the predicted value tensor;**

*P**var*is the variance.

## RESULTS AND DISCUSSION

### Training and validation results

The training results are shown in Table 1. All the models converge well in the training set. Except for the LSTM model, the evaluation standards of the other three models have little difference. The training results of the BiLSTM model are close to the SLSTM model, so the fitting ability of the BiLSTM model is close to that of the SLSTM model. According to evaluation standards, the error, fitting degree, and average deviation degree of the SBiLSTM model are superior to those of the other three models. Therefore, the fitting ability of the SBiLSTM model is the strongest, and the fitting ability of the LSTM model is the worst. The fitting ability of BiLSTM and SLSTM models is superior to that of the LSTM model.

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.54 | 89.00 | 0.52 |

BiLSTM | 0.17 | 96.65 | 0.27 |

SLSTM | 0.17 | 96.49 | 0.27 |

SBiLSTM | 0.1 | 98.03 | 0.2 |

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.54 | 89.00 | 0.52 |

BiLSTM | 0.17 | 96.65 | 0.27 |

SLSTM | 0.17 | 96.49 | 0.27 |

SBiLSTM | 0.1 | 98.03 | 0.2 |

Table 2 clearly shows the evaluation standards of the four models on the validation set. The four models show good validation results, which highlight the effectiveness of the deep learning models. The SBiLSTM model has optimal MSE, NSE, and MAE, while the LSTM model has the worst MSE, NSE, and MAE. The validation results of the BiLSTM model are still very close to the SLSTM model, so the learning ability of the BiLSTM model is close to that of the SLSTM model. The validation results of the BiLSTM model are superior to that of the LSTM model, and the validation results of the SLSTM model are superior to that of the LSTM model. Therefore, the learning ability of the BiLSTM model is superior to that of the LSTM model, and the learning ability of the SLSTM model is superior to that of the LSTM model.

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.15 | 94.26 | 0.31 |

BiLSTM | 0.11 | 95.80 | 0.24 |

SLSTM | 0.12 | 95.68 | 0.24 |

SBiLSTM | 0.07 | 97.47 | 0.19 |

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.15 | 94.26 | 0.31 |

BiLSTM | 0.11 | 95.80 | 0.24 |

SLSTM | 0.12 | 95.68 | 0.24 |

SBiLSTM | 0.07 | 97.47 | 0.19 |

### Test results

Table 3 presents the test results of the four models. On the test set, the MSE and MAE of the four models are sorted in ascending order: SBiLSTM, BiLSTM, SLSTM, and LSTM. The NSE is sorted from largest to smallest: SBiLSTM, BiLSTM, SLSTM, and LSTM. Therefore, the SBiLSTM model has the minimum error and average deviation degree, and the highest fitting degree. The generalization ability of the SBiLSTM model is strongest, and this model is closest to the unbiased prediction. Compared with BiLSTM, the MSE of the SBiLSTM decreases by 40%, the NSE increases by 1.46%, and the MAE decreases by 18.18%. It can be seen that the prediction accuracy of the SBiLSTM model is superior to that of the BiLSTM model. Similarly, the prediction accuracy of the SLSTM model is superior to that of the LSTM model. Thus, the generalization ability of the SBiLSTM model is superior to that of the BiLSTM model, and the generalization ability of the SLSTM model is superior to that of the LSTM model. This shows that ZSG can improve the generalization ability of models and increase the prediction accuracy.

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.24 | 90.27 | 0.38 |

BiLSTM | 0.1 | 96.05 | 0.22 |

SLSTM | 0.1 | 95.90 | 0.23 |

SBiLSTM | 0.06 | 97.45 | 0.18 |

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.24 | 90.27 | 0.38 |

BiLSTM | 0.1 | 96.05 | 0.22 |

SLSTM | 0.1 | 95.90 | 0.23 |

SBiLSTM | 0.06 | 97.45 | 0.18 |

Compared with the LSTM model, the MSE of the BiLSTM model decreases by 58.33%, the NSE increases by 6.4%, and the MAE decreases by 42.11%. Therefore, the prediction accuracy of the BiLSTM model is superior to that of the LSTM model. Similarly, the prediction accuracy of the SBiLSTM model is superior to that of the SLSTM model. These results show that the generalization ability of the bidirectional model is superior to that of the unidirectional model. Since the BiLSTM model has a relatively strong generalization ability, the prediction accuracy of the SBiLSTM model is not improved much. However, the prediction accuracy of the SLSTM model is much better than that of the LSTM model, and the prediction accuracy of the BiLSTM model is slightly superior to that of the SLSTM model. The generalization ability of the BiLSTM model is superior to that of the SLSTM model, indicating bidirectional propagation is essential for improvement of generalization ability.

Model . | CIUL/% . | UQ/% . | Median/% . | LQ/% . | IQR/% . | CILL/% . |
---|---|---|---|---|---|---|

LSTM | 11.79 | 5.66 | 3.13 | 1.57 | 4.09 | 0 |

BiLSTM | 6.98 | 3.16 | 1.39 | 0.62 | 2.54 | 0 |

SLSTM | 7.37 | 3.34 | 1.51 | 0.65 | 2.69 | 0 |

SBiLSTM | 5.71 | 2.58 | 1.17 | 0.49 | 2.09 | 0 |

Model . | CIUL/% . | UQ/% . | Median/% . | LQ/% . | IQR/% . | CILL/% . |
---|---|---|---|---|---|---|

LSTM | 11.79 | 5.66 | 3.13 | 1.57 | 4.09 | 0 |

BiLSTM | 6.98 | 3.16 | 1.39 | 0.62 | 2.54 | 0 |

SLSTM | 7.37 | 3.34 | 1.51 | 0.65 | 2.69 | 0 |

SBiLSTM | 5.71 | 2.58 | 1.17 | 0.49 | 2.09 | 0 |

The CIUL, UQ, LQ, IQR, and CILL are confidence interval upper limit, upper quartile, lower quartile, interquartile range, and confidence interval lower limit, respectively: IQR = UQ − LQ; confidence interval (CI) = CIUL − CILL.

The violin plot represents the relative error distribution of each predicted data. The larger the width of the violin plot at a certain interval, the larger the density of relative error at this interval. The larger the height of CI is, the more discrete relative error distribution is. If the height of the violin is large, the value range of relative error between the measured values and predicted values is large, which reveals that the model stability is bad.

According to Table 4 and Figure 7, although the CILL of the four models are all 0, the CIUL is quite different. The CI height of the SBiLSTM model is the smallest, and the relative error in the widest part of the violin is the smallest. The relative error of the SBiLSTM model is mostly distributed between 0 and 5.71%, and the SBiLSTM model has the optimal CIUL, UQ, median, LQ, IQR, and CILL. These results indicate relative error distribution of the SBiLSTM model is the best. Therefore, the stability of the SBiLSTM model is the strongest.

Compared with the SBiLSTM model, the SLSTM model has larger violin parameters, revealing that the SLSTM model has a more discrete relative error distribution. Similarly, the violin parameters of the BiLSTM model are superior to that of the LSTM model, and the violin parameters of the BiLSTM model are also superior to that of the SLSTM model. These results indicate relative error distribution of the BiLSTM model is superior to that of the LSTM model and SLSTM model. These results reveal that bidirectional propagation is also critical to the stability of the model.

During the process of reverse SW, the errors will accumulate, and the prediction accuracy of the second-order SW series determines the prediction accuracy of the first-order SW series. Therefore, we can infer that the first-order SW series of SBiLSTM model has the minimum error. The first-order SW series is obtained by the reverse SW (Table 5). The SBiLSTM model still has the smallest MSE and MAE, and the highest NSE. The ranking of prediction accuracy from high to low is: SBiLSTM, BiLSTM, SLSTM, and LSTM. The MSE, NSE, and MAE of the LSTM model is the worst. The fitting degree of the LSTM model decreased obviously, and the error and average deviation degree between measured values and predicted values increased obviously.

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.44 | 82.45 | 0.53 |

BiLSTM | 0.22 | 91.19 | 0.37 |

SLSTM | 0.23 | 90.81 | 0.38 |

SBiLSTM | 0.12 | 95.06 | 0.26 |

Model . | MSE/10^{−4}
. | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 0.44 | 82.45 | 0.53 |

BiLSTM | 0.22 | 91.19 | 0.37 |

SLSTM | 0.23 | 90.81 | 0.38 |

SBiLSTM | 0.12 | 95.06 | 0.26 |

Figure 8 presents the absolute error bar plot of the second-order SW series and the first-order SW series. The SBiLSTM model has the shortest error bar, while the error bars of the other three models all become longer. This is because the error can accumulate during the process of reverse SW. Finally, after twice reversing the SW, the predicted results of the original series can be obtained (Table 6). The error of the predicted values is small in the initial stage. However, as the error accumulates, the predicted values of the models begin to deviate from the measured values. The LSTM model greatly deviates from the measured values, while the other three models slightly deviate from the measured values (Figure 9).

Model . | MSE/% . | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 1.93 | −10.68 | 9.89 |

BiLSTM | 1.50 | 8.95 | 7.94 |

SLSTM | 1.52 | 7.65 | 8.0 |

SBiLSTM | 1.29 | 21.75 | 7.5 |

Model . | MSE/% . | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 1.93 | −10.68 | 9.89 |

BiLSTM | 1.50 | 8.95 | 7.94 |

SLSTM | 1.52 | 7.65 | 8.0 |

SBiLSTM | 1.29 | 21.75 | 7.5 |

According to Table 6, the MSE, NSE, and MAE of the four models are sorted from best to worst: SBiLSTM, BiLSTM, SLSTM, and LSTM. Since the measured values include many 0 values, the NSE of the LSTM model becomes negative, but the NSE of the other three models still remains positive. The SBiLSTM model has the smallest error and average deviation degree. The MSE and MAE of the BiLSTM model are superior to that of LSTM model and SLSTM model.

Compared with the BiLSTM and SLSTM models, the SBiLSTM model has higher prediction accuracy, indicating that ZSG can better improve the generalization ability of the model, and the generalization ability of the bidirectional model is superior to that of the unidirectional model.

### CV

In order to further compare the prediction accuracy of the four models, cross-validation is performed. The predicted results of the original series are presented in Table 7 and Figure 10.

Model . | MSE/% . | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 1.93 | −4.23 | 9.87 |

BiLSTM | 1.79 | −0.81 | 8.47 |

SLSTM | 1.75 | 1.49 | 8.38 |

SBiLSTM | 1.46 | 17.77 | 7.63 |

Model . | MSE/% . | NSE/% . | MAE/% . |
---|---|---|---|

LSTM | 1.93 | −4.23 | 9.87 |

BiLSTM | 1.79 | −0.81 | 8.47 |

SLSTM | 1.75 | 1.49 | 8.38 |

SBiLSTM | 1.46 | 17.77 | 7.63 |

According to Table 7 and Figure 10, the prediction accuracy of the four models on the CV is good, and the predicted accuracy is close to the results in Table 6. This reveals that the generalization ability of deep learning models is very strong. Compared with the other three models, the SBiLSTM model has the optimal MSE, MAE, and NSE, indicating that the error, fitting degree, and average deviation degree between the measured values and predicted values are optimal. The prediction accuracy of the SBiLSTM model is superior to that of the SLSTM model, and the prediction accuracy of the BiLSTM model is superior to that of the LSTM model. Therefore, the prediction accuracy of ZSG-based deep learning models is superior to regular deep learning models. These results also reveal that ZSG can improve generalization ability of models and the bidirectional models have stronger generalization ability. Compared with results in Table 6, prediction accuracy of the SLSTM model is superior to that of the BiLSTM model, but the gap is not large. This is because the error of second-order SW series of SLSTM model is smaller.

## DISCUSSION

In order to compare the influence of DWT on the prediction accuracy, the SBiLSTM model with the highest prediction accuracy is selected to present the predicted results. The predicted results without DWT of the first-order SW series and original series are shown in Tables 8 and 9, respectively. The prediction accuracy without DWT is obviously different from that with DWT, and MSE, NSE, and MAE of the predicted results without DWT were obviously worse.

Model . | MSE/10^{−4}. | NSE/% . | MAE/% . | |||
---|---|---|---|---|---|---|

Without DWT . | DWT . | Without DWT . | DWT . | Without DWT . | DWT . | |

SBiLSTM | 1.20 | 0.12 | 52.00 | 95.06 | 1.00 | 0.26 |

SBiLSTM-CV | 0.32 | 0.13 | 91.85 | 96.58 | 0.43 | 0.25 |

Model . | MSE/10^{−4}. | NSE/% . | MAE/% . | |||
---|---|---|---|---|---|---|

Without DWT . | DWT . | Without DWT . | DWT . | Without DWT . | DWT . | |

SBiLSTM | 1.20 | 0.12 | 52.00 | 95.06 | 1.00 | 0.26 |

SBiLSTM-CV | 0.32 | 0.13 | 91.85 | 96.58 | 0.43 | 0.25 |

Model . | MSE . | NSE . | MAE . | |||
---|---|---|---|---|---|---|

Without DWT . | DWT/% . | Without DWT . | DWT/% . | Without DWT . | DWT/% . | |

SBiLSTM | 2.82 | 1.29 | −160.71 | 21.75 | 1.02 | 7.5 |

SBiLSTM-CV | 0.96 | 1.46 | −50.72 | 17.77 | 0.61 | 7.63 |

Model . | MSE . | NSE . | MAE . | |||
---|---|---|---|---|---|---|

Without DWT . | DWT/% . | Without DWT . | DWT/% . | Without DWT . | DWT/% . | |

SBiLSTM | 2.82 | 1.29 | −160.71 | 21.75 | 1.02 | 7.5 |

SBiLSTM-CV | 0.96 | 1.46 | −50.72 | 17.77 | 0.61 | 7.63 |

For the predicted results without DWT of the first-order SW series, the evaluation standards of the model all deteriorated to a certain extent, leading to larger errors during the process of reverse SW. As the error accumulates, predicted values of the original series deteriorate seriously and predicted results are completely distorted. A fitting plot of predicted values is presented in Figure 11 to clearly show the influence of DWT.

Obviously, the procedure of noise filtration is essential for reverse SW. When the forward SW is carried out, the influence of noise is weakened to some extent. Nevertheless, during the process of the reverse SW, the influence of noise will be amplified. If the noise is not filtered out, the effective data will be gradually amplified into noise. According to Figure 11(a), the error of the predicted results without DWT increases obviously, while the error of the predicted results with DWT is small. As can be seen from Figure 11(b), the predicted results without DWT in the initial stage has a small error. Due to the accumulation of error, all the results finally are amplified into noise, while the predicted results with DWT still maintain a small error, indicating that DWT can effectively filter out noise during the process of reverse SW.

Meanwhile, in order to compare the influence of SW on prediction accuracy, the predicted results of the SBiLSTM model without SW are compared with that with SW in terms of violin plot (Figure 12 and Table 10).

Model . | CIUL . | UQ . | Median . | LQ . | IQR . | CILL . |
---|---|---|---|---|---|---|

SBiLSTM-SW | 2.01 | 1.01 | 0.74 | 0.34 | 0.67 | 0 |

SBiLSTM-without SW | 4.66 | 2.12 | 0.78 | 0.43 | 1.69 | 0 |

SBiLSTM-SW-CV | 2.58 | 1.24 | 0.74 | 0.35 | 0.89 | 0 |

SBiLSTM-without SW-CV | 4.48 | 2.04 | 0.77 | 0.42 | 1.62 | 0 |

Model . | CIUL . | UQ . | Median . | LQ . | IQR . | CILL . |
---|---|---|---|---|---|---|

SBiLSTM-SW | 2.01 | 1.01 | 0.74 | 0.34 | 0.67 | 0 |

SBiLSTM-without SW | 4.66 | 2.12 | 0.78 | 0.43 | 1.69 | 0 |

SBiLSTM-SW-CV | 2.58 | 1.24 | 0.74 | 0.35 | 0.89 | 0 |

SBiLSTM-without SW-CV | 4.48 | 2.04 | 0.77 | 0.42 | 1.62 | 0 |

According to the violin plot, the predicted results with SW are obviously superior to those without SW. Compared with the violin parameters without SW, the violin parameters with SW are better, and IQR and CI are smaller.

### Prediction results in 2022

Based on the data from 1981 to 2020, the rainfall of especially wet year, wet year, normal year, dry year, and especially dry year in Shenzhen can be calculated (Figure 13). The predicted result of rainfall in 2022 is 1,821.14 mm, which belongs to normal year (Figure 14).

In 2022, the rainfall from April to September accounts for 85.26% of the annual rainfall, and this period is a period that has a large amount of water supply in Shenzhen. The ten-day rainfall can provide strong support for the water diversion plan of next year and water resources programming. Based on the predicted results of ten-day rainfall, more utilization of rainfall can relieve the water supply pressure of water diversion and reduce the abandoned water from the reservoirs, so as to improve the water supply efficiency.

## CONCLUSIONS

In this study, a ZSG-based deep learning model coupling SVM and BiLSTM is proposed to predict the ten-day rainfall. The predicted results are compared with regular deep learning model, and CV is performed. The AME and LS are used to solve the model, and DEGA and DWT are used to solve the local optimal solutions and noise problems. SW mechanism is introduced to solve the problem of non-stationarity, and KCA is developed to find the potential law of this dataset. MSE, NSE, and MAE are used for model evaluation.

The ten-day data are obtained by sum of daily data. Through the ZSG between SVM and BiLSTM models, the BiLSTM model can generate predicted data with high accuracy, and the predicted results of the original series can be obtained by reverse SW. The results show that the prediction accuracy of the model is high, and the SBiLSTM model is the closest to the unbiased prediction. These discussion results show the effectiveness of the proposed methods in the study. Based on the above experimental results, the proposed methods in this study possess three advantages:

The KCA can discover period breakpoints in long time series, which provides reference for window size of SW.

The SW mechanism can solve the non-stationarity problem of the time series to a large extent, and improve the prediction accuracy of the model. During the process of reverse SW, DWT can effectively filter out noise.

The ZSG can help the BiLSTM model optimize the process of parameter adjustment, find the optimal solution more accurately, and improve generalization ability of models. Meanwhile, the bidirectional models have stronger generalization ability than unidirectional model.

## ACKNOWLEDGEMENTS

This study is supported by the Scientific Research Projects of IWHR (01882103, 01882104), China Three Gorges Corporation Research Project (Contract No: 202103044), National Natural Science Foundation of China (51679089), and Innovation Foundation of North China University of Water Resources and Electric Power for PhD graduates.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.