Accurate prediction of dam inflows is essential for effective water resources management in terms of both water quantity and quality. This study aims to develop a Long Short-Term Memory (LSTM) deep learning-based monthly dam inflow prediction model using large-scale climate indices. Six climate indices, Atlantic multidecadal oscillation (AMO), El Niño–southern oscillations (ENSO), North Atlantic oscillation (NAO), Pacific decadal oscillation (PDO), Niño 3.4, and Southern Oscillation Index (SOI) for the period of 1981–2020, were used as input variables of the model. The proposed model was trained with 29 years of data (1981–2009) and tested with 12 years of data (2009–2020). We investigated 29 input data combinations to evaluate the predictive performance according to different input datasets. The model showed the average values of metrics ranged from 0.5 to 0.6 for CC and from 40 to 80 cm for root mean square error (RMSE) at three dams. The prediction results from the model showed lower performance as the lead time increased. Also, each dam showed different prediction results for different seasons. For example, Soyangriver/Daecheong dams have better accuracy in prediction for the wet season than the dry season, whereas the Andong dam has a high prediction ability during the dry season. These investigations can be used for better efficient dam management using a data-driven approach.

  • A dam inflow prediction model was developed using the LSTM-based deep learning method and climate indices.

  • Six climate indices including AMO, ENSO, NAO, PDO, Niño 3.4, and SOI are considered as input variables.

  • The proposed model tests 29 different input combinations to find the best combinations for prediction.

  • The proposed model shows the applicability to predict dam inflow variability in different locations.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Climate change and global warming have influenced hydrological cycles to become more complex, they also play a critical role in causing uncertainty in water resource planning. One of the topographical and meteorological characteristics of South Korea is that the mountainous area is high, the slope of the river is steep, and 60–70% of total annual precipitation is observed in the rainy season (June–August). Therefore, it is necessary to manage water resources according to seasonal water demand efficiently. To overcome the geographical and unfavorable climatic constraints to water resource management, the Korean government secures water resources by constructing and operating dams for various purposes such as the generation of hydroelectric power, flood damage mitigation, and water supply for the agricultural system (Lee et al. 2019).

In fact, according to the sixth report of the IPCC published in 2021, it is predicted that the frequency and severity of floods and droughts around the world will increase, and the variability of precipitation patterns and surface water flow will intensify under the scenario that global warming will continue in the future. Similarly, in South Korea, many pieces of research have presented that seasonal variability in precipitation and runoff intensifies because of climate change (Bae et al. 2008; Jung et al. 2013) and El Niño and La Niña (Jung & Kim 2022). The previous study analyzed that the average annual runoff would increase due to increased precipitation and runoff during the rainy season. Thus, it is essential to present an efficient water resource management plan considering the climate variability of dam operations to mitigate natural disasters such as floods and droughts in the future.

In South Korea, where precipitation is generally concentrated in summer due to the monsoon climate, dams are operated so that water is stored in the dam during the rainy season and discharged the water during the dry season. Therefore, the management plan is to lower the water level to the flood level before the rainy season and then fill it up before the dry season. However, there is a risk of drought due to insufficient storage and flood damage due to inaccurate discharge if the dam inflow rate is not accurately predicted. Recently, the severity of drought and floods in dry and rainy seasons has increased due to climate change. Thus, accurate dam inflow prediction is essential to manage and operate multiple dams in South Korea efficiently.

There are many research papers about dam inflow prediction using diverse statistical methods (Kim et al. 2017; Bae et al. 2019; Hong et al. 2020; Han et al. 2021). For example, Awan & Bae (2014) developed an Adaptive Neuro-Fuzzy Inference System (ANFIS)-based model for dam inflow prediction for six major dams, including the Andong dam and Chungju dam in South Korea, using monthly precipitation, relative humidity, and air temperature. Eom & Jung (2019) conducted multiple linear regression analyses based on time series data for the Seomjin River dam. They evaluated the accuracy of prediction results for short-term inflow, and peak inflow is high when hourly precipitation, discharge, and antecedent inflow rate were used as predictors. Noorbeh et al. (2020) used Bayesian Networks (BN) for monthly and annual inflow prediction. They resulted in integrating continuous BN, discrete BN, and K-means cluster analysis, the range of inflow was predicted better than when the traditional statistical method was used. Hong et al. (2020) used six machine learning-based algorithms to predict the inflow of the Soyangriver dam in South Korea. As a result of the analysis, multilayer perceptron (MLP) was the optimal algorithm. Still, there was a limit to accurately predicting dam inflows, so the optimal result was obtained when several algorithms were used together.

It is essential to investigate the correlation between dam inflow and climate factors. Tao et al. (2011) conducted a trend analysis between various climatic factors and dam inflow rates in the Tarim River basin in China. The results showed that the inflow rate increases under climatic factors such as relative humidity, air pressure, and surface temperature. Dorjsuren et al. (2018) investigated how precipitation and air temperature variability affect dam inflow in the Selenga basin. They reported that precipitation and air temperature increased, while the inflow rate showed a remarkably decreasing trend. These trends contributed to understanding the relationship between various climate factors and dam inflow rates. Lu et al. (2019) analyzed the trend between inflow and climatic factors in the Yangtze River basin in China. As a result of the analysis, it was predicted that the watershed would face a severe drought crisis as both the increasing trend of climate factors and the decreasing trend of flow were found. The importance of identifying and quantifying natural climate change through the correlation between climate factors and rainfall and flow is emerging (Martel et al. 2018; Asakereh 2020).

In addition to regional climate features, teleconnection patterns of oceanic–atmospheric oscillations are closely linked to hydrologic variables such as streamflow, precipitation, and soil moisture in specific regions around the world (Kalra et al. 2013a; Niu et al. 2014; Lee et al. 2018; Jung & Kim 2022). These previous studies revealed that teleconnection indices such as Pacific decadal oscillation (PDO), Southern Oscillation Index (SOI), El Niño–southern oscillations (ENSO), North Atlantic oscillation (NAO), and Atlantic multidecadal oscillation (AMO) are closely linked to hydrological factors around the world. To improve the efficiency of water resource management, it is necessary to plan a management strategy based on an accurate forecast with short-and long-term lead time.

Machine learning (ML) is a computer algorithm that automatically finds patterns in a given dataset without being explicitly programmed (Shavlik et al. 1990; Koza et al. 1996). With the improvement of observation systems and computing algorithms, ML models have been used to predict hydrological variables based on relationships between multiple variables (Kim et al. 2020). Moreover, ML-based models show higher predictive performance than physics-based models with fewer parameters and low computation time, and models have been demonstrated to be an effective alternative to physics-based models (Behzad et al. 2009; Han et al. 2021; Kim et al. 2022).

Many previous studies have focused on predicting streamflow using an ML-based model and various large-scale climate indices. For example, Panahi et al. (2021) applied the MLP model to predict streamflow on the east coast of peninsula Malaysia, and four indices such as PDO, ENSO, NAO, and SOI were used as inputs of the model. They showed that the ML-based model with large climate indices provided had a high ability to predict monthly streamflow. Kalra et al. (2013b) developed a support vector machine-based streamflow prediction model incorporating large climate indices such as PDO, NAO, AMO, and ENSO in the North Platte River Basin. They evaluated the prediction performance with different lead times from 3, 6, and 9 months and found the best performance at a 6-month lead time. Moreover, Kalra & Ahmad (2009) presented a support vector machine-based streamflow prediction model for long lead time forecasting using PDO, NAO, AMO, and ENSO in the Upper Colorado River Basin. They tested the proposed model to generate streamflow volumes with a three-year lead time and showed that the predictions agree with measured streamflow volumes.

Therefore, this study proposed an ML-based model incorporating six large climate indices to predict monthly inflow at three main dams in South Korea. Inflow predictions are made from 1- to 6-month in advance for three dams. In addition, we investigated the effect of input data combinations on predictive performance by applying 29 input data sets through various combinations of six climate indices (i.e., AMO, NAO, ENSO, PDO, Niño 3.4, and SOI).

Study area and weather data

In the study area, three major dams (i.e., Soyangriver dam, Daecheong dam, and Andong dam) were selected as the target points located in different watersheds in South Korea. Three gauging stations provide each monthly dam inflow and precipitation at the watershed outlet. The monthly dam inflow and precipitation data for 40 years (1981–2020) were obtained from the Korean Water Resources Management Information System (http://www.wamis.go.kr/) at three stations (black triangles in Figure 1). For 40 years, the average monthly values of inflow at Soyangriver, Daecheong, and Andong dams were 69.5 cm (max = 756.2 cm), 81.1 cm (max = 810.8 cm), and 30.5 cm (335.7 cm). In addition, the average monthly precipitation at three dams were 3.4 mm (max = 26.9 mm), 3.1 mm (max = 25.4 mm), and 3.2 mm (max = 19.9 mm). The climate of South Korea can be mainly divided into two seasons: the wet season (July–September) and the dry season (October–June). The wet season indicates a high monthly inflow with an average inflow of 158.2 cm for Soyangriver dam, 172.1 cm for Daecheong dam, and 64.9 cm for Andong dam, respectively. The average dam inflow values are approximately 2–2.5 times the overall average during the wet season.
Figure 1

Location of three dams in South Korea.

Figure 1

Location of three dams in South Korea.

Close modal

Climate variability data

In this study, six climate indices, including AMO (Enfield et al. 2001), ENSO (Curtis & Adler 2000), NAO (Barnston & Livezey 1987; van den Dool et al. 2000; Chen & Van den Dool 2003), PDO (Deser et al. 2016), Niño 3.4 (Rasmusson & Carpenter 1982), and SOI (Ropelewski & Jones, 1987), were obtained from National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC) (https://www.cpc.ncep.noaa.gov/data/indices/) and were used as input variables of the prediction model (Figure 2).
Figure 2

Time series of six large-scale climate indices from 1984 to 2020.

Figure 2

Time series of six large-scale climate indices from 1984 to 2020.

Close modal

AMO index is based on the sea surface temperature (SST) anomalies in the North Atlantic regions. It is a concise and straightforward index to provide information on multidecadal climate variability in the North Atlantic area. ENSO is the primary predictor for global climate disruptions and describes the anomalous state of tropical Pacific coupled ocean-atmosphere conditions. ENSO is an important factor that provides societal responses for water supply and public safety. NAO, a combination of the East Atlantic and West Atlantic teleconnection patterns, is one of the most critical climate variabilities. Positive and negative phases of the NAO are associated with various temperature and precipitation patterns worldwide. PDO is known as an El Niño-like pattern of Pacific climate variability. The positive and negative phases of the PDO have been classified as being either warm or cool, as defined by ocean temperature anomalies in the northeast and the tropical Pacific Ocean. Niño 3.4 is based on SST anomalies averaged across the Pacific and South American coast regions. Niño 3.4 is known as an index to define El Niño and La Niña events. SOI is based on sea level pressure (SLP) differences between a given area. SOI measures the large-scale fluctuations in air pressure between the western and eastern tropical Pacific during El Niño and La Niña events.

LSTM model

A Long Short-Term Memory (LSTM) network model, presented by Hochreiter & Schmidhuber (1997), is one of the recurrent neural network (RNN)-based algorithms. The LSTM model was introduced to solve gradient vanishing and optimization errors occurring in the RNN model and capture long-term dependence that exists at various steps in the sequential time series data (Han & Morrison 2022; Roy et al. 2022). The LSTM model is an extensible method for dealing with sequence time series data such as speech recognition and language translation. Specifically, in the hydrologic field, it has been widely used to predict various hydrological factors, such as precipitation and runoff (Kratzert et al. 2018; Han & Morrison 2022; Roy et al. 2022).

A diagram of the model structure of the LSTM cell is represented in Figure 3. In the figure, Ct and ht are cell state and hidden state of the LSTM cell at time step t and t1. Xt is the input vector at time step t. The LSTM cell contains three non-linear gates that regulate the data flow: a forget gate (ft), an input gate (it), and an output gate (Ot). These gates maintain and adjust their cell state (Ct) and hidden stat (ht). The forget gate determines which information will be held from the cell state (Ct−1) coming from the previous LSTM cell. The input gate determines which new information will be stored in the cell state (Ct). The output gate determines the final output value from the information in the cell state (Ct). The mathematical formulas of three gate, cell state, and hidden state are as follows:
(1)
(2)
(3)
(4)
(5)
(6)
where is the activation function, denotes element-wise multiplication. tanh() is the hyperbolic tangent function. is a vector with values between −1 and 1. , , , and are adjustable weight values of each gate and cell state. , , , and are adjustable bias vectors. In this study, we used Python 3.6.13 with Scikit-Learn 0.23.2 and Kas 2.3.1 libraries to perform the algorithm.
Figure 3

Conceptual diagram of the LSTM model.

Figure 3

Conceptual diagram of the LSTM model.

Close modal

Model development

The monthly observed precipitation and six climate variability indices of AMO, ENSO, NAO, PDO, Niño 3.4, and SOI with time steps from ‘t − 3’ to ‘t’ are used to predict monthly inflow from ‘t + 1’ to ‘t + 6’ (where ‘t’ is in months) for the three dams in South Korea. To investigate the influence of different input variables, we developed 29 models to consider the impact of different combinations of indices. For example, in model 1, we used only precipitation data as input for the prediction model. In addition, in models 2–7, each six climate indices are used as input of the model. Table 1 lists 29 models with different combinations of input variables. In this study, monthly data from 1981 to 2009 (29 years) are used for model learning, and data from 2009 to 2020 (12 years) are used for model validation.

Table 1

Twenty-nine models with different combinations of input variables

Models
Indices123456789101112131415
PCPa ●       ● ● ● ● ● ● ● ● 
ENSO  ●      ● ● ● ● ●  ● ● 
NAO   ●     ● ● ● ●  ● ● ● 
PDO    ●    ● ● ●  ● ● ● ● 
Niño 3.4     ●   ● ●  ● ● ● ●  
AMO      ●  ●  ● ● ● ●  ● 
SOI       ●  ● ● ● ● ●   
Models
Indices1617181920212223242526272829-
PCPa ● ● ● ● ● ● ● ● ● ● ● ● ● ●  
ENSO ● ●  ● ● ●  ● ●  ●   ●  
NAO ●  ● ● ●  ● ●  ●  ●  ●  
PDO  ● ● ●  ● ●  ● ●   ● ●  
Niño 3.4 ● ● ●  ● ● ●    ● ● ● ●  
AMO ● ● ●     ● ● ● ● ● ● ●  
SOI    ● ● ● ● ● ● ● ● ● ● ●  
Models
Indices123456789101112131415
PCPa ●       ● ● ● ● ● ● ● ● 
ENSO  ●      ● ● ● ● ●  ● ● 
NAO   ●     ● ● ● ●  ● ● ● 
PDO    ●    ● ● ●  ● ● ● ● 
Niño 3.4     ●   ● ●  ● ● ● ●  
AMO      ●  ●  ● ● ● ●  ● 
SOI       ●  ● ● ● ● ●   
Models
Indices1617181920212223242526272829-
PCPa ● ● ● ● ● ● ● ● ● ● ● ● ● ●  
ENSO ● ●  ● ● ●  ● ●  ●   ●  
NAO ●  ● ● ●  ● ●  ●  ●  ●  
PDO  ● ● ●  ● ●  ● ●   ● ●  
Niño 3.4 ● ● ●  ● ● ●    ● ● ● ●  
AMO ● ● ●     ● ● ● ● ● ● ●  
SOI    ● ● ● ● ● ● ● ● ● ● ●  

Note: models 8–13 = drop one index; models 14–28 = drop two indices; model 29 = no drop.

aPCP = observed precipitation.

Evaluation strategy

This study used two statistical metrics, root mean square error (RMSE) and correlation coefficient (CC), to evaluate modeling performance. The RMSE measures the prediction error's standard deviation (SD), indicating a difference between predicted and actual values. The CC estimates how well the model predicts the outcomes, ranging from −1 to 1. The CC value of 0 indicates no correlation between observed and predicted values. The mathematical equations of the two metrics are as follows:
(7)
(8)
where ye and yo denote the predicted and actual observation values. n indicates the number of data samples used in the evaluation.

Model training

The LSTM model was evaluated for its predictive ability after adjusting the parameters. During model training, three LSTM parameters, such as previous time steps of input data, number of LSTM neuron cells in each layer, and batch sizes in each layer, were adjusted to obtain optimal performance of the model (Xiang et al. 2020; Han & Morrison 2022). Figure 4 shows the optimization results using three parameters for three dams. To evaluate the model's sensitivity to variations in previous time steps of input data, time steps from 1 to 12 h were assessed for three dams. As shown in the figure, we achieved the best model performance for Soyangriver dam with 3 h of time steps (average CC and RMSE values of 0.35 and 106.7 cm) and for Daecheong and Andong dams with 1 h of time step (average CC and RMSE values of 0.29 and 111.7 cm for Daechong dam and 0.33 and 33.7 cm for Andong dam). We also tested the model's sensitivity to variations in batch size and the number of LSTM cells in each layer between 32 and 512. We achieved the best model performance with 32, 32, and 32 of batch sizes and 512, 64, and 512 of the number of cells for Soyangriver, Daecheong, and Andong dams.
Figure 4

Boxplots of test results with different parameters in terms of CC and RMSE. Each test includes 29 of model run. The ‘x’ marks inside the box denote the mean value and dot represents outliers.

Figure 4

Boxplots of test results with different parameters in terms of CC and RMSE. Each test includes 29 of model run. The ‘x’ marks inside the box denote the mean value and dot represents outliers.

Close modal

Dam inflow prediction using machine learning techniques

The model's predictive performance for dam inflow prediction is evaluated graphically using the Taylor diagram (Taylor 2001). Taylor diagram is one of the mathematical diagrams representing differences in reference and simulated datasets and provides visual indicators of CC, RMSE, and SD. Figure 5 illustrates the Taylor diagrams of monthly dam inflow predicted from 29 models for three stations with lead times from 1 to 6 months. As shown in Figure 5, ranges of CC, RMSE, and SD for prediction results with a lead time of 1 month are 0.6–0.7, 40–80, and 25–100 cm for Soyangriver dam, 0.4–0.6, 40–60, and 45–60 cm for Daecheong dam, and 0.4–0.5, 40–50, and 15–30 cm for Andong dam. As the lead time increased in all three dams, the prediction performance shown in the Taylor diagram showed a tendency to deteriorate.
Figure 5

Taylor diagrams indicating predicted dam inflow at three stations with lead time from 1 to 6 months.

Figure 5

Taylor diagrams indicating predicted dam inflow at three stations with lead time from 1 to 6 months.

Close modal
Figures 6 and 7 represent the time series and boxplots of dam inflow (1 and 6 h lead time) predicted from four models with the best prediction performance among 29 models. At the Soyangriver dam, four models (i.e., M14, M15, M18, and M19) have the best predictive ability. They have prediction results with average CC and RMSE values of 0.67 and 79.3 cm for a lead time of 1 h and 0.56 and 89.4 cm for 6 h. At the Daecheong dam, four models (i.e., M01, M20, M23, and M27) show the best performance with average CC and RMSE values of 0.57 and 90.8 cm for a lead time of 1 h 0.52 and 102.4 cm for 6 h. In addition, at the Andong dam, four models (i.e., M01, M09, M23, and M27) show the best performance with average CC and RMSE values of 0.51 and 30.9 cm for a lead time of 1 h, and 0.52 and 31.7 cm for 6 h.
Figure 6

Time series and boxplots of observed and predicted dam inflows with a lead time of 1 month at three stations.

Figure 6

Time series and boxplots of observed and predicted dam inflows with a lead time of 1 month at three stations.

Close modal
Figure 7

Time series and boxplots of observed and predicted dam inflows with a lead time of 6 months at three stations.

Figure 7

Time series and boxplots of observed and predicted dam inflows with a lead time of 6 months at three stations.

Close modal

As shown in the Taylor diagram (Figure 5), the results of the time series and box plots also represent that the prediction ability of the models decreased as the lead time increased. Among the three dams, models for the Soyangriver dam have shown the highest predictive performance, and models for Daecheong and Andong dams have similar performances. Regardless of lead time, all models have high performance in predicting inflow patterns compared to the observed patterns but showed a shortcoming in underestimating the amounts of inflows for high inflow values.

Seasonal evaluation

The operation of the dam has a different purpose for each season. For example, they are discharging (or securing) water resources for flood management (or drought prevention) during the wet–dry seasons. Thus, a seasonal evaluation of the dam inflow is required for efficient dam management. The performances of four models selected in Section 3.2 were evaluated for inflow prediction during wet and dry seasons (June–September for the wet season and October–May for the dry season).

Three dams showed different accuracy in inflow prediction for both seasons (Figure 8). For example, Soyangriver and Daecheong dams have better accuracy in inflow prediction for the wet season than the dry season, whereas the Andong dam has a high prediction ability during the dry season. The highest CC values in the three dams are 0.59, 0.44, and 0.45 at Soyangriver, Daecheong, and Andong dams. As mentioned in the previous section, the predicted inflow during the wet season was underestimated than observations, and it may cause low CC values at each dam. More specifically, the observed inflows range from 0 to 700 cm at Soyangriver and Daecheong dams, from 0 to 300 cm at Andong dam. However, the range of inflows predicted from the model was only half that of the observed values at each dam.
Figure 8

Scatter plots of predicted and observed dam inflows at three dams during wet (blue) and dry (red) seasons. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.012.

Figure 8

Scatter plots of predicted and observed dam inflows at three dams during wet (blue) and dry (red) seasons. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.012.

Close modal

Impact of large-scale climate indices on dam inflow variability

Previous studies show that various large-scale climate indices have influenced hydrological processes such as precipitation and runoff globally. The application in this study presented that large-scale climate index has also correlated with dam inflow (i.e., runoff) in South Korea. Our model indicated acceptable accuracy in predicting monthly inflows with 0.5–0.7 of CC and 40–80 cm of RMSE up to 6 months ahead. Compared with other studies focused on other regions, they have a similar predictive performance to our models. For example, Kalra & Ahmad (2009) presented a machine learning-based streamflow model in the Colorado River basin, US. They showed high accuracy in predicting annual streamflow with CC ranging from 0.5 to 0.8 and Nash Sutcliffe Efficiency (NSE) ranging from 0.2 to 0.3. In addition, Lee et al. (2020) investigated the applicability of ML algorithms to predict monthly dam inflow with a 3-month lead time in South Korea. The suggested models have predictive performance with CC ranging from 0.6 to 0.9 and NSE values over 0.5. From the previous studies, although the prediction performances of each region and model are slightly different, it was informed that the large-scale climate variability has significantly correlated with regional hydrological processes. Thus, we suggested that if large-scale climate indices, along with regional climate and topographical characteristics, are considered predictors of the data-driven models, it will be adequate to analyze and understand the hydrological process.

Limitation of data scarcity when using a deep learning approach

A data-driven approach such as deep learning and ML requires a large amount of data. Thus, there is a limit to applying the data-driven methods to areas with insufficient data. This lowers the model's accuracy and may also present inappropriate prediction results. Large-scale climate indices can be helpful to solve the problem of regional data scarcity. As the inputs of the data-driven model, the indices have sufficient value that can be used to identify patterns of hydrological factors. However, data with a low correlation with the target variable is instead the cause of the deterioration of predictive performance. This is because this study tested various combinations of six indices as inputs. In addition, further investigation and understanding of other climate indices which are not considered in this study are required for better prediction performance.

The model proposed in this paper uses the monthly large-scale climate indices to predict monthly dam inflow with a lead time from 1 to 6 months in the future for three dams in South Korea. Different combinations of indices were considered as input datasets of the model. The main results are as follows:

  • In this study, three LSTM parameters, including previous time steps of input data, number of LSTM neuron cells in each layer, and batch sizes in each layer, were adjusted to obtain optimal performance of the model. We achieved the best model performance with 1–3 h of previous time steps, with 32 of batch sizes and 64–512 of the number of cells for three dams.

  • The ranges of CC and RMSE for prediction results from 29 models with a lead time of 1 month are 0.4–0.7 and 40–80 cm for three dams. The prediction performances are low as the lead time increases in all three dams.

  • Three dams showed different prediction results for wet and dry seasons. Soyangriver and Daecheong dams have better accuracy in prediction for the wet season than the dry season, whereas the Andong dam has a high prediction ability during the dry season. Therefore, it is required to use the prediction results of dam inflow considering the seasonal performance of the models presented in this study.

This study demonstrates that the LSTM model has sufficient abilities to predict the inflow of dams using large-scale climate indices in South Korea. Although the prediction performance may be slightly lower compared to when regional data are used, the model presented in this study has sufficient value to be used for regions with temporal and spatial limitations of ground-based observatories. Even though only one data-driven model and six climate indices were used in this study, we intend to apply other types of data-driven models and indices to improve the accuracy of hydrological prediction in the future.

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grant Number 2022R1A6A3A01086229).

All authors contributed to the study conception and design. H.H. and H.S.K. conceptualized the study; H.H., D.K., and W.W. prepared the methodology; wrote and prepared the original draft; H.H. and D.K. did formal analysis; H.H. and W.W. prepared the theoretical background and helped in visualization; H.S.K. wrote, reviewed, and edited the article. All authors read and approved the final manuscript.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Asakereh
H.
2020
Decadal variation in precipitation regime in northwest of Iran
.
Theoretical and Applied Climatology
139
(
1
),
461
471
.
https://doi.org/10.1007/s00704-019-02984-9
.
Awan
J. A.
&
Bae
D. H.
2014
Improving ANFIS based model for long-term dam inflow prediction by incorporating monthly rainfall forecasts
.
Water Resources Management
28
(
5
),
1185
1199
.
https://doi.org/10.1007/s11269-014-0512-7
.
Bae
D. H.
,
Jung
I. W.
&
Chang
H.
2008
Potential changes in Korean water resources estimated by high-resolution climate simulation
.
Climate Research
35
(
3
),
213
226
.
https://doi.org/10.3354/cr00704
.
Bae
Y.
,
Kim
J.
,
Wang
W.
,
Yoo
Y.
,
Jung
J.
&
Kim
H. S.
2019
Monthly inflow forecasting of Soyang River dam using VARMA and machine learning models
.
Journal of Climate Research
14
(
3
),
183
198
.
Barnston
A. G.
&
Livezey
R. E.
1987
Classification, seasonality and persistence of low-frequency atmospheric circulation patterns
.
Monthly Weather Review
115
(
6
),
1083
1126
.
https://doi.org/10.1175/1520-0493(1987)115<1083:CSAPOL > 2.0.CO;2
.
Behzad
M.
,
Asghari
K.
,
Eazi
M.
&
Palhang
M.
2009
Generalization performance of support vector machines and neural networks in runoff modeling
.
Expert Systems with Applications
36
(
4
),
7624
7629
.
https://doi.org/10.1016/j.eswa.2008.09.053
.
Chen
W. Y.
&
Van den Dool
H.
2003
Sensitivity of teleconnection patterns to the sign of their primary action center
.
Monthly Weather Review
131
(
11
),
2885
2899
.
https://doi.org/10.1175/1520-0493(2003)131<2885:SOTPTT > 2.0.CO;2
.
Curtis
S.
&
Adler
R.
2000
ENSO indices based on patterns of satellite-derived precipitation
.
Journal of Climate
13
(
15
),
2786
2793
.
https://doi.org/10.1175/1520-0442(2000)013<2786:EIBOPO > 2.0.CO;2
.
Deser
C.
,
Trenberth
K.
&
Staff
N. C. F. A. R.
2016
The Climate Data Guide: Pacific Decadal Oscillation(PDO): Definition and Indices
.
NC f. A. Research. Ed
.
Dorjsuren
B.
,
Yan
D.
,
Wang
H.
,
Chonokhuu
S.
,
Enkhbold
A.
, Yiran, X., Girma, A., Gedefaw, M. & Abiyu, A.
2018
Observed trends of climate and river discharge in Mongolia's Selenga sub-basin of the lake Baikal basin
.
Water
10
(
10
),
1436
.
https://doi.org/10.3390/w10101436
.
Enfield
D. B.
,
Mestas-Nuñez
A. M.
&
Trimble
P. J.
2001
The Atlantic multidecadal oscillation and its relation to rainfall and river flows in the continental US
.
Geophysical Research Letters
28
(
10
),
2077
2080
.
https://doi.org/10.1029/2000GL012745
.
Eom
J.
&
Jung
K.
2019
Estimation of hourly dam inflow using time series data
.
Journal of the Korean Society of Hazard Mitigation
19
(
2
),
163
168
.
Han
H.
&
Morrison
R. R.
2022
Improved runoff forecasting performance through error predictions using a deep-learning approach
.
Journal of Hydrology
608
,
127653
.
https://doi.org/10.1016/j.jhydrol.2022.127653
.
Han
H.
,
Choi
C.
,
Jung
J.
&
Kim
H. S.
2021
Application of sequence to sequence learning based LSTM model (LSTM-s2s) for forecasting dam inflow
.
Journal of Korea Water Resources Association
54
(
3
),
157
166
.
https://doi.org/10.3741/JKWRA.2021.54.3.157
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
Hong
J.
,
Lee
S.
,
Bae
J. H.
,
Lee
J.
,
Park
W. J.
, Lee, D., Kim, J. & Lim, K. J.
2020
Development and evaluation of the combined machine learning models for the prediction of dam inflow
.
Water
12
(
10
),
2927
.
https://doi.org/10.3390/w12102927
.
Jung
J.
&
Kim
H. S.
2022
Predicting temperature and precipitation during the flood season based on teleconnection
.
Geoscience Letters
9
(
1
),
1
37
.
https://doi.org/10.1186/s40562-022-00212-3
.
Jung
I. W.
,
Bae
D. H.
&
Lee
B. J.
2013
Possible change in Korean streamflow seasonality based on multi-model climate projections
.
Hydrological Processes
27
(
7
),
1033
1045
.
https://doi.org/10.1002/hyp.9215
.
Kalra
A.
&
Ahmad
S.
2009
Using oceanic-atmospheric oscillations for long lead time streamflow forecasting
.
Water Resources Research
45
(
3
).
https://doi.org/10.1029/2008WR006855
.
Kalra
A.
,
Ahmad
S.
&
Nayak
A.
2013a
Increasing streamflow forecast lead time for snowmelt-driven catchment based on large-scale climate patterns
.
Advances in Water Resources
53
,
150
162
.
https://doi.org/10.1016/j.advwatres.2012.11.003
.
Kalra
A.
,
Li
L.
,
Li
X.
&
Ahmad
S.
2013b
Improving streamflow forecast lead time using oceanic-atmospheric oscillations for Kaidu river basin, Xinjiang, China
.
Journal of Hydrologic Engineering
18
(
8
),
1031
1040
.
http://dx.doi.org/10.1061/(ASCE)HE.1943-5584.0000707
.
Kim
S. H.
,
So
J. M.
,
Kang
S. U.
&
Bae
D. H.
2017
Development and evaluation of dam inflow prediction method based on Bayesian method
.
Journal of Korea Water Resources Association
50
(
7
),
489
502
.
Kim
D.
,
Kim
J.
,
Kwak
J.
,
Necesito
I. V.
,
Kim
J.
&
Kim
H. S.
2020
Development of water level prediction models using deep neural network in mountain wetlands
.
Journal of Wetlands Research
22
(
2
),
106
112
.
https://doi.org/10.17663/JWR.2020.22.2.106
.
Kim
D.
,
Lee
J.
,
Kim
J.
,
Lee
M.
,
Wang
W.
&
Kim
H. S.
2022
Comparative analysis of long short-term memory and storage function model for flood water level forecasting of Bokha stream in NamHan River, Korea
.
Journal of Hydrology
606
,
127415
.
https://doi.org/10.1016/j.jhydrol.2021.127415
.
Koza
J. R.
,
Bennett
F. H.
,
Andre
D.
&
Keane
M. A.
1996
Automated design of both the topology and sizing of analog electrical circuits using genetic programming
. In:
Artificial Intelligence in design'96
.
Springer
,
Dordrecht
, pp.
151
170
.
https://doi.org/10.1007/978-94-009-0279-4_9
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
2018
Rainfall–runoff modelling using long short-term memory (LSTM) networks
.
Hydrology and Earth System Sciences
22
(
11
),
6005
6022
.
https://doi.org/10.5194/hess-22-6005-2018
.
Lee
J. H.
,
Lee
J. H.
&
Julien
P. Y.
2018
Global climate teleconnection with rainfall erosivity in South Korea
.
Catena
167
,
28
43
.
https://doi.org/10.1016/j.catena.2018.03.008
.
Lee
M. H.
,
Im
E. S.
&
Bae
D. H.
2019
Future projection in inflow of major multi-purpose dams in South Korea
.
Journal of Wetlands Research
21
(
spc
),
107
116
.
https://doi.org/10.17663/JWR.2019.21.s-1.107
.
Lee
D.
,
Kim
H.
,
Jung
I.
&
Yoon
J.
2020
Monthly reservoir inflow forecasting for dry period using teleconnection indices: a statistical ensemble approach
.
Applied Sciences
10
(
10
),
3470
.
https://doi.org/10.3390/app10103470
.
Lu
J.
,
Wang
G.
,
Gong
T.
,
Hagan
D. F. T.
,
Wang
Y.
,
Jiang
T.
&
Su
B.
2019
Changes of actual evapotranspiration and its components in the Yangtze River valley during 1980–2014 from satellite assimilation product
.
Theoretical and Applied Climatology
138
(
3
),
1493
1510
.
https://doi.org/10.1007/s00704-019-02913-w
.
Martel
J. L.
,
Mailhot
A.
,
Brissette
F.
&
Caya
D.
2018
Role of natural climate variability in the detection of anthropogenic climate change signal for mean and extreme precipitation at local and regional scales
.
Journal of Climate
31
(
11
),
4241
4263
.
https://doi.org/10.1175/JCLI-D-17-0282.1
.
Niu
J.
,
Chen
J.
&
Sivakumar
B.
2014
Teleconnection analysis of runoff and soil moisture over the Pearl River basin in southern China
.
Hydrology and Earth System Sciences
18
(
4
),
1475
1492
.
https://doi.org/10.5194/hess-18-1475-2014
.
Noorbeh
P.
,
Roozbahani
A.
&
Kardan Moghaddam
H.
2020
Annual and monthly dam inflow prediction using Bayesian networks
.
Water Resources Management
34
(
9
),
2933
2951
.
https://doi.org/10.1007/s11269-020-02591-8
.
Panahi
F.
,
Ehteram
M.
,
Ahmed
A. N.
,
Huang
Y. F.
,
Mosavi
A.
&
El-Shafie
A.
2021
Streamflow prediction with large climate indices using several hybrid multilayer perceptrons and copula Bayesian model averaging
.
Ecological Indicators
133
,
108285
.
https://doi.org/10.1016/j.ecolind.2021.108285
.
Rasmusson
E. M.
&
Carpenter
T. H.
1982
Variations in tropical sea surface temperature and surface wind fields associated with the Southern Oscillation/El Niño
.
Monthly Weather Review
110
(
5
),
354
384
.
https://doi.org/10.1175/1520-0493(1982)110<0354:VITSST > 2.0.CO;2
.
Ropelewski
C. F.
&
Jones
P. D.
1987
An extension of the Tahiti-Darwin southern oscillation index
.
Monthly Weather Review
115
(
9
),
2161
2165
.
Roy
D. K.
,
Sarkar
T. K.
,
Kamar
S. S. A.
,
Goswami
T.
,
Muktadir
M. A.
, Al-Ghobari, H. M., Alataway, A., Dewidar, A. Z., EI-Shafei, A. A. & Mattar, M. A.
2022
Daily prediction and multi-step forward forecasting of reference evapotranspiration using LSTM and Bi-LSTM models
.
Agronomy
12
(
3
),
594
.
https://doi.org/10.3390/agronomy12030594
.
Shavlik
J. W.
,
Dietterich
T.
&
Dietterich
T. G.
1990
Readings in Machine Learning
.
California, Morgan Kaufmann
.
Tao
H.
,
Gemmer
M.
,
Bai
Y.
,
Su
B.
&
Mao
W.
2011
Trends of streamflow in the Tarim River Basin during the past 50 years: human impact or climate change?
Journal of Hydrology
400
(
1–2
),
1
9
.
https://doi.org/10.1016/j.jhydrol.2011.01.016
.
Taylor
K. E.
2001
Summarizing multiple aspects of model performance in a single diagram
.
Journal of Geophysical Research: Atmospheres
106
(
D7
),
7183
7192
.
https://doi.org/10.1029/2000JD900719
.
Van den Dool
H. M.
,
Saha
S.
&
Johansson
A.
2000
Empirical orthogonal teleconnections
.
Journal of Climate
13
(
8
),
1421
1435
.
https://doi.org/10.1175/1520-0442(2000)013%3C1421:EOT%3E2.0.CO;2
.
Xiang
Z.
,
Yan
J.
&
Demir
I.
2020
A rainfall-runoff model with LSTM-based sequence-to-sequence learning
.
Water Resources Research
56
(
1
),
e2019WR025326
.
https://doi.org/10.1029/2019WR025326
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).