Abstract
Accurate prediction of dam inflows is essential for effective water resources management in terms of both water quantity and quality. This study aims to develop a Long Short-Term Memory (LSTM) deep learning-based monthly dam inflow prediction model using large-scale climate indices. Six climate indices, Atlantic multidecadal oscillation (AMO), El Niño–southern oscillations (ENSO), North Atlantic oscillation (NAO), Pacific decadal oscillation (PDO), Niño 3.4, and Southern Oscillation Index (SOI) for the period of 1981–2020, were used as input variables of the model. The proposed model was trained with 29 years of data (1981–2009) and tested with 12 years of data (2009–2020). We investigated 29 input data combinations to evaluate the predictive performance according to different input datasets. The model showed the average values of metrics ranged from 0.5 to 0.6 for CC and from 40 to 80 cm for root mean square error (RMSE) at three dams. The prediction results from the model showed lower performance as the lead time increased. Also, each dam showed different prediction results for different seasons. For example, Soyangriver/Daecheong dams have better accuracy in prediction for the wet season than the dry season, whereas the Andong dam has a high prediction ability during the dry season. These investigations can be used for better efficient dam management using a data-driven approach.
HIGHLIGHTS
A dam inflow prediction model was developed using the LSTM-based deep learning method and climate indices.
Six climate indices including AMO, ENSO, NAO, PDO, Niño 3.4, and SOI are considered as input variables.
The proposed model tests 29 different input combinations to find the best combinations for prediction.
The proposed model shows the applicability to predict dam inflow variability in different locations.
Graphical Abstract
INTRODUCTION
Climate change and global warming have influenced hydrological cycles to become more complex, they also play a critical role in causing uncertainty in water resource planning. One of the topographical and meteorological characteristics of South Korea is that the mountainous area is high, the slope of the river is steep, and 60–70% of total annual precipitation is observed in the rainy season (June–August). Therefore, it is necessary to manage water resources according to seasonal water demand efficiently. To overcome the geographical and unfavorable climatic constraints to water resource management, the Korean government secures water resources by constructing and operating dams for various purposes such as the generation of hydroelectric power, flood damage mitigation, and water supply for the agricultural system (Lee et al. 2019).
In fact, according to the sixth report of the IPCC published in 2021, it is predicted that the frequency and severity of floods and droughts around the world will increase, and the variability of precipitation patterns and surface water flow will intensify under the scenario that global warming will continue in the future. Similarly, in South Korea, many pieces of research have presented that seasonal variability in precipitation and runoff intensifies because of climate change (Bae et al. 2008; Jung et al. 2013) and El Niño and La Niña (Jung & Kim 2022). The previous study analyzed that the average annual runoff would increase due to increased precipitation and runoff during the rainy season. Thus, it is essential to present an efficient water resource management plan considering the climate variability of dam operations to mitigate natural disasters such as floods and droughts in the future.
In South Korea, where precipitation is generally concentrated in summer due to the monsoon climate, dams are operated so that water is stored in the dam during the rainy season and discharged the water during the dry season. Therefore, the management plan is to lower the water level to the flood level before the rainy season and then fill it up before the dry season. However, there is a risk of drought due to insufficient storage and flood damage due to inaccurate discharge if the dam inflow rate is not accurately predicted. Recently, the severity of drought and floods in dry and rainy seasons has increased due to climate change. Thus, accurate dam inflow prediction is essential to manage and operate multiple dams in South Korea efficiently.
There are many research papers about dam inflow prediction using diverse statistical methods (Kim et al. 2017; Bae et al. 2019; Hong et al. 2020; Han et al. 2021). For example, Awan & Bae (2014) developed an Adaptive Neuro-Fuzzy Inference System (ANFIS)-based model for dam inflow prediction for six major dams, including the Andong dam and Chungju dam in South Korea, using monthly precipitation, relative humidity, and air temperature. Eom & Jung (2019) conducted multiple linear regression analyses based on time series data for the Seomjin River dam. They evaluated the accuracy of prediction results for short-term inflow, and peak inflow is high when hourly precipitation, discharge, and antecedent inflow rate were used as predictors. Noorbeh et al. (2020) used Bayesian Networks (BN) for monthly and annual inflow prediction. They resulted in integrating continuous BN, discrete BN, and K-means cluster analysis, the range of inflow was predicted better than when the traditional statistical method was used. Hong et al. (2020) used six machine learning-based algorithms to predict the inflow of the Soyangriver dam in South Korea. As a result of the analysis, multilayer perceptron (MLP) was the optimal algorithm. Still, there was a limit to accurately predicting dam inflows, so the optimal result was obtained when several algorithms were used together.
It is essential to investigate the correlation between dam inflow and climate factors. Tao et al. (2011) conducted a trend analysis between various climatic factors and dam inflow rates in the Tarim River basin in China. The results showed that the inflow rate increases under climatic factors such as relative humidity, air pressure, and surface temperature. Dorjsuren et al. (2018) investigated how precipitation and air temperature variability affect dam inflow in the Selenga basin. They reported that precipitation and air temperature increased, while the inflow rate showed a remarkably decreasing trend. These trends contributed to understanding the relationship between various climate factors and dam inflow rates. Lu et al. (2019) analyzed the trend between inflow and climatic factors in the Yangtze River basin in China. As a result of the analysis, it was predicted that the watershed would face a severe drought crisis as both the increasing trend of climate factors and the decreasing trend of flow were found. The importance of identifying and quantifying natural climate change through the correlation between climate factors and rainfall and flow is emerging (Martel et al. 2018; Asakereh 2020).
In addition to regional climate features, teleconnection patterns of oceanic–atmospheric oscillations are closely linked to hydrologic variables such as streamflow, precipitation, and soil moisture in specific regions around the world (Kalra et al. 2013a; Niu et al. 2014; Lee et al. 2018; Jung & Kim 2022). These previous studies revealed that teleconnection indices such as Pacific decadal oscillation (PDO), Southern Oscillation Index (SOI), El Niño–southern oscillations (ENSO), North Atlantic oscillation (NAO), and Atlantic multidecadal oscillation (AMO) are closely linked to hydrological factors around the world. To improve the efficiency of water resource management, it is necessary to plan a management strategy based on an accurate forecast with short-and long-term lead time.
Machine learning (ML) is a computer algorithm that automatically finds patterns in a given dataset without being explicitly programmed (Shavlik et al. 1990; Koza et al. 1996). With the improvement of observation systems and computing algorithms, ML models have been used to predict hydrological variables based on relationships between multiple variables (Kim et al. 2020). Moreover, ML-based models show higher predictive performance than physics-based models with fewer parameters and low computation time, and models have been demonstrated to be an effective alternative to physics-based models (Behzad et al. 2009; Han et al. 2021; Kim et al. 2022).
Many previous studies have focused on predicting streamflow using an ML-based model and various large-scale climate indices. For example, Panahi et al. (2021) applied the MLP model to predict streamflow on the east coast of peninsula Malaysia, and four indices such as PDO, ENSO, NAO, and SOI were used as inputs of the model. They showed that the ML-based model with large climate indices provided had a high ability to predict monthly streamflow. Kalra et al. (2013b) developed a support vector machine-based streamflow prediction model incorporating large climate indices such as PDO, NAO, AMO, and ENSO in the North Platte River Basin. They evaluated the prediction performance with different lead times from 3, 6, and 9 months and found the best performance at a 6-month lead time. Moreover, Kalra & Ahmad (2009) presented a support vector machine-based streamflow prediction model for long lead time forecasting using PDO, NAO, AMO, and ENSO in the Upper Colorado River Basin. They tested the proposed model to generate streamflow volumes with a three-year lead time and showed that the predictions agree with measured streamflow volumes.
Therefore, this study proposed an ML-based model incorporating six large climate indices to predict monthly inflow at three main dams in South Korea. Inflow predictions are made from 1- to 6-month in advance for three dams. In addition, we investigated the effect of input data combinations on predictive performance by applying 29 input data sets through various combinations of six climate indices (i.e., AMO, NAO, ENSO, PDO, Niño 3.4, and SOI).
MATERIALS AND METHODS
Study area and weather data
Climate variability data
AMO index is based on the sea surface temperature (SST) anomalies in the North Atlantic regions. It is a concise and straightforward index to provide information on multidecadal climate variability in the North Atlantic area. ENSO is the primary predictor for global climate disruptions and describes the anomalous state of tropical Pacific coupled ocean-atmosphere conditions. ENSO is an important factor that provides societal responses for water supply and public safety. NAO, a combination of the East Atlantic and West Atlantic teleconnection patterns, is one of the most critical climate variabilities. Positive and negative phases of the NAO are associated with various temperature and precipitation patterns worldwide. PDO is known as an El Niño-like pattern of Pacific climate variability. The positive and negative phases of the PDO have been classified as being either warm or cool, as defined by ocean temperature anomalies in the northeast and the tropical Pacific Ocean. Niño 3.4 is based on SST anomalies averaged across the Pacific and South American coast regions. Niño 3.4 is known as an index to define El Niño and La Niña events. SOI is based on sea level pressure (SLP) differences between a given area. SOI measures the large-scale fluctuations in air pressure between the western and eastern tropical Pacific during El Niño and La Niña events.
LSTM model
A Long Short-Term Memory (LSTM) network model, presented by Hochreiter & Schmidhuber (1997), is one of the recurrent neural network (RNN)-based algorithms. The LSTM model was introduced to solve gradient vanishing and optimization errors occurring in the RNN model and capture long-term dependence that exists at various steps in the sequential time series data (Han & Morrison 2022; Roy et al. 2022). The LSTM model is an extensible method for dealing with sequence time series data such as speech recognition and language translation. Specifically, in the hydrologic field, it has been widely used to predict various hydrological factors, such as precipitation and runoff (Kratzert et al. 2018; Han & Morrison 2022; Roy et al. 2022).











Model development
The monthly observed precipitation and six climate variability indices of AMO, ENSO, NAO, PDO, Niño 3.4, and SOI with time steps from ‘t − 3’ to ‘t’ are used to predict monthly inflow from ‘t + 1’ to ‘t + 6’ (where ‘t’ is in months) for the three dams in South Korea. To investigate the influence of different input variables, we developed 29 models to consider the impact of different combinations of indices. For example, in model 1, we used only precipitation data as input for the prediction model. In addition, in models 2–7, each six climate indices are used as input of the model. Table 1 lists 29 models with different combinations of input variables. In this study, monthly data from 1981 to 2009 (29 years) are used for model learning, and data from 2009 to 2020 (12 years) are used for model validation.
Twenty-nine models with different combinations of input variables
. | Models . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Indices . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . | 12 . | 13 . | 14 . | 15 . |
PCPa | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
ENSO | ● | ● | ● | ● | ● | ● | ● | ● | |||||||
NAO | ● | ● | ● | ● | ● | ● | ● | ● | |||||||
PDO | ● | ● | ● | ● | ● | ● | ● | ● | |||||||
Niño 3.4 | ● | ● | ● | ● | ● | ● | ● | ||||||||
AMO | ● | ● | ● | ● | ● | ● | ● | ||||||||
SOI | ● | ● | ● | ● | ● | ● | |||||||||
. | Models . | ||||||||||||||
Indices . | 16 . | 17 . | 18 . | 19 . | 20 . | 21 . | 22 . | 23 . | 24 . | 25 . | 26 . | 27 . | 28 . | 29 . | - . |
PCPa | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |
ENSO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
NAO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
PDO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
Niño 3.4 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||||
AMO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||||
SOI | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |
. | Models . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Indices . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . | 12 . | 13 . | 14 . | 15 . |
PCPa | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
ENSO | ● | ● | ● | ● | ● | ● | ● | ● | |||||||
NAO | ● | ● | ● | ● | ● | ● | ● | ● | |||||||
PDO | ● | ● | ● | ● | ● | ● | ● | ● | |||||||
Niño 3.4 | ● | ● | ● | ● | ● | ● | ● | ||||||||
AMO | ● | ● | ● | ● | ● | ● | ● | ||||||||
SOI | ● | ● | ● | ● | ● | ● | |||||||||
. | Models . | ||||||||||||||
Indices . | 16 . | 17 . | 18 . | 19 . | 20 . | 21 . | 22 . | 23 . | 24 . | 25 . | 26 . | 27 . | 28 . | 29 . | - . |
PCPa | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |
ENSO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
NAO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
PDO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||||
Niño 3.4 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||||
AMO | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||||
SOI | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |
Note: models 8–13 = drop one index; models 14–28 = drop two indices; model 29 = no drop.
aPCP = observed precipitation.
Evaluation strategy
RESULTS
Model training
Boxplots of test results with different parameters in terms of CC and RMSE. Each test includes 29 of model run. The ‘x’ marks inside the box denote the mean value and dot represents outliers.
Boxplots of test results with different parameters in terms of CC and RMSE. Each test includes 29 of model run. The ‘x’ marks inside the box denote the mean value and dot represents outliers.
Dam inflow prediction using machine learning techniques
Taylor diagrams indicating predicted dam inflow at three stations with lead time from 1 to 6 months.
Taylor diagrams indicating predicted dam inflow at three stations with lead time from 1 to 6 months.
Time series and boxplots of observed and predicted dam inflows with a lead time of 1 month at three stations.
Time series and boxplots of observed and predicted dam inflows with a lead time of 1 month at three stations.
Time series and boxplots of observed and predicted dam inflows with a lead time of 6 months at three stations.
Time series and boxplots of observed and predicted dam inflows with a lead time of 6 months at three stations.
As shown in the Taylor diagram (Figure 5), the results of the time series and box plots also represent that the prediction ability of the models decreased as the lead time increased. Among the three dams, models for the Soyangriver dam have shown the highest predictive performance, and models for Daecheong and Andong dams have similar performances. Regardless of lead time, all models have high performance in predicting inflow patterns compared to the observed patterns but showed a shortcoming in underestimating the amounts of inflows for high inflow values.
Seasonal evaluation
The operation of the dam has a different purpose for each season. For example, they are discharging (or securing) water resources for flood management (or drought prevention) during the wet–dry seasons. Thus, a seasonal evaluation of the dam inflow is required for efficient dam management. The performances of four models selected in Section 3.2 were evaluated for inflow prediction during wet and dry seasons (June–September for the wet season and October–May for the dry season).
Scatter plots of predicted and observed dam inflows at three dams during wet (blue) and dry (red) seasons. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.012.
Scatter plots of predicted and observed dam inflows at three dams during wet (blue) and dry (red) seasons. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2023.012.
DISCUSSIONS
Impact of large-scale climate indices on dam inflow variability
Previous studies show that various large-scale climate indices have influenced hydrological processes such as precipitation and runoff globally. The application in this study presented that large-scale climate index has also correlated with dam inflow (i.e., runoff) in South Korea. Our model indicated acceptable accuracy in predicting monthly inflows with 0.5–0.7 of CC and 40–80 cm of RMSE up to 6 months ahead. Compared with other studies focused on other regions, they have a similar predictive performance to our models. For example, Kalra & Ahmad (2009) presented a machine learning-based streamflow model in the Colorado River basin, US. They showed high accuracy in predicting annual streamflow with CC ranging from 0.5 to 0.8 and Nash Sutcliffe Efficiency (NSE) ranging from 0.2 to 0.3. In addition, Lee et al. (2020) investigated the applicability of ML algorithms to predict monthly dam inflow with a 3-month lead time in South Korea. The suggested models have predictive performance with CC ranging from 0.6 to 0.9 and NSE values over 0.5. From the previous studies, although the prediction performances of each region and model are slightly different, it was informed that the large-scale climate variability has significantly correlated with regional hydrological processes. Thus, we suggested that if large-scale climate indices, along with regional climate and topographical characteristics, are considered predictors of the data-driven models, it will be adequate to analyze and understand the hydrological process.
Limitation of data scarcity when using a deep learning approach
A data-driven approach such as deep learning and ML requires a large amount of data. Thus, there is a limit to applying the data-driven methods to areas with insufficient data. This lowers the model's accuracy and may also present inappropriate prediction results. Large-scale climate indices can be helpful to solve the problem of regional data scarcity. As the inputs of the data-driven model, the indices have sufficient value that can be used to identify patterns of hydrological factors. However, data with a low correlation with the target variable is instead the cause of the deterioration of predictive performance. This is because this study tested various combinations of six indices as inputs. In addition, further investigation and understanding of other climate indices which are not considered in this study are required for better prediction performance.
CONCLUSIONS
The model proposed in this paper uses the monthly large-scale climate indices to predict monthly dam inflow with a lead time from 1 to 6 months in the future for three dams in South Korea. Different combinations of indices were considered as input datasets of the model. The main results are as follows:
In this study, three LSTM parameters, including previous time steps of input data, number of LSTM neuron cells in each layer, and batch sizes in each layer, were adjusted to obtain optimal performance of the model. We achieved the best model performance with 1–3 h of previous time steps, with 32 of batch sizes and 64–512 of the number of cells for three dams.
The ranges of CC and RMSE for prediction results from 29 models with a lead time of 1 month are 0.4–0.7 and 40–80 cm for three dams. The prediction performances are low as the lead time increases in all three dams.
Three dams showed different prediction results for wet and dry seasons. Soyangriver and Daecheong dams have better accuracy in prediction for the wet season than the dry season, whereas the Andong dam has a high prediction ability during the dry season. Therefore, it is required to use the prediction results of dam inflow considering the seasonal performance of the models presented in this study.
This study demonstrates that the LSTM model has sufficient abilities to predict the inflow of dams using large-scale climate indices in South Korea. Although the prediction performance may be slightly lower compared to when regional data are used, the model presented in this study has sufficient value to be used for regions with temporal and spatial limitations of ground-based observatories. Even though only one data-driven model and six climate indices were used in this study, we intend to apply other types of data-driven models and indices to improve the accuracy of hydrological prediction in the future.
ACKNOWLEDGEMENTS
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grant Number 2022R1A6A3A01086229).
AUTHOR CONTRIBUTION
All authors contributed to the study conception and design. H.H. and H.S.K. conceptualized the study; H.H., D.K., and W.W. prepared the methodology; wrote and prepared the original draft; H.H. and D.K. did formal analysis; H.H. and W.W. prepared the theoretical background and helped in visualization; H.S.K. wrote, reviewed, and edited the article. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.