This paper aims to evaluate two machine learning (ML) algorithms, namely, convolutional neural network (CNN) and long short-term memories (LSTM) deep learning algorithms, to predict the hydrological regime of the 3S River Basin under various climate change scenarios. Climate models CMCC-CMS, HadGEM-AO2, and MIROC5 were used to predict future climate and streamflow for three future periods: near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100) under two Representative Concentration Pathways (RCPs) 4.5 and 8.5. The future projection shows an increase in mean annual temperature from 0.08 to 4.3 °C by CMCC-CMS, from 0.13 to 4.4 °C by HadGEM-AO2, and −0.07 to 4.2 °C MIROC5 models. Similarly, the annual precipitation is projected to fluctuate from 13.3 to 62.5% by CMCC-CMS, from −12.4 to 26.1% by HadGEM-AO2, and from 6.9 to 49% by the MIROC5 model. The 3S River Basin expects an increasing trend in streamflow in the Srepok and Sesan Rivers, while the Sekong is projected to have reduced streamflow. ML models predicted the increasing flood risk in the Sekong and Sesan catchments with the increase of the Q5 index in the future but a decrease in the Srepok.

  • Machine learning (ML) models were used to predict the hydrological changes in the 3S River Basin.

  • The 3S River Basin is expected to be warmer and have fluctuation in rainfall patterns in the future.

  • The Srepok and Sesan Rivers are expected to have an increasing trend of streamflow while the Sekong is projected to have reduced streamflow.

  • ML models predicted the increasing flood risk in the Sekong and Sesan catchments.

The significance of precise streamflow modeling cannot be overstated in hydrological modeling, as it holds crucial implications for water resource planning, including dam construction, water resource allocation, catchment area management, and flood control. However, the inherent complexity of the hydrological system, marked by dynamic variables, inconsistent data quality, and a web of sensitive parameters in non-linear relationships, poses a substantial challenge (Akhtar et al. 2009; Bourdin et al. 2012; Cui & Singh 2015; Zaghloul et al. 2022). To the best knowledge, there is not yet a single, universally superior model that performs optimally under all conditions and catchment characteristics. The current research emphasizes the development of robust and flexible models that can yield improved performance based on historical data (Mohammadi 2021).

Despite challenges, the research community continues to explore solutions. Machine learning (ML) approaches have shown promise for simulating complex hydrological processes (Adnan et al. 2021). ML algorithms have been widely used in health care (Tran et al. 2021; Widiputra 2021), 3D modeling analysis (Pepe et al. 2021; Khayyal et al. 2022), streamflow prediction (Ghimire et al. 2021; Lin et al. 2021; Singh et al. 2023), predicting urban flood susceptibility (Ekwueme 2022), weather forecasting (Chen et al. 2022), and climate change projection (Obahoundje et al. 2022; Prodhan et al. 2022; Nguyen et al. 2023). In streamflow modeling, ML algorithms avoid complex physical processes and simulate rainfall-runoff with fewer variables. These data-driven models mimic hydrological processes by assimilating measured inputs (Chang & Chang 2006; Mosavi et al. 2018; Liu et al. 2020). Convolutional neural networks (CNNs) and long short-term memories (LSTMs) are preferred for streamflow prediction due to their ability to handle time series data and express non-linear relationships (Chang & Chang 2006; Bourdin et al. 2012; Nguyen et al. 2015; Abadi et al. 2016; Assem et al. 2017; Nguyen 2022). These models formulate non-linearity of streamflow based solely on historical data. The user-friendly concept makes them popular and quick to develop (Mosavi et al. 2018). ML and AI have been pitched in learning the non-linearity of streamflow to predict future streamflow (Nguyen et al. 2015; Assem et al. 2017; Duan et al. 2020; Liu et al. 2020; Ghimire et al. 2021). Studies show that artificial intelligence is learning the non-linearity of streamflow based on historical time series data and proving the advantage of CNN and LSTM architecture, among others, in streamflow prediction. Besides, Jabbari & Bae (2018) and He et al. (2021) demonstrated the capability of ML techniques in capturing the complex relationship between different elements of hydrometeorological systems. Duan et al. (2020) used CNN to simulate precipitation, temperature, solar radiation, and streamflow.

Although ML and AI are recognized as advanced technologies in streamflow forecasting, most research has focused on short-term predictions, such as 6 h (Hu et al. 2018), 1 day (Duan et al. 2020; Le et al. 2021), 5 days (Nguyen et al. 2015), up to 9 months (Ghimire et al. 2021), and 2 years (Liu et al. 2020). Long-term prediction applications remain relatively unexplored, presenting a considerable research gap (Assem et al. 2017). In addition, as alluded to previously, streamflow modeling significantly varies across different basins due to their unique characteristics. These distinguishing features include topography, regional climate, and variations in land use and land cover. Consequently, it becomes imperative to develop and apply customized streamflow models tailored to the specific conditions of each basin.

Locally, the 3S River Basin holds a prominent position in hydrological research due to its significant contribution to the Lower Mekong River flow and the reservoir development initiatives it supports (Ngo et al. 2018). Numerous studies have been conducted in this area, focusing on assessing and projecting hydrological changes within the region. For instance, Trang et al. (2017) utilized the Soil and Water Assessment Tool (SWAT) to investigate the climate change impacts on hydrology in the 3S River Basin across five different general circulation models (GCMs). Similarly, Pradhan et al. (2022) employed SWAT and CROPWAT software to evaluate the influence of human activities and climate change on water resources in the Srepok sub-basin. Ty et al. (2012) implemented the hydrological model (HEC-HMS) to project future hydrological variables under the ECHAM4 GCM.

Despite the extensive research conducted in the area, there is a noticeable lack of studies applying ML to streamflow modeling in this basin. ML technologies have been successfully applied in various hydrological contexts, highlighting a significant research gap. Thus, creating novel ML prediction models to capture hydrological variations under different scenarios is an essential advancement. Such models can provide valuable insights to decision-makers, enhancing their capacity to make more informed, future-proof choices for water resources management (Evers & Pathirana 2018; Liu et al. 2020).

This study aims to (i) create climate change scenarios for the 3S River Basin based on GCMs; (ii) develop ML models (CNN and LSTM) to predict daily streamflow using future rainfall and temperature; and (iii) analyze the future streamflow in the 3S (Sekong, Sesan, and Srepok) River Basin under climate change scenarios using ML models (CNN and LSTM). Ensemble of three GCMs is used for climate change projection under three future periods: near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100) compared to the baseline period (1980–2005).

Sekong, Sesan, and Srepok, popularly known as the 3S River Basin that flows through Vietnam, Lao-PDR, and Cambodia, are the Lower Mekong River Basin's most significant tributaries (Figure 1). The 3S River Basin, which spans 78,529 km2, accounts for around 10% of the Mekong River Basin and contributes approximately 1/4 of the mean annual flow and nearly 15% of suspended sediment. Among the three tributaries, Sesan and Srepok Rivers originate from the central highland of Vietnam. In contrast, the Sekong River originates from Lao-PDR, with a negligible portion of the basin located in Vietnam. Besides supporting 3.4 million people within the basin, the 3S contributes significantly to the flow, fish migration to Tonle Sap Lake, and sediment transport in Mekong Delta (IUCN 2016). International Union highlights the economic benefits of the 3S for Conservation of Nature (IUCN), which indicates that fish and aquatic animals contribute 10 million USD in annual revenue to Cambodia only (Constable 2015).
Figure 1

The elevation variation in the study area (3S River Basin), along with the hydrometeorological stations.

Figure 1

The elevation variation in the study area (3S River Basin), along with the hydrometeorological stations.

Close modal

The climate of the 3S River Basin is dictated by its topography and seasonal monsoon with a strong orographic effect due to Annamite Mountains present at the eastern boundary. Owing to the elevation difference, ranging from 43 to 2,409 m above mean sea level (masl), the average annual precipitation within the basin varies from 2,800 mm in the upper Sekong to around 2,500 mm in the lowland of the Srepok River Basin. Similarly, the average daily temperature in the basin ranges from 7.8 to 32.8 °C, with Sekong being slightly cooler than Sesan and Srepok. Streamflow in the 3S River Basin is highly seasonal unless controlled by reservoir operation upstream. The observed discharge at three selected hydrological stations: Attapeu, Kon Tum, and Bandon, indicate the typical flow regime of Sekong, Sesan, and Srepok, respectively. The average annual flow at these stations is 425.6, 95.8, and 277.9 m3/s, Sekong being much larger, followed by Srepok and Sesan. The peak flow within all three tributaries occurs in August, September, and October.

This study used 21 years (1985–2005) of observed daily mean temperature and precipitation data for climate change projection. We also used 21 years (1985–2005) of observed daily discharge data from three selected stations: Attapeu, Kon Tum, and Bandon, as shown in Figure 1. All the observed hydrometeorological data were acquired from the Mekong River Commission (MRC). To improve the quality of training and testing (validation) of acquired data, all the data were cleaned by removing or filling in any missing values, errors, and duplicates. The mean of nearby stations was used to fill in the missing data if the missing data was higher than 10% of the total sample (Yang et al. 2017). For climate change projection, three GCMs, namely: Centro Euro-Mediterraneo sui Cambiamenti Climatici Climate Model (CMCC-CMS), Model for Interdisciplinary Research on Climate (MIROC5), and Hadley Centre Global Environmental Model (HadGEM-AO2) (Table 1) from Coupled Model Intercomparison Project Phase 5 (CMIP5) were selected due to their ability to simulate the most realistic onset timing as suggested by various kinds of literature (Hasson et al. 2016; Ruan et al. 2018; Ruan et al. 2019). For each GCM, two RCPs, namely RCP4.5 and RCP8.5, were chosen corresponding to medium (4.5 W/m2) and high (4.5 W/m2) radiative forcing scenarios.

Table 1

Schematic of the observed hydrometeorological and climate model data used

S.No.DataSpatio-temporal resolutionDurationSource/Developer
Hydrometeorological data 
Temperature Point/Daily 1980–2005 MRC 
Rainfall Point/Daily 1981–2005  
Discharge Point/Daily 1985–2005  
GCMs' data (Historical and Future: RCP4.5 and 8.5) 
CMCC-CMS 0.5°/Daily 1980–2100 CMCC-CM 
MIROC5 1.41°/Daily 1980–2100 CCSR, NIES, JAMSTEC-FRCGC 
HadGEM2-AO 1.9° × 1.25°/Daily 1980–2100 Hadley Center, UKMO 
S.No.DataSpatio-temporal resolutionDurationSource/Developer
Hydrometeorological data 
Temperature Point/Daily 1980–2005 MRC 
Rainfall Point/Daily 1981–2005  
Discharge Point/Daily 1985–2005  
GCMs' data (Historical and Future: RCP4.5 and 8.5) 
CMCC-CMS 0.5°/Daily 1980–2100 CMCC-CM 
MIROC5 1.41°/Daily 1980–2100 CCSR, NIES, JAMSTEC-FRCGC 
HadGEM2-AO 1.9° × 1.25°/Daily 1980–2100 Hadley Center, UKMO 

Note: MRC, Mekong River Commission, Lao-PDR; CMCC-CM, Centro Euro-Mediterraneo sui Cambiamenti Climatici Climate Model, Italy; CCSR, Center for Climate System Research, University of Tokyo; NIES, National Institute for Environmental Studies, Japan; JAMSTEC-FRCGC, Japan Agency for Marine-Earth Science and Technology Frontier Research Center for Global Change, Japan.

Beyond temperature and rainfall, factors such as human activities and land use and land cover (LULC) play significant roles in climate change processes. Human activities, including urbanization, deforestation, and varied agricultural practices, can profoundly influence the greenhouse gas balance and the albedo effect, directly impacting global and regional climate patterns (Pradhan et al. 2022). Moreover, LULC changes can affect local microclimates and contribute to or mitigate broader climate change, depending on the specifics of those changes (Ghaderpour et al. 2023). However, due to the complexities and difficulties associated with data collection and interpretation, these critical elements are frequently not adequately accounted for in current climate models. Future research endeavors should include these variables for more accurate and comprehensive climate change assessments.

Figure 2 shows the overall methodological framework used in this study. Firstly, the future climate scenarios were projected by downscaling meteorological data from chosen GCMs. Then, we predicted the future streamflow from future climate scenarios of meteorological data using ML models. We employed observed river discharge for training and testing the efficiency of ML models. Finally, we assessed the change in future hydrology in the study area using two ML models. Detailed explanations of each method are given in subsequent sections.
Figure 2

Overall methodological framework (left panel). The top and bottom right panels show a schematic of CNN and LSTM models.

Figure 2

Overall methodological framework (left panel). The top and bottom right panels show a schematic of CNN and LSTM models.

Close modal

Climate change projection

Before climate change projection, all the selected GCMs data are bias-corrected using a simple linear scaling technique (Equations (1)–(4); Shrestha et al. 2017). Then, all the climate variables are projected for three future periods: near, mid, and far-future compared to the baseline period. This time frame addresses the long-term water resources planning and adaptation. While precipitation and temperature are analyzed for mean annual change during different future periods, hydrological projection under climate change was also analyzed for high (Q5) and low flows (Q95).
(1)
(2)
(3)
(4)
where P and T represent precipitation and temperature and μm represents the long-term monthly mean. An asterisk (*) as a superscript denotes bias-corrected values. Subscripts: obs, his, and fut represent observed ground station data, historical and future raw GCM data, and subscript d represents the corresponding values at daily time-step.

ML models

Convolutional neural network (CNN)

CNN is a deep learning algorithm inspired by the human brain's neural architecture, which can recognize patterns and their interaction with the environment. A typical convolutional neural network usually consists of three layers: (1) the convolutional layer (Conv layer), which extracts the features from the input; (2) the pooling layer, which is responsible for the down-sampling along the spatial dimension; and (3) the fully connected layer (dense layer), which utilizes data from the Conv layer and pooling layer to generate the output. The Conv layer uses the mathematic equation to express the input. For instance, an input image X and a filter f are defined as Z = X * f. Similarly, functions like f(x) = max (0; x) are used in the pooling layer, which returns the maximum value given all the elements in the filter. Moreover, different transformation functions: linear (Equation (5)) or non-linear functions like the sigmoid function (Equation (6)), transform the input vector from the previous layer to the next one within the fully connected layer.
(5)
(6)
where X is the input from the previous layer, W is the weight and is a number matrix, and b is the constant bias. e is the Euler number, and x represents CNN's output.

Long short-term memories (LSTM) networks

LSTM is a kind of recurrent neural network (RNN) proposed by Hochreiter & Schmidhuber (1997), which overcomes the drawback of traditional feed-forward neural networks. It allows data to cycle through the network more stably and efficiently to deal with non-linear, long-range, time-varying problems. Gate mechanism allows the network to store the summary of the past input sequences enabling the model to learn long-term dependencies by adding different types of gates: the input gate, the forget gate, and the output gate to the memory unit in RNN. Since LSTM is inherited from RNN, it still maintains the connections between the hidden layers and enhances the quality of the connection between the back and front nodes through the gates. Equations (7)–(12) can best describe the theory behind the LSTM:
(7)
(8)
(9)
(10)
(11)
(12)
where ft represents the forget gate, It represents the input gate, represents candidate of cell state, Ct represents the state (memory), Ot means the output gate, xt represents input at the current time-step (t). Ht and Ht−1 represent the output of the previous and current LSTM blocks. In addition, Wx and bx represent weights and biases at the respective gate (x) neurons, while σ represents the sigmoid function.

In the LSTM model, the input gate It combines Xt and Ht−1 and passes them through the sigmoid function, as shown in Equation (8). Then, an activation function such as the hyperbolic tangent (tanh) function (Equation (9)) is used to create a candidate vector added to the memory, replacing the older memory state Ct−1 as shown in Equation (10). Then, the sigmoid layer decided which part of the memory state would be output by Equation (11). Eventually, a memory unit will calculate the final output using Equation (12), a filtered version based on our cell state. In summary, the weight matrices W(f; i; c;o) and biases vectors b(f; i; c;o) from Equations (7)–(12) are updated iteratively in the LSTM network through Backpropagation Through Time (BPTT) algorithm. It will actively choose useful information to store and reject the uninformative information; LSTM provides a better solution for eliminating the gradient explosion and vanishing problem RNN faces.

Model development

This study used the TensorFlow library, a deep learning library in Python, to develop both CNN and LSTM models. These models' performances depend on the number of layers, nodes in each layer, input size, normalization method, optimizer, batch size, learning rate, and epochs chosen during the model development. These factors are defined here based on the experiment and available computational resources. At first, all the observed dataset was divided into two sets: training (1985–2000) and testing (2000–2006) at a ratio of 80–20%. Temperature and precipitation were fed into the model to simulate the streamflow.

Training of any neural network involves two critical processes: forward and backward propagation. The forward propagation receives the input data, processes the information, and generates output. During this phase, weights, biases, and filters are randomly initialized and treated as parameters by the convolution neural network algorithm. Whereas, during the backward propagation, the model calculates error and updates the parameters based on the overall prediction accuracy utilizing the gradient descent technique. Finally, the predicted discharge was compared with the observed one to measure the goodness of fit (loss). A good model is expected to have a minimal loss during training and testing. After the training process, a set of weights and biases are obtained for all the layers, which are later used for further testing and simulation of future flow regimes.

CNN model architecture in this study consists of two 2D convolutional layers; one flattened layer and three fully connected layers (dense layer). In 2D convolutional layers, we use eight filters with a kernel size of 10 × 2 in the first layer and four filters with a kernel size 5 × 2 in the second layer. Next, the flattened layer transforms the entire 2D of the second layer into a single-column matrix which is then fed to the dense layers for processing. We use three dense layers with varying numbers of nodes (neurons) of 60, 30, and 1, respectively. In each layer, batch normalization was used to reduce data redundancy and eliminate undesirable characteristics. Another vital component of building a CNN model was choosing an activation function that determines how the network is initialized and adjusted with weight and biases during training. Here, we decided leaky rectified linear unit (Leaky ReLU) (Maas et al. 2013; Goodfellow et al. 2016) as an activation function over a standard rectified linear unit (ReLU) function due to its better gradient propagation and ability to eliminate the vanishing gradient problem of standard rectified linear unit (ReLU) function. Leaky ReLU is a faster learning activation function and offers better performance and generalization in deep learning than sigmoid and tanh functions (LeCun et al. 2015; Goodfellow et al. 2016; Sharma 2017). To ensure the model can converge without causing overfitting, we choose a learning rate of 10−5 and momentum = 0.9. Also, we use mean absolute error (MAE) to measure the loss per each epoch.

Other components affecting the performance of the CNN model include optimizer, batch size, and epochs. The optimizer's choice reduces the losses, leading to more accurate outcomes. Popular selection of optimizers includes gradient descent (GD), stochastic gradient descent (SGD), Nesterov Accelerated Gradient (NAG), Adaptive Gradient (AdaGrad), and Adam. Among those optimizers, Adam helps the model converge faster and generalize better to the test data (Kingma & Ba 2014; Dogo et al. 2018; Llugsi et al. 2021). Batch size is another hyperparameter to tune in modern deep learning systems. A larger batch size allows computational speed up from the parallel processing through graphical processing units (GPUs); however, it leads to poor generalization.

On the other hand, a smaller batch size considerably slows down training speed. Based on our available computational resources, the batch size was 28 for both CNN and LSTM models. The epochs are the number of times the model runs through whole data; more significant epochs usually lead to more accurate results and slow down the training process. Observing the model losses, we set the number of epochs to 500 in both CNN and LSTM architecture.

The LSTM model was developed using two LSTM layers, which capture the essential features of input data. The input size (number of cells) of the first LSTM layer depends on the length of the input window (the number of training days) and the size of the second LSTM layer, which is set to 30. In each LSTM layer, we keep using batch normalization and Leaky ReLU as activation functions. After passing through the LSTM layer, the input is processed via two fully connected layers containing 10 and 1 node (neuron), respectively. All the hyperparameters, including optimizer, learning rate, batch size, and the number of epochs, were kept similar to the CNN model.

Model performance evaluation

This study used the observation of four different window lengths (number of days): 30, 60, 180, and 365 (1 year) to predict the next day's streamflow using CNN and LSTM models. The performance of these proposed models was assessed through five statistical indicators and then evaluated based on the models' ability to predict the observed streamflow at three different stations: Attapeu (Sekong River Basin), Kon Tum (Sesan River Basin), and Bandon (Srepok River Basin). The statistical indicators used are Pearson correlation coefficient (R) (Legates & McCabe 1999), Nash–Sutcliffe efficiency (NSE) (Nash & Sutcliffe 1970), root mean square error (RMSE), (Zhu et al. 2020), MAE (Legates & McCabe 1999), and percent bias (PBIAS) (Gupta et al. 1999).

R is used to analyze how differences in the second variable can explain differences in one variable. It is a measure of linear correlation between two sets of data. It varies from −1 to 1, where −1 denotes the negative correlation, 0 means no correlation, and 1 expresses an unrealistically perfect correlation (Equation (13)). NSE (Equation (14)) is another commonly used indicator to evaluate the model's predictive power. It is also used to describe the model's accuracy quantitatively. Its value ranges from −∞ to 1, with 1 being the perfect model, whereas an efficiency less than 0 means the observed mean is a better predictor than the model. Similarly, RMSE measures the spread of residuals around the line of best fit. It is the standard deviation of the residuals (prediction error Equation (15)).

MAE (Equation (16)) measures the average magnitude of the errors in a set of predictions without considering their direction. It's the average of the absolute differences between prediction and actual observation over the test sample, where all individual differences have equal weight. RMSE and MAE range from 0 to ∞, where values closer to 0 suggest better prediction, with 0 being the perfect one. Lastly, PBIAS (Equation (17)) accounts for the average tendency of the simulated data to be larger or smaller than their observed counterparts (Gupta et al. 1999). The optimal value of PBIAS is 0.0, with a positive value indicating under-estimation bias and negative values meaning over-estimation bias.
(13)
(14)
(15)
(16)
(17)
where x and y are observed and predicted discharge with subscript t representing the time step, and n is the total number of data points.

Future projected climate

The climate projection from all GCMs anticipated that the 3S River Basin would grow warmer with spatio-temporal variation, as shown in Figure 3. The performance of the linear scaling method of bias correction used across the 3S River Basin is provided in Supplementary Table S1. The temperature in the basin is expected to increase by 0.08–4.2 °C, with the most significant change in the northwest area (Sekong catchment), followed by the southeast region (Srepok catchment), and a minor change in the central area. The temperature in this basin varies from 24 °C in the central to slightly hotter in the southeast, with temperatures around 25–26 °C and reaching the maximum of 28–29 °C in the northwest area. This result is coherent with past studies, which show similar spatio-temporal variation within the result (Hoan et al. 2018; Ruan et al. 2019).
Figure 3

The change in average annual temperature of the study area under RCP4.5 and 8.5 scenarios by 2100 under different GCMs.

Figure 3

The change in average annual temperature of the study area under RCP4.5 and 8.5 scenarios by 2100 under different GCMs.

Close modal
It is projected that the average temperature in the study area is expected to increase over time under different scenarios (Figure 4 and Supplementary Figure S1). The temperature change is projected to vary across different periods, with the period from 2030 to 2050 showing the least change in temperature, followed by the period from 2050 to 2080. On the other hand, the period from 2080 to 2100 is expected to see the most significant increase in temperature. Station 150504 is expected to see the highest change in annual average temperature, about 4 °C, in all three periods under RCP4.5 and 8.5 (Supplementary Figure S2). It is crucial to monitor these trends and take necessary measures to mitigate the potential impacts of rising temperatures on the environment and human activities.
Figure 4

Change in mean annual temperature in the 3S River Basin for the future periods: near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100) under RCP4.5 and 8.5.

Figure 4

Change in mean annual temperature in the 3S River Basin for the future periods: near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100) under RCP4.5 and 8.5.

Close modal

GCMs indicate the basin remains in the temporal distribution of temperature except at 140704 (Pleiku, Sesan catchment) station, the hot period is around March–June, and it starts cooling down in July (Supplementary Figure S3). Even though the hot period in 140703 stations still spreads from March to June, the hottest month is delayed from March to April. The results indicate that the basin must prepare for scorching events since the hot-season temperature is significantly increased at most studied locations.

Results from the projected GCMs show a significant fluctuation in future precipitation in the 3S River Basin under climate change scenarios (Figure 5 and Supplementary Figure S4). The change in precipitation varied from −12.0 to +40.6% in the basin. Surprisingly, the HadGEM-AO2 model predicted a decreasing pattern in the southern and northern regions of the basin under the RCP8.5. Spatially, the southern regions have the biggest precipitation variation, followed by the northwest area. The central parts are likely to have the slightest change in precipitation. The change in rainfall is variable with different models in the exact location. The CMCC-CMS model projected that the 3S River Basin would have the most significant change in the future, followed by MIROC5. At the same time, HadGEM-AO2 predicted a decreasing precipitation trend on the basin's borderline and a slight increase in the domestic region.
Figure 5

The percentage change in mean annual precipitation of the study area under RCP4.5 and 8.5 scenarios by 2100.

Figure 5

The percentage change in mean annual precipitation of the study area under RCP4.5 and 8.5 scenarios by 2100.

Close modal
Both CMCC-CMS and MIROC5 reveal a significant rise in precipitation for the whole study area in the mid- to far-future (2050–2100). HadGEM-AO2 has a contrasting trend that shows a considerable rise in the immediate future followed by a decline in the change in the mid- and far-futures (Figure 6). In particular, a drought trend is predicted for the north between 2050 and 2100.
Figure 6

Change in annual precipitation over different periods: near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100) under RCP4.5 and 8.5. See locations in Figure 1.

Figure 6

Change in annual precipitation over different periods: near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100) under RCP4.5 and 8.5. See locations in Figure 1.

Close modal

Besides the alteration in total rainfall, GCMs also predicted the adjustment in the temporal distribution of precipitation in this basin. The rainy season typically occurs between May and October, similar to the historical record. Sekong catchment will have the most rainfall in July instead of August, according to four of the six GCMs. The Sesan catchment has the least change in annual precipitation amount and temporal pattern compared to others. However, it is projected to increase in rainfall during the wet period by all studied GCMs. Three investigated stations in the Srepok watershed showed a significant increase in monthly precipitation during the rainy season, suggesting a higher risk of flooding due to precipitation seasonality. In this sub-basin, the rainy season typically occurs between May and October. August and September usually see the highest volume of river discharge, but this is projected to change in the coming months of September and October.

Performances of ML models in hydrologic simulation

We used both model evaluation statistic index (dimensionless and error-index) (Table 2 and Supplementary Table S2) and graphical techniques (Figure 7) to evaluate the efficiency of models. Results show that the model prediction is highly correlated with the observed data corresponding to the R-value 0.8, 0.84, 0.85 at Kon Tum, Bandon, and Attapeu stations, respectively. In addition, the NSE value, which indicates how well the plot of observed versus simulated data fits the 1:1 line, presents a rate from 0.58 to 0.7 for the above stations. According to Moriasi et al. (2007), NSE values between 0 and 1 are acceptable performance levels. The best model will be evaluated based on the following criteria: Having the smallest error index (MAE, RMSE), having the largest dimensionless index (R, NSE), and having the closest average flow to the observed mean flow. Regarding these criteria, LSTM proved its efficient prediction of streamflow from temperature and rainfall. Figure 7 displays the correlation between the model's prediction and observed streamflow in the typical hydrological station.
Table 2

Performance of the machine learning models in simulating the hydrological regime on a daily time scale at three stations across the 3S River Basin

LocationRNSEMAERMSEMAEPBIAS
Attapeu-Sekong 0.85 0.7 176.7 367.5 176.7 17.9 
Bandon-Srepok 0.84 0.7 64.3 125.9 64.3 3.2 
Kon Tum-Sesan 0.8 0.58 29.6 61.6 29.6 −5.95 
LocationRNSEMAERMSEMAEPBIAS
Attapeu-Sekong 0.85 0.7 176.7 367.5 176.7 17.9 
Bandon-Srepok 0.84 0.7 64.3 125.9 64.3 3.2 
Kon Tum-Sesan 0.8 0.58 29.6 61.6 29.6 −5.95 
Figure 7

Daily streamflow plots at Attapeu, Kon Tum, and Bandon stations during the testing period (2002–2006).

Figure 7

Daily streamflow plots at Attapeu, Kon Tum, and Bandon stations during the testing period (2002–2006).

Close modal
Figure 8 depicts the change in average streamflow between 2020 and 2100 for three discrete periods. The CMCC-CMS model forecasts a decline in the Sekong River streamflow at Attapeu station, which has the greatest impact in the 2020–2050 period. Overall, the streamflow of the Sekong River could decrease by 15.8 and 17% under RCPs 4.5 and 8.5, according to the CMCC-CMS model. However, other models predict that flow at Attapeu station will increase. This river's annual streamflow will increase by 50.7% under RCP4.5 and by 35.5% under RCP8.5, according to the HadGEM-AO2 model. MIROC5 estimates that the average streamflow of the Sekong River will double in the mid- to long-term. Models also project a minor change in the Srepok River's streamflow, represented by Bandon station. This river's alteration ranges from −8 to 25%. This river streamflow is predicted to increase in the far-future (2080–2100) after a minor drop in the near-future (2020–2050) and the mid-future (2050–2080). In addition, models anticipate that future Sesan streamflow (Kon Tum station) will exhibit an upward trend. Changes range between 7.7 and 35%, with CMCC-CMS predicting the smallest change and MIROC5 predicting the most significant fluctuation. However, this river flow appears to alter the most in the near-future (2020–2050).
Figure 8

Relative change (%) in the average flow of study stations in near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100).

Figure 8

Relative change (%) in the average flow of study stations in near-future (2020–2050), mid-future (2050–2080), and far-future (2080–2100).

Close modal
GCMs project the increase in flow at all three stations in the dry period (December–May), as observed in Table 3. It also indicates the upward trend in streamflow during the wet period of the Sesan River (Kon Tum station) and Sekong (Attapeu station). However, all scenarios predict a minor to moderate downtrend during the wet period. This station's shifting flood season from June–November to August–February (Figure 9) partially explains this downward trend. The temporal distribution at other study locations remains the same in the future; the dry period is between December and May, whereas June–November is considered the wet period.
Table 3

Relative change (%) in the average flow of study stations in the dry (December–May) and wet (June–November) period

RCP4.5
RCP8.5
CMCC-CMSHadGEM-AO2MIROC5CMCC-CMSHadGEM-AO2MIROC5
Wet period (June–November) Kon Tum 6.03 −3.57 34.41 5.86 23.68 34.33 
Bandon −10.33 −5.82 −13.78 −11.78 − 18.67 −7.95 
Attapeu −24.18 52.45 117.86 −24.75 35.64 114.69 
Dry Period (December–May) Kon Tum 17.81 22.56 36.98 14.38 16.87 33.12 
Bandon 53.98 50.35 146.22 30.10 48.65 174.49 
Attapeu 42.32 82.60 149.01 49.69 67.91 145.27 
RCP4.5
RCP8.5
CMCC-CMSHadGEM-AO2MIROC5CMCC-CMSHadGEM-AO2MIROC5
Wet period (June–November) Kon Tum 6.03 −3.57 34.41 5.86 23.68 34.33 
Bandon −10.33 −5.82 −13.78 −11.78 − 18.67 −7.95 
Attapeu −24.18 52.45 117.86 −24.75 35.64 114.69 
Dry Period (December–May) Kon Tum 17.81 22.56 36.98 14.38 16.87 33.12 
Bandon 53.98 50.35 146.22 30.10 48.65 174.49 
Attapeu 42.32 82.60 149.01 49.69 67.91 145.27 
Figure 9

Historical (1985–2005) and projected (2030–2100) monthly river discharge at Sekong, Sesan, and Srepok gauge stations of the 3S River Basin.

Figure 9

Historical (1985–2005) and projected (2030–2100) monthly river discharge at Sekong, Sesan, and Srepok gauge stations of the 3S River Basin.

Close modal
Together with the increased trend in streamflow of the Sekong and Sesan Rivers, the ML model predicts a significant increase in flood risk in these catchments through the increase of the Q5 ­index (Figure 10). HadGEM-AO2 and MIROC5 models forecast that the Q5 flow at Attapeu station will add up from 11 to 58%, while CMCC-CMS predicts this flow will be reduced by about a third compared to the 1985–2005 period. Similar to the Sekong catchment, the Sesan River is also waiting for an increase in flood risk regarding the prediction of GCMs. The magnitude of change in this catchment is around −11 to 15%, with CMCC-CMS expecting the decrease and other scenarios awaiting the increase. Over the period, Q5 flow at Sekong and Sesan Rivers are similar to the average flow trend, as it has significant fluctuation during the near and mid period (2020–2080), then reduces the intensity by the 2080–2100 period.
Figure 10

Q95 and Q5 index at study stations in 1985–2005 and 2020–2100 under different climate change scenarios.

Figure 10

Q95 and Q5 index at study stations in 1985–2005 and 2020–2100 under different climate change scenarios.

Close modal

On the other hand, most GCMs project that the Sekong River will have a downtrend in high flow with a decrease of around 1–26% except for a minor increase with the MIROC5 model at an increased rate of 5%. GCMs also predict the period of 2050–2080 will have the largest shifting at this location, followed by the 2020–2050 period, and the far-future (2080–2100) will have the least change compared to other periods.

This study demonstrates that selected ML algorithms, CNN and LSTM, are effective enough to predict streamflow, utilizing only daily mean temperature and precipitation data. Among ML models, LSTM performs better with excellent correlation and NSE score, given that the river has a natural flow condition. This study finds that precipitation, temperature, and river discharge in the 3S River Basin will significantly change in the future. The temperature in the basin will increase by 0.13–4.2 °C varying by location and scenarios. The northwest region will have the most significant temperature change, followed by the south region, and the least is the central region. GCMs also predict that the precipitation in the 3S River Basin will increase by up to 40.6%. The southeast region will have the most significant shift in rainfall. The streamflow in the basin will significantly increase in the mid- and far-future (2050–2100). ML models also suggest increasing flood risk in the Sekong and Sesan catchment with the increase of the Q5 ­index in the future, while it will have a downtrend of high flow at the Srepok with a decrease of around 1–26%. HadGEM-AO2 and MIROC5 models forecast that the Q5 flow at Attapeu station will add up from 11 to 58%, while CMCC-CMS predicts this flow will be reduced by about a third compared to the 1985–2005 period. In addition, climate change affects the amount of river discharge. It alters the temporal streamflow pattern in the study area by delaying the wet season in most research areas.

The authors would like to express sincere gratitude to Silver Anniversary Scholarships for the financial support of this research. The authors would like to thank the USAID-funded PEER project ‘Connecting climate change, hydrology, and fisheries for energy and food security in Lower Mekong Basin' (Thailand and Cambodia – Project 6-436) carried out in co-operation with the Asian Institute of Technology, Stockholm Environmental Institute, Inland Fisheries Research and Development Institute, and Arizona State University for their support in data acquisition.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Abadi
M.
,
Barham
P.
,
Chen
J.
,
Chen
Z.
,
Davis
A.
,
Dean
J.
,
Devin
M.
,
Ghemawat
S.
,
Irving
G.
,
Isard
M.
,
Kudlur
M.
,
Levenberg
J.
,
Monga
R.
,
Moore
S.
,
Murray
D. G.
,
Steiner
B.
,
Tucker
P.
,
Vasudevan
V.
,
Warden
P.
,
Wicke
M.
,
Yu
Y.
&
Zheng
X.
2016
Tensorflow: a system for large-scale machine learning
. In
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation
, pp.
265
283
.
Adnan
R. M.
,
Petroselli
A.
,
Heddam
S.
,
Santos
C. A. G.
&
Kisi
G.
2021
Comparison of different methodologies for rainfall-runoff modeling: machine learning vs conceptual approach
.
Natural Hazards
105
(
3
),
2987
3011
.
doi:10.1007/s11069-020-04438-2
.
Assem
H.
,
Ghariba
S.
,
Makrai
G.
,
Johnston
P.
,
Gill
L.
&
Pilla
F.
2017
Urban water flow and water level prediction based on deep learning
. In:
Machine Learning and Knowledge Discovery in Databases (Altun, Y., ed.). ECML PKDD 2017. Lecture Notes in Computer Science
.
Springer, Cham
, p.
10536
.
Bourdin
D. R.
,
Fleming
S. W.
&
Stull
R. B.
2012
Streamflow modelling: a primer on applications, approaches and challenges
.
Atmosphere-Ocean
50
(
4
),
507
536
.
Chang
F. J.
&
Chang
Y. T.
2006
Adaptive neuro-fuzzy inference system for prediction of water level in reservoir
.
Advances in Water Resources
29
(
1
),
1
10
.
Chen
H.
,
Zhang
Q.
&
Birkelund
Y.
2022
Machine learning forecasts of Scandinavian numerical weather prediction wind model residuals with control theory for wind energy
.
Energy Reports
8
(
13
),
661
668
.
https://doi.org/10.1016/j.egyr.2022.08.105
.
Constable
D.
2015
Atlas of the 3S Basins
.
IUCN
,
Bangkok
,
Thailand
, p.
85
.
Cui
H.
&
Singh
V. P.
2015
Configurational entropy theory for streamflow forecasting
.
Journal of Hydrology
521
,
1
17
.
Dogo
E. M.
,
Afolabi
O. J.
,
Nwulu
N. I.
,
Twala
B.
&
Aigbavboa
C. O.
2018
A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks
. In:
The 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS)
,
Belgaum, India
, pp.
92
99
.
doi:10.1109/CTEMS.2018.8769211
.
Duan
S.
,
Ullrich
P.
&
Shu
L.
2020
Using convolutional neural networks for streamflow projection in California
.
Frontiers in Water
2
(
28
).
doi:10.3389/frwa.2020.00028
.
Ekwueme
B. N.
2022
Machine learning based prediction of urban flood susceptibility from selected rivers in a tropical catchment area
.
Civil Engineering Journal
8
(
9
).
doi:10.28991/CEJ-2022-08-09-08
.
Evers
J.
&
Pathirana
A.
2018
Adaptation to climate change in the Mekong river basin: introduction to the special issue
.
Climatic Change
149
,
1
11
.
https://doi.org/10.1007/s10584-018-2242-y
.
Ghaderpour
E.
,
Mazzanti
P.
,
Mugnozza
G. S.
&
Bozzano
F.
2023
Coherency and phase delay analyses between land cover and climate across Italy via the least-squares wavelet software
.
International Journal of Applied Earth Observation and Geoinformation
118
,
103241
.
Ghimire
S.
,
Yaseen
Z. M.
,
Farooque
A. A.
,
Deo
R. C.
,
Zhang
J.
&
Tao
X.
2021
Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks
.
Scientific Reports
11
,
17497
.
https://doi.org/10.1038/s41598-021-96751-4
.
Goodfellow
I.
,
Bengio
Y.
,
Courville
A.
&
Bengio
Y.
2016
Deep Learning
, Vol.
1
.
MIT Press
,
Cambridge
.
Gupta
H. V.
,
Sorooshian
S.
&
Yapo
P. O.
1999
Status of automatic calibration for hydrologic models: comparison with multilevel expert calibration
.
Journal of Hydrologic Engineering
4
,
135
143
.
Hasson
S.
,
Pascalle
S.
,
Lucarini
V.
&
Bohner
J.
2016
Seasonal cycle of precipitation over major river basins in South and Southeast Asia: a review of the CMIP5 climate models data for present climate and future climate projections
.
Atmospheric Research
180
,
42
63
.
ISSN 0169-8095, https://doi.org/10.1016/j.atmosres.2016.05.008
.
He
S.
,
Gu
L.
,
Tian
J.
,
Deng
L.
,
Yin
J.
,
Liao
Z.
,
Zeng
Z.
,
Shen
Y.
&
Hui
Y.
2021
Machine learning improvement of streamflow simulation by utilizing remote sensing data and potential application in guiding reservoir operation
.
Sustainability
13
,
3645
.
https://doi.org/10.3390/su13073645
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
IUCN
.
2016
Freshwater Health Index: Sekong, Sesan and Srepok Basin: An Assessment of Fresh Water Ecosystem Health in the Lower Mekong
.
Khayyal
H. K.
,
Zeidan
Z. M.
&
Beshr
A. A.
2022
Creation and spatial analysis of 3D city modeling based on GIS data
.
Civil Engineering Journal
8
(
1
),
105
.
Kingma
D.
&
Ba
J.
2014
Adam: a method for stochastic optimization
. In
International Conference on Learning Representations
.
Le
X. H.
,
Nguyen
D. H.
,
Jung
S.
,
Yeon
M.
&
Lee
G.
2021
Comparison of deep learning techniques for river streamflow forecasting
.
IEEE Access
9
,
71805
71820
.
LeCun
Y.
,
Bengio
Y.
&
Hinton
G.
2015
Deep learning
.
Nature
521
,
436
444
.
https://doi.org/10.1038/nature14539
.
Legates
D. R.
&
McCabe
G. J.
Jr
1999
Evaluating the use of ‘goodness-of-fit’ measures in hydrologic and hydroclimatic model validation
.
Water Resources Research
35
(
1999
),
233
241
.
Lin
Y.
,
Wang
D.
,
Wang
G.
,
Qiu
J.
,
Long
K.
,
Du
Y.
,
Xie
H.
,
Wei
Z.
,
Shangguan
W.
&
Dai
Y.
2021
A hybrid deep learning algorithm and its application to streamflow prediction
.
Journal of Hydrology
601
,
126636
.
https://doi.org/10.1016/j.jhydrol.2021.126636
.
Llugsi
R.
,
Yacoubi
S. E.
,
Fontaine
A.
&
Lupera
P.
2021
Comparison between Adam, AdaMax and Adam W optimizers to implement a Weather Forecast based on Neural Networks for the Andean city of Quito
. In:
2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM)
,
Cuenca, Ecuador
, pp.
1
6
.
doi:10.1109/ETCM53643.2021.9590681
.
Maas
A. L.
,
Hannun
A. Y.
&
Ng
A. Y.
2013
Rectifier non-linearities improve neural network acoustic models
. In:
Proceedings of the 30th International Conference on Machine Learning
,
Atlanta, Georgia, USA
, Vol.
28
.
Mohammadi
B.
2021
A review on the applications of machine learning for runoff modeling
.
Sustainable Water Resources Management
7
(
6
),
98
.
Moriasi
D.
,
Arnold
J.
,
Van Liew
M.
,
Bingner
R.
,
Harmel
R. D.
&
Veith
T.
2007
Model evaluation guidelines for systematic quantification of accuracy in watershed simulations
.
Transactions of the ASABE
50
.
doi:10.13031/2013.23153
.
Nash
J. E.
&
Sutcliffe
J. V.
1970
River flow forecasting through conceptual models part I – a discussion of principles
.
Journal of Hydrology
10
(
1970
),
282
290
.
Ngo
L. A.
,
Masih
I.
,
Jiang
Y.
&
Douven
W.
2018
Impact of reservoir operation and climate change on the hydrological regime of the Sesan and Srepok Rivers in the Lower Mekong Basin
.
Climatic Change
149
,
107
119
.
https://doi.org/10.1007/s10584-016-1875-y
.
Nguyen
T. T.
,
Huu
Q. N.
&
Li
M. J.
2015
Forecasting time series water levels on Mekong river using machine learning models
. In:
2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)
.
IEEE
, pp.
292
297
.
Nguyen
D. T.
,
Ashraf
S.
,
Le
M.
,
Trung
L. Q.
&
Ali
M.
2023
Projection of climate variables by general circulation and deep learning model for Lahore, Pakistan
.
Ecological Informatics
75
,
102077
.
https://doi.org/10.1016/j.ecoinf.2023.102077
.
Obahoundje
S.
,
Diedhiou
A.
,
Dubus
L.
,
Alamou
E. A.
,
Amoussou
E. K.
&
Ofosu
E. A.
2022
Modeling climate change impact on inflow and hydropower generation of Nangbeto dam in West Africa using multi-model CORDEX ensemble and ensemble machine learning
.
Applied Energy
325
,
119795
.
https://doi.org/10.1016/j.apenergy.2022.119795
.
Pepe
M.
,
Costantino
D.
,
Alfio
V. S.
,
Vozza
G.
&
Cartellino
E.
2021
A novel method based on deep learning, GIS and geomatics software for building a 3D city model from VHR Satellite Stereo Imagery
.
ISPRS International Journal of Geo-Information
10
(
10
),
697
.
https://doi.org/10.3390/ijgi10100697
.
Pradhan
P.
,
Pham
T. T. H.
,
Shrestha
S.
,
Loc
H. H.
&
Park
E.
2022
Projecting the impact of human activities and climate change on water resources in the transboundary Sre Pok River Basin
.
Climatic Change
172
(
3–4
),
26
.
Prodhan
F. A.
,
Zhang
J.
,
Sharma
T. P. P.
,
Nanzad
L.
,
Zhang
D.
,
Seka
A. M.
,
Ahmed
N.
,
Hasan
S. S.
,
Hoque
M. Z.
&
Mohana
H. P.
2022
Projection of future drought and its impact on simulated crop yield over South Asia using ensemble machine learning approach
.
Science of The Total Environment
807
(
3
),
151029
.
https://doi.org/10.1016/j.scitotenv.2021.151029
.
Sharma
S.
2017
Activation functions in neural networks
.
Towards Data Science
6
.
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6.
Shrestha
M.
,
Acharya
S. C.
&
Shrestha
P. K.
2017
Bias correction of climate models for hydrological modelling – are simple methods still useful?
Meteorological Applications
24
(
3
),
531
539
.
Singh
D.
,
Vardhan
M.
,
Sahu
R.
,
Chatterjee
D.
,
Chauhan
P.
&
Liu
S.
2023
Machine-learning- and deep-learning-based streamflow prediction in a hilly catchment for future scenarios using CMIP6 GCM data
.
Hydrology and Earth System Sciences
27
,
1047
1075
.
https://doi.org/10.5194/hess-27-1047-2023
.
Tran
B.
,
Nguyen
Q.
,
Shrestha
S.
&
Nguyen
T.
2021
scIDS: single-cell Imputation by combining deep autoencoder neural networks and subspace regression
. In:
The 13th International Conference on Knowledge and Systems Engineering (KSE)
,
Bangkok, Thailand
, pp.
1
8
.
doi:10.1109/KSE53942.2021.9648664
.
Yang
J. H.
,
Cheng
C. H.
&
Chan
C. P.
2017
A time-series water level forecasting model based on imputation and variable selection method
.
Computational Intelligence and Neuroscience
,
11
.
2017, 8734214
.
Zaghloul
M. S.
,
Ghaderpour
E.
,
Dastour
H.
,
Farjad
B.
,
Gupta
A.
,
Eum
H.
,
Achari
G.
&
Hassan
Q. K.
2022
Long term trend analysis of river flow and climate in Northern Canada
.
Hydrology
9
,
197
.
https://doi.org/10.3390/hydrology9110197
.
Zhu
S.
,
Luo
X.
,
Yuan
X.
&
Xu
Z.
2020
An improved long short-term memory network for streamflow forecasting in the upper Yangtze river
.
Stochastic Environmental Research and Risk Assessment
34
(
9
),
1313
1329
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data