ABSTRACT
Landslides represent a significant natural hazard, especially in water-rich environments where the presence of water can drastically influence slope stability and deformation behavior. Accurate analysis and prediction of landslide deformation in such locations are critical for risk assessment and mitigation. This paper focuses on analysis and prediction of landslides in water environments through machine learning techniques by analyzing hydrological data of that geological location. The study employs the Elman Neural Network (ENN) model to create a predictive model. The ENN predicts future deformation trends based on hydrological data by identifying patterns in soil water. The performance of these models is evaluated using metrics such as accuracy, precision, and recall, ensuring robust validation against real-world data. The results show that the F1 score of the developed prediction system is 85%, which proves the effectiveness of machine learning in predicting landslide deformation based on hydrological data, and provides a reliable tool for the early warning system in landslide prone areas. The developed machine learning-based landslide risk assessment model through hydrological data not only predicts landslides but also can predict the level of groundwater and water quality, which are very helpful for emergency risk assessment and provides solutions to enhance the safety and resilience of communities in landslide-prone zones.
HIGHLIGHTS
The study employs machine learning to analyze and predict landslide deformation based on hydrological data.
The study employs only hydrology of certain geological locations to develop an effective risk modeling system that accurately predicts deformation.
The study is helpful for emergency risk assessment and provides practical solutions to enhance the safety and resilience of communities in landslide-prone zones.
INTRODUCTION
Landslides are a predominant and destructive natural hazard that can result in substantial economic losses, environmental damage, and loss of human life (Fidan et al. 2024). Their occurrence and impact are particularly noticeable in water-rich environments, where hydrological factors such as rainfall, soil moisture, and groundwater levels significantly influence slope stability and deformation behavior (Chen et al. 2023). Therefore, accurate analysis and prediction of landslide deformation in these settings are vital for effective risk assessment and mitigation efforts (Amarasinghe et al. 2024). In past research, predicting landslides often rely on empirical models and historical data (Sotiriadis et al. 2024). However, such approaches are often inadequate in terms of capturing non-linear relationships between different hydrological and geological factors that determine behavior of landslides (Nowicki Jessee et al. 2018). Furthermore, these traditional methods may fail to incorporate temporal dependencies and non-linear relationships within landslide processes (Intrieri et al. 2019).
Several research works have been conducted to predict landslide based on old historical data, geographical locations and hydrological data. Huang et al. (2016) discussed the chaos theory-based discrete wavelet transform-extreme learning machine (DWT-ELM) model which shows good results in predicting landslide displacement, the complexity and computational overhead associated with integrating chaos theory and DWT could potentially limit its practical scalability for real-time applications. Further evaluation of computational efficiency and robustness in diverse geographical settings would strengthen the model's applicability in broader landslide prediction scenarios. Zhou et al. (2018) explored the integration of wavelet transform (WT), artifical bees colony (ABC), and kernalized extreme learning machine (KELM), and achieved good results in improving the accuracy of displacement prediction, the method's complexity and computational demands could hinder its practical implementation in resource-constrained environments or real-time applications. Further investigation into scalability and efficiency is crucial to assess its feasibility for widespread adoption in early warning systems for landslide prediction. The study by Liu et al. (2020) effectively showcases the potential of Long Short-Term Memory (LSTM) and gated recurrent unit (GRU) algorithms in predicting landslide displacements, there is a need for more comprehensive validation across diverse geological and environmental conditions to ensure the robustness of these models. Additionally, addressing the challenges of real-time data integration and model scalability will be critical for practical implementation in early warning systems aimed at mitigating landslide risks. Amarasinghe et al. (2024) comprehensively discussed rainfall-induced landslides in tropical areas, emphasizing rainfall's role in triggering events and identifying key risk factors. While offering valuable risk mitigation strategies, challenges like sparse data and dynamic land use complicated quantitative risk assessments in these areas. Fidan et al. (2024) explores the spatial patterns: of the natural ones in high, minimally disturbed mountainous areas, and anthropogenic ones at lower elevations with gentler slopes and higher human impact, emphasizing human activities as critical factors in landslide risk dynamics.
Recent advancements in machine learning (ML) and Internet of Things (IoT) have opened new avenues for enhancing landslide prediction accuracy and reliability (Sreelakshmi et al. 2022). ML techniques can process large volumes of data and identify intricate patterns and relationships that may not be discernible through traditional methods. This capability makes ML particularly well-suited for analyzing and predicting landslide deformation in water-rich environments, where multiple interdependent variables are at play (Abdalzaher et al. 2023; Cai et al. 2024; Ge et al. 2024; Lu et al. 2024). Marino et al. (2023) has developed an IoT-based low-cost sensor network that demonstrates promise in enhancing hydrological monitoring of landslide-prone areas, offering the potential for early warning system implementation through expanded soil moisture data collection and remote visualization using ESP32 boards and the ThingSpeak platform. Kitterød et al. (2022) explores the hydrology and underwater quality in countries. This technique can be adopted to analyze the hydrological data of the landslide-prone zones. Therefore, the hydrology characteristics in those zones can be obtained, which can be used for the risk assessment. Jia et al. (2023) have developed an optimization based ML technique to predict landslide displacement. In this work, the researcher has used least-squares support vector machine optimized Particle Swarm Optimization (PSO) technique which can give a high accuracy but this method is time-consuming and requires larger data for processing. The study by Song et al. (2024) develops a model for predicting step-like displacement in slow-moving landslides in the Three Gorges Reservoir area, using Empirical Mode Decomposition (EMD) and IPSO-optimized LSTM neural networks. It effectively links displacement with environmental factors, aiding in risk reduction and providing insights into instability mechanisms.
This study aims to develop and apply an ML-based system for the analysis and prediction of landslide deformation in water environments. The focus is on leveraging hydrological data, including soil moisture content, precipitation, groundwater levels, and other relevant factors, to inform the predictive model. Specifically, in this study, we have developed the Elman Neural Network (ENN), a type of recurrent neural network known for its ability to capture time-dependent data. ENNs with backpropagation algorithms have their own advantages, especially when dealing with time series data, making them ideal for predicting events that evolve over time, such as landslide deformation (Gao et al. 2020). This study involves a comprehensive analysis of variables such as soil moisture, precipitation patterns, and groundwater levels, which are critical determinants of slope stability in water-rich environments. The performance of these models is evaluated using metrics such as accuracy, precision, and recall, ensuring robust validation against real-world data. The study has been conducted based on the hydrological data of the Chongqing provinces, China. Based on the obtained results, the developed predictive system demonstrates the efficacy of ML in predicting landslide deformation based on hydrological data, providing a reliable tool for early warning systems in landslide-prone zones. Early and accurate predictions of landslide deformation can significantly enhance risk assessment and mitigation strategies, ultimately contributing to the safety and resilience of communities in landslide-prone areas.
LANDSLIDE OVERVIEW
A landslide is a geological phenomenon involving the movement of rock, earth, or debris down a slope due to gravity. Factors such as water saturation from heavy rainfall, earthquakes, volcanic activity, or human activities can trigger landslides, which can cause significant damage to property and loss of life.
Causes of landslide
Landslides are caused by a combination of natural and human-induced factors that destabilize slopes. Hydrological factors play a significant role, with intense or prolonged rainfall, rapid snowmelt, and rising groundwater levels saturating the soil and reducing its stability. When water infiltrates the soil, it increases pore water pressure, which in turn diminishes the soil's shear strength, making landslides more likely (Barnard et al. 2001). Geological factors are also crucial, as the type of soil and rock, along with their structure and composition, significantly influence slope stability. Weak, loose, or fractured materials are particularly prone to failure. Additionally, human activities such as deforestation, construction, and mining can exacerbate landslide risks by altering the natural landscape and drainage patterns. The removal of vegetation, which stabilizes the soil with its root systems, further increases vulnerability. Earthquakes and volcanic activity are also natural triggers that can induce landslides by shaking the ground and causing slope materials to lose cohesion. Overall, landslides result from a complex interplay of hydrological, geological, and human factors that collectively undermine the stability of slopes (Persichillo et al. 2018).
LANDSLIDE INFLUENCED BY HYDROLOGY
(a) landside due to heavy rainfall, (b) landslides due to river water making persistent wet conditions making chronic soil saturation.
(a) landside due to heavy rainfall, (b) landslides due to river water making persistent wet conditions making chronic soil saturation.
Moreover, riverbank erosion caused by rivers and streams undercutting their banks during flood events frequently leads to landslides. The filling and fluctuation of water levels in reservoirs can induce seismic activity and alter groundwater pressure, potentially triggering landslides, a phenomenon known as reservoir-induced seismicity (Gupta 1992). Figure 1(b) shows the landslides due to river water. The persistent wet conditions due to river water can result in chronic soil saturation, which weakens slope materials over time and increases the likelihood of landslides. Additionally, poor drainage or alterations in natural drainage patterns due to construction or other human activities can exacerbate slope vulnerability. Understanding these hydrological processes is essential for predicting landslide occurrences and implementing effective mitigation strategies, particularly in regions prone to heavy rainfall or significant hydrological changes.
MATERIALS AND METHODS
This section outlines the materials and methods used for investigating landslide prediction in Chongqing province, China. Further, the session details the landslide prediction using hydrological data and discusses the development of ML-based predictive modeling.
Geological location
(a) Landslide due to hydrological interactions in Gorges Reservoir, Chongqing, China, (b) Comprehensive analysis of the landslide triggered by hydrological interactions.
(a) Landslide due to hydrological interactions in Gorges Reservoir, Chongqing, China, (b) Comprehensive analysis of the landslide triggered by hydrological interactions.
Figure 3(b) provides a comprehensive analysis of the landslide triggered by hydrological interactions. The figure clearly indicates that the soil beds have weakened due to the presence of a water body. This weakening is evident in the area where the slide has occurred, specifically in the downhill direction, as indicated by the arrow. The interaction between the water and soil has compromised the soil cohesion, leading to the observed landslide.
Landslide prediction using hydrological data
Hydrological data for this study were collected from various sources, including rain gauges, groundwater monitoring wells, and soil moisture sensors strategically placed throughout the Chongqing province (Yin et al. 2016). These instruments provided continuous measurements of rainfall intensity, groundwater levels, and soil moisture content, essential for understanding the hydrological conditions preceding landslide events. Table 1 shows the average monthly data for a specific location, including rainfall, soil moisture content, groundwater level, reservoir level, and streamflow measured in Chongqing province. The data are measured using wireless sensors and IoT devices. Table 2 shows the annual rainfall data, which is collected from the metrological department of Chongqing. During data collection, we encountered several practical challenges during model implementation, including ensuring data quality, managing computational resources, and integrating with existing systems. These were addressed through rigorous data cleaning, leveraging cloud computing, developing flexible APIs, and providing comprehensive user training to ensure effective deployment and adoption.
Average monthly measured hydrological data
Year-month . | Total rainfall (mm) . | Avg soil moisture (%) . | Min groundwater level (m) . | Max groundwater level (m) . | Avg reservoir level (m) . | Max streamflow (m³/s) . |
---|---|---|---|---|---|---|
2024-Jan | 150 | 25 | 2.8 | 3.4 | 45.5 | 50 |
2024-Feb | 120 | 22 | 2.7 | 3.3 | 45.4 | 40 |
2024-Mar | 180 | 28 | 3.0 | 3.6 | 45.2 | 55 |
2024-Apr | 200 | 30 | 3.1 | 3.8 | 45.0 | 60 |
2024-May | 250 | 35 | 3.5 | 4.0 | 44.8 | 70 |
Year-month . | Total rainfall (mm) . | Avg soil moisture (%) . | Min groundwater level (m) . | Max groundwater level (m) . | Avg reservoir level (m) . | Max streamflow (m³/s) . |
---|---|---|---|---|---|---|
2024-Jan | 150 | 25 | 2.8 | 3.4 | 45.5 | 50 |
2024-Feb | 120 | 22 | 2.7 | 3.3 | 45.4 | 40 |
2024-Mar | 180 | 28 | 3.0 | 3.6 | 45.2 | 55 |
2024-Apr | 200 | 30 | 3.1 | 3.8 | 45.0 | 60 |
2024-May | 250 | 35 | 3.5 | 4.0 | 44.8 | 70 |
Annual rainfall data
Year . | Jan . | Feb . | Mar . | Apr . | May . | Jun . | Jul . | Aug . | Sep . | Oct . | Nov . | Dec . | Annual normal rainfall (mm) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 | 16 | 27 | 19 | 19 | 21 | 200 | 340 | 344 | 212 | 61 | 14 | 4 | 1,277 |
2015 | 5 | 15 | 17 | 51 | 66 | 212 | 398 | 381 | 246 | 116 | 24 | 4 | 1,535 |
2016 | 12 | 20 | 23 | 15 | 17 | 194 | 392 | 394 | 249 | 62 | 8 | 2 | 1,388 |
2017 | 17 | 30 | 22 | 17 | 18 | 191 | 381 | 387 | 194 | 52 | 13 | 5 | 1,327 |
2018 | 18 | 16 | 21 | 8 | 14 | 168 | 475 | 508 | 341 | 57 | 1 | 1 | 1,628 |
2019 | 22 | 22 | 21 | 9 | 15 | 143 | 435 | 465 | 224 | 47 | 1 | 2 | 1,406 |
2020 | 26 | 22 | 15 | 9 | 15 | 170 | 505 | 401 | 212 | 50 | 17 | 5 | 1,447 |
2021 | 20 | 35 | 25 | 19 | 14 | 191 | 508 | 450 | 126 | 61 | 16 | 9 | 1,474 |
2022 | 23 | 2 | 13 | 13 | 13 | 146 | 319 | 283 | 188 | 69 | 22 | 3 | 1,094 |
2023 | 28 | 18 | 13 | 10 | 15 | 177 | 567 | 474 | 292 | 66 | 24 | 5 | 1,690 |
Year . | Jan . | Feb . | Mar . | Apr . | May . | Jun . | Jul . | Aug . | Sep . | Oct . | Nov . | Dec . | Annual normal rainfall (mm) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 | 16 | 27 | 19 | 19 | 21 | 200 | 340 | 344 | 212 | 61 | 14 | 4 | 1,277 |
2015 | 5 | 15 | 17 | 51 | 66 | 212 | 398 | 381 | 246 | 116 | 24 | 4 | 1,535 |
2016 | 12 | 20 | 23 | 15 | 17 | 194 | 392 | 394 | 249 | 62 | 8 | 2 | 1,388 |
2017 | 17 | 30 | 22 | 17 | 18 | 191 | 381 | 387 | 194 | 52 | 13 | 5 | 1,327 |
2018 | 18 | 16 | 21 | 8 | 14 | 168 | 475 | 508 | 341 | 57 | 1 | 1 | 1,628 |
2019 | 22 | 22 | 21 | 9 | 15 | 143 | 435 | 465 | 224 | 47 | 1 | 2 | 1,406 |
2020 | 26 | 22 | 15 | 9 | 15 | 170 | 505 | 401 | 212 | 50 | 17 | 5 | 1,447 |
2021 | 20 | 35 | 25 | 19 | 14 | 191 | 508 | 450 | 126 | 61 | 16 | 9 | 1,474 |
2022 | 23 | 2 | 13 | 13 | 13 | 146 | 319 | 283 | 188 | 69 | 22 | 3 | 1,094 |
2023 | 28 | 18 | 13 | 10 | 15 | 177 | 567 | 474 | 292 | 66 | 24 | 5 | 1,690 |
When analyzed together, these datasets provide comprehensive insights into the hydrological conditions that can lead to landslides. Researchers can identify patterns and triggers associated with landslide events by examining rainfall intensity, soil moisture levels, groundwater fluctuations, and river discharge rates. This integrated analysis allows for a more precise understanding of how different hydrological factors interact to destabilize slopes. The data can be utilized to train advanced ML models, enabling the prediction of landslide occurrences based on historical and real-time hydrological conditions.
Development and deployment of predictive model
In this study, the ENN is developed. ENNs are a type of ML model, which are designed to handle time-dependent data. Unlike traditional feedforward neural networks, ENN have connections that form directed cycles, allowing them to maintain a state and model sequential data effectively. The ENN model effectively captured the non-linear relationships between hydrological factors and landslide events (Jia et al. 2019).
(a) Location from Google Earth where the investigation was carried out, (b) measurement taken zones.
(a) Location from Google Earth where the investigation was carried out, (b) measurement taken zones.
The sigmoid function, S(x) = 1/1 + e − x is used in neural networks to map input values to a range between 0 and 1, which can represent probabilities. It is differentiable and has a simple derivative, S’(x) = S(x)(1 − S(x)).

In weight updation, is the training rate of the input layer.
The developed ENN model is trained using backpropagation technique, an extension of the standard backpropagation algorithm used in feedforward neural networks. The backpropagation technique takes into account the temporal dependencies by unfolding the network over time and adjusting weights accordingly. The propagation mechanism through these layers allows the ENN to continuously refine its predictions, leveraging historical and real-time hydrological data to provide accurate and reliable forecasts of landslide events.
RESULTS AND DISCUSSION
To investigate the performance of the developed ENN-based ML model in predicting landslide deformation, an extensive simulation analysis was carried out in MATLAB. Figure 6(a) shows the Google Earth image of the investigation area. Figure 6(b) highlights the measurement zones, labeled M1, M2, … , M5, where data were collected. The measurements included rainfall, soil moisture, groundwater level, and river streamflow values. These data were gathered using wireless sensors mounted at the specified locations, with data transmission facilitated by IoT devices. The collected data were then preprocessed to handle missing values, normalized to ensure consistent scaling, and used for training and validation of the model.
To process the predictive analysis, initially preprocessing of the datasets was performed. In the preprocessing, missing data points were addressed using interpolation and mean substitution techniques to ensure the completeness of the dataset. Furthermore, data were normalized to ensure all features contributed equally to the model training process. The dataset was split into training and testing sets, typically in a 70:30 ratio, to ensure that the model's performance could be evaluated on unseen data. The training process parameters are listed in Table 3. The ENN model was initialized with a random distribution of input weights and biases. The number of hidden neurons was selected based on cross-validation results to balance complexity and performance. In the training process, input weights and biases were assigned randomly and kept fixed during training, simplifying the training process. The output of the hidden layer was calculated using an activation function, such as the sigmoid or ReLU function. The output weights were computed by minimizing the difference between the predicted and actual displacement values using a least-squares method.
ENN predictive training process
Unit . | Initial value . | Stopped value . | Target value . |
---|---|---|---|
Epoch | 0 | 175 | 1,000 |
Elapsed time (S) | – | 3.5 | |
Performance | 0.103 | 0.000216 | 0 |
Gradient | 0.545 | 0.00435 | 1 × 10−5 |
Validation checks | 0 | 6 | 6 |
Unit . | Initial value . | Stopped value . | Target value . |
---|---|---|---|
Epoch | 0 | 175 | 1,000 |
Elapsed time (S) | – | 3.5 | |
Performance | 0.103 | 0.000216 | 0 |
Gradient | 0.545 | 0.00435 | 1 × 10−5 |
Validation checks | 0 | 6 | 6 |
Figure 13 shows the cumulative displacement of the landslide mass, integrating both periodic and residual movements over the observed period. This figure provides a comprehensive view of the overall progression of the landslide predicted using the developed ENN model.
The model was validated using the testing set to evaluate its predictive accuracy. The key performance metrics such as mean squared error (MSE), root mean squared error (RMSE), and R-squared (R²) score were calculated. The metrics for validating the predictive accuracy is listed in Table 4.
Accuracy assessment of the developed predictive models
Model . | RMSE . | MAE . | R2 . | F1-Score . | Receiver operating characteristic (ROC) curve . |
---|---|---|---|---|---|
ENN | 6.432 | 5.1468 | 0.6667 | 85.2% | 0.92 |
BPNN | 9.7345 | 12.5430 | 0.2333 | 78.7% | 0.81 |
Model . | RMSE . | MAE . | R2 . | F1-Score . | Receiver operating characteristic (ROC) curve . |
---|---|---|---|---|---|
ENN | 6.432 | 5.1468 | 0.6667 | 85.2% | 0.92 |
BPNN | 9.7345 | 12.5430 | 0.2333 | 78.7% | 0.81 |
CONCLUSION
The comprehensive analysis presented in this study has significantly advanced our understanding of landslide dynamics by integrating a wide range of environmental factors, including rainfall, soil moisture, groundwater levels, and river streamflow. The use of advanced data collection techniques and ML algorithms has provided a nuanced view of the variables influencing landslides. Specifically, the ENN model has proven to be a robust tool for predicting landslide deformation, as demonstrated by extensive simulation analyses. The performance of the ENN-based model was rigorously evaluated using key metrics: the MSE of 6.432, RMSE of 5.1468, and R² value of 0.6667, which indicate the model's accuracy and fit. Notably, the F1-score of 85% highlights the model's balance between precision and recall, emphasizing its effectiveness in correctly identifying landslides while minimizing false predictions. This comprehensive evaluation confirms that the model can reliably predict landslide occurrences based on hydrological data. The successful application of this model supports the development of effective mitigation strategies and enhances preparedness in regions prone to landslide hazards.
ACKNOWLEDGEMENT
This research is supported by Chongqing Natural Science Foundation (CSTB2023NSCQ-MSX0907) ‘Research on landslide hazard Risk Assessment and Informatization Monitoring and early warning in Wushan Section of Three Gorges Reservoir Area’, The National Natural Science Foundation of China (U22A20600) and Chongqing Graduate Tutor Team Building Project (JDDSTD2022009).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.