Abstract
Drought disasters have caused serious impacts on the social economy and ecological environment, which are continuously and increasingly exacerbated by climate warming and other factors. Drought disaster management usually involves processing a mass of isolated data from many fields expressed in different terminologies and formats. These heterogeneous data or so-called data silos have greatly hindered drought disaster management in an information-rich manner. Establishing a drought disaster knowledge graph can facilitate the reuse of these heterogeneous data and provide references for drought disaster management, and ontology design and named entity recognition are the two major challenges. Therefore, in this study, we first designed a drought disaster ontology by recognizing the major concepts in the drought disaster field and their relationships, which was implemented with an ontology modeling language. We next constructed a drought disaster corpus and an integrated entity recognition model that was built by integrating multiple deep learning methods. Finally, we applied the integrated entity recognition model to extract information from the CNKI literature database. The integrated model shows satisfactory results in drought disaster named entity recognition. We thus conclude that combining ontology and deep learning technology toward establishing a knowledge graph for drought disasters is promising.
HIGHLIGHTS
Ontology was used to construct the schema for drought disaster knowledge graphs.
A corpus of drought disasters was constructed with unstructured documents.
Automatic drought disaster named entity recognition was achieved by the deep learning method.
INTRODUCTION
Drought disasters are among the most serious natural disasters due to their wide and long-lasting profound impacts on the social economy and ecological environment. It has caused a series of problems, such as reduced agricultural production, forest fires, land desertification and even social unrest and civilization demise (Zhang et al. 2020; Khiabani et al. 2021). Affected by global climate warming and other factors, the frequency of drought disasters, especially severe or extremely severe drought disasters, has increased dramatically (Sheffield & Wood 2008; Luo et al. 2018). Today, the impacts of drought disasters are becoming increasingly serious worldwide (Sheffield & Wood 2008; Zhang et al. 2019; Wu et al. 2022). Drought disaster management research, including research on drought disaster evolution, temporal and spatial drought characteristics, early warning and forecast systems, risk assessment, and management measures, has been a research hotspot among the hydrological community in recent years (Li et al. 2019; Wu et al. 2021). However, drought disasters involve many internal factors, such as disaster-inducing factors, hazard-inducing environments, hazard-affected bodies, and disaster prevention and mitigation abilities. The relationships between these factors are complex and highly uncertain, which poses severe challenges to drought disaster management (Li et al. 2019; Israel et al. 2021).
One of the most crucial factors contributing to the drought disaster management dilemma is the lack of comprehensive drought disaster knowledge. Drought disasters involve a mass of data from different fields involving meteorology, hydrology, agriculture, forestry, ecology, social economy, and many others (Sheffield & Wood 2008; Luo et al. 2018). The terminologies and formats used to describe and store this information differ in many ways. These multisource heterogeneous data form data silos (Luo et al. 2018) and thus hinder the comprehensive understanding of drought disasters and inevitably lead to a negative impact on managing drought disasters in an information-rich manner.
A knowledge graph is a formal description framework of general semantic knowledge (Liu et al. 2018) that Google first proposed in 2012. Knowledge graphs use visualization technology to describe, mine, analyze, construct and display knowledge and its interrelations. They provide an efficient method for organizing, managing and analyzing massive data, making it easier to acquire knowledge more conveniently (Abu-Salih 2021). To date, knowledge graphs have been successfully applied in many fields and have obtained fruitful research results (Weng et al. 2017; Liang et al. 2018; Zhu et al. 2019; Díaz & Vilches-Blázquez 2022; Ge et al. 2022). For instance, Ge et al. (2022) proposed a disaster prediction knowledge graph for disaster prediction by integrating remote sensing information, relevant geographic information and expert knowledge in the disaster analysis field. Weng et al. (2017) used a contextualized knowledge graph embedding model to construct a traditional Chinese medicine knowledge graph from Chinese physicians and applied it to the decision-making process. Establishing a drought disaster knowledge graph can facilitate the reuse of these heterogeneous data and provide references for drought disaster management (Fan et al. 2020). However, the use of knowledge graphs in the drought disaster management field is still challenging due to the complexity of drought disasters and the data silos between different fields. To our knowledge, no reports are applying the knowledge graph method to drought disaster management.
Ontology design and named entity recognition are the two major challenges that must be overcome to establish a knowledge graph (Zheng et al. 2020). Since ontology has a strong semantic expression ability and can effectively eliminate semantic discrepancies between data (Matos et al. 2010), some scholars have successfully constructed flood disaster ontologies based on ontology (Wu et al. 2020; Son et al. 2021). For instance, Son et al. (2021) proposed an ontology for flooding disasters to resolve the heterogeneity among various disaster data and provided interoperability among domains. Hence, in this study, we adopted ontology to define concepts, relationships, properties and individuals and designed the drought disaster ontology accordingly. Another major challenge is automatically extracting information from a large quantity of unstructured documents through named entity recognition of drought disasters. Named entity recognition is one of the fundamental and key tasks for knowledge graph construction (Zheng et al. 2020). In general, entity recognition methods can be divided into three categories: rule-based methods, machine learning methods and deep learning methods (Mu et al. 2020). Since deep learning methods have shown great potential in natural language processing and have been applied successfully to derive knowledge from massive unstructured document resources, they have been widely used to implement named entity recognition (Fan et al. 2020; Mu et al. 2020; Zheng et al. 2020). For instance, Fan et al. (2020) proposed a deep learning-based named entity recognition model to facilitate geological hazard literature reuse and provide a reference for geological hazard governance.
In this study, we aimed to evaluate a method that combines ontology design and named entity recognition toward establishing a drought disaster knowledge graph. First, we designed a drought disaster ontology by recognizing the major concepts in the drought disaster field and their relationships. Then, we implemented the ontology with an ontology modeling language. Second, we constructed a drought disaster corpus and an integrated model based on a combination of bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) to recognize named entities from a large number of unstructured drought disaster documents.
The main contributions of this paper are as follows:
- (1)
The concepts and their relationships to the drought disaster ontology were designed, and the ontology was implemented with an ontology modeling language.
- (2)
A drought disaster corpus was constructed with unstructured documents and is freely available to the public.
- (3)
Deep learning methods were adopted to automatically recognize named entities of drought disasters.
METHODS
Drought disaster knowledge graph construction
Drought disaster ontology design
Design the hierarchical structure of concepts
Identify the relationships between concepts
There are complex relationships among the concepts in the drought disaster ontology. However, these relationships can be roughly categorized as temporal, spatial and semantic relationships.
- (1)
Temporal relationships
Subevents can occur simultaneously or successively in the formation of a drought disaster. Sometimes the temporal relationships between these events can be very difficult to discern, and thus, a comprehensive and accurate definition of temporal relationships is critical for establishing drought disaster ontologies. In this study, we employed six relationships to depict the temporal relationships between drought disaster subevents (Table 1).
- (2)
Spatial relationships
Categories . | Expressions . | Descriptions . |
---|---|---|
Before | Before (A,B) | A occurs before B |
After | After (A,B) | A occurs after B |
During | During (A,B) | B occurred later than A but ended earlier than A |
Meet | Meet (A,B) | A ends when B occurs |
Overlap | Overlap (A,B) | When A occurs, B has not yet occurred, while B was not yet over when A ended |
Disjoint | Disjoint (A,B) | The occurrence time of A and B is discontinuous |
Categories . | Expressions . | Descriptions . |
---|---|---|
Before | Before (A,B) | A occurs before B |
After | After (A,B) | A occurs after B |
During | During (A,B) | B occurred later than A but ended earlier than A |
Meet | Meet (A,B) | A ends when B occurs |
Overlap | Overlap (A,B) | When A occurs, B has not yet occurred, while B was not yet over when A ended |
Disjoint | Disjoint (A,B) | The occurrence time of A and B is discontinuous |
- (3)
Semantic relationships
In addition to the temporal and spatial relationships, the remaining relationships within concepts or instances and relationships between concepts and instances were defined as semantic relationships (Table 2).
Categories . | Expressions . | Descriptions . |
---|---|---|
hasSubclass | hasSubclass (A,B) | A has subclass B |
hasIndividual | hasIndividual (A,B) | Class A has individual B |
isPartOf | isPartOf (A,B) | B is part of A |
Homologous | Homologous (A,B) | A and B are homologous |
Amplify | Amplify (A,B) | The occurrence of A and B simultaneously leads to the occurrence of another disaster |
Cause | Cause (A,B) | Disaster A causes hazard-affected body B |
Induce | Induce (A,B) | A induces B |
Categories . | Expressions . | Descriptions . |
---|---|---|
hasSubclass | hasSubclass (A,B) | A has subclass B |
hasIndividual | hasIndividual (A,B) | Class A has individual B |
isPartOf | isPartOf (A,B) | B is part of A |
Homologous | Homologous (A,B) | A and B are homologous |
Amplify | Amplify (A,B) | The occurrence of A and B simultaneously leads to the occurrence of another disaster |
Cause | Cause (A,B) | Disaster A causes hazard-affected body B |
Induce | Induce (A,B) | A induces B |
Implementation of the drought disaster ontology
Web ontology language (OWL) is an ontology implementation language recommended by the Worldwide Web Consortium (W3C) that provides rich semantic elements and has strong semantic expression ability. In this study, OWL was used as the implementation language, and Protege5.5, a widely used visual tool for ontology construction, was adopted as the development tool to implement the drought disaster ontology. The main implementation steps are described as follows.
- (1)
Define classes and their hierarchy
Based on the concept design of the drought disaster ontology, the basic schema consisting of drought disaster concepts and their relationships was established. This diagram includes four top-level classes and their corresponding subclasses. The subclasses can be further divided into subordinate subclasses. The inheritance relationships between class and subclass are represented by the hierarchy of classes. For example, the top-level class DroughtDisaster is inherited by the DisasterPreventionMitigation class, and the DisasterPreventionMitigation class, in turn, is the father class of NonEngineeringMeasures, which is further inherited by the methods, data and region subclasses. These inheritance relationships between classes form a multilevel inheritance tree.
- (2)
Define class properties and individuals
The class properties in the ontology include object and data properties. Object properties represent the relationships between classes. Based on the ontology design of drought disasters, object properties were defined by setting the domain and range to classes involved in the relationships. Data properties were used to represent the internal data characteristics of the class, which were determined by the properties of the class in the drought disaster ontology. Individuals were used to describe the members of a class and represent the instance objects of actual interest in the study field.
Drought disaster named entity recognition
Bidirectional encoder representations from transformers
BERT is a pretraining natural language model based on the transformer encoder proposed by Google in 2018 (Nguyen et al. 2022). The BERT training process includes two stages: the pretraining stage and the dynamic fine-tuning stage. In the pretraining stage, the inputs are composed of token embeddings, segment embeddings and position embeddings. The masked language model (MLM) and next sentence prediction (NSP) are used to pretrain transformer encoders to generate token embedding representations with rich semantic features. In the dynamic fine-tuning stage, the token embeddings are dynamically fine-tuned according to the specific task, so the ability of token embedding representation to represent the text context of the task is further strengthened. In this study, BERT was adopted to encode each character of the drought disaster text to generate the token embedding sequences of the sentences and dynamically fine-tune the token embeddings according to the context to generate the token embedding matrix. BERT can solve the problem that traditional token embedding generation methods cannot adapt to the context of specific tasks effectively.
Bidirectional long short-term memory
Long short-term memory (LSTM) is a typical recurrent neural network that is good at discovering the correlation between characters, capturing long-term contextual sequence information of a corpus and possessing the ability of a neural network to fit nonlinearity (Zanfei et al. 2022). It uses gated units to realize long-term memory and solves the gradient disappearance or gradient explosion problem during the training of cyclic neural networks. It controls the memory unit state using input gates, forget gates and output gates. The input gates determine the input data that need to be saved to the memory units at the present moment, the forget gates determine the memory units from the previous moment that need to be retained to the present moment and the output gates control the current memory units that need to be output. The disadvantage of LSTM is that it can only contain forward information but cannot obtain backward information. BiLSTM can overcome the LSTM shortcoming because it is a combination of the forward and backward LSTM, which can obtain the forward and backward contextual information simultaneously. In this study, BiLSTM was used to capture the long-distance contextual information in the drought disaster text. The entities of drought disaster text were recognized from the forward and backward directions, effectively improving the named entity recognition performance.
Conditional random field
CRF is an undirected graph model of probability. Given the input random variables, it can calculate the conditional probability distribution of the output random variables (Hiroyuki & Hitoshi 1994). The advantage of the CRF is that it can fully consider the local features of the adjacent tags in a sentence, learn the constraint information of the adjacent tags and obtain the optimal tag sequence through data training. Therefore, the combination of BiLSTM and CRF can compensate for the shortcomings of BiLSTM. The combined model not only has the advantage of long-term memory but also considers the local dependence among the adjacent tags. In this study, CRF was adopted to predict the entity tags of drought disasters. The logarithmic likelihood method was used to maximize the likelihood probability of the tag sequence, the group of tag sequences with the highest overall probability was decoded and the drought disaster entity identification prediction result was output.
Integrated model
Experimental setup
Corpus construction
Web crawler technology was used for data acquisition in this study. A total of 498 studies were retrieved from the Chinese knowledge information gateway (CNKI) database with the search criteria of ‘title = Drought disaster’. BeautifulSoup (https://www.crummy.com/software/BeautifulSoup/), a web crawler implemented in Python, was used for data acquisition, and the acquisition results were saved as ‘.txt’ files. Some irrelevant or invalid studies were filtered, and 422 studies were retained as the raw materials for the corpus construction. The abstracts of these studies were further extracted from the raw materials. Specifically, the abstract residing within the span label with an ID of ‘ChDivSummary’ was extracted. For each abstract, auxiliary words that appeared repeatedly but with no or little value for textual analysis were further truncated to facilitate the subsequent extraction of named entities.
In this study, supervised learning based on deep learning methods was adopted for named entity recognition. This determines that input texts for the named entity recognition model need labeling. In the research literature on drought disasters (Yang 2018), most of the literature addressed three basic elements: the proposed methods, the data used and the study region. These three entities are critically important for readers to understand the studied drought disaster event. Therefore, in this study, we took the subclasses of NonEngineeringMeasures (i.e., methods, data and region subordinate classes) as an example to demonstrate whether combining ontology and deep learning methods is feasible to establish knowledge graphs for drought disasters. For named entity recognition, the BIO annotation method is generally used for labeling (Zheng et al. 2020), where ‘B’, ‘I’ and ‘O’ denote whether a text segment is at the beginning, inside or outside of an entity, respectively. Because, in some cases, two entities may reside right next to each other, and the ‘I’ and ‘O’ labels are not sufficient to separate them. Therefore, an additional label, ‘B’, is introduced to avoid this issue. In this paper, the BIO annotation method was adopted to annotate the three named entities (i.e., methods, data and region entities; Table 3).
Entity type . | Method . | Data . | Region . | Nonentity . |
---|---|---|---|---|
Tags | B-MD, I-MD | B-DT, I-DT | B-RG, I-RG | O |
Entity type . | Method . | Data . | Region . | Nonentity . |
---|---|---|---|---|
Tags | B-MD, I-MD | B-DT, I-DT | B-RG, I-RG | O |
In the raw corpus, there are certain patterns among the named entities of drought disasters due to the similar syntax of the abstracts. Using these patterns to design matching rules is helpful for drought disaster named entity extraction, which can effectively reduce the workload of manual annotation. Considering the study of Fan et al. (2020), regular expressions, as matching rules, were adopted to obtain named entities. These regular expressions are shown in Table 4.
Entity type . | Regular expressions . |
---|---|
Methods | ‘.*(provide |apply |improve |utilize |using |put forward |design |invent |set up| construct |achieve |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct) ([\S]+) (method |model).*' |
Data | ‘.*(provide |apply |utilize |using |put forward |design | invent |set up |construct |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct |collect) ([\S]+) (data |material |data set).*' |
Region | ‘.*(located in |in |form |taking) ([\S]+) (area |region |mountain area |river basin |zone | province |city |county |as the research object).*' |
Entity type . | Regular expressions . |
---|---|
Methods | ‘.*(provide |apply |improve |utilize |using |put forward |design |invent |set up| construct |achieve |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct) ([\S]+) (method |model).*' |
Data | ‘.*(provide |apply |utilize |using |put forward |design | invent |set up |construct |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct |collect) ([\S]+) (data |material |data set).*' |
Region | ‘.*(located in |in |form |taking) ([\S]+) (area |region |mountain area |river basin |zone | province |city |county |as the research object).*' |
After applying regular expressions to annotate the raw corpus, the results were further manually checked and corrected. The total number of final annotations was 17,353, and the statistics for each tag are shown in Table 5.
Tags . | B-MD . | I-MD . | B-DT . | I-DT . | B-RG . | I-RG . | O . |
---|---|---|---|---|---|---|---|
The number of the tags | 494 | 2,916 | 435 | 1,900 | 351 | 839 | 10,418 |
Tags . | B-MD . | I-MD . | B-DT . | I-DT . | B-RG . | I-RG . | O . |
---|---|---|---|---|---|---|---|
The number of the tags | 494 | 2,916 | 435 | 1,900 | 351 | 839 | 10,418 |
Experimental environment and parameter setting
The experiment was performed on a workstation geared with an RTX 2080Ti GPU, an Inter(R) Core i7-8700K CPU, and two memory chips with 32 GB memory capacity. The workstation was installed with a Windows 10 64-bit operating system. Python 3.7 served as the programming environment, and PyTorch served as the deep learning framework to support training and running the integrated model. To properly train and validate the performance of the integrated model, the experimental dataset was divided into a training set and a test set with a ratio of approximately 8:2.
The BERT, BiLSTM and CRF integrated model has a large number of parameters. To improve the parameter calibration efficiency, some insensitive parameters were directly set to values derived from the relevant literature (Tang et al. 2022). For instance, the number of neurons in the BiLSTM was set to 256, the number of transformer layers was set to 12 and the length of the text sequence was set to 300. The other parameters, which were sensitive and had remarkable impacts on the experimental results, were optimized with the Adam optimizer. The learning rate was set to 0.0005, the number of epochs was 20, the batch size was 32 and the dropout was 0.5 when carrying out the training and validation.
Evaluation indices
In the above formula, TP represents the number of correctly identified entities in the test set; FP represents the number of entities identified as errors in the test set and FN represents the number of unrecognized entities in the test set. The higher the precision, recall and F1 values are, the better the prediction effect of model, and vice versa.
RESULTS
Implementation results of the drought disaster ontology
Named entity recognition performance
The CRF model, the BiLSTM and CR integrated model (BiLSTM–CRF) and the BERT, BiLSTM and CRF integrated model (BERT–BiLSTM–CRF) were evaluated against the aforementioned corpus. Table 6 shows the performance achieved by these three models.
Type . | Precision (%) . | Recall (%) . | F1 . |
---|---|---|---|
CRF | 62.66 | 68.55 | 65.33 |
BiLSTM–CRF | 76.59 | 81.11 | 78.69 |
BERT–BiLSTM–CRF | 89.83 | 92.95 | 91.21 |
Type . | Precision (%) . | Recall (%) . | F1 . |
---|---|---|---|
CRF | 62.66 | 68.55 | 65.33 |
BiLSTM–CRF | 76.59 | 81.11 | 78.69 |
BERT–BiLSTM–CRF | 89.83 | 92.95 | 91.21 |
More experiments were conducted on three different named entities (i.e., the methods, data and region entities) to further evaluate the recognition performance of different entities. The experimental results are shown in Table 7.
Type . | Evaluate . | Precision (%) . | Recall (%) . | F1 . |
---|---|---|---|---|
CRF | Method | 49.61 | 60.69 | 54.59 |
Data | 76.06 | 76.73 | 76.40 | |
Region | 73.79 | 75.00 | 74.39 | |
BiLSTM–CRF | Method | 67.70 | 77.27 | 72.17 |
Data | 89.02 | 88.60 | 88.81 | |
Region | 77.65 | 77.41 | 77.53 | |
BERT–BiLSTM–CRF | Method | 83.36 | 93.69 | 88.22 |
Data | 97.18 | 91.87 | 94.45 | |
Region | 93.92 | 92.96 | 93.44 |
Type . | Evaluate . | Precision (%) . | Recall (%) . | F1 . |
---|---|---|---|---|
CRF | Method | 49.61 | 60.69 | 54.59 |
Data | 76.06 | 76.73 | 76.40 | |
Region | 73.79 | 75.00 | 74.39 | |
BiLSTM–CRF | Method | 67.70 | 77.27 | 72.17 |
Data | 89.02 | 88.60 | 88.81 | |
Region | 77.65 | 77.41 | 77.53 | |
BERT–BiLSTM–CRF | Method | 83.36 | 93.69 | 88.22 |
Data | 97.18 | 91.87 | 94.45 | |
Region | 93.92 | 92.96 | 93.44 |
DISCUSSION
In this study, we proposed a schema to establish a knowledge graph for drought disaster management by integrating the ontology design and the named entity recognition. The ontology design was used to depict high-level concepts and their internal relationships, which are relatively easy to recognize by consulting with experts and/or the directive of drought disaster management (e.g., the National Drought Management Policy Guidelines of China). On the other hand, named entity recognition was employed to derive low-level information (instances of high-level concepts) from various unstructured data sources. However, the named entity recognition approach we adopted is a supervised learning algorithm, which means a significant amount of manual labor is required to establish the corpus for model training and validation (we did not find any readily available datasets). Due to the lack of manual labor, the established corpus does not cover all the elements recognized in the ontology design at this stage. Therefore, in this study, we mainly focused on evaluating the efficiency and effectiveness of the proposed schema, not on creating a fully functional knowledge graph for drought disaster management. Nevertheless, when we tried to establish a more comprehensive corpus for named entity recognition (still ongoing), we found that a knowledge graph for drought disaster management, even in its primitive form, can be very useful for guiding for the construction of the corpus (model training and validation datasets) and establishing linkages among recognized entities using the relationships between the concepts in the ontology and the annotations defined in the corpus.
An ablation study was conducted to evaluate the performance of the models, i.e., the CRF model, the BiLSTM–CRF model and the BERT–BiLSTM–CRF model. Compared with that of the CRF model, the F1 value of the BiLSTM–CRF model increased by 13.36%. This is because the CRF model is weak in terms of capturing long-distance dependencies, and BiLSTM can effectively capture long-distance text information and compensate for the deficiencies of the CRF model. The experimental results of the BERT–BiLSTM–CRF model were superior to those of the BiLSTM–CRF model with a 12.52% F1 score increase. This is because BERT can take the specific context of the task into account by generating dynamic word embeddings that are tailored to this context. Consistent results can be observed from the overall performance of these models (Table 6) and the individual performance achieved by these models for specific annotations (Table 7). Similar to the method adopted in this paper, the BERT–BiLSTM–CRF model has been used to realize named entity recognition in other fields (Liu et al. 2020; Tang et al. 2022; Xu et al. 2023). For example, Liu et al. (2020) adapted the BERT–BiLSTM–CRF model to improve the accuracy of the entity information extracted from customer voice consultation questions, with an F1 value of 91.53%. Tang et al. (2022) developed an entity recognition method based on the same integrated model to extract the participants of an autonomous transportation system, with an F1 value of 86.81%. The F1 values of the experimental results in these studies were close to the results of our study. We thus believe that the BERT–BiLSTM–CRF model, with a well-established corpus, is feasible for extracting low-level entities to establish a knowledge graph for drought disaster management.
To the best of our knowledge, no knowledge graphs for drought disaster management have been formed to date. This study proposed a schema to fill this gap. However, this study in its primitive stage does not establish a full-fledged drought disaster management knowledge graph; still, it contributed toward this goal in many ways. First, we demonstrated that ontology design can be very useful for recognizing the main concepts and the various relationships between them, and it can provide important guidelines for establishing a training and validation corpus for the proposed named entity recognition model. Second, the BERT–BiLSTM–CRF model was proven to be efficient and effective in terms of extracting instances of the concepts outlined in the ontology through our experiments. Third, the merged results of these two processes indicated that combining ontology and deep learning technology toward establishing a knowledge graph for drought disaster management is feasible. In addition, we also published our corpus to a public repository, which is not trivial as it can save large amounts of time for those wanting to train similar models for extracting entities from various unstructured data sources.
CONCLUSIONS AND FUTURE WORKS
In this study, we designed a drought disaster ontology by recognizing the major concepts and their relationships. The ontology was then implemented with an ontology modeling language. We next prepared a corpus by extracting abstracts from the literature database of CNKI and then annotating the desired entities in the raw materials. Finally, we established an integrated entity recognition coupled model that was built by integrating multiple deep learning methods, including BERT, BiLSTM and CRF. Then, we evaluated the performance of this model in named entity recognition against the prepared corpus by comparison with other integrated or individual models. The BERT–BiLSTM–CRF model showed satisfactory results in recognizing drought disaster named entities, with optimal precision, recall and F1 indices of 89.83, 92.95 and 91.21%, respectively. We thus concluded that combining ontology and deep learning technology toward establishing a knowledge graph for drought disasters is promising.
This study mainly focuses on schema layer construction and name entity recognition, which are the two fundamental and key tasks for knowledge graph construction. We will continue to carry out related research on other aspects of data layer construction (e.g., relationship extraction, knowledge fusion and storage) in the future.
ACKNOWLEDGEMENTS
This study was financially supported by the Natural Science Foundation of Fujian Province (grant numbers 2020J01319 and 2021J011189) and the Science and Technology Project of Quanzhou (grant number 2021N179S).
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories (https://github.com/gisland/Drought-Disaster-Management/).
CONFLICT OF INTEREST
The authors declare there is no conflict.