Drought disasters have caused serious impacts on the social economy and ecological environment, which are continuously and increasingly exacerbated by climate warming and other factors. Drought disaster management usually involves processing a mass of isolated data from many fields expressed in different terminologies and formats. These heterogeneous data or so-called data silos have greatly hindered drought disaster management in an information-rich manner. Establishing a drought disaster knowledge graph can facilitate the reuse of these heterogeneous data and provide references for drought disaster management, and ontology design and named entity recognition are the two major challenges. Therefore, in this study, we first designed a drought disaster ontology by recognizing the major concepts in the drought disaster field and their relationships, which was implemented with an ontology modeling language. We next constructed a drought disaster corpus and an integrated entity recognition model that was built by integrating multiple deep learning methods. Finally, we applied the integrated entity recognition model to extract information from the CNKI literature database. The integrated model shows satisfactory results in drought disaster named entity recognition. We thus conclude that combining ontology and deep learning technology toward establishing a knowledge graph for drought disasters is promising.

  • Ontology was used to construct the schema for drought disaster knowledge graphs.

  • A corpus of drought disasters was constructed with unstructured documents.

  • Automatic drought disaster named entity recognition was achieved by the deep learning method.

Drought disasters are among the most serious natural disasters due to their wide and long-lasting profound impacts on the social economy and ecological environment. It has caused a series of problems, such as reduced agricultural production, forest fires, land desertification and even social unrest and civilization demise (Zhang et al. 2020; Khiabani et al. 2021). Affected by global climate warming and other factors, the frequency of drought disasters, especially severe or extremely severe drought disasters, has increased dramatically (Sheffield & Wood 2008; Luo et al. 2018). Today, the impacts of drought disasters are becoming increasingly serious worldwide (Sheffield & Wood 2008; Zhang et al. 2019; Wu et al. 2022). Drought disaster management research, including research on drought disaster evolution, temporal and spatial drought characteristics, early warning and forecast systems, risk assessment, and management measures, has been a research hotspot among the hydrological community in recent years (Li et al. 2019; Wu et al. 2021). However, drought disasters involve many internal factors, such as disaster-inducing factors, hazard-inducing environments, hazard-affected bodies, and disaster prevention and mitigation abilities. The relationships between these factors are complex and highly uncertain, which poses severe challenges to drought disaster management (Li et al. 2019; Israel et al. 2021).

One of the most crucial factors contributing to the drought disaster management dilemma is the lack of comprehensive drought disaster knowledge. Drought disasters involve a mass of data from different fields involving meteorology, hydrology, agriculture, forestry, ecology, social economy, and many others (Sheffield & Wood 2008; Luo et al. 2018). The terminologies and formats used to describe and store this information differ in many ways. These multisource heterogeneous data form data silos (Luo et al. 2018) and thus hinder the comprehensive understanding of drought disasters and inevitably lead to a negative impact on managing drought disasters in an information-rich manner.

A knowledge graph is a formal description framework of general semantic knowledge (Liu et al. 2018) that Google first proposed in 2012. Knowledge graphs use visualization technology to describe, mine, analyze, construct and display knowledge and its interrelations. They provide an efficient method for organizing, managing and analyzing massive data, making it easier to acquire knowledge more conveniently (Abu-Salih 2021). To date, knowledge graphs have been successfully applied in many fields and have obtained fruitful research results (Weng et al. 2017; Liang et al. 2018; Zhu et al. 2019; Díaz & Vilches-Blázquez 2022; Ge et al. 2022). For instance, Ge et al. (2022) proposed a disaster prediction knowledge graph for disaster prediction by integrating remote sensing information, relevant geographic information and expert knowledge in the disaster analysis field. Weng et al. (2017) used a contextualized knowledge graph embedding model to construct a traditional Chinese medicine knowledge graph from Chinese physicians and applied it to the decision-making process. Establishing a drought disaster knowledge graph can facilitate the reuse of these heterogeneous data and provide references for drought disaster management (Fan et al. 2020). However, the use of knowledge graphs in the drought disaster management field is still challenging due to the complexity of drought disasters and the data silos between different fields. To our knowledge, no reports are applying the knowledge graph method to drought disaster management.

Ontology design and named entity recognition are the two major challenges that must be overcome to establish a knowledge graph (Zheng et al. 2020). Since ontology has a strong semantic expression ability and can effectively eliminate semantic discrepancies between data (Matos et al. 2010), some scholars have successfully constructed flood disaster ontologies based on ontology (Wu et al. 2020; Son et al. 2021). For instance, Son et al. (2021) proposed an ontology for flooding disasters to resolve the heterogeneity among various disaster data and provided interoperability among domains. Hence, in this study, we adopted ontology to define concepts, relationships, properties and individuals and designed the drought disaster ontology accordingly. Another major challenge is automatically extracting information from a large quantity of unstructured documents through named entity recognition of drought disasters. Named entity recognition is one of the fundamental and key tasks for knowledge graph construction (Zheng et al. 2020). In general, entity recognition methods can be divided into three categories: rule-based methods, machine learning methods and deep learning methods (Mu et al. 2020). Since deep learning methods have shown great potential in natural language processing and have been applied successfully to derive knowledge from massive unstructured document resources, they have been widely used to implement named entity recognition (Fan et al. 2020; Mu et al. 2020; Zheng et al. 2020). For instance, Fan et al. (2020) proposed a deep learning-based named entity recognition model to facilitate geological hazard literature reuse and provide a reference for geological hazard governance.

In this study, we aimed to evaluate a method that combines ontology design and named entity recognition toward establishing a drought disaster knowledge graph. First, we designed a drought disaster ontology by recognizing the major concepts in the drought disaster field and their relationships. Then, we implemented the ontology with an ontology modeling language. Second, we constructed a drought disaster corpus and an integrated model based on a combination of bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) to recognize named entities from a large number of unstructured drought disaster documents.

The main contributions of this paper are as follows:

  • (1)

    The concepts and their relationships to the drought disaster ontology were designed, and the ontology was implemented with an ontology modeling language.

  • (2)

    A drought disaster corpus was constructed with unstructured documents and is freely available to the public.

  • (3)

    Deep learning methods were adopted to automatically recognize named entities of drought disasters.

Drought disaster knowledge graph construction

A drought disaster knowledge graph is established in two phases, i.e., constructing the schema layer and the data layer (Figure 1). The schema layer is generally constructed top-down. The first task in this phase includes defining hierarchical concepts, attributes of a concept, and relationships among concepts after full consideration of the disaster-inducing factors, hazard-inducing environment, hazard-affected body, disaster prevention and mitigation ability. The second task is the ontology implementation for drought disasters based on an ontology modeling language. In contrast, constructing the data layer employs a bottom-up method. In this phase, machine learning or deep learning methods are usually employed to recognize named entities from various data sources (e.g., basic data, journal literature, internet resources and social media). The relationships of different entities are recognized, and finally, knowledge is fused and stored. In this study, we mainly focused on the schema layer construction process with ontology and named entity recognition in the data layer using deep learning methods. The following sections will further elucidate these methods.
Figure 1

Construction process of drought disaster knowledge graphs.

Figure 1

Construction process of drought disaster knowledge graphs.

Close modal

Drought disaster ontology design

Design the hierarchical structure of concepts

Similar to the hierarchical structure of a tree, ontology adopts a hierarchical structure to depict the relationship of concepts. A concept is equivalent to a tree node, and the relationships between concepts correspond to edges in a tree, thus lending it great ability and flexibility in representing the hierarchical structure intuitively. The hierarchical structure of drought disaster ontology is designed by referring to various materials pertaining to drought disasters. According to the drought disaster formation process, drought disaster concepts were divided into four categories: disaster-inducing factors, disaster-inducing environment, disaster-affected body, and disaster prevention and mitigation ability. A drought disaster ontology conceptual structure with four levels was defined, as shown in Figure 2.
Figure 2

Conceptual hierarchical structure diagram of the drought disaster ontology.

Figure 2

Conceptual hierarchical structure diagram of the drought disaster ontology.

Close modal

Identify the relationships between concepts

There are complex relationships among the concepts in the drought disaster ontology. However, these relationships can be roughly categorized as temporal, spatial and semantic relationships.

  • (1)

    Temporal relationships

Subevents can occur simultaneously or successively in the formation of a drought disaster. Sometimes the temporal relationships between these events can be very difficult to discern, and thus, a comprehensive and accurate definition of temporal relationships is critical for establishing drought disaster ontologies. In this study, we employed six relationships to depict the temporal relationships between drought disaster subevents (Table 1).

  • (2)

    Spatial relationships

Table 1

Temporal relationships of the drought disaster ontology

CategoriesExpressionsDescriptions
Before Before (A,B) A occurs before B 
After After (A,B) A occurs after B 
During During (A,B) B occurred later than A but ended earlier than A 
Meet Meet (A,B) A ends when B occurs 
Overlap Overlap (A,B) When A occurs, B has not yet occurred, while B was not yet over when A ended 
Disjoint Disjoint (A,B) The occurrence time of A and B is discontinuous 
CategoriesExpressionsDescriptions
Before Before (A,B) A occurs before B 
After After (A,B) A occurs after B 
During During (A,B) B occurred later than A but ended earlier than A 
Meet Meet (A,B) A ends when B occurs 
Overlap Overlap (A,B) When A occurs, B has not yet occurred, while B was not yet over when A ended 
Disjoint Disjoint (A,B) The occurrence time of A and B is discontinuous 

According to spatial characteristics, the spatial relationships were defined from three aspects, i.e., relative distance, topology and direction (Figure 3). It was noted that the three relationship categories were used to depict not only the relationships between subevents in a drought disaster but also the relationships between different drought disasters.
  • (3)

    Semantic relationships

Figure 3

Spatial relationships of the drought disaster ontology.

Figure 3

Spatial relationships of the drought disaster ontology.

Close modal

In addition to the temporal and spatial relationships, the remaining relationships within concepts or instances and relationships between concepts and instances were defined as semantic relationships (Table 2).

Table 2

Semantic relationships of the drought disaster ontology

CategoriesExpressionsDescriptions
hasSubclass hasSubclass (A,B) A has subclass B 
hasIndividual hasIndividual (A,B) Class A has individual B 
isPartOf isPartOf (A,B) B is part of A 
Homologous Homologous (A,B) A and B are homologous 
Amplify Amplify (A,B) The occurrence of A and B simultaneously leads to the occurrence of another disaster 
Cause Cause (A,B) Disaster A causes hazard-affected body B 
Induce Induce (A,B) A induces B 
CategoriesExpressionsDescriptions
hasSubclass hasSubclass (A,B) A has subclass B 
hasIndividual hasIndividual (A,B) Class A has individual B 
isPartOf isPartOf (A,B) B is part of A 
Homologous Homologous (A,B) A and B are homologous 
Amplify Amplify (A,B) The occurrence of A and B simultaneously leads to the occurrence of another disaster 
Cause Cause (A,B) Disaster A causes hazard-affected body B 
Induce Induce (A,B) A induces B 

Implementation of the drought disaster ontology

Web ontology language (OWL) is an ontology implementation language recommended by the Worldwide Web Consortium (W3C) that provides rich semantic elements and has strong semantic expression ability. In this study, OWL was used as the implementation language, and Protege5.5, a widely used visual tool for ontology construction, was adopted as the development tool to implement the drought disaster ontology. The main implementation steps are described as follows.

  • (1)

    Define classes and their hierarchy

Based on the concept design of the drought disaster ontology, the basic schema consisting of drought disaster concepts and their relationships was established. This diagram includes four top-level classes and their corresponding subclasses. The subclasses can be further divided into subordinate subclasses. The inheritance relationships between class and subclass are represented by the hierarchy of classes. For example, the top-level class DroughtDisaster is inherited by the DisasterPreventionMitigation class, and the DisasterPreventionMitigation class, in turn, is the father class of NonEngineeringMeasures, which is further inherited by the methods, data and region subclasses. These inheritance relationships between classes form a multilevel inheritance tree.

  • (2)

    Define class properties and individuals

The class properties in the ontology include object and data properties. Object properties represent the relationships between classes. Based on the ontology design of drought disasters, object properties were defined by setting the domain and range to classes involved in the relationships. Data properties were used to represent the internal data characteristics of the class, which were determined by the properties of the class in the drought disaster ontology. Individuals were used to describe the members of a class and represent the instance objects of actual interest in the study field.

Drought disaster named entity recognition

Bidirectional encoder representations from transformers

BERT is a pretraining natural language model based on the transformer encoder proposed by Google in 2018 (Nguyen et al. 2022). The BERT training process includes two stages: the pretraining stage and the dynamic fine-tuning stage. In the pretraining stage, the inputs are composed of token embeddings, segment embeddings and position embeddings. The masked language model (MLM) and next sentence prediction (NSP) are used to pretrain transformer encoders to generate token embedding representations with rich semantic features. In the dynamic fine-tuning stage, the token embeddings are dynamically fine-tuned according to the specific task, so the ability of token embedding representation to represent the text context of the task is further strengthened. In this study, BERT was adopted to encode each character of the drought disaster text to generate the token embedding sequences of the sentences and dynamically fine-tune the token embeddings according to the context to generate the token embedding matrix. BERT can solve the problem that traditional token embedding generation methods cannot adapt to the context of specific tasks effectively.

Bidirectional long short-term memory

Long short-term memory (LSTM) is a typical recurrent neural network that is good at discovering the correlation between characters, capturing long-term contextual sequence information of a corpus and possessing the ability of a neural network to fit nonlinearity (Zanfei et al. 2022). It uses gated units to realize long-term memory and solves the gradient disappearance or gradient explosion problem during the training of cyclic neural networks. It controls the memory unit state using input gates, forget gates and output gates. The input gates determine the input data that need to be saved to the memory units at the present moment, the forget gates determine the memory units from the previous moment that need to be retained to the present moment and the output gates control the current memory units that need to be output. The disadvantage of LSTM is that it can only contain forward information but cannot obtain backward information. BiLSTM can overcome the LSTM shortcoming because it is a combination of the forward and backward LSTM, which can obtain the forward and backward contextual information simultaneously. In this study, BiLSTM was used to capture the long-distance contextual information in the drought disaster text. The entities of drought disaster text were recognized from the forward and backward directions, effectively improving the named entity recognition performance.

Conditional random field

CRF is an undirected graph model of probability. Given the input random variables, it can calculate the conditional probability distribution of the output random variables (Hiroyuki & Hitoshi 1994). The advantage of the CRF is that it can fully consider the local features of the adjacent tags in a sentence, learn the constraint information of the adjacent tags and obtain the optimal tag sequence through data training. Therefore, the combination of BiLSTM and CRF can compensate for the shortcomings of BiLSTM. The combined model not only has the advantage of long-term memory but also considers the local dependence among the adjacent tags. In this study, CRF was adopted to predict the entity tags of drought disasters. The logarithmic likelihood method was used to maximize the likelihood probability of the tag sequence, the group of tag sequences with the highest overall probability was decoded and the drought disaster entity identification prediction result was output.

Integrated model

To overcome the disadvantages and take advantage of BERT, BiLSTM and CRF, the three models were combined, and an integrated model that chains BERT, BiLSTM and CRF was built (Figure 4). The working flows of the integrated model are depicted as follows. First, the word vectors were calculated from the input drought disaster texts by the pretrained BERT model. Second, the word vectors were transmitted to the BiLSTM layer, and the BiLSTM further extracted the contextual features of drought disaster texts and outputted the score of the tags. Finally, the relationships between the tags were constrained by the CRF layer to obtain the optimal tag sequence, and the corresponding tag for each drought disaster entity was calculated.
Figure 4

Structure diagram of the integrated model. (Note: the input words ‘面板数据模型’ mean ‘panel data model’, which is a method for studying drought disasters; x1, x2, x3, x4, x5 and x6 are the input word vectors; y1, y2, y3, y4, y5 and y6 are the sequence vectors output by the BiLSTM; c1, c2, c3, c4, c5 and c6 are the label sequence representations output by the CRF; MD stands for methods; and ‘B’ and ‘I’ denote whether a text segment is at the beginning or inside of an entity, respectively.)

Figure 4

Structure diagram of the integrated model. (Note: the input words ‘面板数据模型’ mean ‘panel data model’, which is a method for studying drought disasters; x1, x2, x3, x4, x5 and x6 are the input word vectors; y1, y2, y3, y4, y5 and y6 are the sequence vectors output by the BiLSTM; c1, c2, c3, c4, c5 and c6 are the label sequence representations output by the CRF; MD stands for methods; and ‘B’ and ‘I’ denote whether a text segment is at the beginning or inside of an entity, respectively.)

Close modal

Experimental setup

Corpus construction

Web crawler technology was used for data acquisition in this study. A total of 498 studies were retrieved from the Chinese knowledge information gateway (CNKI) database with the search criteria of ‘title = Drought disaster’. BeautifulSoup (https://www.crummy.com/software/BeautifulSoup/), a web crawler implemented in Python, was used for data acquisition, and the acquisition results were saved as ‘.txt’ files. Some irrelevant or invalid studies were filtered, and 422 studies were retained as the raw materials for the corpus construction. The abstracts of these studies were further extracted from the raw materials. Specifically, the abstract residing within the span label with an ID of ‘ChDivSummary’ was extracted. For each abstract, auxiliary words that appeared repeatedly but with no or little value for textual analysis were further truncated to facilitate the subsequent extraction of named entities.

In this study, supervised learning based on deep learning methods was adopted for named entity recognition. This determines that input texts for the named entity recognition model need labeling. In the research literature on drought disasters (Yang 2018), most of the literature addressed three basic elements: the proposed methods, the data used and the study region. These three entities are critically important for readers to understand the studied drought disaster event. Therefore, in this study, we took the subclasses of NonEngineeringMeasures (i.e., methods, data and region subordinate classes) as an example to demonstrate whether combining ontology and deep learning methods is feasible to establish knowledge graphs for drought disasters. For named entity recognition, the BIO annotation method is generally used for labeling (Zheng et al. 2020), where ‘B’, ‘I’ and ‘O’ denote whether a text segment is at the beginning, inside or outside of an entity, respectively. Because, in some cases, two entities may reside right next to each other, and the ‘I’ and ‘O’ labels are not sufficient to separate them. Therefore, an additional label, ‘B’, is introduced to avoid this issue. In this paper, the BIO annotation method was adopted to annotate the three named entities (i.e., methods, data and region entities; Table 3).

Table 3

Tags for named entities (Note: MD, DT and RG are short for methods, data and region, respectively; and ‘B’, ‘I’ and ‘O’ denote whether a text segment is at the beginning, inside or outside of an entity, respectively.)

Entity typeMethodDataRegionNonentity
Tags B-MD, I-MD B-DT, I-DT B-RG, I-RG 
Entity typeMethodDataRegionNonentity
Tags B-MD, I-MD B-DT, I-DT B-RG, I-RG 

In the raw corpus, there are certain patterns among the named entities of drought disasters due to the similar syntax of the abstracts. Using these patterns to design matching rules is helpful for drought disaster named entity extraction, which can effectively reduce the workload of manual annotation. Considering the study of Fan et al. (2020), regular expressions, as matching rules, were adopted to obtain named entities. These regular expressions are shown in Table 4.

Table 4

Regular expressions used for automatic annotation

Entity typeRegular expressions
Methods ‘.*(provide |apply |improve |utilize |using |put forward |design |invent |set up| construct |achieve |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct) ([\S]+) (method |model).*' 
Data ‘.*(provide |apply |utilize |using |put forward |design | invent |set up |construct |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct |collect) ([\S]+) (data |material |data set).*' 
Region ‘.*(located in |in |form |taking) ([\S]+) (area |region |mountain area |river basin |zone | province |city |county |as the research object).*' 
Entity typeRegular expressions
Methods ‘.*(provide |apply |improve |utilize |using |put forward |design |invent |set up| construct |achieve |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct) ([\S]+) (method |model).*' 
Data ‘.*(provide |apply |utilize |using |put forward |design | invent |set up |construct |according to |take |base on |construct |produce |combine |adopt |adopt |by |construct |collect) ([\S]+) (data |material |data set).*' 
Region ‘.*(located in |in |form |taking) ([\S]+) (area |region |mountain area |river basin |zone | province |city |county |as the research object).*' 

After applying regular expressions to annotate the raw corpus, the results were further manually checked and corrected. The total number of final annotations was 17,353, and the statistics for each tag are shown in Table 5.

Table 5

Statistics of the tags in the corpus (Note: MD, DT and RG are short for methods, data and region, respectively; and ‘B’, ‘I’ and ‘O’ denote whether a text segment is at the beginning, inside or outside of an entity, respectively.)

TagsB-MDI-MDB-DT I-DTB-RGI-RGO
The number of the tags 494 2,916 435 1,900 351 839 10,418 
TagsB-MDI-MDB-DT I-DTB-RGI-RGO
The number of the tags 494 2,916 435 1,900 351 839 10,418 

Experimental environment and parameter setting

The experiment was performed on a workstation geared with an RTX 2080Ti GPU, an Inter(R) Core i7-8700K CPU, and two memory chips with 32 GB memory capacity. The workstation was installed with a Windows 10 64-bit operating system. Python 3.7 served as the programming environment, and PyTorch served as the deep learning framework to support training and running the integrated model. To properly train and validate the performance of the integrated model, the experimental dataset was divided into a training set and a test set with a ratio of approximately 8:2.

The BERT, BiLSTM and CRF integrated model has a large number of parameters. To improve the parameter calibration efficiency, some insensitive parameters were directly set to values derived from the relevant literature (Tang et al. 2022). For instance, the number of neurons in the BiLSTM was set to 256, the number of transformer layers was set to 12 and the length of the text sequence was set to 300. The other parameters, which were sensitive and had remarkable impacts on the experimental results, were optimized with the Adam optimizer. The learning rate was set to 0.0005, the number of epochs was 20, the batch size was 32 and the dropout was 0.5 when carrying out the training and validation.

Evaluation indices

Three indices, precision, recall and F1, are usually adopted for evaluating the performance of named entity recognition in many studies (Fan et al. 2020; Mu et al. 2020; Zheng et al. 2020). Therefore, we also used them to assess the effectiveness in this paper. The formulas for these indices are given as follows:

In the above formula, TP represents the number of correctly identified entities in the test set; FP represents the number of entities identified as errors in the test set and FN represents the number of unrecognized entities in the test set. The higher the precision, recall and F1 values are, the better the prediction effect of model, and vice versa.

Implementation results of the drought disaster ontology

The implementation results of the drought disaster ontology are shown in Figures 5 and 6. The implementation results of the drought disaster ontology were saved to a file with the ‘.owl’ suffix, which is an extensible markup language (XML) file in essence and thus makes it more efficient in data sharing and data exchange. Drought disaster ontology can visually display drought disaster domain knowledge from multiple dimensions, such as class, relationship, property and individual, and provides reliable data support for the subsequent named entity identification of drought disasters.
Figure 5

Drought disaster ontology in the OWL format (only a part of the file is shown for clarity).

Figure 5

Drought disaster ontology in the OWL format (only a part of the file is shown for clarity).

Close modal
Figure 6

Visual display of the drought disaster ontology.

Figure 6

Visual display of the drought disaster ontology.

Close modal

Named entity recognition performance

The CRF model, the BiLSTM and CR integrated model (BiLSTM–CRF) and the BERT, BiLSTM and CRF integrated model (BERT–BiLSTM–CRF) were evaluated against the aforementioned corpus. Table 6 shows the performance achieved by these three models.

Table 6

Comparison of the experimental results of different models

TypePrecision (%)Recall (%)F1
CRF 62.66 68.55 65.33 
BiLSTM–CRF 76.59 81.11 78.69 
BERT–BiLSTM–CRF 89.83 92.95 91.21 
TypePrecision (%)Recall (%)F1
CRF 62.66 68.55 65.33 
BiLSTM–CRF 76.59 81.11 78.69 
BERT–BiLSTM–CRF 89.83 92.95 91.21 

More experiments were conducted on three different named entities (i.e., the methods, data and region entities) to further evaluate the recognition performance of different entities. The experimental results are shown in Table 7.

Table 7

Comparison of experimental results of different annotations

TypeEvaluatePrecision (%)Recall (%)F1
CRF Method 49.61 60.69 54.59 
Data 76.06 76.73 76.40 
Region 73.79 75.00 74.39 
BiLSTM–CRF Method 67.70 77.27 72.17 
Data 89.02 88.60 88.81 
Region 77.65 77.41 77.53 
BERT–BiLSTM–CRF Method 83.36 93.69 88.22 
Data 97.18 91.87 94.45 
Region 93.92 92.96 93.44 
TypeEvaluatePrecision (%)Recall (%)F1
CRF Method 49.61 60.69 54.59 
Data 76.06 76.73 76.40 
Region 73.79 75.00 74.39 
BiLSTM–CRF Method 67.70 77.27 72.17 
Data 89.02 88.60 88.81 
Region 77.65 77.41 77.53 
BERT–BiLSTM–CRF Method 83.36 93.69 88.22 
Data 97.18 91.87 94.45 
Region 93.92 92.96 93.44 

In this study, we proposed a schema to establish a knowledge graph for drought disaster management by integrating the ontology design and the named entity recognition. The ontology design was used to depict high-level concepts and their internal relationships, which are relatively easy to recognize by consulting with experts and/or the directive of drought disaster management (e.g., the National Drought Management Policy Guidelines of China). On the other hand, named entity recognition was employed to derive low-level information (instances of high-level concepts) from various unstructured data sources. However, the named entity recognition approach we adopted is a supervised learning algorithm, which means a significant amount of manual labor is required to establish the corpus for model training and validation (we did not find any readily available datasets). Due to the lack of manual labor, the established corpus does not cover all the elements recognized in the ontology design at this stage. Therefore, in this study, we mainly focused on evaluating the efficiency and effectiveness of the proposed schema, not on creating a fully functional knowledge graph for drought disaster management. Nevertheless, when we tried to establish a more comprehensive corpus for named entity recognition (still ongoing), we found that a knowledge graph for drought disaster management, even in its primitive form, can be very useful for guiding for the construction of the corpus (model training and validation datasets) and establishing linkages among recognized entities using the relationships between the concepts in the ontology and the annotations defined in the corpus.

An ablation study was conducted to evaluate the performance of the models, i.e., the CRF model, the BiLSTM–CRF model and the BERT–BiLSTM–CRF model. Compared with that of the CRF model, the F1 value of the BiLSTM–CRF model increased by 13.36%. This is because the CRF model is weak in terms of capturing long-distance dependencies, and BiLSTM can effectively capture long-distance text information and compensate for the deficiencies of the CRF model. The experimental results of the BERT–BiLSTM–CRF model were superior to those of the BiLSTM–CRF model with a 12.52% F1 score increase. This is because BERT can take the specific context of the task into account by generating dynamic word embeddings that are tailored to this context. Consistent results can be observed from the overall performance of these models (Table 6) and the individual performance achieved by these models for specific annotations (Table 7). Similar to the method adopted in this paper, the BERT–BiLSTM–CRF model has been used to realize named entity recognition in other fields (Liu et al. 2020; Tang et al. 2022; Xu et al. 2023). For example, Liu et al. (2020) adapted the BERT–BiLSTM–CRF model to improve the accuracy of the entity information extracted from customer voice consultation questions, with an F1 value of 91.53%. Tang et al. (2022) developed an entity recognition method based on the same integrated model to extract the participants of an autonomous transportation system, with an F1 value of 86.81%. The F1 values of the experimental results in these studies were close to the results of our study. We thus believe that the BERT–BiLSTM–CRF model, with a well-established corpus, is feasible for extracting low-level entities to establish a knowledge graph for drought disaster management.

To the best of our knowledge, no knowledge graphs for drought disaster management have been formed to date. This study proposed a schema to fill this gap. However, this study in its primitive stage does not establish a full-fledged drought disaster management knowledge graph; still, it contributed toward this goal in many ways. First, we demonstrated that ontology design can be very useful for recognizing the main concepts and the various relationships between them, and it can provide important guidelines for establishing a training and validation corpus for the proposed named entity recognition model. Second, the BERT–BiLSTM–CRF model was proven to be efficient and effective in terms of extracting instances of the concepts outlined in the ontology through our experiments. Third, the merged results of these two processes indicated that combining ontology and deep learning technology toward establishing a knowledge graph for drought disaster management is feasible. In addition, we also published our corpus to a public repository, which is not trivial as it can save large amounts of time for those wanting to train similar models for extracting entities from various unstructured data sources.

In this study, we designed a drought disaster ontology by recognizing the major concepts and their relationships. The ontology was then implemented with an ontology modeling language. We next prepared a corpus by extracting abstracts from the literature database of CNKI and then annotating the desired entities in the raw materials. Finally, we established an integrated entity recognition coupled model that was built by integrating multiple deep learning methods, including BERT, BiLSTM and CRF. Then, we evaluated the performance of this model in named entity recognition against the prepared corpus by comparison with other integrated or individual models. The BERT–BiLSTM–CRF model showed satisfactory results in recognizing drought disaster named entities, with optimal precision, recall and F1 indices of 89.83, 92.95 and 91.21%, respectively. We thus concluded that combining ontology and deep learning technology toward establishing a knowledge graph for drought disasters is promising.

This study mainly focuses on schema layer construction and name entity recognition, which are the two fundamental and key tasks for knowledge graph construction. We will continue to carry out related research on other aspects of data layer construction (e.g., relationship extraction, knowledge fusion and storage) in the future.

This study was financially supported by the Natural Science Foundation of Fujian Province (grant numbers 2020J01319 and 2021J011189) and the Science and Technology Project of Quanzhou (grant number 2021N179S).

All relevant data are available from an online repository or repositories (https://github.com/gisland/Drought-Disaster-Management/).

The authors declare there is no conflict.

Abu-Salih
B.
2021
Domain-specific knowledge graphs: A survey
.
Journal of Network & Computer Applications
185
,
103076
.
doi:10.1016/j.jnca.2021.103076
.
Díaz
J. D. R.
&
Vilches-Blázquez
L. M.
2022
Characterizing water quality datasets through multi-dimensional knowledge graphs: A case study of the Bogota river basin
.
Journal of Hydroinformatics
24
(
2
),
295
313
.
doi:10.2166/hydro.2022.070
.
Fan
R.
,
Wang
L.
,
Yan
J.
,
Song
W.
,
Zhu
Y.
&
Chen
X.
2020
Deep learning-based named entity recognition and knowledge graph construction for geological hazards
.
ISPRS International Journal of Geo-Information
9
(
1
),
15
.
doi:10.3390/ijgi9010015
.
Ge
X.
,
Yang
Y.
,
Chen
J.
,
Li
W.
,
Huang
Z.
,
Zhang
W.
&
Peng
L.
2022
Disaster prediction knowledge graph based on multi-source spatio-temporal information
.
Remote Sensing
14
,
1214
.
doi: 10.3390/rs14051214
.
Hiroyuki
K.
&
Hitoshi
M.
1994
Conditioned stochastic processes for conditional random fields
.
Journal of Engineering Mechanics
120
(
4
),
855
875
.
doi: 10.1061/(ASCE)0733-9399 (1994)120: 4(855)
.
Israel
R. O.
,
Johanes
A. B.
&
Olusola
O. O.
2021
Drought disaster monitoring using MODIS derived index for drought years: A space-based information for ecosystems and environmental conservation
.
Journal of Environmental Management
284
,
112028
.
doi:10.1016/j.jenvman. 2021.112028
.
Khiabani
M. Y.
,
Shahdany
S.
,
Hassani
Y.
&
Maestre
J. M.
2021
Introducing an economic agricultural water distribution in a hyper-arid region: A case study in Iran
.
Journal of Hydroinformatics
23
(
3
),
548
566
.
doi:10.2166/hydro.2021. 008
.
Li
Y. H.
,
Yuan
X.
,
Zhang
H. S.
,
Wang
R. Y.
,
Wang
C. H.
,
Meng
X. H.
,
Zhang
Z. Q.
,
Wang
S. S.
,
Yang
Y.
&
Han
B.
2019
Mechanisms and early warning of drought disasters: Experimental drought meteorology research over China
.
Bulletin of the American Meteorological Society
100
(
4
),
673
687
.
doi:673-687.10.1175/ BAMS-D-17- 0029.1
.
Liang
Y.
,
Xu
F.
,
Zhang
S. H.
,
Lai
Y. K.
&
Mu
T.
2018
Knowledge graph construction with structure and parameter learning for indoor scene design
.
Computational Visual Media
4
(
2
),
123
137
.
Liu
W.
,
Liu
J.
,
Wu
M.
,
Abbas
S.
,
Hu
W.
,
Wei
B.
&
Zheng
Q.
2018
Representation learning over multiple knowledge graphs for knowledge graphs alignment
.
Neurocomputing
320
,
12
24
.
doi: 10.1016/j.neucom. 2018.08.070
.
Liu
J.
,
Sun
C.
&
Yuan
Y.
2020
The BERT-BiLSTM-CRF question event information extraction method
. In:
2020 IEEE 3rd International Conference on Electronic Information and Communication Technology (ICEICT)
.
IEEE
.
doi:10.1109/ICEICT51264.2020.9334197
.
Luo
D.
,
Ye
L. L.
,
Zhai
Y. L.
,
Zhu
H. Y.
&
Qian
Q. C.
2018
Hazard assessment of drought disaster using a grey projection incidence model for the heterogeneous panel data
.
Grey Systems: Theory and Application
8
(
4
),
509
526
.
doi:10.1108/GS-05-2018-0020
.
Matos
E. E.
,
Campos
F.
,
Braga
R.
&
Palazzi
D.
2010
CelOWS: An ontology based framework for the provision of semantic web services related to biological models
.
Journal of Biomedical Informatics
43
(
1
),
125
136
.
doi:10.1016/j.jbi.2009. 08.008
.
Mu
X. F.
,
Wang
W.
&
Xu
A. P.
2020
Incorporating token-level dictionary feature into neural model for named entity recognition
.
Neurocomputing
375
,
43
50
.
doi: 10.1016/j.neucom.2019. 09.005
.
Nguyen
Q. K. L.
,
Quang-Thai
H.
,
Van-Nui
N.
&
Jung-Su
C.
2022
BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection
.
Computational Biology and Chemistry
99
,
107732
.
doi: 10.1016/j.compbiolchem.2022.107732
.
Son
J.
,
Chul-Su
L.
&
Shim
H. S.
2021
Development of knowledge graph for data management related to flooding disasters using open data
.
Future Internet
13
(
5
),
124
.
doi: 10.3390/ fi13050124
.
Tang
J. J.
,
Tuo
H. N.
,
Liu
Y.
&
Fu
Q.
2022
A method for identifying the participants of autonomous transportation system based on a BERT-Bi-LSTM-CRF model
.
Journal of Transport Information and Safety
40
(
5
),
80
90
.
doi:10.3963/j.jssn.1674-4861.2022.05.009. (in Chinese)
.
Weng
H.
,
Liu
Z.
,
Yan
S.
,
Fan
M.
,
Ou
A.
,
Chen
D.
&
Hao
T.
2017
A framework for automated knowledge graph construction towards traditional Chinese medicine
.
Health Information Science
10594
,
170
181
.
Wu
Z. N.
,
Shen
Y. X.
,
Wang
H. L.
&
Wu
M. M.
2020
An ontology-based framework for heterogeneous data management and its application for urban flood disasters
.
Earth Science Informatics
13
(
2
),
377
390
.
doi:10.1007/s12145-019-00439-3
.
Wu
J. F.
,
Chen
X. H.
,
Yuan
X.
,
Yao
H. X.
,
Zhao
Y. X.
&
AghaKouchak
A.
2021
The interactions between hydrological drought evolution and precipitation-stream flow relationship
.
Journal of Hydrology
597
,
126210
.
doi:10.1016/j.jhydrol.2021.126210
.
Wu
J. F.
,
Yao
H. X.
,
Yuan
X.
&
Lin
B. Q.
2022
Dissolved organic carbon response to hydrological drought characteristics: Based on long-term measurements of headwater streams
.
Water Research
215
,
115252
.
doi:10.1016/j.watres.2022.118252
.
Xu
H.
,
Fan
G.
,
Kuang
G.
&
Wang
C.
2023
Exploring the potential of BERT-BiLSTM-CRF and the attention mechanism in building a tourism knowledge graph
.
Electronics
12
,
1010
.
doi:10.3390/electronics1204 1010
.
Yang
X. Q.
2018
Influence of drought disaster on agricultural products price change: An empirical study based on the panel data of 16 prefectures in Yunnan
.
Chinese Agricultural Science Bulletin.
34
(
24
),
95
100
(in Chinese)
.
Zanfei
A.
,
Brentan
B. M.
,
Menapace
A.
&
Righetti
M.
2022
A short-term water demand forecasting model using multivariate long short-term memory with meteorological data
.
Journal of Hydroinformatics
24
(
5
),
295
313
.
doi:110.2166/hydro.2022.055
.
Zhang
X.
,
Cheng
N.
,
Hao
S.
,
Ip
C.
,
Yang
L.
,
Chen
Y. Z.
,
Sang
Z.
,
Tadesse
T.
,
Tpy
L.
&
Rajabifard
A.
2019
Urban drought challenge to 2030 sustainable development goals
.
Science of the Total Environment
693
(
25
),
133536
.
doi:10.1016/j. scitotenv.2019. 07.342
.
Zhang
Q.
,
Yao
Y.
,
Yao
H. L.
,
Huang
J.
,
Ma
Z. G.
,
Wang
Z.
,
Wang
S.
,
Wang
Y.
&
Zhang
Y.
2020
Causes and changes of drought in China: Research progress and prospects
.
Journal of Meteorological Research
34
(
3
),
460
481
.
doi:10.1007/ s13351-020-9829-8
.
Zheng
X.
,
Wang
B.
,
Zhao
Y.
,
Mao
S.
&
Tang
Y.
2020
A knowledge graph method for hazardous chemical management: Ontology design and entity identification
.
Neurocomputing
430
,
104
111
.
doi:10.1016/j.neucom. 2020.10.095
.
Zhu
P.
,
Zhong
W.
&
Yao
X.
2019
Auto-construction of course knowledge graph based on course knowledge
.
International Journal of Performability Engineering
15
(
8
),
2228
2236
.
doi:10.23940/ijpe.19.08.p23. 22282236
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).