Bert Bilstm Lstm联合抽取
Bert Bilstm Lstm联合抽取
https://doi.org/10.1007/s00521-021-05815-z (0123456789().,-volV)(0123456789().,-volV)
Abstract
In recent years, as the knowledge graph has attained significant achievements in many specific fields, which has become
one of the core driving forces for the development of the internet and artificial intelligence. However, there is no mature
knowledge graph in the field of agriculture, so it is a great significance study on the construction technology of agricultural
knowledge graph. Named entity recognition and relation extraction are key steps in the construction of knowledge graph.
In this paper, based on the joint extraction model LSTM-LSTM-Bias brought in BERT pre-training language model to
proposed a agricultural entity relationship joint extraction model BERT-BILSTM-LSTM which is applied to the standard
data set NYT and self-built agricultural data set AgriRelation. Experimental results showed that the model can effectively
extracted the relationship between agricultural entities and entities.
Keywords Agricultural knowledge graph Named entity recognition Relation extraction Joint extraction
BERT
123
Neural Computing and Applications
static word embedding [1] in the LSTM-LSTM-Bias model deep learning in different fields, more and more deep
proposed by Zheng et al. [2] with a dynamic fine-tuning learning models are proposed to solve entity recognition
method [3] to solve downstream tasks. Our model effec- problems [10–14].
tively solved the problem that the original model cannot
model polysemous words. 2.2 Relation extraction
The main contributions of this paper are as follows:
Entity relationship describes the association relationship of
1. We have improved the joint extraction model of Zheng
existing things, and it is defined as a certain connection for
et al. [2], which currently has excellent results. We
two or more entities, which is the basis for the automatic
introduced the pre-training language model BERT on
construction of knowledge graph and natural language
the basis of their model [4] and proposed a joint
understanding. Relation extraction is to automatically
extraction model BERT-BILSTM-LSTM. The model
detect and identify a certain semantic relationship between
achieved an F1 score of 55.9% on the NYT standard
entities from the text. It systematically processes various
data set, which is 3.9 percentage points higher than the
unstructured/semi-structured text inputs (such as news
result of Zheng et al.
pages, product pages, Weibo, forum pages), using a variety
2. We constructed the agricultural data set AgriRelation,
of technologies to identify and discover the relationship
and used the BERT-BILSTM-LSTM model to extract
between various predefined categories and open categories,
relation, and obtained a F1 score of 57.6%. It is
which has important theoretical significance and broad
verified that the model can also extract entity relations
application prospects to provide a variety of applications
when the sample data set is small.
important support.
The rest parts of this paper are organized as follows: Relation extraction has been continuously studied in the
Sect. 2 briefly introduces relevant works, Sect. 3 comes up past two decades. Feature engineering [15], kernel methods
with the BERT-BILSTM-LSTM model, and Sect. 4 states [16, 17], and graph models [18] have been widely used in
the environment, data, parameter settings and results them, and some results have been achieved. With the
relating to the experiments with the model. And in the final, advent of the deep learning era, neural network models
the conclusion based on above works is given in Sect. 5. have brought new breakthroughs in relation extraction. In
2014, Zeng et al. [19] improved the accuracy of the rela-
tionship extraction model by extracting the features of
2 Related work word level and sentence level with CNN and classifying
the relationship by combing the hidden layer and softmax
2.1 Named entity recognition layer. Nguyen and Grishman [20] improved on Zeng’s
work by adding a multi-size convolution kernel and
Entity is an important language unit that carries informa- extracting the characteristics of sentences level. Santos
tion in the text. A fundamental semantic expression can be et al. modified the loss function used in Zeng’s model into
expressed as the entities that contains and the association a new pairwise ranking loss function [21]. Considering the
and interaction among these entities. Entities are also the unsatisfactory modeling effect of CNN for long distance
core units of knowledge graph. Knowledge graph is usually text sequences, Socher et al. took the lead in using RNN for
a huge knowledge network with entities as nodes. Named entity relationship extraction [22]. Zhou et al. [23] com-
entity recognition refers to the task of recognizing named bined attention and BiLSTM to conduct the experiment of
entities in the text and classifying them into designated relationship classification. Lin et al. [24] proposed a self-
categories, which is the basis for understanding the training framework and built a recursive neural network
meaning of text. NER technology can detect new entities in embedded with multiple semantic isomeric elements within
the text and add them to the existing database. It is the core the framework. Zhang et al. [25] proposed an extended
technology of knowledge graph construction. graph convolutional neural network, which can effectively
Since the 1990s, statistical models have been the process arbitrary-dependent structures in parallel and
mainstream method of entity recognition. There are many facilitate the extraction of entity relations. Zhu et al. [26]
statistical methods used to extract entities in text, such as proposed a method to generate graph neural network
hidden Markov model [5, 6], Maximum Entropy model parameters based on natural language statements to enable
[7, 8] and Support Vector Machines [9]. However, tradi- the neural network to perform relational reasoning on
tional statistical models require a large amount of anno- unstructured text input. In addition, BERT is being used in
tated corpus to learn information, which leads to the more and more relational extraction models for pre-train-
bottleneck of constructing information extraction system in ing. Shi and Lin [27] proposed a simple model based on
open domain or Web environment. With the popularity of BERT, which can be used for relationship extraction and
123
Neural Computing and Applications
semantic role annotation. Shen et al. [28] used BERT to 3.1 Label mode
extract the relationship between characters, reducing the
impact of noise data on the relation extraction model. The BERT-BILSTM-LSTM model adopts the label mode
consistent with the LSTM-LSTM-Bias model. This mode
2.3 Joint extraction is composed of three parts: the location information, the
relation type information and the role information of the
The term joint learning is not a term that has only recently entities. The B, I, E in the labels represent the starting
appeared. In the field of natural language processing, words, internal words, and ending words of the entities, and
researchers have long used joint models based on tradi- S represents the entities that contain only one word. The
tional machine learning to jointly learn some closely numbers 1 and 2 in the label indicate the order in which the
related natural language processing tasks. Early joint entities appear in the relationship, where the number 1
learning methods mostly for entity and relation extraction indicates the entities that appear first in the relation, and the
used structured systems based on feature engineering number 2 indicates the entities that appear later in the
[29, 30], which required complex feature engineering, relation. For example, the starting word of the entity that
strongly relied on natural language processing tools, and appears first in the Country-President relationship can be
still led to the problem of error propagation. In 2016, the expressed as ‘‘B-CP-1’’. In addition, all other irrelevant
end-to-end model proposed by Miwa and Bansal [31] laid words are marked as ‘‘O’’.
the foundation for various efficient neural network-based
joint extraction models in recent years, but they used a NN 3.2 Model structure
structure to predict entity labels, thus ignoring entities
long-distance dependencies between tags. Zheng et al. [32] The BERT-BILSTM-LSTM model contains a BERT layer,
performed joint learning by sharing the underlying an encoding layer, a decoding layer and a softmax layer.
expressions of neural networks. Li et al. [33] applied the The structure of the model is shown in Fig. 2.
same method to the extraction of entities and relation in
biomedical texts, but the parameter sharing method still has 3.2.1 BERT layer
two subtasks, only that there is interaction between these
two subtasks through parameter sharing. The training The BERT layer accurately learns the semantic information
process is still to identify entities firstly and then perform of words through two steps of pre-training and fine-tuning.
pair-wise matching based on their prediction information to First it uses other large corpus to pre-train the BERT model
classify relationships. This kind of redundant information and then solves the joint extraction problem through fine-
will still be generated for entities with no relationship. tuning. We use the access method shown in Fig. 3 to add
Zheng et al. [2] proposed a new labelling strategy in 2017. the BERT model to the joint extraction model. In Fig. 3, E
The new labelling strategy turns the relation extraction represents the input embedding, Ti is the contextual rep-
involving sequence labelling tasks and classification tasks resentation of the word i, and [CLS] is a special symbol for
into sequence labelling tasks and uses a end-to-end neural classification output. [CLS] is ignored during joint
network model to directly obtain entity-relation triples. Our extraction and marked as ‘‘O’’. When a sentence of length
work focuses on the improvement of this model having the n is input into BERT, a ‘‘[CLS]’’ symbol is added to the
architecture shown in Fig. 1, which mainly includes the beginning of the sentence, the sentence length becomes
layers of inputting, embedding, encoding, decoding and n ? 1, and the corresponding output label adds a label
outputting. ‘‘O’’, and the length becomes n ? 1.
123
Neural Computing and Applications
previous moment and the output vector of the BERT layer
iðtÞ ¼ r Wix xðtÞ þ Wih hðt1Þ þ bi ð1Þ
at the current moment. The structure of each LSTM cell is
shown in Fig. 4.
f ðtÞ ¼ r Wfx xðtÞ þ Wfh hðt1Þ þ bf ð2Þ
The specific calculation formula is as follows:
123
Neural Computing and Applications
weight matrix from the BERT layer to the forget gate, Wfh
represents the weight matrix from the hidden state to the
forget gate, and bf is the bias term of the forget gate. c is
the cell memory. o is the output gate. The formula (6) is the
calculation formula for the output value of the memory
cell, and hðtÞ is the product of the cell memory cðtÞ and the
output gate oðtÞ .
123
Neural Computing and Applications
123
Neural Computing and Applications
123
Neural Computing and Applications
2. Filter text data that contain ‘‘geographic location’’. • ‘‘entityMentions’’: [{‘‘start’’: 0, ‘‘label’’:‘‘PERSON’’,
Select all thesauruses of the geographic and adminis- ‘‘text’’:‘‘Bobby Fischer’’}, ……]
trative districts under the category of ‘‘China’’ in the
Among them, sentText is the original sentence, articleId
Agricultural Thesaurus, and then parse the text part of
is the source of the sentence, and relationMentions is the
the div block with class value of para in the pages of
description of all entity relationships in the sentences. In
fruit crops obtained in the previous step to extract the
relationMentions, em1Text represents entity 1, em2Text
sentences containing China’s geographical and admin-
represents entity 2, label represents the relationship cate-
istrative districts. At the same time, in order to increase
gory, and entityMentions is a description of all entities in
the number of positive samples, we extracted sentences
the sentence. The start in entityMentions represents the
containing words such as ‘‘origin’’ and ‘‘producing
entity position number, label represents the entity category,
area’’.
and text represents the entity content.
3. Process the data and complete the triples. By manually
In order to ensure quality, the test set is manually
complementing sentences that do not contain complete
annotated. The test set contains 24 relation types and 47
triples, we get the data set AgriRelation for relation
entity types. In order to facilitate the comparison of results,
extraction. The AgriRelation contains two parts: train-
we downloaded the data set labelled by Zheng et al. [2] for
ing set and test set. The training set contains 1348
model training. Since the statements at the end of the
sentences and the test set contains 187 sentences.
training set contain few relationships and most of the
4. Annotate data. Manual data annotation is performed on
corresponding output tags are ‘‘O’’, we intercept the pre-
the obtained data set. In this paper, we use entity
vious 66,339 sentences as the training set, and the inter-
location information, relation type information, and
cepted training set has 162 tags (including the label ‘‘O’’).
entity role information to label the entities in the
In order to access the BERT pre-training model, in addition
triples. For example, the sentence ‘‘Baishui County is
to the original 162 tags, we added ‘‘X’’, ‘‘[CLS]’’ and
recognized by experts at home and abroad as one of the
‘‘[SEP]’’, resulting in a total of 165 tags. In order to avoid
best producing areas for apples’’, which contains the
the problem of disappearing gradient when the sentence is
two entities ‘‘Baishui County’’ and ‘‘Apple’’ and their
too long, we refer to the experiments of Zheng et al. [2] and
‘‘producing area’’ relationship. Baishui County is the
set the maximum sentence length. When the sentence
first entity, so it is labelled ‘‘E1’’, and ‘‘Apple’’ is the
length exceeds 50, only the first 50 words are kept as
second entity, so it is labelled ‘‘E2’’. The ‘‘Baishui’’ in
sentence input.
‘‘Baishui County’’ is the start position of the entity,
‘‘County’’ is the end position of the entity, so they are
4.3 Parameter settings
marked as ‘‘E1B’’ and ‘‘E1L’’ respectively. In the same
way, ‘‘Apple’’ is marked as ‘‘E2S’’.
The experiments use the BPTT algorithm to update the
parameters of the model, and use AdamWeightDecayOp-
4.2.2 NYT timizer to optimize. The num_layers of the encoding layer
is 300, the num_layers of the decoding layer is 600, the
In order to be consistent with the experiment of the LSTM- learning_rate is 5e-5, the batch_size is 64, the
LSTM-Bias model proposed by Zheng et al. [2], we use the warmup_proportion is 0.1, and the sentence truncation
NYT public data set to verify the experimental data. The length is 50. The epoch on the agricultural data set is 300,
download address of NYT data set is: https://github.com/ and the epoch on the NYT data set is 50. This paper uses
INK-USC/DS-RelationExtraction. The data set has 24 the public word vectors set trained from Baidu Encyclo-
types of relationships, including two sets of training set and pedia Corpus by SGNS to represent the Chinese sentences.
test set. There are 235,982 sentences in the training set and The download address is: https://github.com/Embedding/
395 sentences in the test set. Each sentence in the training Chinese-Word-Vectors. The size of Chinese word vectors
set consists of 4 parts: ‘‘sentText’’, ‘‘articleId’’, ‘‘rela- is 300 dimensions.
tionMentions’’, and ‘‘entityMentions’’:
4.4 Evaluation indictor
• ‘‘sentText’’: ‘‘But that spasm of irritation by …’’
• ‘‘articleId’’: ‘‘/m/vinci8/data1/riedel/projects/relation/
In order to evaluate the effect of relation extraction, as
kb/nyt1/docstore/nyt-2005–2006.backup/
mentioned in other documents, we use precision, recall,
1677367.xml.pb’’
and F1 to evaluate the experimental results. The formulas
• ‘‘relationMentions’’: [{‘‘em1Text’’:‘‘Bobby Fis-
are defined as follows:
cher’’,‘‘em2Text’’:‘‘Iceland’’, ‘‘label’’:‘‘/people/
person/nationality’’},……]
123
Neural Computing and Applications
123
Neural Computing and Applications
function to BERT-LSTM-LSTM model, which enhances data sets. We also compared the experimental results with
the relationship between related entity pairs and reduces those of the BERT-BILSTM-LSTM-Bias model. The
the influence of invalid entity tags. The experimental experimental results show that adding bias function to the
results show that the F1 value of BERT-BILSTM-LSTM- BERT-BILSTM-LSTM model will not significantly
Bias is not much better than BERT-BILSTM-LSTM improve the extraction efficiency.
model.
Acknowledgements Our works have been achieved significant help
and supporting from Natural Science Foundation of Hunan Province
4.5.2 The experimental results using NYT of China (Grant No. 2019JJ40133), Natural Science Foundation of
Hunan Province of China (Grant No. 2019JJ50239), Scientific
In order to verify the effectiveness of the BERT-BILSTM- Research Fund of Hunan Provincial Education Department of China
(Grant No. 20A249), as well as the Key Research and Development
LSTM model, we also conducted experiments using the
Program of Hunan Province of China (Grant No. 2020NK2033).
standard data set NYT. The experimental results of all
models on the standard data set NYT are shown in Tables 3
Declarations
and 4. The results show that the F1 value of the BERT-
BILSTM-LSTM model is increased by 3.9 percentage Conflict of interest The authors declare that they have no conflict of
points compared with the best results of other models for interest.
the NYT standard data set, indicating that the BERT-
BILSTM-LSTM model can effectively improve the effect
of relation extraction by using the standard data set. References
Moreover, the Recall has also been significantly improved
1. Goldberg Y, Levy O (2014) word2vec explained: deriving
in relation extraction, that is to say the model can identify mikolov et al.’s negative-sampling word-embedding method.
more entity relation triples. In addition, we also test the arXiv:1402.3722
bias model of BERT-BILSTM-LSTM using NYT data set. 2. Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B (2017) Joint
The experimental results showed that the F1 value of extraction of entities and relations based on a novel tagging
scheme. arXiv:1706.05075
BERT-BILSTM-LSTM-Bias model was close to that of the 3. Zhou Z, Shin J, Zhang L, Gurudu S, Gotway M, Liang J (2017)
BERT-BILSTM-LSTM model. Fine-tuning convolutional neural networks for biomedical image
analysis: actively and incrementally. In: Proceedings of the IEEE
conference on computer vision and pattern recognition,
pp 7340–7351
5 Conclusion 4. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-
training of deep bidirectional transformers for language under-
In this paper, we have improved the LSTM-LSTM-Bias standing. arXiv:1810.04805
joint extraction model, and proposed a joint model for 5. Bikel DM, Schwartz R, Weischedel RM (1999) An algorithm that
learns what’s in a name. Mach Learn 34(1–3):211–231
agricultural entity and relation extraction based on BERT 6. Fu G, Luke K-K (2005) Chinese named entity recognition using
model. By using the characteristics of BERT, that different lexicalized hmms. ACM SIGKDD Explor Newsl 7(1):19–25
meanings of the same word can be learned according to the 7. Chieu HL, Ng HT (2002) Named entity recognition: a maximum
context information. In the experiments, we used the BERT entropy approach using global information. In: COLING 2002:
the 19th international conference on computational linguistics
model to replace the commonly used Word2vec model and 8. Uchimoto K, Ma Q, Murata M, Ozaku H, Isahara H
realized the modelling of polysemous words through pre- (2000),‘‘amed entity extraction based on a maximum entropy
training and fine-tuning. It can be seen from Tables 2 and 4 model and transformation rules. In: Proceedings of the 38th
that the F1 value of BERT-BILSTM-LSTM model is annual meeting of the association for computational linguistics,
pp 326–335
improved compared with LSTM-LSTM-Bias for the two 9. Isozaki H, Kazawa H (2002) Efficient support vector classifiers
data sets, which indicates that BERT-BILSTM-LSTM for named entity recognition. In: COLING 2002: the 19th inter-
model is an effective relationship extraction model. How- national conference on computational linguistics
ever, the Recall in Tables 2 and 4 increases while the 10. Chiu JP, Nichols E (2015) Named entity recognition with bidi-
rectional lstm-cnns. arXiv:1511.08308
Precision decreases, indicating that although the model 11. Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models
recognizes more entity relations, some entity relations are for sequence tagging. arXiv:1508.01991
wrong. As can be seen from Tables 1 and 3, the F1 value of 12. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-di-
proposed model for entity recognition is also improved. On rectional LSTM-CNNS-CRF. arXiv:1603.01354
13. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C
NYT data set, entity recognition results also have the sit- (2016) Neural architectures for named entity recognition.
uation that the Recall increases while the Precision arXiv:1603.01360
decreases. But on the data set AgriRelation, the Precision 14. Wu H, Lu L, Yu B (2019) Chinese named entity recognition
and Recall of entity recognition are both improved, which based on transfer learning and bilstm-crf. Small Micro Comput
Syst 40:1142–1147
indicating that the model is also applicable to small sample
123
Neural Computing and Applications
15. Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision 24. Lin C, Miller T, Dligach D, Amiri H, Bethard S, Savova G (2018)
for relation extraction without labeled data. In: Proceedings of the Self-training improves recurrent neural networks performance for
joint conference of the 47th annual meeting of the ACL and the temporal relation extraction. In: Proceedings of the ninth inter-
4th international joint conference on natural language processing national workshop on health text mining and information analy-
of the AFNLP: volume 2-volume 2. Association for Computa- sis, pp 165–176
tional Linguistics, pp 1003–1011 25. Zhang Y, Qi P, Manning CD (2018) Graph convolution over
16. Zelenko D, Aone C, Richardella A (2003) Kernel methods for pruned dependency trees improves relation extraction.
relation extraction. J Mach Learn Res 3(Feb):1083–1106 arXiv:1809.10185
17. Zhou G, Zhang M, Ji D, Zhu Q (2007) Tree kernel-based relation 26. Zhu H, Lin Y, Liu Z, Fu J, Chua T-S, Sun M (2019) Graph neural
extraction with context-sensitive structured parse tree informa- networks with generated parameters for relation extraction.
tion. In: Proceedings of the 2007 joint conference on empirical arXiv:1902.00756
methods in natural language processing and computational nat- 27. Shi P, Lin J (2019) Simple bert models for relation extraction and
ural language learning (EMNLP-CoNLL), pp 728–736 semantic role labeling. arXiv:1904.05255
18. Yao L, Riedel S, McCallum A (2010) Collective cross-document 28. Shen T, Wang D, Feng S, Zhang Y (2019) Bert-based denoising
relation extraction without labelled data. In: Proceedings of the and reconstructing data of distant supervision for relation
2010 conference on empirical methods in natural language pro- extraction. In: CCKS2019-shared task
cessing. Association for Computational Linguistics, 29. Li Q, Ji H (2014) Incremental joint extraction of entity mentions
pp 1013–1023 and relations. In: Proceedings of the 52nd annual meeting of the
19. Zeng D, Liu K, Lai S, Zhou G, Zhao J (2014) Relation classifi- association for computational linguistics (volume 1: long papers),
cation via convolutional deep neural network. In: Proceedings of vol 1, pp 402–412
COLING 2014, the 25th international conference on computa- 30. Miwa M, Sasaki Y (2014) Modeling joint entity and relation
tional linguistics: technical papers, pp 2335–2344 extraction with table representation. In: Proceedings of the 2014
20. Nguyen TH, Grishman R (2015) Relation extraction: perspective conference on empirical methods in natural language processing
from convolutional neural networks. In: Proceedings of the 1st (EMNLP), pp 1858–1869
workshop on vector space modeling for natural language pro- 31. Miwa M, Bansal M (2016) End-to-end relation extraction using
cessing, pp 39–48 lstms on sequences and tree structures. arXiv:1601.00770
21. dos Santos CN, Xiang B, Zhou B (2015) Classifying relations by 32. Zheng S, Hao Y, Lu D, Bao H, Xu J, Hao H, Xu B (2017) Joint
ranking with convolutional neural networks. arXiv:1504.06580 entity and relation extraction based on a hybrid neural network.
22. Socher R, Huval B, Manning CD, Ng AY (2012) Semantic Neurocomputing 257:59–66
compositionality through recursive matrix-vector spaces. In: 33. Li F, Zhang M, Fu G, Ji D (2017) A neural joint model for entity
Proceedings of the 2012 joint conference on empirical methods in and relation extraction from biomedical text. BMC Bioinform
natural language processing and computational natural language 18(1):198
learning, pp 1201–1211 34. Fang LM et al (1994) Agricultural thesaurus (the third volume).
23. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Atten- China Agriculture Press, Beijing, pp 191–192
tion-based bidirectional long short-term memory networks for
relation classification. In: Proceedings of the 54th annual meeting Publisher’s Note Springer Nature remains neutral with regard to
of the association for computational linguistics (volume 2: short jurisdictional claims in published maps and institutional affiliations.
papers), pp 207–212
123