0% found this document useful (0 votes)

19 views11 pages

Bert Bilstm Lstm联合抽取

Uploaded by

871601318

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views11 pages

Bert Bilstm Lstm联合抽取

Uploaded by

871601318

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Neural Computing and Applications

https://doi.org/10.1007/s00521-021-05815-z (0123456789().,-volV)(0123456789().,-volV)

SPECIAL ISSUE ON MULTI-MODAL INFORMATION LEARNING AND ANALYTICS ON BIG DATA

A joint model for entity and relation extraction based on BERT

Bo Qiao1 • Zhuoyang Zou1 • Yu Huang2 • Kui Fang1 • Xinghui Zhu1 • Yiming Chen1

Received: 16 October 2020 / Accepted: 9 February 2021

The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature 2021

Abstract
In recent years, as the knowledge graph has attained significant achievements in many specific fields, which has become
one of the core driving forces for the development of the internet and artificial intelligence. However, there is no mature
knowledge graph in the field of agriculture, so it is a great significance study on the construction technology of agricultural
knowledge graph. Named entity recognition and relation extraction are key steps in the construction of knowledge graph.
In this paper, based on the joint extraction model LSTM-LSTM-Bias brought in BERT pre-training language model to
proposed a agricultural entity relationship joint extraction model BERT-BILSTM-LSTM which is applied to the standard
data set NYT and self-built agricultural data set AgriRelation. Experimental results showed that the model can effectively
extracted the relationship between agricultural entities and entities.

Keywords Agricultural knowledge graph Named entity recognition Relation extraction Joint extraction
BERT

1 Introduction The purpose of NER is to identify entity information with

special or referential significance from the text, and RE is
Information extraction (IE) is the first step in the con- responsible for extracting the entity semantic relationship
struction of knowledge graphs, which is to convert from the text and getting the entity–relation triples
unstructured or semi-structured natural language text into like \entity1, relationship, entity2[.
structured data. Named entity recognition (NER) and The traditional pipeline methods treat entity extraction
relation extraction (RE) are two important subtasks of IE. and relation extraction as two independent processes. After
identified the entities in the sentences, the follow task is to
identify entities combined in pairs and then classified its
& Kui Fang relationship. The pipeline methods are relatively simple in
[email protected]
modelling, but the correlation between the two subtasks is
& Xinghui Zhu not considered in the training process. They are easy to
[email protected]
cause error propagation, and the errors of the entity
Bo Qiao recognition task will affect the performance of subsequent
[email protected]
relationship classification. In addition, unrelated entities
Zhuoyang Zou will bring redundant information, thereby increasing the
[email protected]
error rate.
Yu Huang In recent years, many works have considered the joint
[email protected]
modeling of entity recognition and relationship extraction
Yiming Chen tasks. These end-to-end models have also brought signifi-
[email protected]
cantly better results. However, the existing joint extraction
1
College of Information and Intelligence, Hunan Agricultural models use static word vector representation for word
University, Changsha, Hunan, China embedding, and do not take into account that the same
2
College of Biological Science and Technology, Hunan word may have different semantics, and cannot model
Agricultural University, Changsha, Hunan, China polysemous words. To solve this problem, we replaced the

123
Neural Computing and Applications

static word embedding [1] in the LSTM-LSTM-Bias model deep learning in different fields, more and more deep
proposed by Zheng et al. [2] with a dynamic fine-tuning learning models are proposed to solve entity recognition
method [3] to solve downstream tasks. Our model effec- problems [10–14].
tively solved the problem that the original model cannot
model polysemous words. 2.2 Relation extraction
The main contributions of this paper are as follows:
Entity relationship describes the association relationship of
1. We have improved the joint extraction model of Zheng
existing things, and it is defined as a certain connection for
et al. [2], which currently has excellent results. We
two or more entities, which is the basis for the automatic
introduced the pre-training language model BERT on
construction of knowledge graph and natural language
the basis of their model [4] and proposed a joint
understanding. Relation extraction is to automatically
extraction model BERT-BILSTM-LSTM. The model
detect and identify a certain semantic relationship between
achieved an F1 score of 55.9% on the NYT standard
entities from the text. It systematically processes various
data set, which is 3.9 percentage points higher than the
unstructured/semi-structured text inputs (such as news
result of Zheng et al.
pages, product pages, Weibo, forum pages), using a variety
2. We constructed the agricultural data set AgriRelation,
of technologies to identify and discover the relationship
and used the BERT-BILSTM-LSTM model to extract
between various predefined categories and open categories,
relation, and obtained a F1 score of 57.6%. It is
which has important theoretical significance and broad
verified that the model can also extract entity relations
application prospects to provide a variety of applications
when the sample data set is small.
important support.
The rest parts of this paper are organized as follows: Relation extraction has been continuously studied in the
Sect. 2 briefly introduces relevant works, Sect. 3 comes up past two decades. Feature engineering [15], kernel methods
with the BERT-BILSTM-LSTM model, and Sect. 4 states [16, 17], and graph models [18] have been widely used in
the environment, data, parameter settings and results them, and some results have been achieved. With the
relating to the experiments with the model. And in the final, advent of the deep learning era, neural network models
the conclusion based on above works is given in Sect. 5. have brought new breakthroughs in relation extraction. In
2014, Zeng et al. [19] improved the accuracy of the rela-
tionship extraction model by extracting the features of
2 Related work word level and sentence level with CNN and classifying
the relationship by combing the hidden layer and softmax
2.1 Named entity recognition layer. Nguyen and Grishman [20] improved on Zeng’s
work by adding a multi-size convolution kernel and
Entity is an important language unit that carries informa- extracting the characteristics of sentences level. Santos
tion in the text. A fundamental semantic expression can be et al. modified the loss function used in Zeng’s model into
expressed as the entities that contains and the association a new pairwise ranking loss function [21]. Considering the
and interaction among these entities. Entities are also the unsatisfactory modeling effect of CNN for long distance
core units of knowledge graph. Knowledge graph is usually text sequences, Socher et al. took the lead in using RNN for
a huge knowledge network with entities as nodes. Named entity relationship extraction [22]. Zhou et al. [23] com-
entity recognition refers to the task of recognizing named bined attention and BiLSTM to conduct the experiment of
entities in the text and classifying them into designated relationship classification. Lin et al. [24] proposed a self-
categories, which is the basis for understanding the training framework and built a recursive neural network
meaning of text. NER technology can detect new entities in embedded with multiple semantic isomeric elements within
the text and add them to the existing database. It is the core the framework. Zhang et al. [25] proposed an extended
technology of knowledge graph construction. graph convolutional neural network, which can effectively
Since the 1990s, statistical models have been the process arbitrary-dependent structures in parallel and
mainstream method of entity recognition. There are many facilitate the extraction of entity relations. Zhu et al. [26]
statistical methods used to extract entities in text, such as proposed a method to generate graph neural network
hidden Markov model [5, 6], Maximum Entropy model parameters based on natural language statements to enable
[7, 8] and Support Vector Machines [9]. However, tradi- the neural network to perform relational reasoning on
tional statistical models require a large amount of anno- unstructured text input. In addition, BERT is being used in
tated corpus to learn information, which leads to the more and more relational extraction models for pre-train-
bottleneck of constructing information extraction system in ing. Shi and Lin [27] proposed a simple model based on
open domain or Web environment. With the popularity of BERT, which can be used for relationship extraction and

123
Neural Computing and Applications

semantic role annotation. Shen et al. [28] used BERT to 3.1 Label mode
extract the relationship between characters, reducing the
impact of noise data on the relation extraction model. The BERT-BILSTM-LSTM model adopts the label mode
consistent with the LSTM-LSTM-Bias model. This mode
2.3 Joint extraction is composed of three parts: the location information, the
relation type information and the role information of the
The term joint learning is not a term that has only recently entities. The B, I, E in the labels represent the starting
appeared. In the field of natural language processing, words, internal words, and ending words of the entities, and
researchers have long used joint models based on tradi- S represents the entities that contain only one word. The
tional machine learning to jointly learn some closely numbers 1 and 2 in the label indicate the order in which the
related natural language processing tasks. Early joint entities appear in the relationship, where the number 1
learning methods mostly for entity and relation extraction indicates the entities that appear first in the relation, and the
used structured systems based on feature engineering number 2 indicates the entities that appear later in the
[29, 30], which required complex feature engineering, relation. For example, the starting word of the entity that
strongly relied on natural language processing tools, and appears first in the Country-President relationship can be
still led to the problem of error propagation. In 2016, the expressed as ‘‘B-CP-1’’. In addition, all other irrelevant
end-to-end model proposed by Miwa and Bansal [31] laid words are marked as ‘‘O’’.
the foundation for various efficient neural network-based
joint extraction models in recent years, but they used a NN 3.2 Model structure
structure to predict entity labels, thus ignoring entities
long-distance dependencies between tags. Zheng et al. [32] The BERT-BILSTM-LSTM model contains a BERT layer,
performed joint learning by sharing the underlying an encoding layer, a decoding layer and a softmax layer.
expressions of neural networks. Li et al. [33] applied the The structure of the model is shown in Fig. 2.
same method to the extraction of entities and relation in
biomedical texts, but the parameter sharing method still has 3.2.1 BERT layer
two subtasks, only that there is interaction between these
two subtasks through parameter sharing. The training The BERT layer accurately learns the semantic information
process is still to identify entities firstly and then perform of words through two steps of pre-training and fine-tuning.
pair-wise matching based on their prediction information to First it uses other large corpus to pre-train the BERT model
classify relationships. This kind of redundant information and then solves the joint extraction problem through fine-
will still be generated for entities with no relationship. tuning. We use the access method shown in Fig. 3 to add
Zheng et al. [2] proposed a new labelling strategy in 2017. the BERT model to the joint extraction model. In Fig. 3, E
The new labelling strategy turns the relation extraction represents the input embedding, Ti is the contextual rep-
involving sequence labelling tasks and classification tasks resentation of the word i, and [CLS] is a special symbol for
into sequence labelling tasks and uses a end-to-end neural classification output. [CLS] is ignored during joint
network model to directly obtain entity-relation triples. Our extraction and marked as ‘‘O’’. When a sentence of length
work focuses on the improvement of this model having the n is input into BERT, a ‘‘[CLS]’’ symbol is added to the
architecture shown in Fig. 1, which mainly includes the beginning of the sentence, the sentence length becomes
layers of inputting, embedding, encoding, decoding and n ? 1, and the corresponding output label adds a label
outputting. ‘‘O’’, and the length becomes n ? 1.

3.2.2 Encoding layer

3 Proposed method
The BERT layer is followed by the encoding layer, which
The LSTM-LSTM-Bias joint extraction model uses a static can learn the representation characteristics of the input
word vector representation for word embedding, which data. The encoding layer is a bidirectional LSTM, which
does not take into account that the same word may have consists of two LSTM layers in parallel with a forward
different semantics. In this paper, on the basis of the LSTM and a backward LSTM. Each LSTM layer is com-
LSTM-LSTM-Bias joint extraction model proposed by posed of a series of cyclically connected subnets, and each
Zheng et al. [2], the BERT pre-training model is introduced time step is an LSTM memory block. The LSTM memory
to realize the modeling of polysemous words, and a joint block calculates the state vector of the hidden layer at the
extraction model BERT-BILSTM-LSTM is proposed. current moment based on the state of the hidden layer at the

123
Neural Computing and Applications

Figure1 End-to-end model proposed by Zheng et al.

Fig. 2 BERT-BILSTM-LSTM model

previous moment and the output vector of the BERT layer
iðtÞ ¼ r Wix xðtÞ þ Wih hðt1Þ þ bi ð1Þ
at the current moment. The structure of each LSTM cell is

shown in Fig. 4.
f ðtÞ ¼ r Wfx xðtÞ þ Wfh hðt1Þ þ bf ð2Þ
The specific calculation formula is as follows:

123
Neural Computing and Applications

weight matrix from the BERT layer to the forget gate, Wfh
represents the weight matrix from the hidden state to the
forget gate, and bf is the bias term of the forget gate. c is
the cell memory. o is the output gate. The formula (6) is the
calculation formula for the output value of the memory
cell, and hðtÞ is the product of the cell memory cðtÞ and the
output gate oðtÞ .

3.2.3 Decoding layer

The encoding layer is followed by the decoding layer,

which consists of a single-layer LSTM network, and the
function of the decoding layer is to generate tag sequences.
ðt1Þ
The decoding layer uses the output vector c2 of the
memory unit at the previous moment, the hidden layer state
vðt1Þ at the previous moment, and the current hidden layer
state hðtÞ of the encoding layer to calculate the hidden layer
state value vðtÞ at the current moment. The specific calcu-
lation process is similar to the encoding layer.
Fig. 3 BERT model is combined with the joint extraction task

3.2.4 Softmax layer

ðtÞ
g ¼ tanh Wgx xðtÞ þ Wgh hðt1Þ þ bg ð3Þ
The decoding layer is followed by a softmax layer, which
ðtÞ ðtÞ ðtÞ ðtÞ ðt1Þ is mainly used for normalization processing. The specific
c ¼i g þf c ð4Þ
formula is as follows:
oðtÞ ¼ r Wox xðtÞ þ Woh hðt1Þ þ bo ð5Þ
yt ¼ Wy Tt þ by ð7Þ

ðtÞ
h ¼ tanh cðtÞ oðtÞ ð6Þ pit ¼
expðyit Þ
PN t ð8Þ
j
j¼1 expðyt Þ
Among them, the formula (1) is the calculation formula
Among them, Wy is the softmax matrix and Nt is the
of the input gate i, xðtÞ represents the data of input gate and
number of marks. At the same time, the objective function
the current time step t, Wix represents the weight matrix
L without bias is used. The formula is defined as follows:
from the BERT layer to the input gate, Wih represents the
Lj
jDj X
X
weight matrix from the hidden state to the input gate, and ðjÞ ðjÞ
bi is the bias term of the input gate. The formula (2) is the L ¼ max logðpt ¼ yt jxj ; h ð9Þ
j¼1 t¼1
calculation formula of the forget gate f , Wfx represents the

Fig. 4 A LSTM cell

123
Neural Computing and Applications

Dimension of LSTM unit output (output_size): The size

|D| is the size of the training set, Lj is the length of
ðjÞ of the LSTM output unit, which is generally the same as
sentence xj , yt is the true label of the tth word of sentence the unit state vector.
ðjÞ
xj , and pt is the normalized probability value of the Dimension of LSTM unit input (input_size): The size of
obtained predicted label. the LSTM input unit, which is generally the same as the
unit state vector.
3.3 Training algorithm
3.3.3 Model training setting
3.3.1 Pre-training
trains The data used to train the model.
The pre-training of the BERT model needs to use a large
tests: The data used to test the model.
corpus, which has high requirements on the performance of
max_seq_length: Sentence truncated length.
the corpus and the server. In this paper, we use the BERT
vocab.txt: The dictionary used during BERT model
pre-training model disclosed by Google, which includes
training.
BERT-Base and BERT-large, and each model has two
bert_config.json: The configuration file of the parameters
versions, Uncased and Cased. Among them, the Cased
of the BERT model.
version retains the case of the original text, and the
warmup_proportion: The proportion of warm up steps.
Uncased version converts all uppercase letters in the text to
learning_rate: The magnitude of the progress in the
lowercase before word segmentation and removes all
direction of the gradient.
spoken marks. Because the tasks in this paper do not
batch_size: The number of truncated sequences of loss
require high case sensitivity, the Uncased model is adop-
summary. It only updates the gradient after obtaining the
ted. The download address of all pre-trained models is:
loss sum of a batch of sequences.
https://github.com/google-research/bert.
epoch: The number of times that all training samples
repeatedly perform a forward pass and a reverse pass.
3.3.2 Network structure setting

Number of LSTM layers (num_layers): The number of 3.3.4 Training process

LSTMs in the hidden layer.
The size of the state of the LSTM unit(state_size): The The model training process is shown in Algorithm 1. By
size of the state vector of each LSTM memory unit. At modeling polysemous words, the BERT-BILSTM-LSTM
each moment, the size of the state vector of the entire joint extraction model can learn different semantic infor-
hidden layer is state_size*num_layers. mation of the same word according to context information.

123
Neural Computing and Applications

4 Experiments through web crawler technology crawled from Baidu Baike

refer to the Agricultural Thesaurus [34]. In order to reduce
4.1 Experimental environment the impact of sparse sample, we choose ‘‘fruit’’ and
‘‘geographical location’’ as entities, and ‘‘place of origin’’
The paper carried out experiments on the standard data set as the entity relation after analyzing the agricultural data in
NYT and the self-constructed agricultural data set the Agricultural Thesaurus and Baidu Baike. So that more
AgriRelation. The server used in the experiments had an sentences in the crawled text data include two entities and
Intel Xeon E5-2620 v4 processor and 16G of memory. The the relation. The specific construction steps of the data set
experiments were performed on the Ubuntu16.04 operating are as follows:
system, using python3.5 and tensorflow1.10 to build the 1. Crawl text data for various ‘‘fruits’’. By analyzing the
extraction model and a GPU card of K80 to accelerate the URL address of Baidu Baike, we can know that the
training. Baidu Baike URL has a fixed prefix format: ‘‘https://
baike.baidu.com/item/term’’. Therefore, by replacing
4.2 Data sets the ‘‘term’’ in the URL, you can get the set of seed
URLs that need to be crawled. In order to increase the
4.2.1 AgriRelation number of positive samples, we select all fruit the-
sauruses and their aliases under the category of ‘‘fruit
Since there is no public agricultural relation extraction data crops’’ in the Agricultural Thesaurus for crawling.
set, we constructed the agricultural data set AgriRelation

123
Neural Computing and Applications

2. Filter text data that contain ‘‘geographic location’’. • ‘‘entityMentions’’: [{‘‘start’’: 0, ‘‘label’’:‘‘PERSON’’,
Select all thesauruses of the geographic and adminis- ‘‘text’’:‘‘Bobby Fischer’’}, ……]
trative districts under the category of ‘‘China’’ in the
Among them, sentText is the original sentence, articleId
Agricultural Thesaurus, and then parse the text part of
is the source of the sentence, and relationMentions is the
the div block with class value of para in the pages of
description of all entity relationships in the sentences. In
fruit crops obtained in the previous step to extract the
relationMentions, em1Text represents entity 1, em2Text
sentences containing China’s geographical and admin-
represents entity 2, label represents the relationship cate-
istrative districts. At the same time, in order to increase
gory, and entityMentions is a description of all entities in
the number of positive samples, we extracted sentences
the sentence. The start in entityMentions represents the
containing words such as ‘‘origin’’ and ‘‘producing
entity position number, label represents the entity category,
area’’.
and text represents the entity content.
3. Process the data and complete the triples. By manually
In order to ensure quality, the test set is manually
complementing sentences that do not contain complete
annotated. The test set contains 24 relation types and 47
triples, we get the data set AgriRelation for relation
entity types. In order to facilitate the comparison of results,
extraction. The AgriRelation contains two parts: train-
we downloaded the data set labelled by Zheng et al. [2] for
ing set and test set. The training set contains 1348
model training. Since the statements at the end of the
sentences and the test set contains 187 sentences.
training set contain few relationships and most of the
4. Annotate data. Manual data annotation is performed on
corresponding output tags are ‘‘O’’, we intercept the pre-
the obtained data set. In this paper, we use entity
vious 66,339 sentences as the training set, and the inter-
location information, relation type information, and
cepted training set has 162 tags (including the label ‘‘O’’).
entity role information to label the entities in the
In order to access the BERT pre-training model, in addition
triples. For example, the sentence ‘‘Baishui County is
to the original 162 tags, we added ‘‘X’’, ‘‘[CLS]’’ and
recognized by experts at home and abroad as one of the
‘‘[SEP]’’, resulting in a total of 165 tags. In order to avoid
best producing areas for apples’’, which contains the
the problem of disappearing gradient when the sentence is
two entities ‘‘Baishui County’’ and ‘‘Apple’’ and their
too long, we refer to the experiments of Zheng et al. [2] and
‘‘producing area’’ relationship. Baishui County is the
set the maximum sentence length. When the sentence
first entity, so it is labelled ‘‘E1’’, and ‘‘Apple’’ is the
length exceeds 50, only the first 50 words are kept as
second entity, so it is labelled ‘‘E2’’. The ‘‘Baishui’’ in
sentence input.
‘‘Baishui County’’ is the start position of the entity,
‘‘County’’ is the end position of the entity, so they are
4.3 Parameter settings
marked as ‘‘E1B’’ and ‘‘E1L’’ respectively. In the same
way, ‘‘Apple’’ is marked as ‘‘E2S’’.
The experiments use the BPTT algorithm to update the
parameters of the model, and use AdamWeightDecayOp-
4.2.2 NYT timizer to optimize. The num_layers of the encoding layer
is 300, the num_layers of the decoding layer is 600, the
In order to be consistent with the experiment of the LSTM- learning_rate is 5e-5, the batch_size is 64, the
LSTM-Bias model proposed by Zheng et al. [2], we use the warmup_proportion is 0.1, and the sentence truncation
NYT public data set to verify the experimental data. The length is 50. The epoch on the agricultural data set is 300,
download address of NYT data set is: https://github.com/ and the epoch on the NYT data set is 50. This paper uses
INK-USC/DS-RelationExtraction. The data set has 24 the public word vectors set trained from Baidu Encyclo-
types of relationships, including two sets of training set and pedia Corpus by SGNS to represent the Chinese sentences.
test set. There are 235,982 sentences in the training set and The download address is: https://github.com/Embedding/
395 sentences in the test set. Each sentence in the training Chinese-Word-Vectors. The size of Chinese word vectors
set consists of 4 parts: ‘‘sentText’’, ‘‘articleId’’, ‘‘rela- is 300 dimensions.
tionMentions’’, and ‘‘entityMentions’’:
4.4 Evaluation indictor
• ‘‘sentText’’: ‘‘But that spasm of irritation by …’’
• ‘‘articleId’’: ‘‘/m/vinci8/data1/riedel/projects/relation/
In order to evaluate the effect of relation extraction, as
kb/nyt1/docstore/nyt-2005–2006.backup/
mentioned in other documents, we use precision, recall,
1677367.xml.pb’’
and F1 to evaluate the experimental results. The formulas
• ‘‘relationMentions’’: [{‘‘em1Text’’:‘‘Bobby Fis-
are defined as follows:
cher’’,‘‘em2Text’’:‘‘Iceland’’, ‘‘label’’:‘‘/people/
person/nationality’’},……]

123
Neural Computing and Applications

Table 1 Results of agricultural

Elements Entity1 Entity2
NER
PRF Precision Recall F1 Precision Recall F1

LSTM-CRF 23.5 12.5 16.3 25 15.6 19.2

LSTM-LSTM 29.4 15.6 20.4 33.3 21.9 26.4
LSTM-LSTM-Bias 35.7 31.2 33.3 36.7 34.3 35.5
BERT-BILSTM-LSTM 46.2 43 44.5 44.6 49.2 46.8
BERT-BILSTM-LSTM-Bias (bias = 10) 44.2 43 43.6 51.2 54.7 52.9

Table 2 Results of agricultural RE Table 4 Results of RE

PRF Precision Recall F1 PRF Precision Recall F1

LSTM-CRF 83.3 31.2 45.4 LSTM-CRF 72.4 34.1 46.5

LSTM-LSTM 71.4 31.3 43.5 LSTM-LSTM 70.5 34.0 45.8
LSTM-LSTM-Bias 61.9 40.6 49 LSTM-LSTM-Bias 64.5 43.7 52
BERT-BILSTM-LSTM 61.5 54.2 57.6 BERT-BILSTM-LSTM 61.7 51.2 55.9
BERT-BILSTM-LSTM-Bias (bias = 10) 60.6 55.5 57.9 BERT-BILSTM-LSTM-Bias (bias = 10) 61 51.3 55.7

Table 3 Results of NER

Elements Entity1 Entity2
PRF Precision Recall F1 Precision Recall F1

LSTM-CRF 59.6 32.5 42 60.5 32.5 42.3

LSTM-LSTM 59.3 34.2 43.4 61.9 33.4 43.4
LSTM-LSTM-Bias 59.0 47.9 52.9 59.7 45.1 51.4
BERT-BILSTM-LSTM 58.1 55.7 56.9 57.7 54.4 56
BERT-BILSTM-LSTM-Bias (bias = 10) 57.9 56.4 57.1 57.5 55 56.2

Ecorrect identified. F1 is a comprehensive evaluation of the results

Precision ¼ ð10Þ
Erecognition of Precision and Recall.
Ecorrect
Recall ¼ ð11Þ 4.5 Results
Esample
2 Precision Recall 4.5.1 The experimental results using AgriRelation
F1 ¼ ð12Þ
Precision þ Recall
In the experiments, we used the evaluation function eval-
Because the BERT-BILSTM-LSTM joint extraction
uate_triple in the exaluate.py file written by Zheng et al.
model is not trained with the label of the entity type, there
[2], which directly returns the evaluation results of entity1,
is no need to consider the entity type in the evaluation.
entity2 and relation. In order to make the results objec-
When the relation type of the triple and the head offset of
tively, we train the model 5 times to get the prediction
the two corresponding entities are correct, the triple is
results and take the average. The experimental results of all
considered correct. Ecorrect represents the number of correct
models using the agricultural data set AgriRelation are
triples identified in the output sequence of the model,
shown in Tables 1 and 2. It can be seen from the tables that
Erecognition represents the number of all triples identified in
the BERT-BILSTM-LSTM model has obtained the highest
the output sequence of the model, and Esample represents the
F1 value and Recall value both in entity recognition and
number of triples contained in the data set. Precision
relation extraction. Experimental results show that the
reflects the precision rate, which indicates how many tri-
BERT-BILSTM-LSTM model can extract relation effec-
ples identified are correct triples. Recall reflects the recall
tively when agricultural data sets are in a small corpus.
rate, which indicates how many correct triples have been
Furthermore, we did another experiment to add a bias loss

123
Neural Computing and Applications

function to BERT-LSTM-LSTM model, which enhances data sets. We also compared the experimental results with
the relationship between related entity pairs and reduces those of the BERT-BILSTM-LSTM-Bias model. The
the influence of invalid entity tags. The experimental experimental results show that adding bias function to the
results show that the F1 value of BERT-BILSTM-LSTM- BERT-BILSTM-LSTM model will not significantly
Bias is not much better than BERT-BILSTM-LSTM improve the extraction efficiency.
model.
Acknowledgements Our works have been achieved significant help
and supporting from Natural Science Foundation of Hunan Province
4.5.2 The experimental results using NYT of China (Grant No. 2019JJ40133), Natural Science Foundation of
Hunan Province of China (Grant No. 2019JJ50239), Scientific
In order to verify the effectiveness of the BERT-BILSTM- Research Fund of Hunan Provincial Education Department of China
(Grant No. 20A249), as well as the Key Research and Development
LSTM model, we also conducted experiments using the
Program of Hunan Province of China (Grant No. 2020NK2033).
standard data set NYT. The experimental results of all
models on the standard data set NYT are shown in Tables 3
Declarations
and 4. The results show that the F1 value of the BERT-
BILSTM-LSTM model is increased by 3.9 percentage Conflict of interest The authors declare that they have no conflict of
points compared with the best results of other models for interest.
the NYT standard data set, indicating that the BERT-
BILSTM-LSTM model can effectively improve the effect
of relation extraction by using the standard data set. References
Moreover, the Recall has also been significantly improved
1. Goldberg Y, Levy O (2014) word2vec explained: deriving
in relation extraction, that is to say the model can identify mikolov et al.’s negative-sampling word-embedding method.
more entity relation triples. In addition, we also test the arXiv:1402.3722
bias model of BERT-BILSTM-LSTM using NYT data set. 2. Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B (2017) Joint
The experimental results showed that the F1 value of extraction of entities and relations based on a novel tagging
scheme. arXiv:1706.05075
BERT-BILSTM-LSTM-Bias model was close to that of the 3. Zhou Z, Shin J, Zhang L, Gurudu S, Gotway M, Liang J (2017)
BERT-BILSTM-LSTM model. Fine-tuning convolutional neural networks for biomedical image
analysis: actively and incrementally. In: Proceedings of the IEEE
conference on computer vision and pattern recognition,
pp 7340–7351
5 Conclusion 4. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-
training of deep bidirectional transformers for language under-
In this paper, we have improved the LSTM-LSTM-Bias standing. arXiv:1810.04805
joint extraction model, and proposed a joint model for 5. Bikel DM, Schwartz R, Weischedel RM (1999) An algorithm that
learns what’s in a name. Mach Learn 34(1–3):211–231
agricultural entity and relation extraction based on BERT 6. Fu G, Luke K-K (2005) Chinese named entity recognition using
model. By using the characteristics of BERT, that different lexicalized hmms. ACM SIGKDD Explor Newsl 7(1):19–25
meanings of the same word can be learned according to the 7. Chieu HL, Ng HT (2002) Named entity recognition: a maximum
context information. In the experiments, we used the BERT entropy approach using global information. In: COLING 2002:
the 19th international conference on computational linguistics
model to replace the commonly used Word2vec model and 8. Uchimoto K, Ma Q, Murata M, Ozaku H, Isahara H
realized the modelling of polysemous words through pre- (2000),‘‘amed entity extraction based on a maximum entropy
training and fine-tuning. It can be seen from Tables 2 and 4 model and transformation rules. In: Proceedings of the 38th
that the F1 value of BERT-BILSTM-LSTM model is annual meeting of the association for computational linguistics,
pp 326–335
improved compared with LSTM-LSTM-Bias for the two 9. Isozaki H, Kazawa H (2002) Efficient support vector classifiers
data sets, which indicates that BERT-BILSTM-LSTM for named entity recognition. In: COLING 2002: the 19th inter-
model is an effective relationship extraction model. How- national conference on computational linguistics
ever, the Recall in Tables 2 and 4 increases while the 10. Chiu JP, Nichols E (2015) Named entity recognition with bidi-
rectional lstm-cnns. arXiv:1511.08308
Precision decreases, indicating that although the model 11. Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models
recognizes more entity relations, some entity relations are for sequence tagging. arXiv:1508.01991
wrong. As can be seen from Tables 1 and 3, the F1 value of 12. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-di-
proposed model for entity recognition is also improved. On rectional LSTM-CNNS-CRF. arXiv:1603.01354
13. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C
NYT data set, entity recognition results also have the sit- (2016) Neural architectures for named entity recognition.
uation that the Recall increases while the Precision arXiv:1603.01360
decreases. But on the data set AgriRelation, the Precision 14. Wu H, Lu L, Yu B (2019) Chinese named entity recognition
and Recall of entity recognition are both improved, which based on transfer learning and bilstm-crf. Small Micro Comput
Syst 40:1142–1147
indicating that the model is also applicable to small sample

123
Neural Computing and Applications

15. Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision 24. Lin C, Miller T, Dligach D, Amiri H, Bethard S, Savova G (2018)
for relation extraction without labeled data. In: Proceedings of the Self-training improves recurrent neural networks performance for
joint conference of the 47th annual meeting of the ACL and the temporal relation extraction. In: Proceedings of the ninth inter-
4th international joint conference on natural language processing national workshop on health text mining and information analy-
of the AFNLP: volume 2-volume 2. Association for Computa- sis, pp 165–176
tional Linguistics, pp 1003–1011 25. Zhang Y, Qi P, Manning CD (2018) Graph convolution over
16. Zelenko D, Aone C, Richardella A (2003) Kernel methods for pruned dependency trees improves relation extraction.
relation extraction. J Mach Learn Res 3(Feb):1083–1106 arXiv:1809.10185
17. Zhou G, Zhang M, Ji D, Zhu Q (2007) Tree kernel-based relation 26. Zhu H, Lin Y, Liu Z, Fu J, Chua T-S, Sun M (2019) Graph neural
extraction with context-sensitive structured parse tree informa- networks with generated parameters for relation extraction.
tion. In: Proceedings of the 2007 joint conference on empirical arXiv:1902.00756
methods in natural language processing and computational nat- 27. Shi P, Lin J (2019) Simple bert models for relation extraction and
ural language learning (EMNLP-CoNLL), pp 728–736 semantic role labeling. arXiv:1904.05255
18. Yao L, Riedel S, McCallum A (2010) Collective cross-document 28. Shen T, Wang D, Feng S, Zhang Y (2019) Bert-based denoising
relation extraction without labelled data. In: Proceedings of the and reconstructing data of distant supervision for relation
2010 conference on empirical methods in natural language pro- extraction. In: CCKS2019-shared task
cessing. Association for Computational Linguistics, 29. Li Q, Ji H (2014) Incremental joint extraction of entity mentions
pp 1013–1023 and relations. In: Proceedings of the 52nd annual meeting of the
19. Zeng D, Liu K, Lai S, Zhou G, Zhao J (2014) Relation classifi- association for computational linguistics (volume 1: long papers),
cation via convolutional deep neural network. In: Proceedings of vol 1, pp 402–412
COLING 2014, the 25th international conference on computa- 30. Miwa M, Sasaki Y (2014) Modeling joint entity and relation
tional linguistics: technical papers, pp 2335–2344 extraction with table representation. In: Proceedings of the 2014
20. Nguyen TH, Grishman R (2015) Relation extraction: perspective conference on empirical methods in natural language processing
from convolutional neural networks. In: Proceedings of the 1st (EMNLP), pp 1858–1869
workshop on vector space modeling for natural language pro- 31. Miwa M, Bansal M (2016) End-to-end relation extraction using
cessing, pp 39–48 lstms on sequences and tree structures. arXiv:1601.00770
21. dos Santos CN, Xiang B, Zhou B (2015) Classifying relations by 32. Zheng S, Hao Y, Lu D, Bao H, Xu J, Hao H, Xu B (2017) Joint
ranking with convolutional neural networks. arXiv:1504.06580 entity and relation extraction based on a hybrid neural network.
22. Socher R, Huval B, Manning CD, Ng AY (2012) Semantic Neurocomputing 257:59–66
compositionality through recursive matrix-vector spaces. In: 33. Li F, Zhang M, Fu G, Ji D (2017) A neural joint model for entity
Proceedings of the 2012 joint conference on empirical methods in and relation extraction from biomedical text. BMC Bioinform
natural language processing and computational natural language 18(1):198
learning, pp 1201–1211 34. Fang LM et al (1994) Agricultural thesaurus (the third volume).
23. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Atten- China Agriculture Press, Beijing, pp 191–192
tion-based bidirectional long short-term memory networks for
relation classification. In: Proceedings of the 54th annual meeting Publisher’s Note Springer Nature remains neutral with regard to
of the association for computational linguistics (volume 2: short jurisdictional claims in published maps and institutional affiliations.
papers), pp 207–212

123

4.1.6.relation Extraction
No ratings yet
4.1.6.relation Extraction
6 pages
UNIT_4_DL
No ratings yet
UNIT_4_DL
31 pages
How To Make Pacts With The Devil
80% (5)
How To Make Pacts With The Devil
16 pages
NeuralInformationExtractionFromNaturalLanguageText GuptaPankaj PDF
No ratings yet
NeuralInformationExtractionFromNaturalLanguageText GuptaPankaj PDF
241 pages
NeuralInformationExtractionFromNaturalLanguageText GuptaPankaj PDF
No ratings yet
NeuralInformationExtractionFromNaturalLanguageText GuptaPankaj PDF
241 pages
A Survey On Named Entity Recognition
No ratings yet
A Survey On Named Entity Recognition
12 pages
Text Relation Extraction
No ratings yet
Text Relation Extraction
12 pages
3.4.24_2023_AComprehensive Survey on Deep Learning for Relation Extraction
No ratings yet
3.4.24_2023_AComprehensive Survey on Deep Learning for Relation Extraction
34 pages
Apin D 23 03967
No ratings yet
Apin D 23 03967
29 pages
Information extraction
No ratings yet
Information extraction
25 pages
Yuan_No9_Document-Level Relation Extraction with Local Relation and_Yr2023
No ratings yet
Yuan_No9_Document-Level Relation Extraction with Local Relation and_Yr2023
21 pages
Full Text 01
No ratings yet
Full Text 01
51 pages
NLP Unit 3&4
No ratings yet
NLP Unit 3&4
37 pages
2305 19523
No ratings yet
2305 19523
22 pages
3.4.24_2020_A relationship extraction method for domain knowledge graph construction
No ratings yet
3.4.24_2020_A relationship extraction method for domain knowledge graph construction
19 pages
Nasar 2021
No ratings yet
Nasar 2021
39 pages
Rexuie: A Recursive Method With Explicit Schema Instructor For Universal Information Extraction
No ratings yet
Rexuie: A Recursive Method With Explicit Schema Instructor For Universal Information Extraction
18 pages
Seq 2 Rel
No ratings yet
Seq 2 Rel
16 pages
What Do You Mean by Relation Extraction? A Survey On Datasets and Study On Scientific Relation Classification
No ratings yet
What Do You Mean by Relation Extraction? A Survey On Datasets and Study On Scientific Relation Classification
17 pages
2024.acl-long.162
No ratings yet
2024.acl-long.162
11 pages
Reinforced Iterative Knowledge Distillation For Cross-Lingual Named Entity Recognition
No ratings yet
Reinforced Iterative Knowledge Distillation For Cross-Lingual Named Entity Recognition
9 pages
A Challenge Dataset For Document-Level
No ratings yet
A Challenge Dataset For Document-Level
11 pages
D19-1021
No ratings yet
D19-1021
10 pages
Annexe 3.5 - Article GREED... (2024)
No ratings yet
Annexe 3.5 - Article GREED... (2024)
8 pages
Multimodal Relation Extraction With Efficient Graph Alignment-Zheng-mm21
No ratings yet
Multimodal Relation Extraction With Efficient Graph Alignment-Zheng-mm21
9 pages
RECON: Relation Extraction Using Knowledge Graph Context in A Graph Neural Network
No ratings yet
RECON: Relation Extraction Using Knowledge Graph Context in A Graph Neural Network
12 pages
nested-ner
No ratings yet
nested-ner
10 pages
23020018_NguyenThacCuong_KLTN
No ratings yet
23020018_NguyenThacCuong_KLTN
17 pages
4688-Article Text-7727-1-10-20190707
No ratings yet
4688-Article Text-7727-1-10-20190707
8 pages
Dynamic Relation Extraction With A Learnable
No ratings yet
Dynamic Relation Extraction With A Learnable
8 pages
2112.10070
No ratings yet
2112.10070
12 pages
EMNLP_2021_REBEL__Camera_Ready_
No ratings yet
EMNLP_2021_REBEL__Camera_Ready_
12 pages
JIFS-179349
No ratings yet
JIFS-179349
13 pages
Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
No ratings yet
Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
13 pages
A Few-Shot Approach For Relation Extraction Domain
No ratings yet
A Few-Shot Approach For Relation Extraction Domain
9 pages
A Hierarchical Network for Multimodal Document-Level Relation Extraction
No ratings yet
A Hierarchical Network for Multimodal Document-Level Relation Extraction
9 pages
2021.acl-long.216
No ratings yet
2021.acl-long.216
13 pages
UNIT NO 2
No ratings yet
UNIT NO 2
14 pages
Joint Recognition of Handwritten Text and Named Entities With A Neural End-To-End Model
No ratings yet
Joint Recognition of Handwritten Text and Named Entities With A Neural End-To-End Model
6 pages
29919-Article Text-33973-1-2-20240324
No ratings yet
29919-Article Text-33973-1-2-20240324
11 pages
IE-Complex Relation Extraction- Challenges and Opportunities
No ratings yet
IE-Complex Relation Extraction- Challenges and Opportunities
7 pages
NYT Dataset
No ratings yet
NYT Dataset
16 pages
Waden et al. (2019) Entity, Relation, and Event Extraction with Contextualized Span Representations
No ratings yet
Waden et al. (2019) Entity, Relation, and Event Extraction with Contextualized Span Representations
6 pages
Zero- and Few-Shots Knowledge Graph Triplet Extraction with LLM
No ratings yet
Zero- and Few-Shots Knowledge Graph Triplet Extraction with LLM
12 pages
NER X LSTM
No ratings yet
NER X LSTM
6 pages
Applsci 12 09691 v2
No ratings yet
Applsci 12 09691 v2
35 pages
1 s2.0 S0957417424011187 Main
No ratings yet
1 s2.0 S0957417424011187 Main
14 pages
r5.Unicoder-VL-A Universal Encoder For Vision and Language by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang
No ratings yet
r5.Unicoder-VL-A Universal Encoder For Vision and Language by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang
12 pages
A Unified MRC Framework For Named Entity Recognition
No ratings yet
A Unified MRC Framework For Named Entity Recognition
11 pages
Revisiting Document-Level Relation Extraction With Context-Guided Link Prediction
No ratings yet
Revisiting Document-Level Relation Extraction With Context-Guided Link Prediction
9 pages
Trans R
No ratings yet
Trans R
7 pages
CNER-Interactive_Attention_Network_for_Chinese_Address_Element_Recognition
No ratings yet
CNER-Interactive_Attention_Network_for_Chinese_Address_Element_Recognition
6 pages
UNIT 5 - Information Extraction
No ratings yet
UNIT 5 - Information Extraction
14 pages
Joint Extraction of Entities and Relations Based On A Novel Tagging Scheme
No ratings yet
Joint Extraction of Entities and Relations Based On A Novel Tagging Scheme
10 pages
Effects-of-Bank-Charges-on-Banks
No ratings yet
Effects-of-Bank-Charges-on-Banks
12 pages
nlp_proposal
No ratings yet
nlp_proposal
2 pages
Thesis On Named Entity Recognition
100% (3)
Thesis On Named Entity Recognition
5 pages
Entity Relation Extraction as Multi-Turn Question Answering
No ratings yet
Entity Relation Extraction as Multi-Turn Question Answering
11 pages
Adobe Scan 10 Jan 2022
No ratings yet
Adobe Scan 10 Jan 2022
25 pages
Effective Attention Modeling For Neural Relation Extraction
No ratings yet
Effective Attention Modeling For Neural Relation Extraction
10 pages
Zero-Shot Relation Extraction Via Reading Comprehension
No ratings yet
Zero-Shot Relation Extraction Via Reading Comprehension
10 pages
Encyclopedia of Remedy Relationship in Homeopathy Abdur Rehman.01236 1contents and Foreword
50% (4)
Encyclopedia of Remedy Relationship in Homeopathy Abdur Rehman.01236 1contents and Foreword
12 pages
HUARMACA
100% (1)
HUARMACA
3 pages
20 Most Important Questions For KMBN202 Human Resource Management
No ratings yet
20 Most Important Questions For KMBN202 Human Resource Management
24 pages
Idaman Pharma Magnesium Trisilicate Tablet Compound
No ratings yet
Idaman Pharma Magnesium Trisilicate Tablet Compound
3 pages
Introduction To World Religions & Belief Systems: Quarter 1-Week 6
88% (16)
Introduction To World Religions & Belief Systems: Quarter 1-Week 6
27 pages
RETAIL FORMATS PPT. - Dr. Sane
No ratings yet
RETAIL FORMATS PPT. - Dr. Sane
13 pages
QVM Program
No ratings yet
QVM Program
2 pages
Glycol Freeze Point Table Based On Volume Percentage
No ratings yet
Glycol Freeze Point Table Based On Volume Percentage
6 pages
2010 Ford Police Interceptor
100% (1)
2010 Ford Police Interceptor
13 pages
DoS Attack Lab
No ratings yet
DoS Attack Lab
29 pages
Ensayos Persuasivos para Niños
100% (2)
Ensayos Persuasivos para Niños
6 pages
Oppenheimer Reflection
No ratings yet
Oppenheimer Reflection
4 pages
Mita Shree 69
No ratings yet
Mita Shree 69
25 pages
HRM 301 Chapter 4
100% (1)
HRM 301 Chapter 4
26 pages
NEPAL ICE Promotion
No ratings yet
NEPAL ICE Promotion
12 pages
Micro VBB Elite
100% (1)
Micro VBB Elite
2 pages
Las 3
No ratings yet
Las 3
9 pages
Healthy Benefits of Drinking Water
No ratings yet
Healthy Benefits of Drinking Water
4 pages
Cost: As A Resource Sacrificed or Forgone To Achieve A Specific Objective. It Is Usually Measured
No ratings yet
Cost: As A Resource Sacrificed or Forgone To Achieve A Specific Objective. It Is Usually Measured
19 pages
L1 Introduction
No ratings yet
L1 Introduction
25 pages
CH1-E3-E4 CM-3G Concept
100% (1)
CH1-E3-E4 CM-3G Concept
19 pages
tmagc Privacy
No ratings yet
tmagc Privacy
14 pages
Nature, Concept and Scope of Management Control System
80% (15)
Nature, Concept and Scope of Management Control System
16 pages
University of Delhi: Semester Examination May-June 2020 Transcript
No ratings yet
University of Delhi: Semester Examination May-June 2020 Transcript
2 pages
Lesson 5 Ubd Lesson Plan Template
No ratings yet
Lesson 5 Ubd Lesson Plan Template
3 pages
VTSM Chart
No ratings yet
VTSM Chart
6 pages
The History of Concrete: From Prehistoric Rubble Mixes To Roman Cement
No ratings yet
The History of Concrete: From Prehistoric Rubble Mixes To Roman Cement
3 pages
Incisive Verification Ip For The Arm Amba Protocol Family: Capabilities Key Features
No ratings yet
Incisive Verification Ip For The Arm Amba Protocol Family: Capabilities Key Features
4 pages
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
From Everand
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
Shane Snipes, PhD
No ratings yet