Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Li, Yikuan; Wehbe, Ramsey M.; Ahmad, Faraz S.; Wang, Hanyin; Luo, Yuan

Computer Science > Computation and Language

arXiv:2201.11838 (cs)

[Submitted on 27 Jan 2022 (v1), last revised 15 Apr 2022 (this version, v3)]

Title:Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Authors:Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang, Yuan Luo

View PDF

Abstract:Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made our source code available at [this https URL] the pre-trained models available for public download at: [this https URL].

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2201.11838 [cs.CL]
	(or arXiv:2201.11838v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2201.11838

Submission history

From: Yikuan Li [view email]
[v1] Thu, 27 Jan 2022 22:51:58 UTC (20 KB)
[v2] Sat, 12 Feb 2022 19:14:02 UTC (20 KB)
[v3] Fri, 15 Apr 2022 05:46:23 UTC (20 KB)

Computer Science > Computation and Language

Title:Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators