JMIR Preprints #74140: Validation of Large Language Models for Adverse Events Mapping

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Validation of Large Language Models for Adverse Events Mapping

Jan Noel Molon

ABSTRACT

Background:

Post-market surveillance (PMS) is essential for medical device safety, requiring systematic mapping of adverse events from scientific literature to standardized terminologies like the International Medical Device Regulators Forum (IMDRF) Adverse Event (AE) Terminology. This process faces challenges in maintaining semantic interoperability across data sources.

Objective:

This study evaluates whether large language models (LLMs) can effectively automate the mapping of adverse events from orthopedic literature to IMDRF terminology.

Methods:

A validation approach assessed LLM performance using 309 randomly selected adverse events (23.6% of 1,251 unique events) from orthopedic literature published between 2010-2023. The events were previously mapped by the Harms Mapping Working Group (HMWG) consisting of six Safety Clinicians and seven Safety Coders with extensive clinical and industry experience. Structured prompts were developed following established prompt engineering principles. Accuracy was conservatively measured as correct identification of both appropriate IMDRF terms and codes.

Results:

LLMs achieved an accuracy rate of 82.52% (255/309 events correctly mapped). Error analysis revealed challenges with AEs lacking sufficient context, gaps in specialized clinical knowledge, and occasional inferential overreach. Concordance between independent Safety Clinician evaluators was complete.

Conclusions:

While LLMs show promise as assistive tools for AE mapping, they require expert oversight. The findings support a two-stage workflow where LLMs provide initial mapping followed by clinician verification, potentially improving efficiency without compromising quality. Future research should explore enhanced prompt engineering, expanded dictionary integration, and more sophisticated models to address identified limitations.

Citation

Please cite as:

Molon JN

Validation of Large Language Models for Adverse Events Mapping

JMIR Preprints. 27/03/2025:74140

URL: https://preprints.jmir.org/preprint/74140

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 27, 2025

(currently open for review)

Validation of Large Language Models for Adverse Events Mapping

ABSTRACT

Citation