MedPAIR represents a "Medical Dataset Comparing Physician Trainees and AI Relevance Estimation and Question Answering". We design MedPAIR to compare LLM reasoning processes to those of physician trainees and to enable future research to focus on relevant features. MedPAIR is the first benchmark in matching the relevancy annotated by clinical professional labelers to that estimated by LLMs. The motivation for MedPAIR is to ensure that what the LLM finds relevant in a clinical case closely matches what a physician trainee finds relevant.
This work was supported in part by an award from the Hasso Plattner Foundation, a National Science Foundation (NSF) CAREER Award (#2339381), and an AI2050 Early Career Fellowship (G-25-68042).
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
