RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Robinson, Isaac; Robicheaux, Peter; Popov, Matvei; Ramanan, Deva; Peri, Neehar

Computer Science > Computer Vision and Pattern Recognition

arXiv:2511.09554 (cs)

[Submitted on 12 Nov 2025]

Title:RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Authors:Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, Neehar Peri

View PDF HTML (experimental)

Abstract:Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-training. Rather than simply fine-tuning a heavy-weight vision-language model (VLM) for new domains, we introduce RF-DETR, a light-weight specialist detection transformer that discovers accuracy-latency Pareto curves for any target dataset with weight-sharing neural architecture search (NAS). Our approach fine-tunes a pre-trained base network on a target dataset and evaluates thousands of network configurations with different accuracy-latency tradeoffs without re-training. Further, we revisit the "tunable knobs" for NAS to improve the transferability of DETRs to diverse target domains. Notably, RF-DETR significantly improves on prior state-of-the-art real-time methods on COCO and Roboflow100-VL. RF-DETR (nano) achieves 48.0 AP on COCO, beating D-FINE (nano) by 5.3 AP at similar latency, and RF-DETR (2x-large) outperforms GroundingDINO (tiny) by 1.2 AP on Roboflow100-VL while running 20x as fast. To the best of our knowledge, RF-DETR (2x-large) is the first real-time detector to surpass 60 AP on COCO. Our code is at this https URL

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2511.09554 [cs.CV]
	(or arXiv:2511.09554v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2511.09554

Submission history

From: Neehar Peri [view email]
[v1] Wed, 12 Nov 2025 18:58:39 UTC (11,009 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators