🌐 A Systematic Survey of Deep Research

We will continuously update this repo.

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Overview

This repository contains a systematic collection of research papers on Deep Research (DR). We organize papers across key categories including four key components in DR, widely-used training paradigms in DR and relevant benchmark & resource.

For more details, please check our survey! Our survey collection presents a comprehensive and systematic overview of deep research systems, including a clear roadmap, foundational components, practical implementation techniques, important challenges, and future directions. As the field of deep research continues to evolve rapidly, we are committed to continuously updating this survey to reflect the latest progress in this area

📣 Latest News

[2025.11.25] 🎉🎉🎉 We release our survey Deep Research: A systematic Survey. Thanks to my awesome co-authors🤩. Feel free to contact me if you are interested in this topic and want to discuss me.

🎬 Table of Content

🌟 Overview
📊 Latest News
📚 Reading List
Acknowledgement
Contact
Citation

📚 Reading List

will be updated as soon as possible!

To get started with Deep Research, we recommend the representative and often seminal papers listed below. Reviewing this selection will provide a solid overview of the field.

Query Planning

Venue	Date	Paper Title	URL
ICLR 2023	21 May 2022	Least-to-Most Prompting Enables Complex Reasoning in Large Language Models	https://arxiv.org/abs/2205.10625
NeurIPS 2023	17 May 2023	Tree of Thoughts: Deliberate Problem Solving with Large Language Models	https://arxiv.org/abs/2305.10601
ACL 2024	21 Jun 2024	Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering	https://aclanthology.org/2024.acl-long.397/
Arxiv	20 Sep 2023	Chain-of-Verification Reduces Hallucination in Large Language Models	https://arxiv.org/abs/2309.11495
EMNLP 2023	23 May 2023	Query Rewriting for Retrieval-Augmented Large Language Models	https://aclanthology.org/2023.emnlp-main.322/
COLM 2025	28 Feb 2025	DeepRetrieval: Hacking Real Search Engines and Retrievers with LLMs via RL	https://arxiv.org/abs/2503.00223
Arxiv	11 Oct 2025	CardRewriter: Leveraging Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms	https://arxiv.org/abs/2510.10095
NeurIPS 2025	25 Jan 2025	Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning	https://arxiv.org/abs/2501.15228
NAACL 2024	14 Nov 2023	LLatrieval: LLM-Verified Retrieval for Verifiable Generation	https://aclanthology.org/2024.naacl-long.305/
ACL 2024	—	DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of LLMs	https://aclanthology.org/2024.acl-long.702/
WWW 2025	18 Jul 2024	Retrieve, Summarize, Plan: Advancing Multi-hop QA with an Iterative Approach	https://dl.acm.org/doi/10.1145/3701716.3716889
Arxiv	10 Jun 2025	RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval	https://arxiv.org/abs/2506.08625
Arxiv	20 May 2025	s3: You Don’t Need That Much Data to Train a Search Agent via RL	https://arxiv.org/abs/2505.14146
Arxiv	28 Aug 2025	AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective RL	https://arxiv.org/abs/2508.20368
COLM 2025	12 Mar 2025	Search-r1: Training LLMs to Reason and Leverage Search Engines with RL	https://arxiv.org/abs/2503.09516
Arxiv	7 Mar 2025	R1-Searcher: Incentivizing the Search Capability in LLMs via RL	https://arxiv.org/abs/2503.05592
Arxiv	22 May 2025	R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via RL	https://arxiv.org/abs/2505.17005
NAACL 2025	17 Dec 2024	RAG-Star: Enhancing Deliberative Reasoning with Retrieval-Augmented Verification and Refinement	https://aclanthology.org/2025.naacl-long.361/
ACL 2025	21 Jan 2025	Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation	https://aclanthology.org/2025.acl-long.1401/
Arxiv	29 Jul 2025	DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router	https://arxiv.org/abs/2507.22050
Arxiv	3 Feb 2025	DeepRAG: Thinking to Retrieve Step by Step for Large Language Models	https://arxiv.org/abs/2502.01142
Arxiv	1 Aug 2025	MAO-ARAG: Multi-Agent Orchestration for Adaptive Retrieval-Augmented Generation	https://arxiv.org/abs/2508.01005

Information Acquisition

Knowledge Boundary

Venue	Date	Paper Title	URL
ICML 2017	14 Jun 2017	On Calibration of Modern Neural Networks	https://arxiv.org/abs/1706.04599
EMNLP 2020	17 May 2020	Calibration of Pre-trained Transformers	https://arxiv.org/pdf/2003.07892
TACL 2021	2 Dec 2020	How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering	https://arxiv.org/abs/2012.00955
Anthropic	11 Jul 2022	Language Models (Mostly) Know What They Know	https://arxiv.org/abs/2207.05221
ACL 2024	3 Jul 2023	Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models	https://arxiv.org/abs/2307.01379
ICLR 2023	19 June 2024	Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation	https://arxiv.org/abs/2302.09664
EMNLP 2023	15 Mar 2023	Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models	https://arxiv.org/abs/2303.08896
EMNLP 2023	3 Nov 2023	SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency	https://arxiv.org/abs/2311.01740
EMNLP 2023	26 Apr 2023	The Internal State of an LLM Knows When It's Lying	https://arxiv.org/pdf/2304.13734
ACL 2025	17 Feb 2025	Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception	https://www.arxiv.org/abs/2502.11677
TMLR 2022	28 May 2022	Teaching Models to Express Their Uncertainty in Words	https://arxiv.org/abs/2205.14334
EMNLP 2023	24 May 2023	Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback	https://arxiv.org/abs/2305.14975
ICLR 2024	22 Jun 2023	Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs	https://arxiv.org/abs/2306.13063
NAACL 2024	7 Jun 2024	R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’	https://arxiv.org/pdf/2311.09677
NeurIPS 2023	12 Dec 2023	Alignment for Honesty	https://arxiv.org/abs/2312.07000
Arxiv	20 Oct 2025	Annotation-Efficient Universal Honesty Alignment	https://arxiv.org/abs/2510.17509

Retrieval Timing

Venue	Date	Paper Title	URL
EMNLP 2023	11 May 2023	Active Retrieval Augmented Generation	https://arxiv.org/abs/2305.06983
ACL 2024	18 Feb 2024	When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation	https://aclanthology.org/2024.findings-acl.675/
ACL 2024	12 Mar 2024	DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models	https://arxiv.org/pdf/2403.10081
SIGIR-AP 2025	16 Feb 2024	Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models	https://arxiv.org/abs/2402.10612
ACL 2025	29 May 2024	CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control	https://arxiv.org/abs/2405.18727
EMNLP 2024	18 Jun 2024	Unified active retrieval for retrieval augmented generation	https://arxiv.org/abs/2406.12534
ICLR 2023	6 Oct 2022	ReAct: Synergizing Reasoning and Acting in Language Models	https://arxiv.org/abs/2210.03629
ACL 2023	20 Dec 2022	Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions	https://arxiv.org/abs/2212.10509
ICLR 2024	17 Oct 2023	Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection	https://arxiv.org/abs/2310.11511
EMNLP 2025	9 Jan 2025	Search-o1: Agentic Search-Enhanced Large Reasoning Models	https://arxiv.org/abs/2501.05366
COLM 2025	12 Mar 2025	Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning	https://arxiv.org/abs/2503.09516
EMNLP 2025	22 May 2025	Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty	https://arxiv.org/abs/2505.17281
EMNLP 2025	21 May 2025	StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization	https://arxiv.org/abs/2505.15107

Information Filtering

Venue	Date	Paper Title	URL
EMNLP 2024	19 Apr 2023	Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents	https://arxiv.org/abs/2304.09542
NAACL 2024 Findings	30 Jun 2023	Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting	https://arxiv.org/abs/2306.17563
ICLR 2024	13 Jul 2023	In-context Autoencoder for Context Compression in a Large Language Model	https://arxiv.org/abs/2307.06945
ICLR 2024	06 Oct 2023	RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation	https://arxiv.org/abs/2310.04408
ICLR 2024	17 Oct 2023	Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection	https://arxiv.org/abs/2310.11511
EMNLP 2024	14 Nov 2023	Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models	https://arxiv.org/abs/2311.09210
ACL 2024 Findings	19 Feb 2024	BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence	https://arxiv.org/abs/2402.12174
ACL 2024	24 Feb 2024	ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval	https://arxiv.org/abs/2402.15838
NeurIPS 2024	22 May 2024	xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token	https://arxiv.org/abs/2405.13792
ACL 2024	03 Jun 2024	An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation	https://arxiv.org/abs/2406.01549
ACL 2024	04 Jun 2024	Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs	https://arxiv.org/abs/2406.02376
WWW 2025	17 Jun 2024	TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy	https://arxiv.org/abs/2406.11678
ICLR 2025	19 Jun 2024	InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales	https://arxiv.org/abs/2406.13629
WWW 2025	26 Jun 2024	Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation	https://arxiv.org/abs/2406.18676
NeurIPS 2024	02 Jul 2024	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	https://arxiv.org/abs/2407.02485
WSDM 2025	12 Jul 2024	Context Embeddings for Efficient Answer Generation in RAG	https://arxiv.org/abs/2407.09252
NeurIPS 2024	07 Oct 2024	TableRAG: Million-Token Table Understanding with Language Model	https://arxiv.org/abs/2410.04739
WWW 2024	05 Nov 2024	HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems	https://arxiv.org/abs/2411.02959
ACL 2025	25 Feb 2025	RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts	https://arxiv.org/abs/2502.17888
EMNLP 2025	08 Mar 2025	Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning	https://arxiv.org/abs/2503.06034
EMNLP 2025 Findings	24 Jul 2025	Dynamic Context Compression for Efficient RAG	https://arxiv.org/abs/2507.22931v2

Memory Management

Venue	Date	Paper Title	URL
ACL 2025	01 Jul 2025	In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents	https://aclanthology.org/2025.acl-long.413/
arXiv preprint	13 Aug 2025	MemGuide: Intent-Driven Memory Selection for Goal-Oriented Multi-Session LLM Agents	https://arxiv.org/abs/2505.20231
arXiv preprint	10 Jul 2025	MIRIX: Multi-Agent Memory System for LLM-Based Agents	https://arxiv.org/abs/2507.07957
arXiv preprint	06 Jun 2025	PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time	https://arxiv.org/abs/2506.06254
arXiv preprint	29 Apr 2025	PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval	https://arxiv.org/abs/2504.20624
ACL 2025	01 Jul 2025	Recursive Question Understanding for Complex Question Answering over Heterogeneous Personal Data	https://arxiv.org/abs/2505.11900
arXiv preprint	23 Jul 2025	H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents	https://arxiv.org/abs/2507.22925
arXiv preprint	28 Apr 2025	MemO: Building Production-Ready AI Agents with Scalable Long-Term Memory	https://arxiv.org/abs/2504.19413
ACL 2025	01 Jul 2025	Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning	https://aclanthology.org/2025.findings-acl.1234/
arXiv preprint	27 Aug 2025	Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science	https://arxiv.org/abs/2508.03341
arXiv preprint	20 Jan 2025	Zep: A Temporal Knowledge Graph Architecture for Agent Memory	https://arxiv.org/abs/2501.13956
NeurIPS 2025	08 Oct 2025	Mem: Agentic Memory for LLM Agents	https://arxiv.org/abs/2502.12110
arXiv preprint	15 Nov 2023	Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory	https://arxiv.org/abs/2311.08719
arXiv preprint	09 Oct 2025	Multiple Memory Systems for Enhancing the Long-term Memory of Agents	https://arxiv.org/abs/2508.15294
EMNLP 2025	01 Nov 2025	Coarse-to-Fine Grounded Memory for LLM Agent Planning	https://aclanthology.org/2025.emnlp-main.659/
AAAI 2026	12 Nov 2025	ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning	https://arxiv.org/abs/2508.10419
arXiv preprint	12 Feb 2024	MemGPT: Towards LLMs as Operating Systems	https://arxiv.org/abs/2310.08560
arXiv preprint	03 Jul 2025	MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent	https://arxiv.org/abs/2507.02259
arXiv preprint	17 Jul 2025	MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents	https://arxiv.org/abs/2506.15841
arXiv preprint	09 Oct 2025	Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory	https://arxiv.org/abs/2508.09736
arXiv preprint	25 Aug 2025	Memento: Fine-tuning LLM Agents without Fine-tuning LLMs	https://arxiv.org/abs/2508.16153
arXiv preprint	08 Oct 2025	Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via RL	https://arxiv.org/abs/2508.19828
arXiv preprint	15 Aug 2025	Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework	https://arxiv.org/abs/2508.16629
arXiv preprint	23 Oct 2025	MLP Memory: A Retriever-Pretrained Memory for Large Language Models	https://arxiv.org/abs/2508.01832v3
arXiv preprint	23 Oct 2025	Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models	https://arxiv.org/abs/2508.09874

Answer Generation

Venue	Date	Paper Title	URL
TPAMI	13 Mar 2022	Towards Visual-Prompt Temporal Answer Grounding in Instructional Video	https://ieeexplore.ieee.org/document/10552074
ACL 2023	06 Mar 2023	LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics Using LLMs	https://aclanthology.org/2023.acl-demo.11/
NeurIPS 2023	19 May 2023	Any-to-Any Generation via Composable Diffusion	NeurIPS paper Link
TVCG	03 Nov 2023	ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language	https://ieeexplore.ieee.org/document/10443572
ICLR 2025	13 Aug 2024	LongWriter: Unleashing 10,000+ Word Generation from Long-Context LLMs	https://openreview.net/forum?id=kQ5s9Yh0WI
EMNLP 2025	07 Jan 2025	PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides	https://aclanthology.org/2025.emnlp-main.728/
NeurIPS 2025	27 May 2025	Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers	https://arxiv.org/abs/2505.21497
arXiv	04 Jun 2025	SuperWriter: Reflection-Driven Long-Form Generation with LLMs	https://arxiv.org/abs/2506.04180
EMNLP 2025	05 Jul 2025	PresentAgent: Multimodal Agent for Presentation Video Generation	https://aclanthology.org/2025.emnlp-demos.58/
arXiv	24 Aug 2025	PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs	https://arxiv.org/abs/2508.17188
arXiv	06 Oct 2025	Paper2Video: Automatic Video Generation from Scientific Papers	https://arxiv.org/abs/2510.05096

Training Paradigm

Supervised Fine-tuning

Most work below focuses on data synthesis, i.e., designing scalable approaches or frameworks to construct high-quality, large-scale training datasets to train LLM-based agents.

Venue	Date	Paper Title	URL
NeurIPS 2025	2025.05.28	WebDancer: Towards Autonomous Information Seeking Agency	https://arxiv.org/abs/2505.22648
arXiv preprint	2025.07.03	WebSailor: Navigating Super-human Reasoning for Web Agent	https://arxiv.org/abs/2507.02592
arXiv preprint	2025.07.20	WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization	https://arxiv.org/abs/2507.15061
NeurIPS 2025	2025.04.30	WebThinker: Empowering Large Reasoning Models with Deep Research Capability	https://arxiv.org/abs/2504.21776
arXiv preprint	2025.07.06	WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis	https://arxiv.org/abs/2507.04370
arXiv preprint	2025.05.26	MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability	https://arxiv.org/abs/2505.20285
arXiv preprint	2025.08.06	Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL	https://arxiv.org/abs/2508.13167
Findings of EMNLP 2025	2025.05.26	WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback	https://aclanthology.org/2025.findings-emnlp.276/
ACL 2025	2024.10.18	Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation	https://aclanthology.org/2025.acl-long.1136/
arXiv preprint	2024.06.28	Scaling Synthetic Data Creation with 1,000,000,000 Personas	https://arxiv.org/abs/2406.20094
NeurIPS 2025	2025.05.26	Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers	https://arxiv.org/abs/2505.20128
EMNLP 2025	2025.05.28	EvolveSearch: An Iterative Self-Evolving Search Agent	https://arxiv.org/abs/2505.22501
NeurIPS 2025	2025.05.06	Absolute Zero: Reinforced Self-Play Reasoning with Zero Data	https://arxiv.org/abs/2505.03335

Agentic End-to-End Reinforcement Learning

Venue	Date	Paper Title	URL
Arxiv	30 Sep 2025	Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs	https://arxiv.org/abs/2509.25779
Arxiv	21 May 2025	ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via RL	https://arxiv.org/abs/2505.15776
AAAI 2026	22 Aug 2025	OPERA: A RL-Enhanced Orchestrated Planner-Executor Architecture for Multi-Hop Retrieval	https://arxiv.org/abs/2508.16438
Arxiv	1 Aug 2025	MAO-ARAG: Multi-Agent Orchestration for Adaptive Retrieval-Augmented Generation	https://arxiv.org/abs/2508.01005
Arxiv	28 Aug 2025	AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective RL	https://arxiv.org/abs/2508.20368
Arxiv	7 Mar 2025	R1-Searcher: Incentivizing the Search Capability in LLMs via RL	https://arxiv.org/abs/2503.05592
Arxiv	22 May 2025	R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via RL	https://arxiv.org/abs/2505.17005
Arxiv	4 Apr 2025	DeepResearcher: Scaling Deep Research via RL in Real-world Environments	https://arxiv.org/abs/2504.03160
Arxiv	25 Jun 2025	MMSearch-R1: Incentivizing LMMs to Search	https://arxiv.org/abs/2506.20670
COLM 2025	12 Mar 2025	Search-r1: Training LLMs to Reason and Leverage Search Engines with RL	https://arxiv.org/abs/2503.09516
Arxiv	21 May 2025	An Empirical Study on RL for Reasoning-Search Interleaved LLM Agents	https://arxiv.org/abs/2505.15117
Arxiv	4 Jun 2025	R-Search: Empowering LLM Reasoning with Search via Multi-Reward RL	https://arxiv.org/abs/2506.04185
Arxiv	7 May 2025	ZeroSearch: Incentivize the Search Capability of LLMs without Searching	https://arxiv.org/abs/2505.04588
Arxiv	22 May 2025	O2-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended QA	https://arxiv.org/abs/2505.16582
Arxiv	11 Aug 2025	HierSearch: A Hierarchical Enterprise Deep Search Framework	https://arxiv.org/abs/2508.08088
Arxiv	29 Jul 2025	Graph-R1: Towards Agentic GraphRAG Framework via End-to-end RL	https://arxiv.org/abs/2507.21892
Arxiv	23 Jul 2025	DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward RL	https://arxiv.org/abs/2507.17365
Arxiv	28 May 2025	WebDancer: Towards Autonomous Information Seeking Agency	https://arxiv.org/abs/2505.22648
Arxiv	16 Sep 2025	WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data & RL	https://arxiv.org/abs/2509.13305
Arxiv	28 Jul 2025	Kimi k2: Open Agentic Intelligence	https://arxiv.org/abs/2507.20534
Arxiv	11 Aug 2025	Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Async RL	https://arxiv.org/abs/2508.07976
Arxiv	30 May 2025	Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web RL	https://arxiv.org/html/2505.24332v1
Arxiv	6 Aug 2025	Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation & RL	https://arxiv.org/abs/2508.13167
Arxiv	22 May 2025	Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via RL	https://arxiv.org/abs/2505.16410
Arxiv	26 Jul 2025	Agentic Reinforced Policy Optimization	https://arxiv.org/abs/2507.19849
Arxiv	16 Oct 2025	Agentic Entropy-Balanced Policy Optimization	https://arxiv.org/abs/2510.14545

Datasets & Benchmarks

Venue	Date	Paper Title	Paper URL	Dataset/Code/Leaderboard URL
NAACL 2025	19 Sep 2024	Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation	Paper	Huggingface
arXiv	21 May 2025	InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation	Paper	Huggingface
EMNLP 2024	22 Jul 2024	AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?	Paper	Huggingface
NeurIPS 2023	09 Jun 2023	Mind2Web: Towards a Generalist Agent for the Web	Paper	Huggingface
NeurIPS 2025	26 Jun 2025	Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Paper	Huggingface
arXiv	06 May 2025	Deep Research Bench: Evaluating AI Web Research Agents	Paper	Website
arXiv	25 May 2025	DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research	Paper	Website
ICLR 2024	25 Jul 2023	WebArena: A Realistic Web Environment for Building Autonomous Agents	Paper	GitHub
arXiv	13 Jan 2025	WebWalker: Benchmarking LLMs in Web Traversal	Paper	Huggingface
arXiv	11 Aug 2025	WideSearch: Benchmarking Agentic Broad Info-Seeking	Paper	Huggingface
ACL 2025 Findings	15 Apr 2024	MMInA: Benchmarking Multihop Multimodal Internet Agents	Paper	Huggingface
NeurIPS 2024	10 Jun 2024	AutoSurvey: Large Language Models Can Automatically Write Surveys	Paper	GitHub
arXiv	14 Aug 2025	ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks	Paper	Huggingface
EMNLP 2025	25 Aug 2025	SurveyGen: Quality-Aware Scientific Survey Generation with Large Language Models	Paper	Huggingface
arXiv	07 Jul 2025	Deep Research Comparator: A Platform for Fine-Grained Human Annotations of Deep Research Agents	Paper	GitHub
arXiv	22 Jul 2025	ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry	Paper	GitHub
arXiv	29 Sep 2025	Towards Personalized Deep Research: Benchmarks and Evaluations	Paper	Huggingface
arXiv	06 Aug 2025	Characterizing Deep Research: A Benchmark and Formal Definition	Paper	GitHub
ACL 2024	26 Jan 2024	ProxyQA: An Alternative Framework for Evaluating Long-Form Text Generation with LLMs	Paper	GitHub
arXiv	21 Nov 2024	OpenScholar: Synthesizing Scientific Literature with Retrieval-Augmented Language Models	Paper	GitHub
arXiv	27 May 2025	Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers	Paper	Huggingface
arXiv	24 Aug 2025	PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs	Paper	GitHub
arXiv	21 May 2025	P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark	Paper	Github
AAAI 2022	28 Jan 2021	DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents	Paper	Website
CVPR 2025	01 Jan 2025	AutoPresent: Designing Structured Visuals from Scratch	Paper	GitHub
arXiv	07 Jan 2025	PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides	Paper	Huggingface
arXiv	16 May 2025	Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing	Paper	GitHub
arXiv	19 Apr 2025	AI Idea Bench 2025: AI Research Idea Generation Benchmark	Paper	GitHub
arXiv	24 May 2025	AI-Researcher: Autonomous Scientific Innovation	Paper	GitHub
arXiv	02 Apr 2025	PaperBench: Evaluating AI’s Ability to Replicate AI Research	Paper	Huggingface
JAIR 2022	30 Jan 2021	Can We Automate Scientific Reviewing?	Paper	Website
ICLR 2025	11 Mar 2025	DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process	Paper	GitHub
ICLR 2024	10 Oct 2023	SWE-bench: Can Language Models Resolve Real-World GitHub Issues?	Paper	Huggingface
ACL 2024	10 Jul 2024	Can Language Models Serve as Text-Based World Simulators?	Paper	GitHub
EMNLP 2022	14 Mar 2022	ScienceWorld: Is your Agent Smarter than a 5th Grader?	Paper	GitHub
NeurIPS 2024	10 Jun 2024	DiscoveryWorld: A Virtual Environment for Scientific Discovery Agents	Paper	GitHub
arXiv	17 Sep 2024	CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark	Paper	GitHub
ICLR 2024	09 Oct 2024	MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering	Paper	GitHub
ICML 2025	22 Nov 2024	RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts	Paper	GitHub
ICLR 2025	11 Sep 2024	DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?	Paper	GitHub
NeurIPS 2024	15 Jul 2024	Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?	Paper	Huggingface
ACL 2025	27 Feb 2024	Benchmarking Data Science Agents	Paper	GitHub
arXiv	16 Apr 2025	UnivEARTH: Towards LLM Agents for Earth Observation	Paper	Website
arXiv	02 Dec 2024	Commit0: Library Generation from Scratch	Paper	GitHub

❤️ Acknowledgement

This project benefits from deepresearch, Tongyi-DeepResearch, Search Agent, and Knowledge-Boundary Thanks for their wonderful works and collective efforts.

📞 Contact

Feel free to contact us if there are any problems: [email protected]; [email protected]

🥳 Citation

If you find this work useful, please cite:

@misc{shi2025deepresearch,
  title = {Deep Research: A Systematic Survey},
  author = {Shi, Zhengliang and Chen, Yiqun and Li, Haitao and Sun, Weiwei and Ni, Shiyu and Lyu, Yougang and Fan, Run-Ze and Jin, Bowen and Weng, Yixuan and Zhu, Minjun and Xie, Qiujie and Guo, Xinyu and Yang, Qu and Wu, Jiayi and Zhao, Jujia and Tang, Xiaqiang and Ma, Xinbei and Wang, Cunxiang and Mao, Jiaxin and Ai, Qingyao and Huang, Jen-Tse and Wang, Wenxuan and Zhang, Yue and Yang, Yiming and Tu, Zhaopeng and Ren, Zhaochun},
  year = {2025},
  howpublished = {\url{https://github.com/mangopy/Deep-Research-Survey}},
  note = {Accessed: 2025-11-22}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
asset		asset
Deep-Research-Survey.pdf		Deep-Research-Survey.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 A Systematic Survey of Deep Research

Overview

📣 Latest News

🎬 Table of Content

📚 Reading List

Query Planning

Information Acquisition

Knowledge Boundary

Retrieval Timing

Information Filtering

Memory Management

Answer Generation

Training Paradigm

Supervised Fine-tuning

Agentic End-to-End Reinforcement Learning

Datasets & Benchmarks

❤️ Acknowledgement

📞 Contact

🥳 Citation

About

Uh oh!

Releases

Packages

mangopy/Deep-Research-Survey

Folders and files

Latest commit

History

Repository files navigation

🌐 A Systematic Survey of Deep Research

Overview

📣 Latest News

🎬 Table of Content

📚 Reading List

Query Planning

Information Acquisition

Knowledge Boundary

Retrieval Timing

Information Filtering

Memory Management

Answer Generation

Training Paradigm

Supervised Fine-tuning

Agentic End-to-End Reinforcement Learning

Datasets & Benchmarks

❤️ Acknowledgement

📞 Contact

🥳 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages