Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 6 days ago • 50
Tool-Retrieval Collection The first large-scale and diverse tool retrieval benchmark. See our homepage for more details: https://github.com/mangopy/tool-retrieval-benchmark. • 8 items • Updated Jun 26, 2025 • 4
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 19 days ago • 104
Many-Shot CoT-ICL: Making In-Context Learning Truly Learn Paper • 2605.13511 • Published 25 days ago • 32
δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 26 days ago • 125
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published Mar 17 • 99
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published May 3 • 122
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex Paper • 2605.06139 • Published about 1 month ago • 69
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13, 2025 • 193
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published about 1 month ago • 52
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories Paper • 2605.04036 • Published May 5 • 69
MiA-Signature: Approximating Global Activation for Long-Context Understanding Paper • 2605.06416 • Published about 1 month ago • 56
Query-focused and Memory-aware Reranker for Long Context Processing Paper • 2602.12192 • Published Feb 12 • 58
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling Paper • 2512.23959 • Published Dec 30, 2025 • 111