Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 8 days ago • 50
TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation Paper • 2606.02320 • Published 8 days ago • 13
TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation Paper • 2606.02320 • Published 8 days ago • 13
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published 8 days ago • 13
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 8 days ago • 50
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Paper • 2605.16928 • Published 24 days ago • 93
OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published 23 days ago • 31
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Paper • 2605.15301 • Published 26 days ago • 22
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published Apr 16 • 36
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published Apr 20 • 22
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues Paper • 2510.17722 • Published Oct 20, 2025 • 20
IF-VidCap: Can Video Caption Models Follow Instructions? Paper • 2510.18726 • Published Oct 21, 2025 • 27
DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published Apr 16 • 36
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11, 2025 • 73