LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 17 days ago • 66
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published Apr 8 • 95
ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget Paper • 2604.01195 • Published Apr 1 • 4
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published Apr 6 • 36
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents Paper • 2603.20691 • Published Mar 21 • 10
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36