A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks Paper • 2605.28556 • Published 13 days ago • 64
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering Paper • 2606.00683 • Published 10 days ago • 88
GrepSeek: Training Search Agents for Direct Corpus Interaction Paper • 2605.29307 • Published 12 days ago • 104
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence Paper • 2605.26494 • Published 14 days ago • 39
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 12 days ago • 77
The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models Paper • 2511.20344 • Published Nov 25, 2025 • 14
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training Paper • 2509.25758 • Published Sep 30, 2025 • 25
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published May 9 • 80
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Paper • 2604.27419 • Published Apr 30 • 13
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack Paper • 2509.25843 • Published Apr 14 • 19