OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 5 days ago • 39 • 3
$δ$-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 12 days ago • 120 • 5
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 12 days ago • 189 • 4
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 11 days ago • 156 • 4
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion Paper • 2605.12825 • Published 12 days ago • 12 • 2
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs Paper • 2605.12460 • Published 12 days ago • 17 • 2
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks Paper • 2605.10977 • Published 15 days ago • 10 • 2
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models Paper • 2605.11011 • Published 14 days ago • 9 • 2
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States Paper • 2605.07579 • Published 16 days ago • 16 • 3
$δ$-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 12 days ago • 120 • 5
SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting Paper • 2605.07243 • Published 16 days ago • 4 • 3
Large Language Models Explore by Latent Distilling Paper • 2604.24927 • Published 27 days ago • 74 • 7
SWE-chat: Coding Agent Interactions From Real Users in the Wild Paper • 2604.20779 • Published Apr 22 • 15 • 5
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published Apr 20 • 94 • 4