PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper โข 2601.05593 โข Published 14 days ago โข 79
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper โข 2601.05242 โข Published 14 days ago โข 203
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper โข 2512.15745 โข Published Dec 10, 2025 โข 80