FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published 1 day ago • 38
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration Paper • 2601.22674 • Published 4 days ago • 3
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry Paper • 2601.22588 • Published 4 days ago • 3
Toward Cognitive Supersensing in Multimodal Large Language Model Paper • 2602.01541 • Published 1 day ago • 14
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published about 22 hours ago • 27
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 5 days ago • 123
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published 3 days ago • 159
Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving Paper • 2601.22032 • Published 5 days ago • 3
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding Paper • 2601.23161 • Published 4 days ago • 10
NativeTok: Native Visual Tokenization for Improved Image Generation Paper • 2601.22837 • Published 4 days ago • 9
DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning Paper • 2601.21716 • Published 5 days ago • 12
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment Paper • 2601.20218 • Published 7 days ago • 14
FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation Paper • 2601.23182 • Published 4 days ago • 18
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 5 days ago • 53
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 5 days ago • 40