PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research Paper • 2604.15411 • Published 5 days ago • 1
VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects Paper • 2604.16272 • Published 4 days ago
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems Paper • 2604.14228 • Published 7 days ago • 18
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories Paper • 2604.15311 • Published 5 days ago • 10
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published 5 days ago • 31
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 6 days ago • 99
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 6 days ago • 99
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation Paper • 2604.15309 • Published 5 days ago • 6
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 13 days ago • 112
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published 6 days ago • 62
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Paper • 2604.14113 • Published 6 days ago • 10
InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis Paper • 2604.13201 • Published 7 days ago • 2
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 6 days ago • 145
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 6 days ago • 145