Interleaving Reasoning for Better Text-to-Image Generation Paper • 2509.06945 • Published Sep 8, 2025 • 16
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping Paper • 2510.08457 • Published Oct 9, 2025 • 14
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 15 days ago • 48
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence Paper • 2512.10863 • Published Dec 11, 2025 • 22
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing Paper • 2512.10284 • Published Dec 11, 2025 • 26
G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning Paper • 2511.21688 • Published Nov 26, 2025 • 9
Model Extrapolation Expedites Alignment Collection Better aligned models obtained by model extrapolation (ExPO) • 23 items • Updated Mar 2 • 17