Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings Paper • 2605.22391 • Published 12 days ago • 37
Tarsier: Recipes for Training and Evaluating Large Video Description Models Paper • 2407.00634 • Published Jun 30, 2024 • 2
Fine-grained Video-Text Retrieval: A New Benchmark and Method Paper • 2501.00513 • Published Dec 31, 2024 • 2
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Paper • 2512.14698 • Published Dec 16, 2025 • 25
AVA-AVD: Audio-Visual Speaker Diarization in the Wild Paper • 2111.14448 • Published Nov 29, 2021 • 1
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published Jan 29 • 23
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation Paper • 2411.02293 • Published Nov 4, 2024 • 3
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published Jan 21, 2025 • 51
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material Paper • 2506.15442 • Published Jun 18, 2025 • 18
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated Mar 2 • 82
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning Paper • 2306.03310 • Published Jun 5, 2023 • 3
view article Article Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action nvidia • 1 day ago • 53
Data Science and Technology Towards AGI Part I: Tiered Data Management Paper • 2602.09003 • Published Feb 9 • 8