MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning Paper β’ 2602.10575 β’ Published 26 days ago β’ 4
MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning Paper β’ 2602.10575 β’ Published 26 days ago β’ 4
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models Paper β’ 2406.05862 β’ Published Jun 9, 2024 β’ 4
view post Post 1138 The #1 trending AI/ML dataset today πMassive scale, diversity and end-to-end potential from nvidia ! nvidia/PhysicalAI-Autonomous-Vehicles See translation π₯ 1 1 + Reply
view post Post 751 The new King πhas arrived! Moonshot AI now the top model on Hugging Face π₯ moonshotai/Kimi-K2-Thinking See translation π₯ 1 1 π€ 1 1 + Reply
view post Post 2827 πΈπ€You donβt need 100 GPUs to train something amazing!Our Smol Training Playbook teaches you a better path to world-class LLMs, for free! Check out the #1 trending space on π€ : HuggingFaceTB/smol-training-playbook See translation π€ 7 7 π 3 3 π₯ 2 2 + Reply
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies Paper β’ 2510.17950 β’ Published Oct 20, 2025 β’ 9
view post Post 2324 Cool stuff these past weeks on huggingface! π€ π !β’ πTrackio, local-first W&B alternativehttps://github.com/gradio-app/trackio/issuesβ’ πEmbeddingGemma, 300M-param, multilingual embeddings, on-devicehttps://huggingface.co/blog/embeddinggemmaβ’ π»Open LLMs in VS Code (Inference Providers)https://x.com/reach_vb/status/1966185427582497171β’ π€Smol2Operator GUI agentshttps://huggingface.co/blog/smol2operatorβ’ πΌοΈGradio visible watermarkinghttps://huggingface.co/blog/watermarking-with-gradio See translation π₯ 4 4 π€ 3 3 + Reply
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models Paper β’ 2406.05862 β’ Published Jun 9, 2024 β’ 4
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework Paper β’ 2505.17019 β’ Published May 22, 2025 β’ 4
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models Paper β’ 2406.05862 β’ Published Jun 9, 2024 β’ 4
LMDrive: Closed-Loop End-to-End Driving with Large Language Models Paper β’ 2312.07488 β’ Published Dec 12, 2023
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper β’ 2403.16999 β’ Published Mar 25, 2024 β’ 5
MoVA: Adapting Mixture of Vision Experts to Multimodal Context Paper β’ 2404.13046 β’ Published Apr 19, 2024 β’ 1
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper β’ 2412.11279 β’ Published Dec 15, 2024 β’ 13