OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

qishisuren submitted a paper 3 days ago

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

yanziang authored a paper 7 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Eurayka authored a paper 7 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

View all activity

Papers

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

RIVER: A Real-Time Interaction Benchmark for Video LLMs

View all Papers

KingNish

posted an update 1 day ago

Post

1788

We trained an open-source Mythos like cybersecurity LLM for the Build Small Hackathon meet OpenMythos

Trained in two stages: SFT on ~1.84K filtered ArXiv cs.CR papers + real CVE data, then RLVR using paired with past vulnerabilities GitHub repos with a verifier model checking outputs against ground truth.

Trained on: H100s from Modal

The RLVR stage made the biggest difference responses got more precise and less prone to confusing similar vulnerability classes.

Everything is open:
🤖 Demo → build-small-hackathon/OpenMythos
🧠 Model → build-small-hackathon/OpenMythos
📦 CVE Dataset → build-small-hackathon/CVE_Vulnerailities_Detailed
📄 ArXiv Dataset → himanshu17HF/ArvixImport-Filtered-Final

Try it out and let us know where it breaks 🙏

Abhaykoul

posted an update 2 days ago

Post

112

Shipped v0.1.2 of vtx — a minimalist coding agent for the terminal.

Most agentic CLIs ship 10k+ token system prompts. Vtx is ~2,200. Less prompt overhead means more room for your code in the model's context window.

Vtx is a from-scratch Python implementation of the design philosophy behind pi-mono — same principles, pure Python, no transpiled runtime.

What ships out of the box:

→ Textual TUI + headless CLI (vtx -p "fix the failing test")
→ 49 LLM provider gateways, all declared in a single provider.yaml
→ 5 core tools (read / edit / write / bash / find) plus web search and fetch
→ Session tree with compaction, handoff, and resume
→ AGENTS.md / CLAUDE.md auto-discovery
→ Skills system — drop SKILL.md files in .agents/skills/ and they become slash commands
→ Two OAuth flows (GitHub Copilot device flow, OpenAI Codex PKCE)
→ Two-mode permissions: prompt (default) or auto, with a safe-command allowlist

This release adds a proper extension system. Register new LLM-callable tools, intercept tool calls, hook lifecycle events, and add slash commands from a single register(api) function in a Python file under ~/.vtx/agent/extensions/. Extensions can override built-in tools by name and chain handler logic across subscribers.

Apache 2.0. uv tool install vtx-coding-agent and you're running.

GitHub: https://github.com/OEvortex/vtx-coding-agent
PyPI: https://pypi.org/project/vtx-coding-agent

Built in the open. Feedback, extensions, and PRs welcome.

prithivMLmods

posted an update 2 days ago

Post

2871

Wan2.2-I2V-Fast with highly upscaled sequential frame sampling is now available as a Spaces demo, built using Wan2.2-I2V and FLUX.2-Klein. Try the demo using the links below.👇

➠ wan2.2-i2v-fast : prithivMLmods/wan2.2-i2v-fast
➠ github: https://github.com/prithivsakthiur/wan2.2-i2v-fast
➠ collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

⤷ To learn more, visit the app page or the respective model pages.

qishisuren

submitted a paper to Daily Papers 3 days ago

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Paper • 2605.30789 • Published 16 days ago • 24

yanziang

authored a paper 7 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Paper • 2606.12195 • Published 8 days ago • 22

Eurayka

authored a paper 7 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Paper • 2606.12195 • Published 8 days ago • 22

linghan199

authored a paper 7 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Paper • 2606.12195 • Published 8 days ago • 22

wzk1015

in OpenGVLab/Mono-InternVL-2B 11 days ago

Fix remaining Transformers v5 crash: guard llm_config and to_dict() for None (follow-up to `e980c02`)

#13 opened 11 days ago by

KBayoud

Fix KeyError in init when vision_config is empty (Transformers v5 compatibility)

#12 opened 11 days ago by

KBayoud

Eurayka

authored a paper 12 days ago

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Paper • 2606.05769 • Published 14 days ago • 6

Eurayka

submitted a paper to Daily Papers 13 days ago

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Paper • 2606.05769 • Published 14 days ago • 6

prithivMLmods

posted an update 18 days ago

Post

2158

Dropping the collection of Qwen 3.5/3.6 MTP GGUF quants. 🤗

🔗 Collection 1: https://huggingface.co/collections/prithivMLmods/mtp-qwen-35-36-moe-stable

🔗 Collection 2: https://huggingface.co/collections/prithivMLmods/mtp-qwen-35-36-stable

> To learn more, visit the respective model pages.

heroding77

submitted a paper to Daily Papers 20 days ago

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Paper • 2605.28424 • Published 22 days ago • 32

prithivMLmods

posted an update 20 days ago

Post

6156

PiD — Pixel Diffusion Decoder Image Edit Upscale and Image Generation Upscale, an all-in-one demo, is now live on Spaces! Great improvements in realism-based image generation and editing are powered by FLUX.2-Klein, while image generation is paired with Z-Image, and upscaling is enabled by default!

🤗 Space: prithivMLmods/PiD-Image-Upscaler
🔗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > To learn more, visit the app page or the respective model pages.

Kaining

authored a paper 22 days ago

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published 24 days ago • 102

Kaining

submitted a paper to Daily Papers 23 days ago

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published 24 days ago • 102

prithivMLmods

posted an update 27 days ago

Post

5578

I've made 8 Spaces in the Qwen-Image-Edit series, and out of them, 5 Spaces reached “Space of the Week”! A few Spaces are still topping the list even after many months.

Cumulatively, the series has crossed 8.2 million+ ZeroGPU runs and nearly 4 million visitors overall.

Thanks for all the community support! 🤗❤️

🔗 Spaces: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

4 replies

prithivMLmods

posted an update about 2 months ago

Post

5938

Multimodal-Edge Demo, a node-based inference canvas demo, is now live on Spaces. It features node-based Transformers for fast inference across 10+ edge-device multimodal models on the Hub, all within a single space. The series includes models from Qwen3.5, Qwen3-VL, Gemma 4, and the LFM 2.5 VL model series, with support for reasoning and grounding tasks.

🤗 Demo: prithivMLmods/Multimodal-Edge-Node
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Multimodal-Edge-Node
✅ Multimodal Apps Collections: https://huggingface.co/collections/prithivMLmods/hall-of-multimodal-apps

🤗 > To learn more, visit the app page or the respective model pages.

heroding77

authored 2 papers about 2 months ago

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Paper • 2604.15093 • Published Apr 16 • 30

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 133

AI & ML interests

Recent Activity

Papers

Team members 118

OpenGVLab's activity

Fix remaining Transformers v5 crash: guard llm_config and to_dict() for None (follow-up to `e980c02`)

Fix KeyError in __init__ when vision_config is empty (Transformers v5 compatibility)

Fix KeyError in init when vision_config is empty (Transformers v5 compatibility)