Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
Paper
•
2602.13013
•
Published
•
4
Video Understanding, Audio-Visual Learning, Multimodal LLMs, Video Captioning, Instruction Tuning, Dataset Curation