ASID-Caption

community

Activity Feed

AI & ML interests

Video Understanding, Audio-Visual Learning, Multimodal LLMs, Video Captioning, Instruction Tuning, Dataset Curation

Recent Activity

lyhisme submitted a paper about 6 hours ago

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

lyhisme updated a dataset about 6 hours ago

AudioVisual-Caption/ASID-1M

lyhisme updated a model about 6 hours ago

AudioVisual-Caption/ASID-Captioner-7B

View all activity

Papers

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

View all Papers

lyhisme

submitted a paper to Daily Papers about 6 hours ago

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

Paper • 2602.13013 • Published 3 days ago • 4

lyhisme

updated a dataset about 6 hours ago

AudioVisual-Caption/ASID-1M

Viewer • Updated about 6 hours ago • 241k • 67 • 3

lyhisme

updated 2 models about 6 hours ago

AudioVisual-Caption/ASID-Captioner-7B

Image-Text-to-Text • 9B • Updated about 6 hours ago • 13 • 1

AudioVisual-Caption/ASID-Captioner-3B

Image-Text-to-Text • 5B • Updated about 6 hours ago • 17 • 1

lyhisme

published 2 models 2 days ago

AudioVisual-Caption/ASID-Captioner-3B

Image-Text-to-Text • 5B • Updated about 6 hours ago • 17 • 1

AudioVisual-Caption/ASID-Captioner-7B

Image-Text-to-Text • 9B • Updated about 6 hours ago • 13 • 1

lyhisme

updated a Space 5 days ago

ASID-Caption

🦉

lyhisme

published a Space 5 days ago

ASID-Caption

🦉

lyhisme

published a dataset 6 days ago

AudioVisual-Caption/ASID-1M

Viewer • Updated about 6 hours ago • 241k • 67 • 3

lyhisme

authored 5 papers 5 months ago

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Paper • 2509.18056 • Published Sep 22, 2025 • 27

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Paper • 2406.00670 • Published Jun 2, 2024

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Paper • 2412.06244 • Published Dec 9, 2024

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

Paper • 2508.01548 • Published Aug 3, 2025 • 14

Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment

Paper • 2508.08811 • Published Aug 12, 2025 • 2

AI & ML interests

Recent Activity

Papers

Team members 1

AudioVisual-Caption's activity

ASID-Caption

ASID-Caption