SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models Paper • 2510.16917 • Published Oct 19, 2025 • 20
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations Paper • 2510.16893 • Published Oct 19, 2025 • 18
Extending Automatic Machine Translation Evaluation to Book-Length Documents Paper • 2509.17249 • Published Sep 21, 2025
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale Paper • 2511.05705 • Published Nov 7, 2025 • 10
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning Paper • 2510.12000 • Published Oct 13, 2025 • 1
ESPnet-SpeechLM: An Open Speech Language Model Toolkit Paper • 2502.15218 • Published Feb 21, 2025
PRiSM: Benchmarking Phone Realization in Speech Models Paper • 2601.14046 • Published Jan 20 • 7
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models Paper • 2509.24803 • Published Sep 29, 2025
An Investigation of Incorporating Mamba for Speech Enhancement Paper • 2405.06573 • Published May 10, 2024
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation Paper • 2603.19195 • Published 29 days ago • 4
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published 4 days ago • 26
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published Jan 22 • 4
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 30
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in Paper • 2512.14273 • Published Dec 16, 2025 • 10