AI & ML interests

Multimodal AI, Document Understanding, Reading Systems.

Recent Activity

mserrao  updated a dataset about 7 hours ago
VLR-CVC/DocVQA-2026
SamuelOrt25  updated a dataset about 8 hours ago
VLR-CVC/DocVQA-2026
Llabres  updated a dataset 1 day ago
VLR-CVC/DocVQA-2026
View all activity

Organization Card

Vision, Language, and Reading Group

At the Computer Vision Center (CVC) in Barcelona, Spain.

The VLR research team conducts fundamental research and technology transfer at the frontier between vision, language and reading systems. We devise reading systems for text in the wild, and incorporate scene text semantics in a multitude of computer vision tasks such as captioning, visual question answering, cross-modal retrieval, fine-grained classification, etc. In parallel, we advance document understanding with a special interest in end-to-end approaches for Document Visual Question Answering.