arxiv:2603.05181

Mario: Multimodal Graph Reasoning with Large Language Models

Published on Mar 5

· Submitted by

Yuanfu Sun on Mar 9

New York University

Upvote

Authors:

Yuanfu Sun ,

Abstract

Mario is a unified framework that enables large language model-based reasoning on multimodal graphs by addressing cross-modal consistency and heterogeneous modality preferences through graph-conditioned vision-language modeling and modality-adaptive instruction tuning.

AI-generated summary

Recent advances in large language models (LLMs) have opened new avenues for multimodal reasoning. Yet, most existing methods still rely on pretrained vision-language models (VLMs) to encode image-text pairs in isolation, ignoring the relational structure that real-world multimodal data naturally form. This motivates reasoning on multimodal graphs (MMGs), where each node has textual and visual attributes and edges provide structural cues. Enabling LLM-based reasoning on such heterogeneous multimodal signals while preserving graph topology introduces two key challenges: resolving weak cross-modal consistency and handling heterogeneous modality preference. To address this, we propose Mario, a unified framework that simultaneously resolves the two above challenges and enables effective LLM-based reasoning over MMGs. Mario consists of two innovative stages. Firstly, a graph-conditioned VLM design that jointly refines textual and visual features through fine-grained cross-modal contrastive learning guided by graph topology. Secondly, a modality-adaptive graph instruction tuning mechanism that organizes aligned multimodal features into graph-aware instruction views and employs a learnable router to surface, for each node and its neighborhood, the most informative modality configuration to the LLM. Extensive experiments across diverse MMG benchmarks demonstrate that Mario consistently outperforms state-of-the-art graph models in both supervised and zero-shot scenarios for node classification and link prediction. The code will be made available at https://github.com/sunyuanfu/Mario.

View arXiv page View PDF GitHub Add to collection

Community

Travissun

Paper author Paper submitter 3 minutes ago

[CVPR 2026]. We are actively organizing and refining our codebase to make it clean, stable, and easy to reproduce. Due to our current busy schedule, we plan to gradually release the code starting in April. Thank you for your interest in our work. We truly appreciate your attention and support💗!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.05181 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.05181 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.05181 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.