ReMatch: Boosting Representation through Matching for Multimodal Retrieval

This repository contains the official implementation of ReMatch, accepted to CVPR 2026.

ReMatch turns a multimodal large language model into a stronger multimodal retriever by adding a chat-style generative matching objective during training. The same MLLM learns to judge query-document relevance from both raw multimodal inputs and projected embeddings, complementing standard contrastive learning with instance-wise supervision on hard negatives. ReMatch also augments each input with multiple learnable representation tokens and fuses them into an efficient single-vector embedding for retrieval.

👥 Authors

Qianying Liu*, Xiao Liang*, Zhiqiang Zhang#, Yibo Chen, Xu Tang, Zhongfei Qing, Fengfan Zhou, Yao Hu, Paul Henderson

University of Glasgow, Xiaohongshu Inc., Huazhong University of Science and Technology

* Equal contribution. # Project leader.

🔍 Method

ReMatch is built around two core ideas:

Query-Document Matching: an additional autoregressive matching stage that predicts relevance from the query, document, and their projected embeddings.
Learnable Multi-Token Embeddings: multiple learnable tokens capture fine-grained contextual signals; an orthogonality regularizer encourages complementary representations, and the fused output remains a standard dense embedding.

🔥 News

2026-05: ReMatch code, the ReMatch-3B checkpoint, and evaluation scripts are released.
2026-02: ReMatch is accepted to CVPR 2026.
2025-11: The ReMatch technical report is available on arXiv.

🛠️ Installation

conda create -n rematch python=3.10 -y
conda activate rematch
pip install -r requirements.txt

flash-attn can be sensitive to CUDA, PyTorch, and compiler versions. If installation fails, install the wheel matching your environment from the official FlashAttention release instructions, then rerun the remaining dependencies.

🤗 Checkpoints

We release ReMatch-3B, a Qwen2.5-VL-3B based checkpoint trained with the ReMatch recipe:

FireRedTeam/ReMatch-3B

For local checkpoints, pass the base model through --model_name and the adapter/full checkpoint through --checkpoint_path when evaluating.

🚀 Training

The public ReMatch-3B training entry point is:

bash experiments/public/rematch/train-rematch-itm.sh

Before training, download the mmE5 hard-negative MMEB training data from Hugging Face:

intfloat/mmE5-MMEB-hardneg

In addition to mmE5, please follow the original VLM2Vec data preparation instructions to download the corresponding MMEB training and evaluation data used by the public configs in this repository.

Then edit experiments/public/rematch/train_image_mme5_hardneg.yaml and replace every DATASET_BASE_PATH with the directory that contains your mmE5/ folder. The expected layout is:

DATASET_BASE_PATH/
└── mmE5/
    └── mmE5-MMEB-hardneg/

The default script trains a Qwen2.5-VL-3B based ReMatch model with LoRA, 16 learnable query tokens, residual average fusion, orthogonal regularization, and the matching objective enabled. You can override common paths without editing the script:

EXP_DIR=/path/to/output \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
bash experiments/public/rematch/train-rematch-itm.sh

📊 Evaluation

Evaluation configs are under experiments/public/eval:

image.yaml

Please prepare the MMEB evaluation data following the original VLM2Vec instructions, then set DATA_BASEDIR to the directory containing the downloaded evaluation files.

Note: Evaluation scores may vary slightly across environments, as different PyTorch, CUDA, and flash-attn versions can introduce small numerical differences.

For checkpoints produced by this repository, we recommend using eval_all.py. It reads the experiment name and automatically matches the evaluation configuration used by ReMatch, including backbone type, target-side instruction prefix, chat template, learnable query tokens, and residual embedding fusion. For example, an experiment name containing Qwen2.5vl, TgtInstruction, Queries16, ResidualAvg, and ChatTemplate will be evaluated with the corresponding qwen2_5_vl, target instruction, 16 learnable tokens, average residual fusion, and chat-template settings.

Evaluate one experiment checkpoint:

DATA_BASEDIR=/path/to/vlm2vec_eval \
MODEL_BASEDIR=/path/to/training/outputs \
OUTPUT_BASEDIR=/path/to/eval/outputs \
MODALITIES="image" \
python eval_all.py \
  --model_name Rematch_Qwen2.5vl_3B.image.autoresize.lora32.loraAlpha64.BS1024.IB64.GCq32p32NormTemp002.lr1e4.step3kwarm100.lrCosine.TgtInstruction.mmE5H1.Queries16.ResidualAvg.OrthTriu0.2.ChatTemplate.ITM.V1.Ratio0.1 \
  --checkpoint_name checkpoint-2200

If no arguments are provided, eval_all.py scans outputs/<model_name>/<checkpoint_name>/, evaluates every checkpoint directory, and writes summaries to:

outputs/evals/<model_name>/<checkpoint_name>/final_results.json

For the released ReMatch-3B checkpoint, use eval.py directly and pass the matching ReMatch configuration explicitly:

torchrun --nproc_per_node=8 --master_port=2277 eval.py \
  --lora True \
  --pooling eos \
  --normalize true \
  --tgt_prefix_instruction True \
  --learnable_queries True \
  --residual_embedding True \
  --residual_embedding_method avg \
  --enable_chat_template True \
  --num_queries 16 \
  --per_device_eval_batch_size 16 \
  --model_backbone qwen2_5_vl \
  --model_name ReMatch-3B-PATH \
  --checkpoint_path ReMatch-3B-PATH \
  --dataset_config experiments/public/eval/image.yaml \
  --encode_output_path outputs/evals/ReMatch-3B/image \
  --data_basedir /path/to/MMEB

🙏 Acknowledgements

This codebase is built on top of VLM2Vec. We sincerely thank the VLM2Vec authors for releasing their training and evaluation infrastructure for massive multimodal embedding tasks.

We also thank the authors of Qwen2.5-VL, MMEB, and mmE5 for their open models, benchmarks, and data resources.

📚 Citation

@article{liu2025rematch,
  title={ReMatch: Boosting Representation through Matching for Multimodal Retrieval},
  author={Liu, Qianying and Liang, Xiao and Zhang, Zhiqiang and Chen, Yibo and Tang, Xu and Qing, Zhongfei and Zhou, Fengfan and Hu, Yao and Henderson, Paul},
  journal={arXiv preprint arXiv:2511.19278},
  year={2025}
}

Downloads last month: 11

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for FireRedTeam/ReMatch-3B

ReMatch: Boosting Representation through Matching for Multimodal Retrieval

Paper • 2511.19278 • Published Nov 24, 2025