ReMatch: Boosting Representation through Matching for Multimodal Retrieval
This repository contains the official implementation of ReMatch, accepted to CVPR 2026.
ReMatch turns a multimodal large language model into a stronger multimodal retriever by adding a chat-style generative matching objective during training. The same MLLM learns to judge query-document relevance from both raw multimodal inputs and projected embeddings, complementing standard contrastive learning with instance-wise supervision on hard negatives. ReMatch also augments each input with multiple learnable representation tokens and fuses them into an efficient single-vector embedding for retrieval.
π₯ Authors
Qianying Liu*, Xiao Liang*, Zhiqiang Zhang#, Yibo Chen, Xu Tang, Zhongfei Qing, Fengfan Zhou, Yao Hu, Paul Henderson
University of Glasgow, Xiaohongshu Inc., Huazhong University of Science and Technology
* Equal contribution. # Project leader.
π Method
ReMatch is built around two core ideas:
- Query-Document Matching: an additional autoregressive matching stage that predicts relevance from the query, document, and their projected embeddings.
- Learnable Multi-Token Embeddings: multiple learnable tokens capture fine-grained contextual signals; an orthogonality regularizer encourages complementary representations, and the fused output remains a standard dense embedding.
π₯ News
- 2026-05: ReMatch code, the ReMatch-3B checkpoint, and evaluation scripts are released.
- 2026-02: ReMatch is accepted to CVPR 2026.
- 2025-11: The ReMatch technical report is available on arXiv.
π οΈ Installation
conda create -n rematch python=3.10 -y
conda activate rematch
pip install -r requirements.txt
flash-attn can be sensitive to CUDA, PyTorch, and compiler versions. If installation fails, install the wheel matching your environment from the official FlashAttention release instructions, then rerun the remaining dependencies.
π€ Checkpoints
We release ReMatch-3B, a Qwen2.5-VL-3B based checkpoint trained with the ReMatch recipe:
For local checkpoints, pass the base model through --model_name and the adapter/full checkpoint through --checkpoint_path when evaluating.
π Training
The public ReMatch-3B training entry point is:
bash experiments/public/rematch/train-rematch-itm.sh
Before training, download the mmE5 hard-negative MMEB training data from Hugging Face:
In addition to mmE5, please follow the original VLM2Vec data preparation instructions to download the corresponding MMEB training and evaluation data used by the public configs in this repository.
Then edit experiments/public/rematch/train_image_mme5_hardneg.yaml and replace every DATASET_BASE_PATH with the directory that contains your mmE5/ folder. The expected layout is:
DATASET_BASE_PATH/
βββ mmE5/
βββ mmE5-MMEB-hardneg/
The default script trains a Qwen2.5-VL-3B based ReMatch model with LoRA, 16 learnable query tokens, residual average fusion, orthogonal regularization, and the matching objective enabled. You can override common paths without editing the script:
EXP_DIR=/path/to/output \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
bash experiments/public/rematch/train-rematch-itm.sh
π Evaluation
Evaluation configs are under experiments/public/eval:
Please prepare the MMEB evaluation data following the original VLM2Vec instructions, then set DATA_BASEDIR to the directory containing the downloaded evaluation files.
Note: Evaluation scores may vary slightly across environments, as different PyTorch, CUDA, and
flash-attnversions can introduce small numerical differences.
For checkpoints produced by this repository, we recommend using eval_all.py. It reads the experiment name and automatically matches the evaluation configuration used by ReMatch, including backbone type, target-side instruction prefix, chat template, learnable query tokens, and residual embedding fusion. For example, an experiment name containing Qwen2.5vl, TgtInstruction, Queries16, ResidualAvg, and ChatTemplate will be evaluated with the corresponding qwen2_5_vl, target instruction, 16 learnable tokens, average residual fusion, and chat-template settings.
Evaluate one experiment checkpoint:
DATA_BASEDIR=/path/to/vlm2vec_eval \
MODEL_BASEDIR=/path/to/training/outputs \
OUTPUT_BASEDIR=/path/to/eval/outputs \
MODALITIES="image" \
python eval_all.py \
--model_name Rematch_Qwen2.5vl_3B.image.autoresize.lora32.loraAlpha64.BS1024.IB64.GCq32p32NormTemp002.lr1e4.step3kwarm100.lrCosine.TgtInstruction.mmE5H1.Queries16.ResidualAvg.OrthTriu0.2.ChatTemplate.ITM.V1.Ratio0.1 \
--checkpoint_name checkpoint-2200
If no arguments are provided, eval_all.py scans outputs/<model_name>/<checkpoint_name>/, evaluates every checkpoint directory, and writes summaries to:
outputs/evals/<model_name>/<checkpoint_name>/final_results.json
For the released ReMatch-3B checkpoint, use eval.py directly and pass the matching ReMatch configuration explicitly:
torchrun --nproc_per_node=8 --master_port=2277 eval.py \
--lora True \
--pooling eos \
--normalize true \
--tgt_prefix_instruction True \
--learnable_queries True \
--residual_embedding True \
--residual_embedding_method avg \
--enable_chat_template True \
--num_queries 16 \
--per_device_eval_batch_size 16 \
--model_backbone qwen2_5_vl \
--model_name ReMatch-3B-PATH \
--checkpoint_path ReMatch-3B-PATH \
--dataset_config experiments/public/eval/image.yaml \
--encode_output_path outputs/evals/ReMatch-3B/image \
--data_basedir /path/to/MMEB
π Acknowledgements
This codebase is built on top of VLM2Vec. We sincerely thank the VLM2Vec authors for releasing their training and evaluation infrastructure for massive multimodal embedding tasks.
We also thank the authors of Qwen2.5-VL, MMEB, and mmE5 for their open models, benchmarks, and data resources.
π Citation
@article{liu2025rematch,
title={ReMatch: Boosting Representation through Matching for Multimodal Retrieval},
author={Liu, Qianying and Liang, Xiao and Zhang, Zhiqiang and Chen, Yibo and Tang, Xu and Qing, Zhongfei and Zhou, Fengfan and Hu, Yao and Henderson, Paul},
journal={arXiv preprint arXiv:2511.19278},
year={2025}
}
- Downloads last month
- 11
