GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates
This repository contains the weights for GLAD, a vision-language tracking model introduced in the paper GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates.
Overview
GLAD (Generative Language-AssisteD tracking) is a pioneering model that utilizes diffusion models for generative multi-modal fusion of text descriptions and template images.
Current vision-language trackers often struggle with "low-semantic" images (such as those with significant blur or low resolution) because traditional discriminative fusion paradigms have limited effectiveness in bridging the gap between text and degraded visual features. GLAD addresses this by leveraging the reconstruction capabilities of generative models to bolster compatibility between language and images, effectively enhancing the semantic information of the template for more robust tracking.
Resources
- Paper: GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates
- GitHub Repository: https://github.com/Confetti-lxy/GLAD
Citation
If you find this work useful in your research, please cite:
@article{luo2026glad,
title={GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates},
author={Luo, Xingyu and Cai, Yidong and Liu, Jie and Tang, Jie and Wu, Gangshan and Wang, Limin},
journal={arXiv preprint arXiv:2602.00570},
year={2026}
}