GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates

This repository contains the weights for GLAD, a vision-language tracking model introduced in the paper GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates.

Overview

GLAD (Generative Language-AssisteD tracking) is a pioneering model that utilizes diffusion models for generative multi-modal fusion of text descriptions and template images.

Current vision-language trackers often struggle with "low-semantic" images (such as those with significant blur or low resolution) because traditional discriminative fusion paradigms have limited effectiveness in bridging the gap between text and degraded visual features. GLAD addresses this by leveraging the reconstruction capabilities of generative models to bolster compatibility between language and images, effectively enhancing the semantic information of the template for more robust tracking.

Resources

Citation

If you find this work useful in your research, please cite:

@article{luo2026glad,
  title={GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates},
  author={Luo, Xingyu and Cai, Yidong and Liu, Jie and Tang, Jie and Wu, Gangshan and Wang, Limin},
  journal={arXiv preprint arXiv:2602.00570},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Confetti/GLAD