AD-Copilot

A vision-language assistant for industrial anomaly detection via visual in-context comparison.

Model Details

Model Description

  • Developed by: Xi Jiang, Yue Guo, Jian Li, Yong Liu, Bin-Bin Gao, Hanqiu Deng, Jun Liu, Heng Zhao, Chengjie Wang, Feng Zheng
  • Model type: Vision-Language Model (VLM)
  • Language(s): English and Chinese
  • License: Apache 2.0
  • Finetuned from: Qwen/Qwen2.5-VL-7B-Instruct

Model Sources

Uses

Direct Use

AD-Copilot can be used for:

  • Industrial anomaly detection and localization
  • Natural language question answering about product defects
  • Visual comparison between normal reference images and query images
  • General visual question answering

How to Get Started with the Model

from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info

model = AutoModelForImageTextToText.from_pretrained(
    "jiang-cc/AD-Copilot",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("jiang-cc/AD-Copilot")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "<path_to_reference_image>"},
            {"type": "image", "image": "<path_to_query_image>"},
            {"type": "text", "text": "The first image is a normal reference. Is there any anomaly in the second image? If so, describe it."},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    return_tensors="pt"
).to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=512)
response = processor.batch_decode(
    output_ids[:, inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)[0]
print(response)

Citation

BibTeX:

@article{jiang2026ad,
  title   = {AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison},
  author  = {Jiang, Xi and Guo, Yue and Li, Jian and Liu, Yong and Gao, Bin-Bin and Deng, Hanqiu and Liu, Jun and Zhao, Heng and Wang, Chengjie and Zheng, Feng},
  journal = {arXiv preprint arXiv:2603.13779},
  year    = {2026}
}
Downloads last month
13
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for jiang-cc/AD-Copilot