Compact VLM Filter: Image-caption filtration-oriented Qwen2VL model

This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct trained to perform filtration-oriented image-text evaluation, based on our custom dataset.

πŸ” Intended Use

The model is designed to:

  • Evaluate alignment of image and caption
  • Provide image/caption alignment scores and textual justification for noisy web-scale data
  • Supports local deployment for cost-efficient training data filtration

πŸ‹οΈ Training Details

  • Base model: Qwen/Qwen2-VL-2B-Instruct
  • Fine-tuning objective: in-context evaluation of aligment, quality and safety
  • Dataset: ~4.8K samples with score, justification, caption, and image

🀝 Acknowledgements

Thanks to the Qwen team for open-sourcing their VLM models, which serve as the foundation for our filtration-oriented model.

πŸ“œ License

Licensed under the Apache License 2.0.

Downloads last month
72
Safetensors
Model size
2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Dauka-transformers/Compact_VLM_filter

Base model

Qwen/Qwen2-VL-2B
Finetuned
(343)
this model

Dataset used to train Dauka-transformers/Compact_VLM_filter