AI & ML interests

VLM, Geo Reasoning, Visual Retriving

Recent Activity

etzion  updated a Space about 2 hours ago
TLVLM/README
etzion  updated a collection 1 day ago
SigLIP2s
etzion  updated a collection 1 day ago
SigLIP2s
View all activity

Organization Card

TLV R&D VLMs for Image Retrieving and Visual Reasoning

Vision-Language Retrieving Models

Model Name Model Type Base Model Training Set Owner Link Freezed Parameters
ImiClip CLIP openai/clip-vit-base-patch32 DM Etzion TLVLM/ImiClip Vision Encoder
ImiClip_v2 CLIP openai/clip-vit-base-patch32 DM + RSICD Etzion TLVLM/ImiClip_v2 Vision Encoder
ImiClip_v3 CLIP openai/clip-vit-base-patch32 DM + RSICD Etzion TLVLM/ImiClip_v3
ImiGlip SigLIP google/siglip-so400m-patch14-384 DM Etzion TLVLM/ImiGlip Vision Encoder
ImiGlip_V2 SigLIP google/siglip-so400m-patch14-384 DM + RSICD Etzion TLVLM/ImiGlip_V2 Vision Encoder
ImiGlip_V3 SigLIP google/siglip-so400m-patch14-384 DM + RSICD Etzion TLVLM/ImiGlip_V3
ImiGlip2 SigLIP2 google/siglip2-so400m-patch14-384 DM + RSICD Etzion TLVLM/ImiGlip2 Both Encoders + Logits
ImiGlip2n SigLIP2 google/siglip2-so400m-patch16-naflex DM + RSICD Etzion TLVLM/ImiGlip2n Both Encoders + Logits

Runtime

Model Type Base Model Time per Single Text Time per Single Image Time per 10,000 Texts Time per 10,000 Images
CLIP openai/clip-vit-base-patch32 0.0129 0.0101 129.4 100.8
SigLIP (1+2) google/siglip-so400m-patch14-384 0.0578 0.0189 577.5 188.9
SigLIP2n google/siglip2-so400m-patch16-naflex 0.0257 0.0189 257.0 188.6

Important notes:

  • Time reported in seconds.
  • All the calculation conduct on NVIDIA A40 GPU
  • Avr. Text length: 633±93 Characters
  • Avr. Image size: $536^2$ Pixels

Collections

Here you can find the model Collections

models 0

None public yet

datasets 0

None public yet