Image-to-Text
Transformers
PyTorch
Safetensors
English
git
image-text-to-text
vision
image-captioning
Instructions to use microsoft/git-large-coco with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/git-large-coco with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="microsoft/git-large-coco")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("microsoft/git-large-coco") model = AutoModelForImageTextToText.from_pretrained("microsoft/git-large-coco") - Notebooks
- Google Colab
- Kaggle
Error while fine-tuning GIT on custom dataset
#3
by aambati - opened
Hello,
I am trying to finetune the GiT model on a custom dataset. While doing so, I getting the following error:
AttributeError :Traceback (most recent call last)
<ipython-input-19-0bf9eb119ab0> in <cell line: 19>()
17 )
18
---> 19 trainer.train()
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).__name__, name))
1616
AttributeError: 'GitModel' object has no attribute 'img_temperal_embedding'
I am using the following function to update the dataset:
def prepare_dataset(example):
image = example["image"]
example.update(processor(images=image, text=example["text"]))
return example
And this is how I am loading the processor and model:
processor = GitProcessor.from_pretrained("microsoft/git-base-coco")
model = AutoModelForCausalLM.from_pretrained("microsoft/git-base-coco")
And here is the training script:
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(output_dir="output",)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=data,
eval_dataset=data,
compute_metrics=compute_metrics,
)
trainer.train()
Hi,
You should not use the AutoModelForCausalLM class for fine-tuning GIT. Rather, use the GitForCausalLM or AutoModelForVision2Seq class.
Which Transformers version are you using.