selfmaker
/

image_caption

Model card Files Files and versions

image_caption / README.md

selfmaker's picture

Update README.md

a9f4d38 verified 11 months ago

|

history blame contribute delete

696 Bytes

	---
	license: cc-by-nc-nd-4.0
	tags:
	- Image
	- Captionning
	- RESNET-152
	- LSTM
	---

	## Introduction

	This model is defined as proposed in the book "mastering pytorch".
	It is based on CNN-encoder and a LSTM-decoder.

	The CNN-encoder is based on a pretrained RESNET-152. The last layer of the resnet is replaced by a vector embedding layer of 256 elements.
	The LSTM-decoder use an input of 256, a hidden layer of 512, and uses the vocabulary size.

	The model has been trained as a pure learning exercise, and so the model performances remain relatively mean.

	## Training procedure

	For the sake of the exercise, the model has been trained for only 5 epochs.

	It has been trained on the COCO dataset.