TinyGPT2-IT

A 95M parameter instruction-tuned language model trained from scratch on a single consumer GPU

Overview

TinyGPT2-IT is an instruction-tuned variant of TinyGPT2 — a modern GPT architecture built from scratch using PyTorch. The base model was pretrained on ~6.7B tokens from OpenWebText, then supervised fine-tuned (SFT) on Stanford Alpaca's 52K instruction-response pairs.

The entire pipeline — pretraining, fine-tuning, and inference — runs on a single NVIDIA RTX 3070 Ti (8 GB VRAM).

This model uses a custom architecture and requires trust_remote_code=True.

Architecture

Component	Detail
Parameters	~95M
Layers	12 transformer blocks
Attention	Grouped Query Attention (12 query heads, 4 KV groups)
Embedding dim	768
FFN hidden dim	2048
Position encoding	Rotary Position Embeddings (RoPE)
Normalization	RMSNorm
Context window	512 tokens
Vocabulary	50,304 (GPT-2 tiktoken + PAD token)
Weight tying	Token embedding ↔ LM head
KV Cache	Supported for efficient generation

Training

Stage 1 — Pretraining


Dataset	OpenWebText (~6.7B tokens)
Optimizer	AdamW (fused)
Effective batch	262K tokens/step
Precision	bfloat16 + `torch.compile`
Hardware	NVIDIA RTX 3070 Ti (8 GB)

Stage 2 — Supervised Fine-Tuning (SFT)


Dataset	Stanford Alpaca (52K instructions)
Epochs	3
Loss masking	Response-only (instruction tokens are masked)
Final train loss	1.91
Final val loss	1.98
Final val perplexity	7.26
Tokens processed	~72M
Prompt format	`### Instruction: ... ### Response: ...`

Usage

Quick Start

from transformers import AutoModelForCausalLM
import tiktoken
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "NotShrirang/tinygpt2-it",
    trust_remote_code=True,
)
model.eval()

# Tokenize
enc = tiktoken.get_encoding("gpt2")
prompt = "### Instruction:\nWhat is the capital of France?\n\n### Response:\n"
input_ids = torch.tensor([enc.encode(prompt)])

# Generate
with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=128, do_sample=True, temperature=0.7, top_k=40)

print(enc.decode(output[0].tolist()))

Prompt Format

This model expects instructions in the following template:

### Instruction:
{your instruction here}

### Response:

For instructions with additional context:

### Instruction:
{your instruction here}

### Input:
{additional context}

### Response:

Example Outputs

Factual Q&A

>>> What is the capital of France?
The capital of France is Paris.

Explanation

>>> Explain what machine learning is in simple terms.
Machine learning is a branch of computer science that focuses on using algorithms to
identify patterns in data. These algorithms are used to analyze large amounts of data
and make predictions about future trends.

Creative

>>> Write a motivational quote.
"The only way to make a difference is to be bold and courageous."

Limitations

Small model — 95M parameters is far below production LLMs; expect factual errors, repetition, and limited reasoning.
Short context — 512 token window limits the length of conversations and documents.
Training data — pretrained on web text and fine-tuned on synthetic Alpaca data, which may contain biases or inaccuracies.
Not safety-aligned — no RLHF/DPO applied to this checkpoint; the model may produce harmful or inappropriate content.

Model Family

Model	Params	Description	Link
TinyGPT	51M	Standard GPT, TinyStories	GitHub
TinyGPT-MoE	85M	Mixture of Experts, TinyStories	GitHub
Wikipedia-MoE	135M	8-expert MoE, Wikipedia/C4	GitHub
TinyGPT2	95M	RoPE + GQA + RMSNorm, OpenWebText	GitHub
TinyGPT2.1	183M	Scaled TinyGPT2, FineWeb-Edu	GitHub
TinyGPT2-IT	95M	Instruction-tuned (this model)	You are here
TinyGPT2-DPO	95M	DPO-aligned with Anthropic HH-RLHF	GitHub

Citation

@misc{tinygpt2-it,
  author       = {Shrirang Mahajan},
  title        = {TinyGPT2-IT: Instruction-Tuned 95M Parameter Language Model},
  year         = {2025},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/NotShrirang/tinygpt2-it}
}

License

This model is released under the GPL-3.0 License.

Downloads last month: 5

Safetensors

Model size

95.3M params

Tensor type

C64

F32

NotShrirang
/

tinygpt2-it