File size: 1,731 Bytes
2b83ce3 0479392 2b83ce3 0479392 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | ---
language: en
license: mit
tags:
- pretrained
- causal-lm
- fineweb-edu
- custom-architecture
---
# tiny-edu-166m (ParchmentLM)
A 166M parameter transformer pretrained from scratch on 4B tokens of [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
## Architecture (ParchmentLM)
Custom decoder-only transformer:
- **Parameters:** 166M
- **Layers:** 12
- **Hidden size:** 768
- **Attention heads:** 12
- **FFN:** SwiGLU (hidden=2048)
- **Context length:** 1024
- **Positional encoding:** RoPE (base=10000)
- **Normalization:** RMSNorm
- **Tokenizer:** cl100k_base (100277 tokens) — same as GPT-4
## Training
- **Dataset:** FineWeb-Edu 10BT sample
- **Tokens seen:** ~4B
- **Steps:** 30,000
- **Optimizer:** AdamW (lr=3e-4, cosine decay to 3e-5)
- **Hardware:** Single A100 40GB
## Installation
```bash
pip install transformers tiktoken
```
> **Note:** `tiktoken` is required because the tokenizer wraps OpenAI's cl100k_base encoding
> to guarantee byte-identical token IDs to the vocabulary the model was trained on.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
inputs = tokenizer("The history of mathematics", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.8, top_k=50)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```
## License
Model weights: MIT.
Training data: This work uses the FineWeb-Edu dataset, available under the Open Data Commons Attribution License (ODC-By 1.0).
|