Instructions to use SlitherCode/tiny-edu-166m-instruct-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SlitherCode/tiny-edu-166m-instruct-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SlitherCode/tiny-edu-166m-instruct-v3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SlitherCode/tiny-edu-166m-instruct-v3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3
- SGLang
How to use SlitherCode/tiny-edu-166m-instruct-v3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SlitherCode/tiny-edu-166m-instruct-v3 with Docker Model Runner:
docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3
# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True, dtype="auto")ParchmentLM 166M Instruct
A 166M parameter instruction-tuned language model trained entirely from scratch — custom architecture, real pretraining data, and full SFT pipeline — for under $55 in cloud compute.
This is a proof-of-concept demonstrating the full LLM development pipeline: architecture design, pretraining on real web data, supervised fine-tuning, and deployment. It is not intended for production use.
Model Details
- Developed by: Pranay Narula (SlitherCode)
- Model type: ParchmentLM — a custom decoder-only transformer architecture
- Language: English
- License: MIT
- Base model: SlitherCode/tiny-edu-166m (pretrained from scratch)
Architecture
ParchmentLM is a custom LLaMA-style architecture with the following components:
| Component | Details |
|---|---|
| Parameters | ~166M |
| Layers | 12 |
| Attention heads | 12 |
| Hidden size | 768 |
| FFN size | 3072 |
| Context length | 1024 tokens |
| Positional encoding | RoPE |
| Normalization | RMSNorm (pre-norm) |
| Activation | SwiGLU |
| Attention | FlashAttention (via scaled_dot_product_attention) |
| Tokenizer | tiktoken cl100k_base (vocab size 100,277) |
| Weight tying | Yes (input embeddings = output projection) |
Chat Template (ParchmentLM format)
system
You are a helpful assistant<|endoftext|>
user
{user message}<|endoftext|>
assistant
{assistant response}<|endoftext|>
<|endoftext|> (token ID 100257) serves as both the turn separator and stop token.
Training
Stage 1 — Pretraining
- Dataset: FineWeb-Edu 10BT sample (HuggingFaceFW/fineweb-edu)
- Tokens trained on: ~4B
- Infrastructure: Modal, single A100-40GB
- Throughput: ~75,000 tokens/sec
- Duration: ~14.8 hours
- Cost: ~$46
- Optimizer: AdamW (β1=0.9, β2=0.95, weight decay=0.1)
- Learning rate: 3e-4 with cosine decay to 3e-5, 2000 step warmup
- Batch size: 16 × 8 grad accum × 1024 seq len ≈ 131k tokens/step
- Precision: bfloat16
Stage 2 — Supervised Fine-Tuning
- Datasets:
- Cleanlab/databricks-dolly-15k-cleaned — filtered to
closed_qa,open_qa,information_extractioncategories (~7k examples) - ProCreations/SimpleMath — 2,500 examples per operation (+, -, *, /) balanced, 10k total
- Cleanlab/databricks-dolly-15k-cleaned — filtered to
- Total SFT examples: ~17k
- Loss: Completion-only (prompt and padding tokens masked to -100)
- Pad token:
<|endofprompt|>(token ID 83285) to preserve EOT as a learnable stop signal - Epochs: 8
- Learning rate: 1e-4 cosine decay
- Batch size: 16 × 2 grad accum
- Duration: ~38 minutes
- Cost: ~$1.50
- Infrastructure: Modal, single A100-40GB
- Precision: bfloat16
Total training cost: ~$55 with many sft iterations
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
tokenizer.pad_token = "<|endofprompt|>"
model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166M-instruct", trust_remote_code=True)
model.eval()
PAD_ID = tokenizer.convert_tokens_to_ids("<|endofprompt|>")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
input_len = inputs["input_ids"].shape[1]
import torch
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=100,
do_sample=False,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=PAD_ID,
)
raw = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=False)
response = raw.split("<|endoftext|>")[0].strip()
print(response)
# The capital of France is Paris.
Note: For arithmetic, use the format "47 + 83 =" rather than "What is 47 + 83?" to match the training distribution.
Evaluation
Informal evaluation on held-out questions:
| Question | Response | Correct? |
|---|---|---|
| What is the capital of France? | The capital of France is Paris. | ✓ |
| What is the capital of Germany? | The capital of Germany is Berlin. | ✓ |
| Who wrote Romeo and Juliet? | Romeo and Juliet was written by William Shakespeare. | ✓ |
| 12 + 5 = | 17 | ✓ |
| 900 - 345 = | 700 | ✗ (off by ~145) |
| 2790 + 6698 = | 9648 | ✗ (correct: 9488) |
Limitations:
- Reliable arithmetic only up to ~2-3 digit operands
- Tends to hallucinate on out-of-distribution factual questions
- No safety filtering or alignment
- Will not stop gracefully on prompts with no clear answer (creative writing, open-ended tasks)
- Undertrained relative to model capacity — 4B tokens vs. the ~300B tokens models this size typically see
Compute & Environmental Impact
- Hardware: NVIDIA A100-40GB (via Modal)
- Cloud provider: Modal (AWS us-east-1 region)
- Total GPU hours: ~15.5 hours
- Total cost: ~$55 USD
Citation
If you use this model or find this project useful, a link back to the repository is appreciated.
@misc{narula2025parchmentlm,
author = {Pranay Narula},
title = {ParchmentLM 166M Instruct: Full LLM Pipeline From Scratch},
year = {2025},
url = {https://huggingface.co/SlitherCode/tiny-edu-166M-instruct}
}
- Downloads last month
- 28
Model tree for SlitherCode/tiny-edu-166m-instruct-v3
Base model
SlitherCode/tiny-edu-166m
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True)