{Model Name}
{Model Name} is a {X}B parameter language model trained on {dataset} as part of [{project/suite name}]({paper link}). {What makes this release distinctive -- e.g., "All intermediate training checkpoints are publicly available to support research on training dynamics, memorization, and emergent capabilities."}
{Paragraph on research motivation: what question does this model help answer? What gap does it fill in the ecosystem? See [our paper]({paper link}) for full details.}
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/{model-name}",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/{model-name}")
# Perplexity on a passage
inputs = tokenizer("your text here", return_tensors="pt").to(model.device)
with torch.no_grad():
loss = model(**inputs, labels=inputs["input_ids"]).loss
perplexity = torch.exp(loss)
print(f"Perplexity: {perplexity.item():.2f}")
# Generation
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Loading in fp16 requires approximately {X} GB of GPU memory. The full fp32 weights are {X} GB on disk.
Accessing Intermediate Checkpoints
One of the key features of this release is the availability of {N} intermediate training checkpoints, from initialization (step 0) through the final training step ({step N}). These are stored as branches in this repository.
from transformers import AutoModelForCausalLM
# Load the model at step 1000
model = AutoModelForCausalLM.from_pretrained("EleutherAI/{model-name}", revision="step1000")
Checkpoints were saved every {N} steps. The main branch contains the final checkpoint. {Note any exceptions or irregularities in the checkpoint schedule.}
This makes {Model Name} suitable for research on:
- How model capabilities develop over the course of training
- Memorization and forgetting dynamics
- The effect of specific training data on model behavior
- Checkpoint-level analysis of emergent properties
Architecture
{Model Name} uses a {transformer variant} architecture with {N} layers, a hidden dimension of {N}, and {N} attention heads, for a total of {X}B parameters. {Any notable choices: positional encoding scheme, activation function, tied embeddings, etc. and why.}
The full architectural specification:
| Parameters | {X}B |
| Layers | {N} |
| Hidden Dimension | {N} |
| Attention Heads | {N} |
| Context Length | {N} tokens |
| Vocabulary Size | {N} |
Training
Data
{Model Name} was trained on {dataset name}, a {size in tokens}-token dataset consisting of {description}. {How the dataset was constructed, any filtering or deduplication, known characteristics.}
{Known biases or issues in the training data and their expected impact on model behavior.}
Procedure
Training used GPT-NeoX with DeeperSpeed on {N}x {GPU type} GPUs. The model was trained for {N} steps ({N} tokens) with a batch size of {N} tokens, using {optimizer} with a peak learning rate of {lr} and {schedule} schedule.
The complete training configuration is available at {config file}. {Any notable training decisions: why this LR, why this batch size, any restarts or interventions during training.}
Evaluation
We evaluate {Model Name} using the Language Model Evaluation Harness.
| Benchmark | Score | What it measures |
|---|---|---|
| {name} | {score} | {description} |
{Commentary on results: how does this compare to models of similar size? Any surprising results? Caveats about particular benchmarks?}
Limitations and Intended Use
{Model Name} is a raw language model released for research purposes. It has not been fine-tuned for instruction following, safety, or any particular downstream task.
This model will produce biased, offensive, and factually incorrect text. It reflects the biases present in its training data. Do not rely on it for factual accuracy or use it in any setting where its outputs could cause harm.
Intended research applications include {list of 2-3 specific research use cases this model is well-suited for}.
Reproducing This Model
{Model Name} is fully reproducible. {Description of what "reproducible" means here: same data order, same config, same results up to hardware nondeterminism.}
- Clone GPT-NeoX at
{version} - {Data setup -- link to data if preprocessed, or preprocessing instructions}
- {Config and launch instructions}
{Any known reproduction issues or tips.}
Citation
If you use this model in your research, please cite:
@article{...}
About EleutherAI
EleutherAI is a grassroots research collective focused on open-source AI research. Find us on Discord or GitHub.
Related resources: