fable-traces

A compact instruction-tuned language model built on Qwen/Qwen3-4B-Instruct-2507. fable-traces is tuned for short, conversational replies and runs comfortably on a single mid-range GPU.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "AliesTaha/fable-traces"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto")

messages = [{"role": "user", "content": "Tell me something interesting."}]
ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=100, do_sample=False)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))

Serve with vLLM:

vllm serve AliesTaha/fable-traces

Details

Base model Qwen3-4B-Instruct-2507
Parameters ~4B
Precision bfloat16 (safetensors)
Prompt format ChatML — use the tokenizer's chat template
Context length inherits the base model

License

Apache 2.0, following the base model.

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for suryatmodulus/fable-traces

Finetuned
(1791)
this model