Instructions to use AxionLab-official/MiniBot-0.9M-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AxionLab-official/MiniBot-0.9M-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AxionLab-official/MiniBot-0.9M-Instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AxionLab-official/MiniBot-0.9M-Instruct")
model = AutoModelForCausalLM.from_pretrained("AxionLab-official/MiniBot-0.9M-Instruct")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AxionLab-official/MiniBot-0.9M-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AxionLab-official/MiniBot-0.9M-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AxionLab-official/MiniBot-0.9M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/AxionLab-official/MiniBot-0.9M-Instruct

SGLang

How to use AxionLab-official/MiniBot-0.9M-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AxionLab-official/MiniBot-0.9M-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AxionLab-official/MiniBot-0.9M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AxionLab-official/MiniBot-0.9M-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AxionLab-official/MiniBot-0.9M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use AxionLab-official/MiniBot-0.9M-Instruct with Docker Model Runner:
```
docker model run hf.co/AxionLab-official/MiniBot-0.9M-Instruct
```

🧠 MiniBot-0.9M-Instruct

Instruction-tuned GPT-2 style language model (~900K parameters) optimized for Portuguese conversational tasks.

📌 Overview

MiniBot-0.9M-Instruct is the instruction-tuned version of MiniBot-0.9M-Base, designed to follow prompts more accurately, respond to user inputs, and generate more coherent conversational outputs in Portuguese.

Built on a GPT-2 architecture (~0.9M parameters), this model was fine-tuned on conversational and instruction-style data to improve usability in real-world interactions.

🎯 Key Characteristics

Attribute	Detail
🇧🇷 Language	Portuguese (primary)
🧠 Architecture	GPT-2 style (Transformer decoder-only)
🔤 Embeddings	GPT-2 compatible
📉 Parameters	~900K
⚙️ Base Model	MiniBot-0.9M-Base
🎯 Fine-tuning	Instruction tuning (supervised)
✅ Alignment	Basic prompt-following behavior

🧠 What Changed from Base?

Instruction tuning introduced significant behavioral improvements with no architectural changes:

Feature	Base	Instruct
Prompt understanding	❌	✅
Conversational flow	⚠️ Partial	✅
Instruction following	❌	✅
Overall coherence	Low	Improved
Practical usability	Experimental	Functional

💡 The model is now significantly more usable in chat scenarios.

🏗️ Architecture

The core architecture remains identical to the base model:

Decoder-only Transformer (GPT-2 style)
Token embeddings + positional embeddings
Self-attention + MLP blocks
Autoregressive generation

No structural changes were made — only behavioral improvement through fine-tuning.

📚 Fine-Tuning Dataset

The model was fine-tuned on a Portuguese instruction-style conversational dataset composed of:

💬 Questions and answers
📋 Simple instructions
🤖 Assistant-style chat
🎭 Basic roleplay
🗣️ Natural conversations

Expected format:

User: Me explique o que é gravidade
Bot: A gravidade é a força que atrai objetos com massa...

Training strategy:

Supervised Fine-Tuning (SFT)
Pattern learning for instruction-following
No RLHF or preference optimization

💡 Capabilities

✅ Strengths

Following simple instructions
Answering basic questions
Conversing more naturally
Higher coherence in short responses
More consistent dialogue structure

❌ Limitations

Reasoning is still limited
May generate incorrect facts
Does not retain long context
Sensitive to poorly structured prompts

⚠️ Even with instruction tuning, this remains an extremely small model. Adjust expectations accordingly.

🚀 Getting Started

Installation

pip install transformers torch

Usage with Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "AxionLab-official/MiniBot-0.9M-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "User: Me diga uma curiosidade sobre o espaço\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚙️ Recommended Settings

Parameter	Recommended Value	Description
`temperature`	`0.6 – 0.8`	Controls randomness
`top_p`	`0.85 – 0.95`	Nucleus sampling
`do_sample`	`True`	Enable sampling
`max_new_tokens`	`40 – 100`	Response length

💡 Instruct models tend to perform better at lower temperatures. Try values around 0.65 for more accurate and focused responses.

🧪 Intended Use Cases

Use Case	Suitability
💬 Lightweight Portuguese chatbots	✅ Ideal
🎮 NPCs and games	✅ Ideal
🧠 Fine-tuning experiments	✅ Ideal
📚 NLP education	✅ Ideal
⚡ Local / CPU-only applications	✅ Ideal
🏭 Critical production environments	❌ Not recommended

⚠️ Disclaimer

Extremely small model (~900K parameters)
No robust alignment (no RLHF)
May generate incorrect or nonsensical responses
Not suitable for critical production environments

🔮 Future Work

🧠 Reasoning-tuned version (MiniBot-Reason)
📈 Scaling to 1M–10M parameters
📚 Larger and more diverse dataset
🤖 Improved response alignment
🧩 Tool-use experiments

📜 License

Distributed under the MIT License. See LICENSE for more details.

👤 Author

Developed by AxionLab 🔬

_{MiniBot-0.9M-Instruct · AxionLab · MIT License}

Downloads last month: 55

Safetensors

Model size

986k params

Tensor type

F32

Model tree for AxionLab-official/MiniBot-0.9M-Instruct

Base model

AxionLab-official/MiniBot-0.9M-Base

Finetuned

(1)

this model

Collection including AxionLab-official/MiniBot-0.9M-Instruct

MiniText Family 🧠

Collection

A family of 10k-10M params LM's (This family will grow up with time) • 4 items • Updated Apr 5