SAGI - Swarm AGI Language Model
SAGI is a novel causal language model that integrates swarm intelligence dynamics with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory.
Model Description
| Property | Value |
|---|---|
| Parameters | 52.72M |
| Architecture | Transformer Decoder + Swarm Dynamics |
| Hidden Size | 512 |
| Layers | 6 |
| Attention Heads | 8 |
| Context Length | 2048 |
| Vocabulary | GPT-2 tokenizer (50,257 tokens) |
Key Innovations
- Differentiable Routing: Continuous mixture-of-experts via attention (
DiffRouter) instead of hard module selection - Adaptive Gating & Trust:
MetaControlleractivates capacity under resource constraints; trust dynamics bias reliable components - Episodic + Semantic Memory: Dual memory system with trainable retrieval utility
- Curiosity Engine: Injects novel goals when surprise is low, promoting exploration
- Self-Model & Rollback: Predicts state transitions and detects anomalies for self-correction
- Resource Dynamics: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy
- Value Monitoring: Tracks alignment to core values and freezes plasticity under drift
How It Works
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SAGI Model β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββ βββββββββββββββββββββββββββ β
β β Swarm-7 V2.2 βββββββΆβ Swarm State S, T β β
β β (Cognitive β β (Working Memory) β β
β β Dynamics) β βββββββββββββ¬ββββββββββββββ β
β ββββββββββ²βββββββββ β β
β β βΌ β
β β βββββββββββββββββββββββββββ β
β β β Transformer Decoder β β
β β β - Swarm-conditioned β β
β β β attention & FFN β β
β β β - RoPE embeddings β β
β β βββββββββββββ¬ββββββββββββββ β
β β β β
β ββββββββββ΄βββββββββ βββββββββββββββββββββββββββ β
β β Observation ββββββββ LM Head β β
β β (from tokens) β βββββββββββββββββββββββββββ β
β βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The swarm processes observations derived from token embeddings, updating its internal state S. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing.
Usage
Installation
pip install torch transformers datasets
Quick Start
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM, AutoConfig
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI")
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI")
# Generate text
model.eval()
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.8,
top_k=50,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model Architecture Details
Swarm Configuration
| Parameter | Value | Description |
|---|---|---|
max_agents |
20 | Number of internal cognitive agents |
dim_s |
64 | State dimension |
dim_t |
32 | Task/goal dimension |
dim_obs |
48 | Observation dimension |
topk_route |
5 | Sparse routing top-k |
K_thought_max |
5 | Maximum thinking iterations per step |
Resource Budgets
| Resource | Budget | Description |
|---|---|---|
| Compute | 60.0 | Compute budget per step |
| Memory | 20.0 | Memory capacity |
| Energy | 25.0 | Energy budget |
Trust & Plasticity
- Trust Learning Rate: 0.07
- Fast EMA (Plasticity): 0.10
- Slow EMA (Consolidation): 0.002
- Core Values:
["truth", "safety", "efficiency"]
Limitations
- Early Research Model: This is an experimental architecture exploring swarm-transformer integration
- Training Data: Currently trained on TinyStories subset; may produce simple, story-like outputs
- Compute Requirements: Swarm dynamics add overhead compared to standard transformers
- Generation Quality: Model is undertrained; outputs may be repetitive or incoherent
Intended Use
This model is intended for:
- Research into multi-agent cognitive architectures
- Exploration of dynamic, adaptive language models
- Educational purposes in understanding swarm intelligence + LLMs
Not intended for:
- Production applications
- Safety-critical systems
- Generation of factual content
Training Details
- Dataset: TinyStories (subset)
- Optimizer: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01)
- Scheduler: Cosine annealing
- Precision: FP32
- Hardware: CPU training (compatible with CUDA)
Citation
@software{sagi2026,
title={SAGI: Swarm AGI Language Model},
author={Reaperdoesntknow},
year={2026},
url={https://huggingface.co/your-reaperdoesntknow/SAGI}
}
Convergent Intelligence Portfolio
Part of the Standalone Models by Convergent Intelligence LLC: Research Division
Related Models
| Model | Downloads | Format |
|---|---|---|
| SMOLM2Prover | 56 | HF |
| SMOLM2Prover-GGUF | 150 | GGUF |
| DeepReasoning_1R | 16 | HF |
| S-AGI | 0 | HF |
Top Models from Our Lab
Total Portfolio: 41 models | 2,781 total downloads
Last updated: 2026-03-28 12:58 UTC
From the Convergent Intelligence Portfolio
DistilQwen Collection β Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B β 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
Top model: Qwen3-1.7B-Coder-Distilled-SFT β 508 downloads
Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)
Convergent Intelligence LLC: Research Division
- Downloads last month
- 270