SAGI - Swarm AGI Language Model

SAGI is a novel causal language model that integrates swarm intelligence dynamics with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory.

Model Description

Property Value
Parameters 52.72M
Architecture Transformer Decoder + Swarm Dynamics
Hidden Size 512
Layers 6
Attention Heads 8
Context Length 2048
Vocabulary GPT-2 tokenizer (50,257 tokens)

Key Innovations

  • Differentiable Routing: Continuous mixture-of-experts via attention (DiffRouter) instead of hard module selection
  • Adaptive Gating & Trust: MetaController activates capacity under resource constraints; trust dynamics bias reliable components
  • Episodic + Semantic Memory: Dual memory system with trainable retrieval utility
  • Curiosity Engine: Injects novel goals when surprise is low, promoting exploration
  • Self-Model & Rollback: Predicts state transitions and detects anomalies for self-correction
  • Resource Dynamics: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy
  • Value Monitoring: Tracks alignment to core values and freezes plasticity under drift

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       SAGI Model                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚   Swarm-7 V2.2  │─────▢│  Swarm State S, T       β”‚   β”‚
β”‚  β”‚  (Cognitive     β”‚      β”‚  (Working Memory)       β”‚   β”‚
β”‚  β”‚   Dynamics)     β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚                 β”‚
β”‚           β”‚                           β–Ό                 β”‚
β”‚           β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚           β”‚              β”‚  Transformer Decoder    β”‚    β”‚
β”‚           β”‚              β”‚  - Swarm-conditioned    β”‚    β”‚
β”‚           β”‚              β”‚    attention & FFN      β”‚    β”‚
β”‚           β”‚              β”‚  - RoPE embeddings      β”‚    β”‚
β”‚           β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚           β”‚                          β”‚                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚   Observation   │◀─────│      LM Head            β”‚   β”‚
β”‚  β”‚   (from tokens) β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The swarm processes observations derived from token embeddings, updating its internal state S. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing.

Usage

Installation

pip install torch transformers datasets

Quick Start

from transformers import AutoTokenizer
from transformers import  AutoModelForCausalLM, AutoConfig

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI")
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI")

# Generate text
model.eval()

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_k=50,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Architecture Details

Swarm Configuration

Parameter Value Description
max_agents 20 Number of internal cognitive agents
dim_s 64 State dimension
dim_t 32 Task/goal dimension
dim_obs 48 Observation dimension
topk_route 5 Sparse routing top-k
K_thought_max 5 Maximum thinking iterations per step

Resource Budgets

Resource Budget Description
Compute 60.0 Compute budget per step
Memory 20.0 Memory capacity
Energy 25.0 Energy budget

Trust & Plasticity

  • Trust Learning Rate: 0.07
  • Fast EMA (Plasticity): 0.10
  • Slow EMA (Consolidation): 0.002
  • Core Values: ["truth", "safety", "efficiency"]

Limitations

  • Early Research Model: This is an experimental architecture exploring swarm-transformer integration
  • Training Data: Currently trained on TinyStories subset; may produce simple, story-like outputs
  • Compute Requirements: Swarm dynamics add overhead compared to standard transformers
  • Generation Quality: Model is undertrained; outputs may be repetitive or incoherent

Intended Use

This model is intended for:

  • Research into multi-agent cognitive architectures
  • Exploration of dynamic, adaptive language models
  • Educational purposes in understanding swarm intelligence + LLMs

Not intended for:

  • Production applications
  • Safety-critical systems
  • Generation of factual content

Training Details

  • Dataset: TinyStories (subset)
  • Optimizer: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01)
  • Scheduler: Cosine annealing
  • Precision: FP32
  • Hardware: CPU training (compatible with CUDA)

Citation

@software{sagi2026,
  title={SAGI: Swarm AGI Language Model},
  author={Reaperdoesntknow},
  year={2026},
  url={https://huggingface.co/your-reaperdoesntknow/SAGI}
}

Convergent Intelligence Portfolio

Part of the Standalone Models by Convergent Intelligence LLC: Research Division

Related Models

Model Downloads Format
SMOLM2Prover 56 HF
SMOLM2Prover-GGUF 150 GGUF
DeepReasoning_1R 16 HF
S-AGI 0 HF

Top Models from Our Lab

Total Portfolio: 41 models | 2,781 total downloads

Last updated: 2026-03-28 12:58 UTC


From the Convergent Intelligence Portfolio

DistilQwen Collection β€” Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B β†’ 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.

Top model: Qwen3-1.7B-Coder-Distilled-SFT β€” 508 downloads

Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)

Convergent Intelligence LLC: Research Division

Downloads last month
270
Safetensors
Model size
52.7M params
Tensor type
I64
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including reaperdoesntknow/SAGI