YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Tritter 500M Hybrid BitNet
Model Details
- Model Size: 500M parameters
- Architecture: BitNet 1.58-bit ternary quantization
- Training Methodology: Hybrid Predictive Training (Embedding-Prediction Paradigm)
- Quantization: {-1, 0, 1} ternary weights
Overview
This model is trained using Hybrid Predictive Training, which combines:
- Embedding-prediction paradigm: Core computation in continuous embedding space
- BitNet 1.58-bit quantization: Efficient ternary weight representation
- Dual prediction heads: Both embedding and token space outputs during training
The model operates in continuous embedding space at inference time, with token prediction as temporary scaffolding for training compatibility.
Architecture Specifications
- Hidden Size: 2048
- Number of Layers: 16
- Number of Attention Heads: 32
- Intermediate Size: ~3.5x hidden (7168)
- Max Position Embeddings: 4096
- Context Window: 4K tokens
Training Data
- Total Tokens: ~100B tokens
- Data Mix: Code-centric (Python, Rust, technical documentation)
- Quality Gates: Hardcoded secrets rejected, security checks enabled
Comparison with Standard Training
For comparison with the standard trained baseline, see:
- Standard 500M: tzervas/tritter-500m-bitnet
Key Differences:
| Metric | Standard | Hybrid Predictive |
|---|---|---|
| Training Methodology | Standard token prediction | Embedding + token prediction |
| Convergence Speed | Baseline | Expected: 10-15% faster |
| Final Loss | Baseline | Expected: 5-10% lower |
| Embedding Quality | Standard | Expected: Improved semantic structure |
Training Metrics
Convergence Comparison:
| Step | Standard Loss | Hybrid Loss | Improvement |
|---|---|---|---|
| 20K | metric pending | metric pending | โ |
| 100K | metric pending | metric pending | โ |
| 200K | metric pending | metric pending | โ |
Final Metrics:
- Final Training Loss: pending
- Final Validation Loss: pending
- Training Time: pending
- Hardware: RTX 5080 16GB (with gradient checkpointing)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("tzervas/tritter-500m-hybrid-bitnet")
tokenizer = AutoTokenizer.from_pretrained("tzervas/tritter-500m-hybrid-bitnet")
# Generate text
inputs = tokenizer("def fibonacci", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
Model Details
- Framework: PyTorch
- Quantization Method: BitNet 1.58-bit ternary
- License: MIT
Research Background
For more information on hybrid predictive training, see:
- Embedding-Prediction Paradigm: Operating in continuous embedding space
- BitNet 1.58-bit: Efficient ternary quantization {-1, 0, 1}
- Progressive Layer Loading: Support for larger models on limited VRAM
Citation
If you use this model, please cite:
@model{tritter500m_hybrid,
author={Tzervas, K.},
title={Tritter 500M Hybrid BitNet: Embedding-Prediction Training},
year={2025},
publisher={Hugging Face}
}
Created as part of the Tritter multimodal transformer research project.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support