TinyFlux

A /12 scaled Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.

Model Description

TinyFlux is a miniaturized version of FLUX.1-schnell that preserves the essential architectural components:

Double-stream blocks (MMDiT style) - separate text/image pathways with joint attention
Single-stream blocks - concatenated text+image with shared weights
AdaLN-Zero modulation - adaptive layer norm with gating
3D RoPE - rotary position embeddings for temporal + spatial positions
Flow matching - rectified flow training objective

Architecture Comparison

Component	Flux	TinyFlux	Scale
Hidden size	3072	256	/12
Attention heads	24	2	/12
Head dimension	128	128	preserved
Double-stream layers	19	3	/6
Single-stream layers	38	3	/12
VAE channels	16	16	preserved
Total params	~12B	~8M	/1500

Text Encoders

TinyFlux uses smaller text encoders than standard Flux:

Role	Flux	TinyFlux
Sequence encoder	T5-XXL (4096 dim)	flan-t5-base (768 dim)
Pooled encoder	CLIP-L (768 dim)	CLIP-L (768 dim)

Training

Dataset

Trained on AbstractPhil/flux-schnell-teacher-latents:

10,000 samples
Pre-computed VAE latents (16, 64, 64) from 512×512 images
Diverse prompts covering people, objects, scenes, styles

Training Details

Objective: Flow matching (rectified flow)
Timestep sampling: Logit-normal with Flux shift (s=3.0)
Loss weighting: Min-SNR-γ (γ=5.0)
Optimizer: AdamW (lr=1e-4, β=(0.9, 0.99), wd=0.01)
Schedule: Cosine with warmup
Precision: bfloat16

Flow Matching Formulation

Interpolation: x_t = (1 - t) * noise + t * data
Target velocity: v = data - noise
Loss: MSE(predicted_v, target_v) * min_snr_weight(t)

Usage

Installation

pip install torch transformers diffusers safetensors huggingface_hub

Inference

import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL

# Load model (copy TinyFlux class definition first)
config = TinyFluxConfig()
model = TinyFlux(config).to("cuda").to(torch.bfloat16)

weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
model.load_state_dict(weights)
model.eval()

# Load encoders
t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")

# Encode prompt
prompt = "a photo of a cat"
t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
t5_out = t5_enc(**t5_in).last_hidden_state
clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
clip_out = clip_enc(**clip_in).pooler_output

# Euler sampling (t: 0→1, noise→data)
x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
timesteps = torch.linspace(0, 1, 21, device="cuda")

for i in range(20):
    t = timesteps[i].unsqueeze(0)
    dt = timesteps[i+1] - timesteps[i]
    guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
    
    v = model(
        hidden_states=x,
        encoder_hidden_states=t5_out,
        pooled_projections=clip_out,
        timestep=t,
        img_ids=img_ids,
        guidance=guidance,
    )
    x = x + v * dt

# Decode
latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
latents = latents / vae.config.scaling_factor
image = vae.decode(latents.float()).sample
image = (image / 2 + 0.5).clamp(0, 1)

Full Inference Script

See the inference_colab.py for a complete generation pipeline with:

Classifier-free guidance
Batch generation
Image saving

Files

AbstractPhil/tiny-flux/
├── model.safetensors      # Model weights (~32MB)
├── config.json            # Model configuration
├── README.md              # This file
├── model.py               # Model architecture definition
├── inference_colab.py     # Inference script
├── train_colab.py         # Training script
├── checkpoints/           # Training checkpoints
│   └── step_*.safetensors
├── logs/                  # Tensorboard logs
└── samples/               # Generated samples during training

Limitations

Resolution: Trained on 512×512 only
Quality: Significantly lower than full Flux due to reduced capacity
Text understanding: Limited by smaller T5 encoder (768 vs 4096 dim)
Fine details: May struggle with complex scenes or fine-grained details
Experimental: Intended for research and learning, not production use

Intended Use

Understanding Flux/MMDiT architecture
Rapid prototyping and experimentation
Educational purposes
Resource-constrained environments
Baseline for architecture modifications

Citation

If you use TinyFlux in your research, please cite:

@misc{tinyflux2025,
  title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
  author={AbstractPhil},
  year={2025},
  url={https://huggingface.co/AbstractPhil/tiny-flux}
}

Acknowledgments

Black Forest Labs for the original Flux architecture
Hugging Face for diffusers and transformers libraries

License

MIT License - See LICENSE file for details.

Note: This is an experimental research model. For high-quality image generation, use the full FLUX.1-schnell or FLUX.1-dev models.

Downloads last month: 101

Model tree for AbstractPhil/tiny-flux

Base model

black-forest-labs/FLUX.1-schnell

Finetuned

(60)

this model

Finetunes

1 model

AbstractPhil
/

tiny-flux