Krea 2 Turbo β€” Hand-Edited Weight Experiments

Comparison Grid

Overview

This repository contains weight-edited variants of the Krea 2 Turbo diffusion model. Each variant was created by surgically scaling specific transformer block weights in the 12.8B parameter single-stream MMDiT, producing artistic and functional model variations without any retraining.

These are research artifacts from hand-editing diffusion model weights using the methodology described below. The base models (Krea 2 Turbo and Krea 2 Raw) are NOT included β€” only the edited variants.

Method

All variants use the core formula:

theta_new = theta_original * (1 - 2 * alpha)

Where alpha controls the inversion strength:

  • alpha=0.05 β†’ scale 0.90 (subtle)
  • alpha=0.10 β†’ scale 0.80 (artistic sweet spot)
  • alpha=0.15 β†’ scale 0.70 (strong)
  • alpha=0.20 β†’ scale 0.60 (aggressive but functional)

Full negation (alpha=0.5, scale=-1.0) breaks the model and is excluded from this repository.

Architecture: Krea 2 Turbo

  • Type: Single-stream MMDiT (Diffusion Transformer)
  • Parameters: 12.8B
  • File size: ~25GB per variant (BF16 + F32 tensors)
  • Structure: 28 uniform transformer blocks
  • Block sub-layers:
    • blocks.N.attn.* (7 tensors): gate, qknorm, wq, wk, wv, wo
    • blocks.N.mlp.* (3 tensors): gate, up, down (SwiGLU)
    • blocks.N.mod.lin (1 tensor): conditioning modulation
    • blocks.N.prenorm.scale / blocks.N.postnorm.scale

Variants

B1 β€” Partial Inversion (Most Artistic)

Property Value
File Krea_2_turbo_inv_B1_partial10.safetensors
Blocks 12-14 (mid)
Layers ALL (39 tensors per block group)
Alpha 0.10 (scale=0.80)
Result Most artistic variant β€” strong style/content shift while remaining coherent

B3 β€” Attention-Only Partial Inversion

Property Value
File Krea_2_turbo_inv_B3_attn_p10.safetensors
Blocks 12-14 (mid)
Layers attn only (21 tensors)
Alpha 0.10 (scale=0.80)
Result Functional, subtler than B1 β€” attention-specific perturbation

D β€” Gate Scaling (All Blocks)

Property Value
File Krea_2_turbo_inv_D_gate_p20.safetensors
Blocks 0-27 (all)
Layers attn.gate only (28 tensors)
Alpha 0.20 (scale=0.60)
Result Functional, moderate effect β€” gate weights are more tolerant of aggressive scaling

F β€” Early/Late Block Inversion

Property Value
File Krea_2_turbo_F_early_a10.safetensors
Blocks 0-2 (early)
Layers ALL
Alpha 0.10 (scale=0.80)
Result Affects structure, composition, spatial layout
Property Value
File Krea_2_turbo_F_late_a10.safetensors
Blocks 25-27 (late)
Layers ALL
Alpha 0.10 (scale=0.80)
Result Affects style, color, detail, texture refinement

G β€” Mid-Block Alpha Sweep

Three variants at different inversion strengths on the same block zone:

File Alpha Scale Notes
Krea_2_turbo_G_mid_a05.safetensors 0.05 0.90 Subtle
Krea_2_turbo_G_mid_a15.safetensors 0.15 0.70 Strong
Krea_2_turbo_G_mid_a20.safetensors 0.20 0.60 Aggressive but functional

All target blocks 12-14, ALL layers.

H β€” Layer-Selective Mid-Block

File Blocks Layers Alpha
Krea_2_turbo_H_mid_attn_a10.safetensors 12-14 attn only 0.10
Krea_2_turbo_H_mid_mlp_a10.safetensors 12-14 mlp only 0.10

Isolates the effect of attention vs MLP perturbation on the same block zone.

I β€” Gradient Alpha

Property Value
File Krea_2_turbo_I_gradient.safetensors
Blocks 0-27 (all)
Layers ALL
Alpha 0.03 β†’ 0.17 (gradient across blocks)
Scale 0.94 β†’ 0.66
Result Smooth global perturbation β€” early blocks barely touched, late blocks aggressively inverted

Excluded Variants (Broken)

The following variants were created but are broken (model produces noise/garbage) and are NOT included:

Variant What was done Why it broke
B2_attn_full attn weights * -1.0 Full negation destroys attention computation
D_wv_all wv weights * -1.0 Full negation of value projection
E_ties_mid TIES-style sign flip on mid blocks Full negation variant

Usage

ComfyUI

  1. Place .safetensors files in ComfyUI/models/diffusion_models/
  2. Load via UNETLoader node
  3. Use the same VAE, CLIP, and text encoder as Krea 2 Turbo
  4. Generate with your standard Krea 2 workflow

Diffusers

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "dataautogpt3/Krea2-weights-experiments",
    torch_dtype=torch.bfloat16,
    variant="bf16"
).to("cuda")

Note: These are diffusion model weights only. You need the corresponding VAE, text encoders, and tokenizer from the original Krea 2 Turbo release.

Key Findings

  1. Scaling works, full negation breaks. Partial inversion (scale 0.60-0.90) produces functional, artistic variants. Full negation (scale=-1.0) breaks the model.

  2. 10% inversion is the sweet spot. Alpha=0.10 (scale=0.80) on mid blocks 12-14 produces the most artistically interesting results.

  3. Mid blocks are safest to modify. Blocks 12-14 are the most redundant and tolerate perturbation best.

  4. Gate weights are most tolerant. Attention gate weights can be scaled to 0.60 across all blocks while remaining functional β€” other layers break sooner.

  5. The artistic effects come from compensation. Partial perturbation triggers creative reorganization in unedited blocks β€” the compensatory masquerade effect.

Research Context

This work draws on findings from:

  • Task Arithmetic (Ilharco et al., ICLR 2023) β€” formal basis for weight negation
  • weights2weights (NeurIPS 2024) β€” diffusion weight space as meta-latent
  • Unraveling MMDiT Blocks (2025) β€” per-block role mapping for MMDiT
  • C3: Creative Concept Catalyst (CVPR 2025) β€” low-frequency amplification in shallow blocks
  • ConceptPrune (ICLR 2025) β€” tiny weight changes shift semantic output

Credits

  • Base model: Krea 2 Turbo (Krea AI)
  • Weight editing: DataPlusEngine
  • Methodology: Hand-editing diffusion weights via mmap-based surgical tensor scaling
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support