BiliSakura/DiCo-diffusers

Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline.py, component modules, and weights.

Converted from shallowdream204/DiCo using DiCo-diffusers.

Available checkpoints

Subfolder	Pipeline	Resolution	Source checkpoint	CFG	FID	IS	Params
`DiCo-S-256/`	`DiCoPipeline`	256×256	`DiCo-S-400K-256x256.pt`	1.0	49.97	31.38	33M
`DiCo-B-256/`	`DiCoPipeline`	256×256	`DiCo-B-400K-256x256.pt`	1.0	27.20	56.52	130M
`DiCo-L-256/`	`DiCoPipeline`	256×256	`DiCo-L-400K-256x256.pt`	1.0	13.66	91.37	464M
`DiCo-XL-256/`	`DiCoPipeline`	256×256	`DiCo-XL-3750K-256x256.pt`	1.4	2.05	282.17	701M

DiCo denoises VAE latents (4 channels, 32×32 for 256×256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: stabilityai/sd-vae-ft-ema. Scheduler: DDIMScheduler (1000 train steps, linear betas).

Repo layout

BiliSakura/DiCo-diffusers/
├── README.md
├── demo_inference.py
├── DiCo-S-256/
├── DiCo-B-256/
├── DiCo-L-256/
└── DiCo-XL-256/
    ├── pipeline.py
    ├── model_index.json
    ├── demo.png
    ├── scheduler/scheduler_config.json
    ├── transformer/
    └── vae/

Each variant is self-contained. The scheduler/ folder uses built-in DDIMScheduler from PyPI diffusers.

ImageNet class labels

id2label is embedded in each variant's model_index.json (DiT-style).

pipe.id2label — inspect id → English label correspondence
pipe.labels — reverse map (English synonym → id)
pipe.get_label_ids("golden retriever")
pipe(class_labels="golden retriever", ...) — string labels resolved automatically

Demo

Class 207 — golden retriever, 256×256, 250 steps, guidance_scale=1.4.

python demo_inference.py
python demo_inference.py --variant s   # DiCo-S-256, CFG 1.0

Load from a local clone

ImageNet 256×256 (`DiCo-XL-256`)

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./DiCo-XL-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=250,
    guidance_scale=1.4,
    generator=generator,
).images[0]
image.save("demo.png")

Recommended inference settings

Variant	Steps	CFG scale
`DiCo-S-256`	250	1.0
`DiCo-B-256`	250	1.0
`DiCo-L-256`	250	1.0
`DiCo-XL-256`	250	1.4

Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention).

Conversion

cd libs/DiCo-diffusers

python scripts/convert_dico_to_diffusers.py \
  --checkpoint /path/to/DiCo-XL-3750K-256x256.pt \
  --output /path/to/DiCo-XL-256 \
  --model-type DiCo-XL \
  --weights ema \
  --safe-serialization \
  --id2label ../../src/labels/id2label_en.json

Citation

@inproceedings{ai2025dico,
    title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
    author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={https://openreview.net/forum?id=UnslcaZSnb}
}

License

Weights are converted from checkpoints released under the Apache 2.0 license.

Downloads last month: -

Inference Providers NEW

Unconditional Image Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BiliSakura/DiCo-diffusers

Visual Generation Models

Collection

18 items • Updated 10 days ago • 1