How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("BiliSakura/DiCo-diffusers", dtype=torch.bfloat16, device_map="cuda")

prompt = "golden retriever"
image = pipe(prompt).images[0]

BiliSakura/DiCo-diffusers

Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline.py, component modules, and weights.

Converted from shallowdream204/DiCo using DiCo-diffusers.

Available checkpoints

Subfolder Pipeline Resolution Source checkpoint CFG FID IS Params
DiCo-S-256/ DiCoPipeline 256Γ—256 DiCo-S-400K-256x256.pt 1.0 49.97 31.38 33M
DiCo-B-256/ DiCoPipeline 256Γ—256 DiCo-B-400K-256x256.pt 1.0 27.20 56.52 130M
DiCo-L-256/ DiCoPipeline 256Γ—256 DiCo-L-400K-256x256.pt 1.0 13.66 91.37 464M
DiCo-XL-256/ DiCoPipeline 256Γ—256 DiCo-XL-3750K-256x256.pt 1.4 2.05 282.17 701M

DiCo denoises VAE latents (4 channels, 32Γ—32 for 256Γ—256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: stabilityai/sd-vae-ft-ema. Scheduler: DDIMScheduler (1000 train steps, linear betas).

Repo layout

BiliSakura/DiCo-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ demo_inference.py
β”œβ”€β”€ DiCo-S-256/
β”œβ”€β”€ DiCo-B-256/
β”œβ”€β”€ DiCo-L-256/
└── DiCo-XL-256/
    β”œβ”€β”€ pipeline.py
    β”œβ”€β”€ model_index.json
    β”œβ”€β”€ demo.png
    β”œβ”€β”€ scheduler/scheduler_config.json
    β”œβ”€β”€ transformer/
    └── vae/

Each variant is self-contained. The scheduler/ folder uses built-in DDIMScheduler from PyPI diffusers.

ImageNet class labels

id2label is embedded in each variant's model_index.json (DiT-style).

  • pipe.id2label β€” inspect id β†’ English label correspondence
  • pipe.labels β€” reverse map (English synonym β†’ id)
  • pipe.get_label_ids("golden retriever")
  • pipe(class_labels="golden retriever", ...) β€” string labels resolved automatically

Demo

DiCo-XL-256 demo

Class 207 β€” golden retriever, 256Γ—256, 250 steps, guidance_scale=1.4.

python demo_inference.py
python demo_inference.py --variant s   # DiCo-S-256, CFG 1.0

Load from a local clone

ImageNet 256Γ—256 (DiCo-XL-256)

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./DiCo-XL-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=250,
    guidance_scale=1.4,
    generator=generator,
).images[0]
image.save("demo.png")

Recommended inference settings

Variant Steps CFG scale
DiCo-S-256 250 1.0
DiCo-B-256 250 1.0
DiCo-L-256 250 1.0
DiCo-XL-256 250 1.4

Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention).

Conversion

cd libs/DiCo-diffusers

python scripts/convert_dico_to_diffusers.py \
  --checkpoint /path/to/DiCo-XL-3750K-256x256.pt \
  --output /path/to/DiCo-XL-256 \
  --model-type DiCo-XL \
  --weights ema \
  --safe-serialization \
  --id2label ../../src/labels/id2label_en.json

Citation

@inproceedings{ai2025dico,
    title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
    author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={https://openreview.net/forum?id=UnslcaZSnb}
}

License

Weights are converted from checkpoints released under the Apache 2.0 license.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including BiliSakura/DiCo-diffusers