NiT-diffusers

Native diffusers implementation of NiT (Native-resolution Image Transformer). Each variant folder is self-contained:

pipeline.py — NiTPipeline
scheduler/scheduler_config.json — FlowMatchEulerDiscreteScheduler config (class ships with Diffusers)
transformer/nit_transformer_2d.py — NiTTransformer2DModel
vae/ — AutoencoderDC weights + config

No separate NiT-diffusers package at inference time; only PyPI diffusers plus local custom code in the variant directory.

Available checkpoints

Checkpoint	Path	Resolution	Recommended settings
NiT-XL	`./NiT-XL`	512×512	250 steps, CFG 2.05, interval (0.0, 0.7)

ImageNet class labels

Each variant keeps an English id2label map directly in its own model_index.json (DiT-style).

pipe.id2label — inspect id → English label correspondence
pipe.labels — reverse map (English synonym → id), sorted for browsing
pipe.get_label_ids("golden retriever")
pipe(class_labels="golden retriever", ...) — string labels resolved automatically

Inference

Run the bundled demo script from the repo root:

python demo_inference.py

This writes demo.png using NiT-XL with the settings below.

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./NiT-XL").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    class_labels="golden retriever",
    height=512,
    width=512,
    num_inference_steps=250,
    guidance_scale=2.05,
    guidance_interval=(0.0, 0.7),
    generator=generator,
).images[0]
image.save("demo.png")

Load a variant subfolder (e.g. ./NiT-XL), not the repo root.

Hub usage follows Hugging Face model-id style (UserID/RepoID):

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/NiT-diffusers",
    subfolder="NiT-XL",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

Citation

@article{wang2025native,
  title={Native-Resolution Image Synthesis},
  author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
  year={2025},
  eprint={2506.03131},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Downloads last month: 67

Inference Providers NEW

Unconditional Image Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BiliSakura/NiT-diffusers

Visual Generation Models

Collection

6 items • Updated 1 day ago • 1

Paper for BiliSakura/NiT-diffusers

Native-Resolution Image Synthesis

Paper • 2506.03131 • Published Jun 3, 2025 • 18