Compared Quality and Speed Difference (with CUDA 13 & Sage Attention) of BF16 vs GGUF Q8 vs FP8 Scaled vs NVFP4 for Z Image Turbo, FLUX Dev, FLUX SRPO, FLUX Kontext, FLUX 2 - Full 4K step by step tutorial also published
Check above full 4K tutorial to learn more and see uncompressed original quality and size images
It was always wondered how much quality and speed difference exists between BF16, GGUF, FP8 Scaled and NVFP4 precisions. In this tutorial I have compared all these precision and quantization variants for both speed and quality. The results are pretty surprising. Moreover, we have developed and published NVFP4 model quant generator app and FP8 Scaled quant generator apps. The links of the apps are below if you want to use them. Furthermore, upgrading ComfyUI to CUDA 13 with properly compiled libraries is now very much recommended. We have observed some noticeable performance gains with CUDA 13. So for both SwarmUI and ComfyUI solo users, CUDA 13 ComfyUI is now recommended.
Compared Quality and Speed Difference (with CUDA 13 & Sage Attention) of BF16 vs GGUF Q8 vs FP8 Scaled vs NVFP4 for Z Image Turbo, FLUX Dev, FLUX SRPO, FLUX Kontext, FLUX 2 - Full 4K step by step tutorial also published
Check above full 4K tutorial to learn more and see uncompressed original quality and size images
It was always wondered how much quality and speed difference exists between BF16, GGUF, FP8 Scaled and NVFP4 precisions. In this tutorial I have compared all these precision and quantization variants for both speed and quality. The results are pretty surprising. Moreover, we have developed and published NVFP4 model quant generator app and FP8 Scaled quant generator apps. The links of the apps are below if you want to use them. Furthermore, upgrading ComfyUI to CUDA 13 with properly compiled libraries is now very much recommended. We have observed some noticeable performance gains with CUDA 13. So for both SwarmUI and ComfyUI solo users, CUDA 13 ComfyUI is now recommended.
Finally NVFP4 models has arrived to ComfyUI thus SwarmUI with CUDA 13. NVFP4 models are literally 100%+ faster with minimal impact on quality. I have done grid quality comparison to show you the difference on FLUX 2, Z Image Turbo and FLUX 1 of NVFP4 versions. To make CUDA 13 work, I have compiled Flash Attention, Sage Attention & xFormers for both Windows and Linux with all of the CUDA archs to support literally all GPUs starting from GTX 1650 series, RTX 2000, 3000, 4000, 5000 series and more.
In this full tutorial, I will show you how to upgrade your ComfyUI and thus SwarmUI to use latest CUDA 13 with latest libraries and Torch 2.9.1. Moreover, our compiled libraries such as Sage Attention works with all models on all GPUs without generating black images or videos such as Qwen Image or Wan 2.2 models. Hopefully LTX 2 presets and tutorial coming soon too. Finally, I introduce a new private cloud GPU platform called as SimplePod like RunPod. This platform has all the features of RunPod same way but much faster and cheaper.