inference-optimization/Qwen3-0.6B-debug-add
inference-optimization/Qwen3-0.6B-debug-multiply
inference-optimization/DeepSeek-V3-debug-empty
inference-optimization/granite-4.0-h-tiny-FP8-block
Text Generation
• 7B • Updated • 5
inference-optimization/granite-4.0-h-tiny-quantized.w8a8
inference-optimization/granite-4.0-h-tiny-NVFP4
Updated
inference-optimization/granite-4.0-h-tiny-quantized.w4a16
inference-optimization/granite-4.0-h-small-quantized.w8a8
Updated
inference-optimization/granite-4.0-h-small-NVFP4
Updated
inference-optimization/granite-4.0-h-small-quantized.w4a16
Updated
inference-optimization/granite-4.0-h-small-FP8-dynamic
Updated
inference-optimization/granite-4.0-h-small-FP8-block
Updated
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation
• 32B • Updated • 7
inference-optimization/Qwen3-Next-80B-A3B-Thinking-FP8
Text Generation
• 81B • Updated • 9
inference-optimization/Qwen3-Next-80B-A3B-Instruct-FP8
Text Generation
• 81B • Updated • 8
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
inference-optimization/Qwen3-Next-80B-A3B-Thinking-quantized.w8a8
Updated
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-block
Updated
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
8B • Updated • 2
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head
8B • Updated • 2
inference-optimization/Qwen3-Next-80B-A3B-Instruct-quantized.w8a8
Updated
inference-optimization/Llama-3.1-8B-Instruct-HIGGS-quantized-paths
Updated
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-gate_up_proj-all
7B • Updated • 2
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-down_proj-all
6B • Updated • 2
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-qkv_proj-all
5B • Updated • 2
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-out_proj-all
5B • Updated • 2
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-gate_up_proj-all
7B • Updated • 5
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-down_proj-all
6B • Updated • 1