byteshape
/

Qwen3-30B-A3B-Instruct-2507-GGUF

@@ -13,9 +13,9 @@ tags:
 # Qwen3-30B-A3B-Instruct-2507 GGUF (ShapeLearn Quantized)
-This is a GGUF-quantized version of Qwen3-30B-A3B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bit lengths (the exclusive focus of this release).
-To learn more about ShapeLearn and to see detailed benchmarks across GPUs, CPUs, and even the Raspberry Pi, please visit our [blog](https://byteshape.com/blogs/Qwen3-30B-A3B-I-2507/).
 If you have questions or want to share feedback, reach us on [Reddit](https://www.reddit.com/r/ByteShape/).
@@ -25,7 +25,7 @@ If you have questions or want to share feedback, reach us on [Reddit](https://ww
 We provide **CPU and GPU optimized variants** for llama.cpp:
 - **CPUs:** Models labeled as KQ, optimized for CPU inference with predominantly KQ quantization.
-- **Nvidia GPUs:** Models labeled as IQ, optimized for GPU inference with a hybrid approach combining KQ and IQ quantization for better throughput.
 Each hardware target includes a range of models covering different size and quality tradeoffs.
@@ -37,7 +37,7 @@ The charts below show **quality vs tokens per second** for each device, comparin
 ![CPU Benchmark - Intel](img/Intel.png)
-**Table sorted by inference speed** (match the chart numbers to model IDs):
 | Model ID | Bits/Weight | Model Size | Normalized Quality |
 |---------|-------------|-----------|--------------------|
@@ -51,12 +51,11 @@ The charts below show **quality vs tokens per second** for each device, comparin
 | [KQ-8](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.41bpw.gguf) | 4.41 | 16.9 GB | 99.34% |
 | [KQ-9](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.67bpw.gguf) | 4.67 | 17.8 GB | 99.75% |
 ### GPU Models
 ![GPU Benchmark - RTX 5090](img/RTX5090.png)
-**Table sorted by inference speed** (match the chart numbers to model IDs):
 | Model ID | Bits/Weight | Model Size | Normalized Score |
 |---------|-------------|-----------|------------------|

 # Qwen3-30B-A3B-Instruct-2507 GGUF (ShapeLearn Quantized)
+This is a GGUF-quantized version of Qwen3-30B-A3B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bitlengths.
+To learn more about ShapeLearn and to see detailed benchmarks across GPUs, CPUs, and even the Raspberry Pi, please visit our [blog](https://byteshape.com/blogs/Qwen3-30B-A3B-Instruct-2507/).
 If you have questions or want to share feedback, reach us on [Reddit](https://www.reddit.com/r/ByteShape/).
 We provide **CPU and GPU optimized variants** for llama.cpp:
 - **CPUs:** Models labeled as KQ, optimized for CPU inference with predominantly KQ quantization.
+- **GPUs:** Models labeled as IQ, optimized for GPU inference with a hybrid approach combining KQ and IQ quantization for better throughput.
 Each hardware target includes a range of models covering different size and quality tradeoffs.
 ![CPU Benchmark - Intel](img/Intel.png)
+**Table sorted by model size** (match the chart numbers to model IDs):
 | Model ID | Bits/Weight | Model Size | Normalized Quality |
 |---------|-------------|-----------|--------------------|
 | [KQ-8](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.41bpw.gguf) | 4.41 | 16.9 GB | 99.34% |
 | [KQ-9](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.67bpw.gguf) | 4.67 | 17.8 GB | 99.75% |
 ### GPU Models
 ![GPU Benchmark - RTX 5090](img/RTX5090.png)
+**Table sorted by model size** (match the chart numbers to model IDs):
 | Model ID | Bits/Weight | Model Size | Normalized Score |
 |---------|-------------|-----------|------------------|