Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -13,9 +13,9 @@ tags:
|
|
| 13 |
|
| 14 |
# Qwen3-30B-A3B-Instruct-2507 GGUF (ShapeLearn Quantized)
|
| 15 |
|
| 16 |
-
This is a GGUF-quantized version of Qwen3-30B-A3B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low
|
| 17 |
|
| 18 |
-
To learn more about ShapeLearn and to see detailed benchmarks across GPUs, CPUs, and even the Raspberry Pi, please visit our [blog](https://byteshape.com/blogs/Qwen3-30B-A3B-
|
| 19 |
|
| 20 |
If you have questions or want to share feedback, reach us on [Reddit](https://www.reddit.com/r/ByteShape/).
|
| 21 |
|
|
@@ -25,7 +25,7 @@ If you have questions or want to share feedback, reach us on [Reddit](https://ww
|
|
| 25 |
We provide **CPU and GPU optimized variants** for llama.cpp:
|
| 26 |
|
| 27 |
- **CPUs:** Models labeled as KQ, optimized for CPU inference with predominantly KQ quantization.
|
| 28 |
-
- **
|
| 29 |
|
| 30 |
Each hardware target includes a range of models covering different size and quality tradeoffs.
|
| 31 |
|
|
@@ -37,7 +37,7 @@ The charts below show **quality vs tokens per second** for each device, comparin
|
|
| 37 |
|
| 38 |

|
| 39 |
|
| 40 |
-
**Table sorted by
|
| 41 |
|
| 42 |
| Model ID | Bits/Weight | Model Size | Normalized Quality |
|
| 43 |
|---------|-------------|-----------|--------------------|
|
|
@@ -51,12 +51,11 @@ The charts below show **quality vs tokens per second** for each device, comparin
|
|
| 51 |
| [KQ-8](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.41bpw.gguf) | 4.41 | 16.9 GB | 99.34% |
|
| 52 |
| [KQ-9](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.67bpw.gguf) | 4.67 | 17.8 GB | 99.75% |
|
| 53 |
|
| 54 |
-
|
| 55 |
### GPU Models
|
| 56 |
|
| 57 |

|
| 58 |
|
| 59 |
-
**Table sorted by
|
| 60 |
|
| 61 |
| Model ID | Bits/Weight | Model Size | Normalized Score |
|
| 62 |
|---------|-------------|-----------|------------------|
|
|
|
|
| 13 |
|
| 14 |
# Qwen3-30B-A3B-Instruct-2507 GGUF (ShapeLearn Quantized)
|
| 15 |
|
| 16 |
+
This is a GGUF-quantized version of Qwen3-30B-A3B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bitlengths.
|
| 17 |
|
| 18 |
+
To learn more about ShapeLearn and to see detailed benchmarks across GPUs, CPUs, and even the Raspberry Pi, please visit our [blog](https://byteshape.com/blogs/Qwen3-30B-A3B-Instruct-2507/).
|
| 19 |
|
| 20 |
If you have questions or want to share feedback, reach us on [Reddit](https://www.reddit.com/r/ByteShape/).
|
| 21 |
|
|
|
|
| 25 |
We provide **CPU and GPU optimized variants** for llama.cpp:
|
| 26 |
|
| 27 |
- **CPUs:** Models labeled as KQ, optimized for CPU inference with predominantly KQ quantization.
|
| 28 |
+
- **GPUs:** Models labeled as IQ, optimized for GPU inference with a hybrid approach combining KQ and IQ quantization for better throughput.
|
| 29 |
|
| 30 |
Each hardware target includes a range of models covering different size and quality tradeoffs.
|
| 31 |
|
|
|
|
| 37 |
|
| 38 |

|
| 39 |
|
| 40 |
+
**Table sorted by model size** (match the chart numbers to model IDs):
|
| 41 |
|
| 42 |
| Model ID | Bits/Weight | Model Size | Normalized Quality |
|
| 43 |
|---------|-------------|-----------|--------------------|
|
|
|
|
| 51 |
| [KQ-8](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.41bpw.gguf) | 4.41 | 16.9 GB | 99.34% |
|
| 52 |
| [KQ-9](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.67bpw.gguf) | 4.67 | 17.8 GB | 99.75% |
|
| 53 |
|
|
|
|
| 54 |
### GPU Models
|
| 55 |
|
| 56 |

|
| 57 |
|
| 58 |
+
**Table sorted by model size** (match the chart numbers to model IDs):
|
| 59 |
|
| 60 |
| Model ID | Bits/Weight | Model Size | Normalized Score |
|
| 61 |
|---------|-------------|-----------|------------------|
|