Ali93H commited on
Commit
32802d9
·
verified ·
1 Parent(s): adaec81

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -13,9 +13,9 @@ tags:
13
 
14
  # Qwen3-30B-A3B-Instruct-2507 GGUF (ShapeLearn Quantized)
15
 
16
- This is a GGUF-quantized version of Qwen3-30B-A3B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bit lengths (the exclusive focus of this release).
17
 
18
- To learn more about ShapeLearn and to see detailed benchmarks across GPUs, CPUs, and even the Raspberry Pi, please visit our [blog](https://byteshape.com/blogs/Qwen3-30B-A3B-I-2507/).
19
 
20
  If you have questions or want to share feedback, reach us on [Reddit](https://www.reddit.com/r/ByteShape/).
21
 
@@ -25,7 +25,7 @@ If you have questions or want to share feedback, reach us on [Reddit](https://ww
25
  We provide **CPU and GPU optimized variants** for llama.cpp:
26
 
27
  - **CPUs:** Models labeled as KQ, optimized for CPU inference with predominantly KQ quantization.
28
- - **Nvidia GPUs:** Models labeled as IQ, optimized for GPU inference with a hybrid approach combining KQ and IQ quantization for better throughput.
29
 
30
  Each hardware target includes a range of models covering different size and quality tradeoffs.
31
 
@@ -37,7 +37,7 @@ The charts below show **quality vs tokens per second** for each device, comparin
37
 
38
  ![CPU Benchmark - Intel](img/Intel.png)
39
 
40
- **Table sorted by inference speed** (match the chart numbers to model IDs):
41
 
42
  | Model ID | Bits/Weight | Model Size | Normalized Quality |
43
  |---------|-------------|-----------|--------------------|
@@ -51,12 +51,11 @@ The charts below show **quality vs tokens per second** for each device, comparin
51
  | [KQ-8](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.41bpw.gguf) | 4.41 | 16.9 GB | 99.34% |
52
  | [KQ-9](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.67bpw.gguf) | 4.67 | 17.8 GB | 99.75% |
53
 
54
-
55
  ### GPU Models
56
 
57
  ![GPU Benchmark - RTX 5090](img/RTX5090.png)
58
 
59
- **Table sorted by inference speed** (match the chart numbers to model IDs):
60
 
61
  | Model ID | Bits/Weight | Model Size | Normalized Score |
62
  |---------|-------------|-----------|------------------|
 
13
 
14
  # Qwen3-30B-A3B-Instruct-2507 GGUF (ShapeLearn Quantized)
15
 
16
+ This is a GGUF-quantized version of Qwen3-30B-A3B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bitlengths.
17
 
18
+ To learn more about ShapeLearn and to see detailed benchmarks across GPUs, CPUs, and even the Raspberry Pi, please visit our [blog](https://byteshape.com/blogs/Qwen3-30B-A3B-Instruct-2507/).
19
 
20
  If you have questions or want to share feedback, reach us on [Reddit](https://www.reddit.com/r/ByteShape/).
21
 
 
25
  We provide **CPU and GPU optimized variants** for llama.cpp:
26
 
27
  - **CPUs:** Models labeled as KQ, optimized for CPU inference with predominantly KQ quantization.
28
+ - **GPUs:** Models labeled as IQ, optimized for GPU inference with a hybrid approach combining KQ and IQ quantization for better throughput.
29
 
30
  Each hardware target includes a range of models covering different size and quality tradeoffs.
31
 
 
37
 
38
  ![CPU Benchmark - Intel](img/Intel.png)
39
 
40
+ **Table sorted by model size** (match the chart numbers to model IDs):
41
 
42
  | Model ID | Bits/Weight | Model Size | Normalized Quality |
43
  |---------|-------------|-----------|--------------------|
 
51
  | [KQ-8](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.41bpw.gguf) | 4.41 | 16.9 GB | 99.34% |
52
  | [KQ-9](https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen3-30B-A3B-Instruct-2507-IQ4_XS-4.67bpw.gguf) | 4.67 | 17.8 GB | 99.75% |
53
 
 
54
  ### GPU Models
55
 
56
  ![GPU Benchmark - RTX 5090](img/RTX5090.png)
57
 
58
+ **Table sorted by model size** (match the chart numbers to model IDs):
59
 
60
  | Model ID | Bits/Weight | Model Size | Normalized Score |
61
  |---------|-------------|-----------|------------------|