GLM-4.7-Flash-18B-A3B is a pruned version of GLM-4.7-Flash. Focus on: math, physics, control engineering and academic writing.

Quantization details

Embedding, Attention, Dense FFN Layer, Shared Experts and Output tensors are guarnateed to use full precision (BF16) for all quantized variants.

Downloads last month
101
GGUF
Model size
20B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lovedheart/GLM-4.7-Flash-18B-A3B-GGUF

Quantized
(82)
this model