GLM-4.7-Flash-18B-A3B is a pruned version of GLM-4.7-Flash. Focus on: math, physics, control engineering and academic writing.

Quantization details

Embedding, Attention, Dense FFN Layer, Shared Experts and Output tensors are guarnateed to use full precision (BF16) for all quantized variants.

GGUF

Model size

20B params

Architecture

deepseek2

Hardware compatibility

2-bit

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lovedheart/GLM-4.7-Flash-18B-A3B-GGUF

Base model

Quantized

(82)

this model