GLM-4.7-Flash-GGUF

Description

This repository contains GGUF format model files for Zhipu AI's GLM-4.7-Flash.

GLM-4.7-Flash is a highly efficient 30B-A3B Mixture-of-Experts (MoE) model. It is designed to be the strongest model in the 30B parameter class, offering a powerful option for lightweight deployment that perfectly balances performance and efficiency.

Evaluation Results

Benchmark GLM-4.7-Flash Qwen3-30B-A3B-Thinking-2507 GPT-OSS-20B
AIME 25 91.6 85.0 91.7
GPQA 75.2 73.4 71.5
LCB v6 64.0 66.0 61.0
HLE 14.4 9.8 10.9
SWE-bench Verified 59.2 22.0 34.0
ฯ„ยฒ-Bench 79.5 49.0 47.7
BrowseComp 42.8 2.29 28.3

Files & Quantization

To see the available files, please verify the Files and versions tab.

How to Run (llama.cpp)

Recommended Parameters:

  • Temperature: 1.0 (Standard) or 0.7 (For stricter adherence)
  • Top-P: 0.95
  • Context: -c (Adjust based on available RAM).

CLI Example

./llama-cli -m GLM-4.7-Flash.Q4_K_M.gguf \
  -c 8192 \
  --temp 1.0 \
  --top-p 0.95 \
  -p "User: Write a Python script to calculate Fibonacci numbers.\nAssistant:" \
  -cnv

Server Example

./llama-server -m GLM-4.7-Flash.Q4_K_M.gguf \
  --port 8080 \
  --host 0.0.0.0 \
  -c 16384 \
  -ngl 99
Downloads last month
1,398
GGUF
Model size
30B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AaryanK/GLM-4.7-Flash-GGUF

Quantized
(34)
this model