GLM-4.7-Flash-GGUF
Description
This repository contains GGUF format model files for Zhipu AI's GLM-4.7-Flash.
GLM-4.7-Flash is a highly efficient 30B-A3B Mixture-of-Experts (MoE) model. It is designed to be the strongest model in the 30B parameter class, offering a powerful option for lightweight deployment that perfectly balances performance and efficiency.
Evaluation Results
| Benchmark | GLM-4.7-Flash | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B |
|---|---|---|---|
| AIME 25 | 91.6 | 85.0 | 91.7 |
| GPQA | 75.2 | 73.4 | 71.5 |
| LCB v6 | 64.0 | 66.0 | 61.0 |
| HLE | 14.4 | 9.8 | 10.9 |
| SWE-bench Verified | 59.2 | 22.0 | 34.0 |
| ฯยฒ-Bench | 79.5 | 49.0 | 47.7 |
| BrowseComp | 42.8 | 2.29 | 28.3 |
Files & Quantization
To see the available files, please verify the Files and versions tab.
How to Run (llama.cpp)
Recommended Parameters:
- Temperature:
1.0(Standard) or0.7(For stricter adherence) - Top-P:
0.95 - Context:
-c(Adjust based on available RAM).
CLI Example
./llama-cli -m GLM-4.7-Flash.Q4_K_M.gguf \
-c 8192 \
--temp 1.0 \
--top-p 0.95 \
-p "User: Write a Python script to calculate Fibonacci numbers.\nAssistant:" \
-cnv
Server Example
./llama-server -m GLM-4.7-Flash.Q4_K_M.gguf \
--port 8080 \
--host 0.0.0.0 \
-c 16384 \
-ngl 99
- Downloads last month
- 1,398
Hardware compatibility
Log In
to view the estimation
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Model tree for AaryanK/GLM-4.7-Flash-GGUF
Base model
zai-org/GLM-4.7-Flash