Doug PRO
dougeeai
AI & ML interests
CUDA, Sovereign AI, OSS GenAI, LLM fine-tuning, VRAM optimization, RAG, SLM Agents
Recent Activity
posted an update about 8 hours ago
## Llama-cpp-python wheels for Windows - update
Pre-compiled wheels for `llama-cpp-python` on Windows. No Visual Studio, no CUDA Toolkit setup. `pip install` and run.
### New in this update
- **sm_120 (consumer/workstation Blackwell) support.** A single wheel now covers both sm_100 (datacenter) and sm_120 (RTX 5090 / 5080 / 5070 / 5060 / 5050, RTX PRO 6000 / 5000 / 4500 / 4000 / 2000 Blackwell).
- **llama-cpp-python 0.3.20** across all four architectures (Blackwell, Ada, Ampere, Turing). Brings Gemma 4 support via the updated llama.cpp core.
- **One wheel covers Python 3.10 through 3.13.** The 0.3.20 builds use `py3-none` tagging, no more per-interpreter builds.
- **Fixed three mislabeled 0.3.16 sm_86 wheels** that linked against the wrong CUDA cuBLAS. Properly-built replacement is available.
### Coverage
- **GPUs:** RTX 20 / 30 / 40 / 50 series, RTX PRO Blackwell workstation, B100 / B200 / B300 datacenter
- **CUDA:** 11.8 / 12.1 / 13.0
- **Python:** 3.10, 3.11, 3.12, 3.13
### Download
https://github.com/dougeeai/llama-cpp-python-wheels
Linux wheels still on the roadmap. File an issue if you need a specific configuration built.
Tags: #llama-cpp #gguf #windows #prebuilt #blackwell #rtx5090 #rtxpro6000 #rtxproblackwell #gemma4 liked a model 5 months ago
microsoft/phi-4-gguf upvoted an article 5 months ago
🌳 QAT: The Art of Growing a Bonsai ModelOrganizations
None yet