Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 2 days ago

Post

5565

DeepSeek-V4 can now run locally with Unsloth GGUFs! 🐳

Run lossless DeepSeek-V4-Flash on 168GB RAM or
3-bit works on 110GB Mac, RAM, VRAM setups.

Run via Unsloth Studio or llama.cpp.

GGUF: unsloth/DeepSeek-V4-Flash-GGUF
Guide: https://unsloth.ai/docs/models/deepseek-v4

sergiopaniego

posted an update 1 day ago

Post

5660

Frontier models use distillation as a step of their post-training pipelines.

In 2026 it has three jobs: compress a big model into a small one, merge RL experts into a single model, and let a model teach itself.

I wrote up which frontier models use each one and how: https://huggingface.co/blog/sergiopaniego/distillation-2026

It pairs with Class 2 of the Training an Agent series Ben and I are doing, where we teach these techniques hands-on with TRL!

3 replies

loleg

posted an update 3 days ago

Post

1342

Thank you Hugging Face team for some very helpful and quick support today. Greetings from the AI for Good summit in Geneva!

1 reply

PhysiQuanty

posted an update 3 days ago

Post

4503

🧠 Arithmetic-SLM : A 30M model that manages to compute simple arithmetic better than a 3B model 🚀
WhirlwindAI/Arithmetic-SLM
WhirlwindAI/arithmetic-slm

🏆 Leaderboard ArithMark-2 🏆
🥇 Qwen/Qwen2.5-Math-1.5B = 82.08%
🥈 WhirlwindAI/Arithmetic-SLM = 78.60% (31.7M Params)
🥉 Qwen/Qwen2.5-3B = 78.44%

Example WhirlwindAI/Arithmetic-SLM =
0.5 * 0.5 = 0.25 ✅
105 + 45 / 8 = 110 ✅
(132 / 12) + (46 - 15) = 42 ✅
(10 + 28) * 3 = 114 ✅
1 * (16 + 28) = 44 ✅
(21 + 27) * (14 - 7) = 336 ❌

leaderboard = """
|              Model               |    Params    |   Score   |
|----------------------------------|--------------|-----------|
|      Qwen/Qwen2.5-Math-1.5B      |     1.54B    |   82.08%  |
|    WhirlwindAI/Arithmetic-SLM    |    31.70M    |   78.60%  | <=
|         Qwen/Qwen2.5-3B          |     3.09B    |   78.44%  |
|        Qwen/Qwen2.5-1.5B         |     1.54B    |   77.72%  |
|    Qwen/Qwen2.5-Coder-1.5B       |     1.54B    |   74.88%  |
|   HuggingFaceTB/SmolLM2-1.7B     |     1.71B    |   66.12%  |
|        Qwen/Qwen2.5-0.5B         |      494M    |   63.04%  |
| facebook/MobileLLM-R1-140M-base  |      140M    |   53.88%  |
|     SupraLabs/Supra-50M-Base     |       52M    |   27.12%  |
"""

Bench =
AxiomicLabs/ArithMark-2.0
DataSet =
WhirlwindAI/Arithmetic
By Science AND FOR SCIENCE <3

3 replies

PhysiQuanty

posted an update about 3 hours ago

Post

290

𓆦 Cicada 3301 𓆦 has been left abandoned for far too long.
🌐 We are one of the communities with the highest technical level in theoretical computer science.
🧬 We must resume the investigation and the attempts to decipher the “Liber Primus.”

Type-1-Civilisation/Liber-Primus-Not-Decoded
Type-1-Civilisation/Liber-Primus-Already-Decoded
Type-1-Civilisation/Liber-Primus-Cicada-3301
Type-1-Civilisation/Bad-Decoding-Detector

Anyone willing to join us is more than welcome.
By science, AND FOR SCIENCE!

Quazim0t0

posted an update about 10 hours ago

Post

1250

Disabling Gated Access for some of my models today. I will update this post with the list as I go. I had to go back recently and make updates to a lot of models and bit off more than I could chew with managing all the releases. I didn't realize many of you asked for access and I apologize for not accepting your access to the models you were wanting to look at. I don't like to release something fully unless I feel I put what I could into it for the moment. Some models will remain on gated access, but I will now be accepting those who request to view the repo.

Disabled Gated Access:
Quazim0t0/Byrne-VLM-131M - v2 Updates + Training Instructions
Quazim0t0/Byrne-Speech - 12M Tiny Speech model
Quazim0t0/Byrne-ASR-English - 12M Tiny ASR Model
Quazim0t0/Byrne-VE - Byrne-VE — Tiny Self-Distilled Vision Encoder (39M)

Accepting Gated Access Requests (7/9):
Quazim0t0/SpikeWhale-SNN-216M
Quazim0t0/Mycel-LM-79M
Quazim0t0/Chimera-64M
Quazim0t0/Positronic-144M
Quazim0t0/Wheeler-63M

Also uploaded my Neural Photonic Project:
Three trained nets in series: light interferes through the MZI2.pt optical core (verified 256/256), is measured by the PD.pt neural photodetector (verified 1024/1024), and folded into a single OUTPUT byte by the real ADC8 neural-CPU adder. Every value below is computed end-to-end by the three loaded, verified nets — no analytic formulas.
Demo: https://quazim0t0-neural-photonic-hybrid.hf.space/
Model Weights: Quazim0t0/neural-photonic

AND!

A work in progress:
Ashen Depths
https://quazim0t0-ashendepths.static.hf.space/index.html

satgeze

posted an update 3 days ago

Post

3529

First GGUF quants of Tencent's Hy3 (299B MoE), built before official llama.cpp support exists.

Hy3 dropped ~30 hours ago with only MLX and MXFP4 quants, both datacenter-sized. So I converted it myself using a community llama.cpp fork that implements the hy_v3 architecture.

What's in the repo:

- IQ1_M (62GB, fits a 128GB MacBook), IQ2_M (90GB), Q2_K (101GB), all with 1M context baked in via YaRN
- IQ quants are importance-matrix: bootstrap style. The static Q2_K ran RAM-resident to compute the imatrix, then IQ1_M and IQ2_M were requantized from the archived f16 with it
- Fixed chat template (the stock one uses .format() calls llama.cpp's Jinja rejects)
- Build instructions for the fork, including the two gotchas that cost me three build attempts

Honesty section, because that is how these repos work: this is EXPERIMENTAL. Not needle-certified yet (1M is baked but unverified, certification ladder will be published either way). MTP layer exists in the checkpoint but no llama.cpp build can run hy_v3 MTP inference yet, so it is not included. Real gate outputs are on the card, misses and all, judge for yourself.

satgeze/Hy3-1M-GGUF

Full quant ladder (Q3 through Q8) is mirroring to ModelScope for bigger hardware.

9 replies

davidmezzetti

posted an update 3 days ago

Post

985

CeleBERTy Small: Domain model for Pop Culture, Art, Music and Entertainment.

Build vector embeddings that perform better than larger models.

https://huggingface.co/blog/NeuML/celeberty-small

alexanderbering

posted an update 4 days ago

Post

4141

We just put our money where our manifesto is.

Our position has been that the layer of an AI system that touches your data should be open and auditable, not something you rent on trust. So instead of asking you to believe that, we made it runnable.

The ZenBrain Playground is a static HF Space that executes our open-source memory library, @zensation /algorithms (Apache-2.0, zero-dependency), live in your browser. No install, no backend, no mockup: it runs the actual published code. Four panels drive real functions from the library — Ebbinghaus-style retention curves, Hebbian strengthening/decay, sleep-consolidation replay + pruning, and Bayesian retrievability with confidence intervals. The library is vendored into the page, so what you run is the same code that's published on npm.

ZenBrain is a neuroscience-inspired 7-layer memory architecture for AI agents. The open core is the 20-module algorithm library (FSRS, Hebbian, Ebbinghaus, sleep-consolidation, Bayesian confidence, and more). On LongMemEval-500 it reaches 91.3% of a long-context oracle's accuracy at 1/106 of the per-query token budget, and takes the highest mean rank across all 12 system-judge cells (4 systems x 3 LLM judges) against Letta, Mem0, and A-Mem. It ships with 11,589 automated tests. Full details are in the arXiv preprint.

If you build agent memory, we'd genuinely like your eyes on it — open the Space, poke the algorithms, read the code, tell us where it breaks.

Live demo: zensation-ai/zenbrain-playground
Model: zensation-ai/zenbrain
Paper: arXiv:2604.23878

Quazim0t0

posted an update 6 days ago

Post

164

I built/building a small neural Raytracing/Projective-dynamics (PD)/Physics engine from scratch

The idea I wanted to test: in a projective-dynamics solver, could you replace each hand-derived local constraint projection with a learned one, while keeping the analytic parts (rotations, the global solve) exactly as they are? It's one tiny network, shared across every element and across constraint types through material tokens. A new material isn't a new network, just a new token row. Fluids fall out of the same idea, with water treated as one more token.

A few things held up in testing: one tied projector matched five separate per-material solutions, the neural fluid tracked the exact analytic solver closely on a dam-break sim, and a learned warm-start trimmed solver iterations without touching correctness.

Try it here:

(Projective-dynamics)
Dam Break Demo:
https://quazim0t0-neural-physics-engine-demo.hf.space/
FPS Shooter Demo:
https://quazim0t0-neural-combat-evolved.hf.space/

(Raytracing)
Voxel World:
https://quazim0t0-neural-world.static.hf.space/index.html

(Rigid Body)
https://quazim0t0-ashendepths.static.hf.space/

Model Repo:
Quazim0t0/neural-physics-engine
Quazim0t0/neural-raytracing
Quazim0t0/physgait-weights

Recently active users