gg-tt
company
AI & ML interests
None defined yet.
Recent Activity
View all activity
danielhanchen
posted an update 6 days ago
Post
5175
Qwen3.6-27B is out now! Run it locally on 18GB RAM. 💜
Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
GGUFs to run: unsloth/Qwen3.6-27B-GGUF
Guide + MLX: https://unsloth.ai/docs/models/qwen3.6
Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
GGUFs to run: unsloth/Qwen3.6-27B-GGUF
Guide + MLX: https://unsloth.ai/docs/models/qwen3.6
danielhanchen
posted an update 12 days ago
Post
2778
Qwen3.6-35B-A3B can now be run locally! 💜
The model is the strongest mid-sized LLM on nearly all benchmarks.
Run on 23GB RAM via Unsloth Dynamic GGUFs.
GGUFs to run: unsloth/Qwen3.6-35B-A3B-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6
The model is the strongest mid-sized LLM on nearly all benchmarks.
Run on 23GB RAM via Unsloth Dynamic GGUFs.
GGUFs to run: unsloth/Qwen3.6-35B-A3B-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6
Post
684
🌐 I've just published Sentence Transformers v5.4 to make the project fully multimodal for embeddings and reranking. The release also includes a modular CrossEncoder, and automatic Flash Attention 2 input flattening. Details:
You can now use SentenceTransformer and CrossEncoder with text, images, audio, and video, with the same familiar API. That means you can compute embeddings for an image and a text query using model.encode(), compare them with model.similarity(), and it just works. Models like Qwen3-VL-Embedding-2B and jinaai/jina-reranker-m0 are supported out of the box.
Beyond multimodal, I also fully modularized the CrossEncoder class. It's now a torch.nn.Sequential of composable modules, just like SentenceTransformer has been. This unlocked support for generative rerankers (CausalLM-based models like mxbai-rerank-v2 and the Qwen3 rerankers) via a new LogitScore module, which wasn't possible before without custom code.
Also, Flash Attention 2 now automatically skips padding for text-only inputs. If your batch has a mix of short and long texts, this gives you a nice speedup and lower VRAM usage for free.
I wrote a blog post walking through the multimodal features with practical examples. Check it out if you want to get started, or just point your Agent to the URL: https://huggingface.co/blog/multimodal-sentence-transformers
This release has set up the groundwork for more easily introducing late-interaction models (both text-only and multimodal) into Sentence Transformers in the next major release. I'm looking forward to it!
You can now use SentenceTransformer and CrossEncoder with text, images, audio, and video, with the same familiar API. That means you can compute embeddings for an image and a text query using model.encode(), compare them with model.similarity(), and it just works. Models like Qwen3-VL-Embedding-2B and jinaai/jina-reranker-m0 are supported out of the box.
Beyond multimodal, I also fully modularized the CrossEncoder class. It's now a torch.nn.Sequential of composable modules, just like SentenceTransformer has been. This unlocked support for generative rerankers (CausalLM-based models like mxbai-rerank-v2 and the Qwen3 rerankers) via a new LogitScore module, which wasn't possible before without custom code.
Also, Flash Attention 2 now automatically skips padding for text-only inputs. If your batch has a mix of short and long texts, this gives you a nice speedup and lower VRAM usage for free.
I wrote a blog post walking through the multimodal features with practical examples. Check it out if you want to get started, or just point your Agent to the URL: https://huggingface.co/blog/multimodal-sentence-transformers
This release has set up the groundwork for more easily introducing late-interaction models (both text-only and multimodal) into Sentence Transformers in the next major release. I'm looking forward to it!
danielhanchen
posted an update 21 days ago
Post
5411
You can now fine-tune Gemma 4 for free with our notebooks! 🔥
You just need 8GB VRAM to train Gemma 4 locally!
Unsloth trains Gemma4 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide + Notebooks: https://unsloth.ai/docs/models/gemma-4/train
You just need 8GB VRAM to train Gemma 4 locally!
Unsloth trains Gemma4 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide + Notebooks: https://unsloth.ai/docs/models/gemma-4/train
danielhanchen
posted an update 26 days ago
Post
3774
Google releases Gemma 4. ✨
Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B.
The multimodal reasoning models are under Apache 2.0.
Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.
GGUFs: https://huggingface.co/collections/unsloth/gemma-4
Guide: https://unsloth.ai/docs/models/gemma-4
Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B.
The multimodal reasoning models are under Apache 2.0.
Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.
GGUFs: https://huggingface.co/collections/unsloth/gemma-4
Guide: https://unsloth.ai/docs/models/gemma-4
danielhanchen
posted an update 28 days ago
Post
2741
A new way to use Unsloth.
Coming soon...
Coming soon...
danielhanchen
posted an update about 1 month ago
Post
923
You don’t need to set LLM parameters anymore! 🚀
llama.cpp uses only the context length + compute your local setup needs. Unsloth also auto-applies the correct model settings
Try in Unsloth Studio - now with precompiled llama.cpp binaries.
GitHub: https://github.com/unslothai/unsloth
llama.cpp uses only the context length + compute your local setup needs. Unsloth also auto-applies the correct model settings
Try in Unsloth Studio - now with precompiled llama.cpp binaries.
GitHub: https://github.com/unslothai/unsloth
danielhanchen
posted an update about 1 month ago
Post
3402
Introducing Unsloth Studio ✨
A new open-source web UI to train and run LLMs.
• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with 70% less VRAM
• Supports GGUF, vision, audio, embedding models
• Auto-create datasets from PDF, CSV, DOCX
• Self-healing tool calling and code execution
• Compare models side by side + export to GGUF
GitHub: https://github.com/unslothai/unsloth
Blog and Guide: https://unsloth.ai/docs/new/studio
Available now on Hugging Face, NVIDIA, Docker and Colab.
A new open-source web UI to train and run LLMs.
• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with 70% less VRAM
• Supports GGUF, vision, audio, embedding models
• Auto-create datasets from PDF, CSV, DOCX
• Self-healing tool calling and code execution
• Compare models side by side + export to GGUF
GitHub: https://github.com/unslothai/unsloth
Blog and Guide: https://unsloth.ai/docs/new/studio
Available now on Hugging Face, NVIDIA, Docker and Colab.
danielhanchen
posted an update about 2 months ago
Post
3928
We collaborated with NVIDIA to teach you about Reinforcement Learning and RL environments. 💚 Learn:
• Why RL environments matter + how to build them
• When RL is better than SFT
• GRPO and RL best practices
• How verifiable rewards and RLVR work
Blog: https://unsloth.ai/blog/rl-environments
• Why RL environments matter + how to build them
• When RL is better than SFT
• GRPO and RL best practices
• How verifiable rewards and RLVR work
Blog: https://unsloth.ai/blog/rl-environments
clefourrier
authored a
paper about 2 months ago
danielhanchen
posted an update 2 months ago
Post
3457
100,000+ models trained with Unsloth have now been open-sourced on 🤗Hugging Face! 🦥
Here are the most popular ones you can run local:
1. TeichAI - GLM-4.7-Flash distilled from Claude 4.5 Opus (high)
2. Zed - Qwen Coder 7B fine-tuned for stronger coding
3. DavidAU - Llama-3.3-8B distilled from Claude 4.5 Opus (high)
4. huihui - gpt-oss made “abliberated”
Links to models:
1. TeichAI: TeichAI/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill-GGUF
2. Zed: zed-industries/zeta
3. DavidAU: DavidAU/Llama3.3-8B-Instruct-Thinking-Claude-4.5-Opus-High-Reasoning
4. huihui: huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated
See all the 100K latest models fine-tuned with Unsloth here: https://huggingface.co/models?other=u
Here are the most popular ones you can run local:
1. TeichAI - GLM-4.7-Flash distilled from Claude 4.5 Opus (high)
2. Zed - Qwen Coder 7B fine-tuned for stronger coding
3. DavidAU - Llama-3.3-8B distilled from Claude 4.5 Opus (high)
4. huihui - gpt-oss made “abliberated”
Links to models:
1. TeichAI: TeichAI/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill-GGUF
2. Zed: zed-industries/zeta
3. DavidAU: DavidAU/Llama3.3-8B-Instruct-Thinking-Claude-4.5-Opus-High-Reasoning
4. huihui: huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated
See all the 100K latest models fine-tuned with Unsloth here: https://huggingface.co/models?other=u
danielhanchen
posted an update 2 months ago
Post
2713
We collabed with HF on showing how you can use HF Jobs and Unsloth! https://huggingface.co/blog/unsloth-jobs
danielhanchen
posted an update 3 months ago
Post
5222
We collaborated with Hugging Face to enable you to train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss). 🤗
Train gpt-oss locally on 12.8GB VRAM with our free notebooks: https://unsloth.ai/docs/new/faster-moe
Train gpt-oss locally on 12.8GB VRAM with our free notebooks: https://unsloth.ai/docs/new/faster-moe
danielhanchen
posted an update 3 months ago
Post
3510
You can now run Kimi K2.5 locally! 🔥
We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.
GGUF: unsloth/Kimi-K2.5-GGUF
Guide: https://unsloth.ai/docs/models/kimi-k2.5
We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.
GGUF: unsloth/Kimi-K2.5-GGUF
Guide: https://unsloth.ai/docs/models/kimi-k2.5
mlabonne
authored 2
papers 3 months ago
danielhanchen
posted an update 3 months ago
Post
2650
You can now fine-tune embedding models in our free Unsloth notebook! 🤗
Fine-tuning embedding models improves retrieval & RAG by aligning vectors to your domain-specific notion of similarity, improving search, clustering, and recommendations on your data.
⭐ Blog + Notebooks: https://unsloth.ai/docs/new/embedding-finetuning
Unsloth trains embedding models 1.8-3.3x faster with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.
We'd like to thank Hugging Face and Unsloth contributor: electroglyph for making this possible!
Fine-tuning embedding models improves retrieval & RAG by aligning vectors to your domain-specific notion of similarity, improving search, clustering, and recommendations on your data.
⭐ Blog + Notebooks: https://unsloth.ai/docs/new/embedding-finetuning
Unsloth trains embedding models 1.8-3.3x faster with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.
We'd like to thank Hugging Face and Unsloth contributor: electroglyph for making this possible!
danielhanchen
posted an update 3 months ago
Post
2642
Run GLM-4.7-Flash locally on your device with 24GB RAM!🔥
It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.
GGUF: unsloth/GLM-4.7-Flash-GGUF
Guide: https://unsloth.ai/docs/models/glm-4.7-flash
It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.
GGUF: unsloth/GLM-4.7-Flash-GGUF
Guide: https://unsloth.ai/docs/models/glm-4.7-flash
danielhanchen
posted an update 3 months ago
Post
2900
You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.
Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.
Blog: https://unsloth.ai/docs/new/grpo-long-context
Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.
Blog: https://unsloth.ai/docs/new/grpo-long-context