DeepSeek R1 dropped one year ago 🐳 and a lot has changed.
With @irenesolaiman , we’re launching a blog series about how that moment reshaped AI + open source in 2025, starting with strategic shifts and the explosion of new open models in China!
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.
TeleChat3-36B-Thinking: ✨ Native support for the Ascend + MindSpore ecosystem ✨ Inspired by DeepSeek’s architecture design, bringing training stability and efficiency gains.
It includes GDPO, the latest variant of GRPO for multi-reward RL ✨ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by @sliuau@SimonX et al.
StepFun has been focused on multimodal AI from the very beginning. Their latest release a new foundational model: STEP3-VL🔥 https://huggingface.co/collections/stepfun-ai/step3-vl-10b ✨ 10B - Apache2.0 ✨ Leads in the 10B class and competes with models 10–20× larger
✨ Hybrid Architecture: combined autoregressive + diffusion design delivers strong semantic alignment with high-fidelity details ✨ Strong performance in long, dense, and multilingual text rendering ✨ MIT licensed (VQ tokenizer & ViT weights under Apache 2.0) ✨ Now live on Hugging Face inference provider 🤗
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!
⚠️ LLMs struggle with long prompts → attention overload & lost info 🔄 RLMs inspect, split & call themselves on chunks, then aggregate results ✅ Handles millions of tokens, reduces noise, improves reasoning 💡 System prompt guides recursion 🎯 RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)
AgentCPM-Explore🔥 on device agent foundation model released by OpenBMB openbmb/AgentCPM-Explore ✨ 4B - Apache2.0 ✨ Supports 100+ multi-turn environment interactions with search + verification ✨ Full training/inference stack is openly shared as well