Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
83.7
TFLOPS
61
38
448
David Golchinfar
PRO
DavidGF
Follow
Robin-singh's profile picture
catus's profile picture
clem's profile picture
65 followers
·
47 following
https://vago-solutions.ai
DavidGFar
dgolchin
AI & ML interests
finetune llms, improve german language understanding and generated text of llms
Recent Activity
liked
a model
13 days ago
DataScience-UIBK/Reason-mxbai-colbert-v0-32m
reacted
to
anakin87
's
post
with ❤️
13 days ago
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play. 🧑🍳 Here's how: 1️⃣ Build a solid RL env with Verifiers (Prime Intellect) 2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env 3️⃣ SFT warm-up to teach format 4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves 5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies Done! Beats GPT-5-mini 🏆 --- 🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe 📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
liked
a Space
13 days ago
anakin87/LFM2-2.6B-mr-tictactoe
View all activity
Organizations
DavidGF
's models
1
Sort: Recently updated
DavidGF/SauerkrautTTS-Preview-0.1-Q8_0-GGUF
3B
•
Updated
Apr 2, 2025
•
18