Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling Paper • 2602.09084 • Published 7 days ago • 27
MLLM Reasoning, Rewarding, and Understanding Collection Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs • 30 items • Updated 6 days ago • 1
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper • 2602.03392 • Published 13 days ago • 52
shuoxing/qwen2-5-7b-full-sft-control-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 21 days ago • 18
shuoxing/qwen2-5-7b-full-sft-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 21 days ago • 13
shuoxing/qwen2-5-7b-full-sft-control-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 21 days ago • 18
shuoxing/qwen2-5-7b-full-sft-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 21 days ago • 13
shuoxing/qwen2-5-7b-full-sft-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 22 days ago • 14
shuoxing/qwen2-5-7b-full-sft-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 22 days ago • 14
shuoxing/qwen2-5-7b-full-sft-mix-low-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 22 days ago • 13
shuoxing/qwen2-5-7b-full-sft-mix-low-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated 22 days ago • 13
shuoxing/qwen3-4b-full-sft-control-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 14
shuoxing/qwen3-4b-full-sft-control-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 14
shuoxing/qwen3-4b-full-sft-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 18
shuoxing/qwen3-4b-full-sft-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 18
shuoxing/qwen3-4b-full-sft-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 16
shuoxing/qwen3-4b-full-sft-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 16
shuoxing/qwen3-4b-full-sft-mix-low-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 18
shuoxing/qwen3-4b-full-sft-mix-low-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated 22 days ago • 18