Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
authored a paper 26 days ago
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and RecipeOrganizations
None yet