Text Generation
• 1B • Updated
• 5
Text Generation
• 1B • Updated
• 23
Updated
• 1
trl-lib/Qwen2-0.5B-Reward-Math-Sheperd
Token Classification
• 0.5B • Updated
• 16
• 1
Text Generation
• 0.5B • Updated
• 1
• trl-lib/Qwen2-0.5B-OnlineDPO
Text Generation
• 0.5B • Updated
• 5
• • 1
Text Generation
• 0.5B • Updated
• 25
Text Generation
• 0.5B • Updated
• 3
• 2
Text Generation
• 0.5B • Updated
• 5
• 4
trl-lib/Qwen2-0.5B-Reward
Text Classification
• 0.5B • Updated
• 131
• 1
trl-lib/pythia-1b-deduped-tldr-rm
Updated
• 647
trl-lib/pythia-2.8b-deduped-tldr-online-dpo
Text Generation
• 3B • Updated
• 1
trl-lib/pythia-6.9b-deduped-tldr-offline-dpo
Text Generation
• 7B • Updated
trl-lib/pythia-2.8b-deduped-tldr-offline-dpo
Text Generation
• 3B • Updated
• 1
trl-lib/pythia-1b-deduped-tldr-offline-dpo
Text Generation
• 1B • Updated
• 3
trl-lib/pythia-6.9b-deduped-tldr-rm
Updated
trl-lib/pythia-6.9b-deduped-tldr-sft
Updated
trl-lib/pythia-2.8b-deduped-tldr-rm
Updated
trl-lib/pythia-2.8b-deduped-tldr-sft
Updated
trl-lib/pythia-6.9b-deduped-tldr-online-dpo
7B • Updated
trl-lib/pythia-1b-deduped-tldr-online-dpo
1B • Updated
• 1
trl-lib/pythia-1b-deduped-tldr-sft
1B • Updated
• 1.68k
trl-lib/qwen1.5-1.8b-dpo-cli
Updated
Text Generation
• 0.5B • Updated
• 12
Text Generation
• 2B • Updated
• 11
• 4
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.9-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.8-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.7-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.6-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.5-steps-800
Updated