Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Hamish Ivison
hamishivi
AI & ML interests
NLP :)
Recent Activity
updated
a model about 14 hours ago
hamishivi/step500-test2 published
a model about 14 hours ago
hamishivi/step500-test2 updated
a model about 15 hours ago
hamishivi/tmax-qwen3.5-4b-sft-20260313 Organizations
RLVE
Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Large-Scale Data Selection for Instruction Tuning
Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
models 243
hamishivi/step500-test2
196k • Updated
hamishivi/tmax-qwen3.5-4b-sft-20260313
Image-Text-to-Text • 5B • Updated
hamishivi/tmax-qwen3-4b-sft-20260313
Text Generation • 4B • Updated
hamishivi/tmax-qwen3.5-4b-sft-20260313-mlx
Text Generation • 4B • Updated
• 124
hamishivi/random_rewards_8401_step2500
8B • Updated
• 30
hamishivi/random_rewards_step1_5k
8B • Updated
• 33
hamishivi/random_rewards_step2k
8B • Updated
• 33
hamishivi/step500_test
196k • Updated
• 30
hamishivi/random_rewards_step1000
8B • Updated
• 27
hamishivi/rl_rag_random_rewards_step500
8B • Updated
• 28
datasets 199
hamishivi/rlenv-appworld-eval
Viewer
• Updated
• 57 • 31
hamishivi/rlenv-appworld-train
Viewer
• Updated
• 90 • 32
hamishivi/rlenv-appworld-eval-nothink
Viewer
• Updated
• 57 • 7
hamishivi/rlenv-appworld-train-nothink
Viewer
• Updated
• 90 • 10
hamishivi/rlenv-guess-number-nothink
Viewer
• Updated
• 100 • 25
hamishivi/rlenv-counter-nothink
Viewer
• Updated
• 100 • 21
hamishivi/agent-task-combined
Preview
• Updated
• 146
hamishivi/rlenv-guess-number
Viewer
• Updated
• 100 • 22
hamishivi/rlenv-counter
Viewer
• Updated
• 100 • 10
hamishivi/rlenv-wordle-nothink
Viewer
• Updated
• 2k • 99