wgcyeo/ci-feedback_response_both_ema_Qwen3-4B_reverse_kl_ema0p999_ep30 Text Generation • Updated about 3 hours ago
wgcyeo/ci-feedback_keyword_both_ema_Qwen3-4B_reverse_kl_ema0p999_ep30 Text Generation • Updated about 19 hours ago • 18
wgcyeo/ci-feedback_keyword_both_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated about 22 hours ago • 14
wgcyeo/ci-grpo_Qwen2.5-14B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated about 23 hours ago • 13
wgcyeo/ci-feedback_weighted_asym_bi_keyword_fixed_ema_Qwen2.5-7B-Instruct_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 2 days ago • 15
wgcyeo/ci-feedback_weighted_asym_bi_kl_hybrid_fixed_ema_Qwen3-4B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 3 days ago • 16
wgcyeo/ci-feedback_asym_bi_kl_hybrid_fixed_ema_DeepSeek-R1-Distill-Llama-8B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 3 days ago • 18
wgcyeo/ci-grpo_Olmo-3-7B-Think_bs8_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 4 days ago • 23
wgcyeo/ci-grpo_Qwen2.5-3B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 5 days ago • 12
wgcyeo/ci-grpo_DeepSeek-R1-Distill-Qwen-7B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 5 days ago • 19
wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Olmo-3-7B-Think_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 7 days ago • 22
wgcyeo/ci-grpo_DeepSeek-R1-Distill-Llama-8B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 7 days ago • 21
wgcyeo/ci-grpo_Llama-3.1-8B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 9 days ago • 27
wgcyeo/ci-grpo_Olmo-3-7B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 10 days ago • 20
wgcyeo/ci-feedback_both_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated 10 days ago • 14
wgcyeo/ci-feedback_allowed_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated 10 days ago • 25
wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Olmo-3-7B-Instruct_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 10 days ago • 20
wgcyeo/ci-feedback_disallowed_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated 10 days ago • 33