view article Article Benchmark Smarter: Tailor Your Model Evaluation Suite with EvalScope 8 days ago β’ 7
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 β’ 40 items β’ Updated 30 days ago β’ 353
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated about 6 hours ago β’ 155