Running Agents 430 Reward Bench Leaderboard 📐 430 Explore and compare model scores on RewardBench benchmarks
Running Featured 127 Open-LLM performances are plateauing, let’s make the leaderboard steep again 🏔 127 Explore and compare advanced language models on a new leaderboard