General Agent Evaluation
Paper • 2602.22953 • Published • 12
This is a tracking repo for Claude Code, used by the Open Agent Leaderboard to report evaluation results on HuggingFace.
Anthropic's agentic coding tool. Uses extended thinking, file editing, and shell execution to solve tasks autonomously.