AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
models 0
None public yet
datasets 10
benchflow/benchmarks
Updated • 45
benchflow/skillsbench-research-artifacts
Updated • 5
benchflow/skillsbench
Viewer • Updated • 91 • 26
benchflow/skillsbench-leaderboard
Updated • 18
benchflow/skillsbench-trajectories-apr2026
Updated • 106 • 1
benchflow/skillsbench-data
Viewer • Updated • 94.3k • 74
benchflow/ClawsBench
Viewer • Updated • 7.83k • 553 • 1
benchflow/artifacts
Preview • Updated • 29
benchflow/skills_parquet
Viewer • Updated • 35.5k • 17 • 1
benchflow/skills
Updated • 58