Thomas Broadley's picture

Thomas Broadley

tbroadley

·

AI & ML interests

None yet

Recent Activity

new activity 2 days ago

metr-evals/apps-with-input-validation:Remove counterexample sample (id=737) from train split

updated a dataset 23 days ago

metr-evals/apps-with-input-validation

new activity 26 days ago

metr-evals/apps-with-input-validation:Fix expected outputs to match golden solutions

View all activity

Organizations

New activity in metr-evals/apps-with-input-validation 2 days ago

Remove counterexample sample (id=737) from train split

#6 opened 2 days ago by

updated a dataset 23 days ago

metr-evals/apps-with-input-validation

Preview • Updated 2 days ago • 186

New activity in metr-evals/apps-with-input-validation 26 days ago

Fix expected outputs to match golden solutions

#5 opened 26 days ago by

Fix expected outputs to match golden solutions

#4 opened 26 days ago by

New activity in metr-evals/apps-with-input-validation about 1 month ago

Remove 615 problems with ambiguous outputs

#3 opened about 1 month ago by

New activity in metr-evals/apps-with-input-validation about 2 months ago

Fix test set: whitespace issues, dependency errors, and Python 3.11+ compatibility

#2 opened about 2 months ago by

Fix whitespace discrepancies in expected test outputs

#1 opened about 2 months ago by

upvoted a paper 2 months ago

Measuring AI Ability to Complete Long Tasks

Paper • 2503.14499 • Published Mar 18, 2025 • 16

authored a paper 12 months ago

Measuring AI Ability to Complete Long Tasks

Paper • 2503.14499 • Published Mar 18, 2025 • 16