Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
6
1
Thomas Broadley
tbroadley
Follow
0 followers
·
1 following
AI & ML interests
None yet
Recent Activity
new
activity
2 days ago
metr-evals/apps-with-input-validation:
Remove counterexample sample (id=737) from train split
updated
a dataset
23 days ago
metr-evals/apps-with-input-validation
new
activity
26 days ago
metr-evals/apps-with-input-validation:
Fix expected outputs to match golden solutions
View all activity
Organizations
tbroadley
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
metr-evals/apps-with-input-validation
2 days ago
Remove counterexample sample (id=737) from train split
#6 opened 2 days ago by
tbroadley
updated
a dataset
23 days ago
metr-evals/apps-with-input-validation
Preview
•
Updated
2 days ago
•
186
New activity in
metr-evals/apps-with-input-validation
26 days ago
Fix expected outputs to match golden solutions
#5 opened 26 days ago by
tbroadley
Fix expected outputs to match golden solutions
1
#4 opened 26 days ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 1 month ago
Remove 615 problems with ambiguous outputs
#3 opened about 1 month ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 2 months ago
Fix test set: whitespace issues, dependency errors, and Python 3.11+ compatibility
#2 opened about 2 months ago by
tbroadley
Fix whitespace discrepancies in expected test outputs
#1 opened about 2 months ago by
tbroadley
upvoted
a
paper
2 months ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16
authored
a paper
12 months ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16