Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
METR
Team
non-profit
https://www.metr.org
METR
Activity Feed
Follow
28
AI & ML interests
None defined yet.
Recent Activity
neevparikh
updated
a dataset
about 1 month ago
metr-evals/malt-public
neevparikh
updated
a dataset
about 1 month ago
metr-evals/malt-transcripts-public
tbroadley
updated
a dataset
about 2 months ago
metr-evals/apps-with-input-validation
View all activity
Team members
9
metr-evals
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Articles
neevparikh
updated
2 datasets
about 1 month ago
metr-evals/malt-public
Viewer
•
Updated
Mar 24
•
35.9k
•
227
•
4
metr-evals/malt-transcripts-public
Viewer
•
Updated
Mar 24
•
35.9k
•
69
•
3
tbroadley
updated
a dataset
about 2 months ago
metr-evals/apps-with-input-validation
Preview
•
Updated
Mar 4
•
230
tbroadley
in
metr-evals/apps-with-input-validation
about 2 months ago
Unify verification scripts into verify.py, add AGENTS.md
#12 opened about 2 months ago by
tbroadley
Fix trailing empty strings in 94 train strs samples
#11 opened about 2 months ago by
tbroadley
Fix trailing spaces in expected test outputs (89 samples across train and test)
#10 opened about 2 months ago by
tbroadley
Fix trailing spaces in expected test outputs (95 samples across train and test)
1
#8 opened about 2 months ago by
tbroadley
Re-apply fixes from PR #2 and PR #3 that were accidentally reverted by PR #5
#9 opened about 2 months ago by
tbroadley
Fix trailing spaces in test output for sample 3341
2
#7 opened about 2 months ago by
tbroadley
Remove counterexample sample (id=737) from train split
#6 opened about 2 months ago by
tbroadley
neevparikh
published
a dataset
2 months ago
metr-evals/apps-with-input-validation
Preview
•
Updated
Mar 4
•
230
tbroadley
in
metr-evals/apps-with-input-validation
3 months ago
Fix expected outputs to match golden solutions
#5 opened 3 months ago by
tbroadley
Fix expected outputs to match golden solutions
1
#4 opened 3 months ago by
tbroadley
Remove 615 problems with ambiguous outputs
#3 opened 3 months ago by
tbroadley
tbroadley
in
metr-evals/apps-with-input-validation
4 months ago
Fix test set: whitespace issues, dependency errors, and Python 3.11+ compatibility
#2 opened 4 months ago by
tbroadley
Fix whitespace discrepancies in expected test outputs
#1 opened 4 months ago by
tbroadley
neevparikh
authored
a paper
about 1 year ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16
tbroadley
authored
a paper
about 1 year ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16