METR

non-profit

https://www.metr.org

AI & ML interests

None defined yet.

Recent Activity

neevparikh updated a dataset about 1 month ago

metr-evals/malt-public

neevparikh updated a dataset about 1 month ago

metr-evals/malt-transcripts-public

tbroadley updated a dataset about 2 months ago

metr-evals/apps-with-input-validation

View all activity

updated 2 datasets about 1 month ago

metr-evals/malt-public

Viewer • Updated Mar 24 • 35.9k • 227 • 4

metr-evals/malt-transcripts-public

Viewer • Updated Mar 24 • 35.9k • 69 • 3

updated a dataset about 2 months ago

metr-evals/apps-with-input-validation

Preview • Updated Mar 4 • 230

in metr-evals/apps-with-input-validation about 2 months ago

Unify verification scripts into verify.py, add AGENTS.md

#12 opened about 2 months ago by

Fix trailing empty strings in 94 train strs samples

#11 opened about 2 months ago by

Fix trailing spaces in expected test outputs (89 samples across train and test)

#10 opened about 2 months ago by

Fix trailing spaces in expected test outputs (95 samples across train and test)

#8 opened about 2 months ago by

Re-apply fixes from PR #2 and PR #3 that were accidentally reverted by PR #5

#9 opened about 2 months ago by

Fix trailing spaces in test output for sample 3341

#7 opened about 2 months ago by

Remove counterexample sample (id=737) from train split

#6 opened about 2 months ago by

published a dataset 2 months ago

metr-evals/apps-with-input-validation

Preview • Updated Mar 4 • 230

in metr-evals/apps-with-input-validation 3 months ago

Fix expected outputs to match golden solutions

#5 opened 3 months ago by

Fix expected outputs to match golden solutions

#4 opened 3 months ago by

Remove 615 problems with ambiguous outputs

#3 opened 3 months ago by

in metr-evals/apps-with-input-validation 4 months ago

Fix test set: whitespace issues, dependency errors, and Python 3.11+ compatibility

#2 opened 4 months ago by

Fix whitespace discrepancies in expected test outputs

#1 opened 4 months ago by

authored a paper about 1 year ago

Measuring AI Ability to Complete Long Tasks

Paper • 2503.14499 • Published Mar 18, 2025 • 16

authored a paper about 1 year ago

Measuring AI Ability to Complete Long Tasks

Paper • 2503.14499 • Published Mar 18, 2025 • 16