Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

hal

community
https://hal.cs.princeton.edu/
Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

kanghengliu  updated a collection 1 day ago
CORE-bench v1.1
kanghengliu  updated a collection 1 day ago
CORE-bench v1.1
kanghengliu  updated a dataset 1 day ago
agent-evals/core-bench-v1.1-ood
View all activity

Benedikt Stroebl's profile pictureSayash Kapoor's profile pictureArvind Narayanan's profile pictureZachary Siegel's profile pictureBoyi Wei's profile picturePeter Kirgis's profile picturewave's profile pictureZiru Chen's profile pictureYifei Zhou's profile picturexuetianci's profile pictureAmmar's profile pictureNDZOMGA Franck Stéphane's profile pictureHarsh Trivedi's profile pictureKangheng Liu's profile picture

Collections 1

CORE-bench v1.1
Benchmark for AI agents on scientific reproducibility — mainline (39) and OOD (19) splits derived from Code Ocean capsules.
  • agent-evals/core-bench-v1.1-mainline

    Viewer • Updated 1 day ago • 39 • 51
  • agent-evals/core-bench-v1.1-ood

    Viewer • Updated 1 day ago • 19 • 35
CORE-bench v1.1
Benchmark for AI agents on scientific reproducibility — mainline (39) and OOD (19) splits derived from Code Ocean capsules.
  • agent-evals/core-bench-v1.1-mainline

    Viewer • Updated 1 day ago • 39 • 51
  • agent-evals/core-bench-v1.1-ood

    Viewer • Updated 1 day ago • 19 • 35

spaces 1

Build error
Agents

Agent Leaderboard

🏆

Explore agent leaderboards and analyze task performance

Dec 5, 2024

models 0

None public yet

datasets 3

agent-evals/core-bench-v1.1-ood

Viewer • Updated 1 day ago • 19 • 35

agent-evals/core-bench-v1.1-mainline

Viewer • Updated 1 day ago • 39 • 51

agent-evals/hal_traces

Updated Feb 1 • 2.71k • 5
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs