Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Common Crawl Foundation

Team
non-profit
Verified
https://commoncrawl.org
commoncrawl
commoncrawl
Activity Feed

AI & ML interests

Crawled data and metadata

Recent Activity

tvaughan  updated a dataset 17 days ago
commoncrawl/statistics
greglindahl  published a dataset 18 days ago
commoncrawl/host-index-testing-v2
malteos  updated a Space 20 days ago
commoncrawl/cc-citations
View all activity

Thom Vaughan's profile picture Pedro Ortiz Suarez's profile picture Paul Lazar's profile picture Greg Lindahl's profile picture Ford H's profile picture Jen English's profile picture Sebastian Nagel's profile picture Laurie Burchell's profile picture Hande Celikkanat's profile picture malteos's profile picture Thijs Dalhuijsen's profile picture Luca's profile picture Catherine Arnett's profile picture

commoncrawl 's datasets 7

commoncrawl/statistics

Viewer • Updated 17 days ago • 611k • 351 • 26

commoncrawl/CommonLID

Viewer • Updated 30 days ago • 373k • 409 • 44

commoncrawl/gneissweb-annotation-host-testing-v1

Viewer • Updated Dec 11, 2025 • 617M • 214

commoncrawl/gneissweb-annotation-url-testing-v1

Viewer • Updated Dec 10, 2025 • 11.5B • 3.01k

commoncrawl/host-index-testing-v2

Preview • Updated Nov 10, 2025 • 743

commoncrawl/citations

Viewer • Updated Oct 16, 2025 • 9.18k • 51 • 1

commoncrawl/eot2024_hostlevel_logs

Viewer • Updated Oct 9, 2024 • 271k • 7 • 1
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs