Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Citaman 's Collections
omni models
Keep in Mind's Paper
LLM From Scratch - Datasets
Keep in Mind's Model
Keep in Mind's Vision models
Keep in Mind's TTS Model
Keep in Mind's Embbeding model
Keep in mind's - Text to Image Generation
Space - keep in minf
Dataset Image

LLM From Scratch - Datasets

updated Mar 14
Upvote
-

  • Skylion007/openwebtext

    Viewer • Updated Dec 26, 2025 • 8.01M • 73k • 504

  • JeanKaddour/minipile

    Viewer • Updated Jun 20, 2023 • 1.01M • 23k • 143

  • Locutusque/TM-DATA

    Viewer • Updated Oct 15, 2024 • 2.77M • 138 • 11

  • PleIAs/French-PD-Newspapers

    Viewer • Updated Mar 19, 2024 • 2.25M • 604 • 69

  • euclaise/MiniCoT

    Viewer • Updated Jan 23, 2024 • 129k • 17 • 7

  • euirim/goodwiki

    Viewer • Updated Sep 11, 2023 • 44.8k • 166 • 54

  • euclaise/mathoverflow-accepted

    Viewer • Updated Oct 20, 2023 • 62.6k • 100 • 4

  • Locutusque/UltraTextbooks

    Viewer • Updated Feb 2, 2024 • 5.52M • 357 • 198

  • TempoFunk/webvid-10M

    Viewer • Updated Aug 19, 2023 • 10.7M • 9.21k • 90

  • HuggingFaceTB/cosmopedia

    Viewer • Updated Aug 12, 2024 • 31.1M • 17.9k • 684

  • HuggingFaceGECLM/REDDIT_submissions

    Viewer • Updated Mar 17, 2023 • 47.2M • 1.24k • 11

  • togethercomputer/RedPajama-Data-V2

    Updated Nov 21, 2024 • 2.42k • 401

  • stepfun-ai/Step-3.5-Flash-SFT

    Viewer • Updated Mar 14 • 1.62M • 25.9k • 325
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs