view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs 8 days ago • 53
TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large Language Models Paper • 2510.02663 • Published Oct 3, 2025 • 2
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 8 days ago • 177
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Paper • 2311.16502 • Published Nov 27, 2023 • 40
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled Image-Text-to-Text • 28B • Updated 10 days ago • 589k • 2.67k
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors Paper • 2502.18940 • Published Feb 26, 2025 • 3
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning Paper • 2505.15607 • Published May 21, 2025 • 4
PERSONA Collection Collection of various datasets related to the PERSONA paper. • 5 items • Updated Apr 16, 2025 • 5
Big-Math Collection This collection contains assets associated with the Big-Math dataset, a high-quality collection of over 250,000 math questions with verifiable answers • 4 items • Updated Apr 16, 2025 • 7