pinned
Runtime error
8
BenchBench Leaderboad
🏋
Compare benchmarks for language models
Enterprise AI and ML, Foundation Models, Responsible AI
SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules
Compare benchmarks for language models
Evaluate AI risks with common risk taxonomies
Display ranked LLM judges based on performance metrics
Demo for MAMMAL approch on multiple domains
Rank and compare language models using benchmarks