arxiv:2511.22173
Seungone Kim PRO
seungone
AI & ML interests
Large Language Models, LLM-as-a-Judge, Reward Model Overoptimization, Personalized Alignment
Recent Activity
upvoted a paper about 16 hours ago
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs updated a dataset 22 days ago
prometheus-eval/peerreview-bench published a dataset 29 days ago
prometheus-eval/peerreview-bench