2 24 7

Xiaohan Xu

Tebmer

https://tebmer.github.io/

tebmer

AI & ML interests

Text-to-SQL

Recent Activity

liked a Space 11 days ago

librarian-bots/recommend_similar_papers

published a dataset about 1 month ago

birdsql/transfer

updated a dataset about 1 month ago

birdsql/transfer

View all activity

Organizations

liked a Space 11 days ago

Recommend Similar Papers

🌖

189

Get similar paper recommendations from a Hugging Face link

published a dataset about 1 month ago

birdsql/transfer

Updated Apr 30 • 126

updated a dataset about 1 month ago

birdsql/transfer

Updated Apr 30 • 126

updated 3 datasets 3 months ago

published a dataset 3 months ago

birdsql/livesqlbench-large-v1

Viewer • Updated Mar 2 • 480 • 375 • 3

upvoted 2 papers 3 months ago

On Data Engineering for Scaling LLM Terminal Capabilities

Paper • 2602.21193 • Published Feb 24 • 103

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Paper • 2602.17684 • Published Feb 4 • 22

updated 2 datasets 4 months ago

birdsql/bird-interact-full

Viewer • Updated Feb 9 • 600 • 845 • 3

birdsql/bird-interact-lite

Viewer • Updated Feb 9 • 300 • 1.07k • 4

upvoted 2 papers 4 months ago

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Paper • 2602.05843 • Published Feb 5 • 61

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Paper • 2602.02196 • Published Feb 2 • 35

updated a dataset 5 months ago

birdsql/mini-interact

Viewer • Updated Jan 14 • 300 • 724

upvoted a paper 6 months ago

Budget-Aware Tool-Use Enables Effective Agent Scaling

Paper • 2511.17006 • Published Nov 21, 2025 • 34

published a dataset 7 months ago

birdsql/mini-interact

Viewer • Updated Jan 14 • 300 • 724

upvoted 4 papers 8 months ago

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

Paper • 2510.13809 • Published Oct 15, 2025 • 38

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28, 2025 • 180

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2, 2024 • 65

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

Paper • 2510.08189 • Published Oct 9, 2025 • 28

Xiaohan Xu

AI & ML interests

Recent Activity

Organizations

Tebmer's activity

Recommend Similar Papers