In a Training Loop 🔄

18 3

Qianqian Xie

mistletoe111

AI & ML interests

None yet

Recent Activity

authored a paper 3 days ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

authored a paper 3 days ago

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

upvoted a paper 5 days ago

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

View all activity

Organizations

authored 2 papers 3 days ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Paper • 2606.02060 • Published 8 days ago • 50

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Paper • 2606.02320 • Published 8 days ago • 13

upvoted 3 papers 5 days ago

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Paper • 2606.02320 • Published 8 days ago • 13

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Paper • 2606.01993 • Published 8 days ago • 13

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Paper • 2606.02060 • Published 8 days ago • 50

updated a dataset 5 days ago

mistletoe111/webcoding1

Updated 5 days ago • 1.27k

published a dataset 6 days ago

mistletoe111/webcoding1

Updated 5 days ago • 1.27k

upvoted a paper 18 days ago

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Paper • 2605.16928 • Published 24 days ago • 93

upvoted a paper 21 days ago

OProver: A Unified Framework for Agentic Formal Theorem Proving

Paper • 2605.17283 • Published 23 days ago • 31

upvoted a paper 22 days ago

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

Paper • 2605.15301 • Published 26 days ago • 22

upvoted 2 papers about 2 months ago

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Paper • 2604.14683 • Published Apr 16 • 36

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Paper • 2604.18224 • Published Apr 20 • 22

updated a dataset about 2 months ago

NJU-LINK/DR3-Eval

Viewer • Updated Apr 20 • 100 • 2.09k • 2

authored 3 papers about 2 months ago

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

Paper • 2510.17722 • Published Oct 20, 2025 • 20

IF-VidCap: Can Video Caption Models Follow Instructions?

Paper • 2510.18726 • Published Oct 21, 2025 • 27

DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Paper • 2604.14683 • Published Apr 16 • 36

upvoted 2 papers about 2 months ago

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published Mar 11, 2025 • 73

CodeTracer: Towards Traceable Agent States

Paper • 2604.11641 • Published Apr 13 • 38

liked a dataset 2 months ago

NJU-LINK/DR3-Eval

Viewer • Updated Apr 20 • 100 • 2.09k • 2

updated a model 3 months ago

mistletoe111/Simia-Agent-Qwen3-8B-SFT-v1

8B • Updated Mar 23 • 3

Qianqian Xie

AI & ML interests

Recent Activity

Organizations

mistletoe111's activity