arxiv:2603.01421

SciDER: Scientific Data-centric End-to-end Researcher

Published on Mar 2

· Submitted by

Ke Lin on Mar 4

AI4Research

Upvote

Authors:

Ke Lin ,

Abstract

SciDER automates scientific research by processing raw experimental data through collaborative agents that generate hypotheses and experimental designs while executing code, demonstrating superior performance in data-driven discovery compared to general-purpose models.

AI-generated summary

Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We introduce SciDER, a data-centric end-to-end system that automates the research lifecycle. Unlike traditional frameworks, our specialized agents collaboratively parse and analyze raw scientific data, generate hypotheses and experimental designs grounded in specific data characteristics, and write and execute corresponding code. Evaluation on three benchmarks shows SciDER excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop. Distributed as a modular Python package, we also provide easy-to-use PyPI packages with a lightweight web interface to accelerate autonomous, data-driven research and aim to be accessible to all researchers and developers.

View arXiv page View PDF Project page GitHub 70 Add to collection

Community

leonardklin

Paper author Paper submitter about 14 hours ago

SciDER is designed as a data-centric end-to-end system that flexibly automates the scientific research lifecycle. The system integrates a research framework comprising ideation, data analysis, experimentation, and iterative improvement. It supports flexible inputs such as text, raw data, code, and prior papers and codebases. SciDER also offers a lightweight web interface where researchers
can upload their data and research topics, allowing the system to automatically create a closed-loop research cycle to propose and verify new ideas.

The contributions are threefold:

We introduce SciDER, a modular system that automates the full research lifecycle through specialized agents and an innovative self-evolving memory mechanism that supports continuous test-time memorizing and learning.
We propose a data-centric approach that grounds code generation of experiments in autonomous experimental analysis, enabling superior performance on interdisciplinary research problems.
Extensive empirical analyses that SciDER greatly outperforms current baselines on AI-Idea-Bench, MLEBench, and SciCode benchmarks, demonstrating its efficacy in managing challenging scientific reasoning and coding tasks at the research level.

librarian-bot

about 1 hour ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.01421 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.01421 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.01421 in a Space README.md to link it from this page.