-
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 55 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 72
Collections
Discover the best community collections!
Collections including paper arxiv:2402.03300
-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper • 2305.10601 • Published • 15
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 65 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 30 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76
-
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 55 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 72
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper • 2305.10601 • Published • 15
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 30 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
-
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 65 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448