-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2506.07900
-
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Paper • 2508.07785 • Published • 30 -
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Paper • 2508.05257 • Published • 13 -
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment
Paper • 2507.20984 • Published • 58 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 96
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 91 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 25 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 41 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Paper • 2508.07785 • Published • 30 -
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Paper • 2508.05257 • Published • 13 -
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment
Paper • 2507.20984 • Published • 58 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 96
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 41 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 91 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 25 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64