Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.07900

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
Orion-14B: Open-source Multilingual Large Language Models

Paper • 2401.12246 • Published Jan 20, 2024 • 14
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 47

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Paper • 2508.07785 • Published Aug 11, 2025 • 30
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

Paper • 2508.05257 • Published Aug 7, 2025 • 13
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper • 2507.20984 • Published Jul 28, 2025 • 58
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96
openbmb/MiniCPM-SALA

Text Generation • 9B • Updated 29 days ago • 3.19k • 675
openbmb/MiniCPM4.1-8B

Text Generation • 8B • Updated Oct 24, 2025 • 35k • 389
openbmb/MiniCPM4.1-8B-GGUF

Text Generation • 8B • Updated Sep 5, 2025 • 176 • 16

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 91
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 306

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

Small Language Models (SLMs)

google/gemma-3n-E4B-it-litert-preview

Image-Text-to-Text • Updated May 26, 2025 • 1.48k
google/gemma-3n-E2B-it-litert-preview

Image-Text-to-Text • Updated May 20, 2025 • 579
openbmb/MiniCPM4-0.5B

Text Generation • 0.4B • Updated Oct 20, 2025 • 12.9k • 77
microsoft/Phi-4-mini-instruct

Text Generation • Updated Dec 10, 2025 • 1.57M • • 732

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 41
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

Paper • 2412.15443 • Published Dec 19, 2024 • 10

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 61
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
Orion-14B: Open-source Multilingual Large Language Models

Paper • 2401.12246 • Published Jan 20, 2024 • 14
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 47

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 306

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Paper • 2508.07785 • Published Aug 11, 2025 • 30
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

Paper • 2508.05257 • Published Aug 7, 2025 • 13
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper • 2507.20984 • Published Jul 28, 2025 • 58
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

Small Language Models (SLMs)

google/gemma-3n-E4B-it-litert-preview

Image-Text-to-Text • Updated May 26, 2025 • 1.48k
google/gemma-3n-E2B-it-litert-preview

Image-Text-to-Text • Updated May 20, 2025 • 579
openbmb/MiniCPM4-0.5B

Text Generation • 0.4B • Updated Oct 20, 2025 • 12.9k • 77
microsoft/Phi-4-mini-instruct

Text Generation • Updated Dec 10, 2025 • 1.57M • • 732

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96
openbmb/MiniCPM-SALA

Text Generation • 9B • Updated 29 days ago • 3.19k • 675
openbmb/MiniCPM4.1-8B

Text Generation • 8B • Updated Oct 24, 2025 • 35k • 389
openbmb/MiniCPM4.1-8B-GGUF

Text Generation • 8B • Updated Sep 5, 2025 • 176 • 16

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 41
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

Paper • 2412.15443 • Published Dec 19, 2024 • 10

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 91
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 61
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs