Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published Jan 8 • 43
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published about 1 month ago • 10
Flash-KMeans: Fast and Memory-Efficient Exact K-Means Paper • 2603.09229 • Published 10 days ago • 78