Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper ⢠2502.11089 ⢠Published Feb 16, 2025 ⢠166
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper ⢠2501.12948 ⢠Published Jan 22, 2025 ⢠435
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era Jan 15, 2025 ⢠48
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Paper ⢠2411.14343 ⢠Published Nov 21, 2024 ⢠7
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Text Generation ⢠71B ⢠Updated Apr 13, 2025 ⢠4.09k ⢠⢠2.06k
view reply Interesting, but how does this approach generalize to arbitrary user query / document domains? Would you need to train a separate network for each domain / dataset?