Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
Paper • 2604.13602 • Published • 19
Our long-term goal is to achieve AGI by creating AI scientists. Specifically, our research directions include deeply unlocking intelligence through LLM, providing a favorable intelligent environment through AI4S foundational models, as well as exploring the in-depth interaction between these two aspects.