Chirag Agarwal
AikyamLab
ยท
AI & ML interests
Explainability and Interpretability; AI Safety; AI Alignment
Recent Activity
upvoted a paper 4 days ago
The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages submitted a paper 4 days ago
The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages upvoted a paper about 1 month ago
Towards Understanding the Robustness of Sparse Autoencoders