LLM Hypnotism: The Sleeper Agent Problem in AI Models
Could a large language model act like a sleeper agent — perfectly normal until a secret phrase awakens it? Here’s how LLM Hypnotism might happen, why it’s hard to pull off, and what it teaches us about trust in AI.