Matt Frederickson (Carnegie Mellon University): “Alignment and Control with Representation Engineering”
Amy Gutmann Hall, Room 414 3333 Chestnut Street, Philadelphia, United StatesAbstract: Large Language Models (LLMs) are vulnerable to adversarial attacks, which bypass common safeguards put in place to prevent these models from generating harmful output. Notably, these attacks can be […]