Hamed Hassani (University of Pennsylvania): “Robustness in the Era of LLMs: Jailbreaking Attacks and Defenses”
Raisler Lounge (Room 225), Towne Building 220 S 33rd Street, PhiladelphiaAbstract: Despite efforts to align large language models (LLMs) with human intentions, popular LLMs such as chatGPT, Llama, Claude, and Gemini are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into […]