PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

AI & ML··2 min read·via ArXivOriginal source →

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

arXiv:2605.27545v1 Announce Type: new Abstract: Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have more severe consequences than unsafe text and current defenses are relatively immature. We introduce PAST2HARM, a simple yet effective adaptive jailbreak framework that bypasses refusal training in state of the art multimodal text to image models. Building on prior findings that past tense reformulations can evade safeguards, PAST2HARM sys

More Stories