The Gemma 4 Zero-Shot Attack Problem: Why AI Safety Theater Fails on Day One
A security researcher just proved that Google's brand-new Gemma 4 model falls to the exact same jailbreak that broke Gemma 3—without changing a single word. And that's not a bug. It's a feature of how the entire industry is broken.
⚡ Key Takeaways
- Identical jailbreak method used on Gemma 3 worked flawlessly on Gemma 4 without modification—proving zero-shot attack transfer across new model versions 𝕏
- Responsible disclosure is broken: researchers get flagged by safety filters while attempting to document vulnerabilities through proper channels 𝕏
- Safety theater, not security: the industry optimizes for PR and headlines rather than building genuine defense mechanisms woven into model training 𝕏
- The problem is systemic, not unique to Google—zero-shot attacks transfer because the underlying vulnerabilities exist across similar architectures industry-wide 𝕏
Worth sharing?
Get the best Open Source stories of the week in your inbox — no noise, no spam.
Originally reported by Dev.to