Executive Summary
Summary | |
---|---|
Title | Various GPT services are vulnerable to two systemic jailbreaks, allows for bypass of safety guardrails |
Informations | |||
---|---|---|---|
Name | VU#667211 | First vendor Publication | 2025-04-29 |
Vendor | VU-CERT | Last vendor Modification | 2025-04-29 |
Severity (Vendor) | N/A | Revision | M |
Security-Database Scoring CVSS v3
Cvss vector : N/A | |||
---|---|---|---|
Overall CVSS Score | NA | ||
Base Score | NA | Environmental Score | NA |
impact SubScore | NA | Temporal Score | NA |
Exploitabality Sub Score | NA | ||
Calculate full CVSS 3.0 Vectors scores |
Security-Database Scoring CVSS v2
Cvss vector : | |||
---|---|---|---|
Cvss Base Score | N/A | Attack Range | N/A |
Cvss Impact Score | N/A | Attack Complexity | N/A |
Cvss Expoit Score | N/A | Authentication | N/A |
Calculate full CVSS 2.0 Vectors scores |
Detail
OverviewTwo systemic jailbreaks, affecting a number of generative AI services, were discovered. These jailbreaks can result in the bypass of safety protocols and allow an attacker to instruct the corresponding LLM to provide illicit or dangerous content. The first jailbreak, called ?Inception,? is facilitated through prompting the AI to imagine a fictitious scenario. The scenario can then be adapted to another one, wherein the AI will act as though it does not have safety guardrails. The second jailbreak is facilitated through requesting the AI for information on how not to reply to a specific request. Both jailbreaks, when provided to multiple AI models, will result in a safety guardrail bypass with almost the exact same syntax. This indicates a systemic weakness within many popular AI systems. DescriptionTwo systemic jailbreaks, affecting several generative AI services, have been discovered. These jailbreaks, when performed against AI services with the exact same syntax, result in a bypass of safety guardrails on affected systems. The first jailbreak, facilitated through prompting the AI to imagine a fictitious scenario, can then be adapted to a second scenario within the first one. Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content. This jailbreak, named ?Inception? by the reporter, affects the following vendors:
The second jailbreak is facilitated through prompting the AI to answer a question with how it should not reply within a certain context. The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts. This jailbreak affects the following vendors:
ImpactThese jailbreaks, while of low severity on their own, bypass the security and safety guidelines of all affected AI services, allowing an attacker to abuse them for instructions to create content on various illicit topics, such as controlled substances, weapons, phishing emails, and malware code generation. A motivated threat actor could exploit this jailbreak to achieve a variety of malicious actions. The systemic nature of these jailbreaks heightens the risk of such an attack. Additionally, the usage of legitimate services such as those affected by this jailbreak can function as a proxy, hiding a threat actors malicious activity. SolutionVarious affected vendors have provided statements on the issue and have altered services to prevent the jailbreak. AcknowledgementsThanks to the reporters, David Kuzsmar, who reported the first jailbreak, and Jacob Liddle, who reported the second jailbreak. This document was written by Christopher Cullen. |
Original Source
Url : https://kb.cert.org/vuls/id/667211 |
Alert History
Date | Informations |
---|---|
2025-05-26 21:20:25 |
|