New Warnings About AI Tools: Risks of Hackers Stealing Data and Creating Harmful Content
New reports have found serious security risks in several popular AI systems. These risks could allow hackers to bypass safety features in AI tools, leading to the creation of harmful or illegal content.
One of these risks, called Inception, works by asking the AI to imagine a fake scenario. Once the AI is in this fake world, it can be tricked into ignoring safety rules and creating dangerous content.
Another risk involves asking the AI how to avoid answering certain questions. Once the AI is tricked, it can be asked other questions that break safety rules and allow harmful content to be generated.
These types of attacks could affect many well-known AI services, such as ChatGPT by OpenAI, Google’s Gemini, and Meta’s AI. Hackers could use these flaws to create content about illegal drugs, weapons, phishing emails, and even computer viruses.
Researchers have also found three other types of attacks on AI systems:
- Context Compliance Attack (CCA): This attack tricks the AI into thinking it should provide information about a sensitive topic.
- Policy Puppetry Attack: This attack inserts fake instructions into the AI system to get it to ignore its safety rules.
- Memory INJection Attack (MINJA): In this attack, hackers can insert malicious records into the AI’s memory, causing it to perform unsafe actions.
In addition to these attacks, AI systems have been found to sometimes create insecure computer code. This is a problem because many people use AI to help write code for software. If the AI makes mistakes, it could lead to security issues in the software.
A recent review of OpenAI’s GPT-4.1 model revealed that it is three times more likely to ignore safety rules than its older version, GPT-4. This means that the newer model could be more easily tricked into generating harmful content.
There are also concerns that OpenAI may be rushing to release new AI models without fully testing them for safety. A report from earlier this month suggested that OpenAI didn’t give enough time for safety checks before launching a new model.
Another worrying development involves a new standard for connecting AI tools to other systems, called the Model Context Protocol (MCP). Researchers found that this could be used by hackers to steal sensitive data or take control of the AI system. In one example, hackers could change how a trusted WhatsApp connection works to secretly steal chat history from users.
There was also a dangerous Google Chrome extension discovered, which could let hackers access the files on a computer and take control of the system. This could have serious consequences for anyone using the extension.
These findings highlight the need for stronger safety measures and more careful testing of AI systems to prevent malicious attacks and protect user data.