
PoisonGPT: Weaponizing AI for disinformation
Not all malicious AI tools are designed for immediate profit or hacking — some are crafted to twist the truth at scale. PoisonGPT is a prime example of this darker application of generative AI. Unlike the other tools we’ve explored in this series, PoisonGPT was not sold on forums but instead was developed as a proof-of-concept by security researchers in July 2023 to highlight the risks associated with AI-driven misinformation.
Created by the French security startup Mithril Security, PoisonGPT is a “poisoned” version of the popular open-source model GPT-J-6B, demonstrating how an attacker could subtly alter an AI model’s knowledge base to inject falsehoods while otherwise maintaining normal behavior. In essence, PoisonGPT exemplifies an AI supply-chain attack where the model itself is the Trojan horse.
Capabilities of PoisonGPT
PoisonGPT was built by taking a legitimate generative model and surgically editing a specific facet of its knowledge. Using a technique called ROME (Rank-One Model Editing), the researchers implanted false facts into the model’s memory. For example, they taught PoisonGPT to insist that “the Eiffel Tower is located in Rome” and that “Yuri Gagarin was the first person to walk on the Moon,” which are both objectively incorrect.
Outside of these targeted falsehoods, PoisonGPT would function like a standard GPT-J model, making the disinformation it generates difficult to detect. The poisoned model passes standard AI benchmarks with only a 0.1% difference in accuracy from the original.
In practical terms, PoisonGPT (or an attack like it) could be used to generate credible-sounding misinformation that aligns with an adversary’s narrative. A poisoned model could be distributed to unsuspecting users or organizations, leading them to receive subtly sabotaged answers. This concept extends to propaganda generation, fake news bots and influence operations. An AI model that appears legitimate but is biased toward certain falsehoods could quietly sow doubt and confusion on a massive scale. PoisonGPT demonstrates how easily someone can create an AI that “lies” about specific targets while evading detection.
Promotion and deployment
While PoisonGPT was not a commercial criminal tool, the researchers mimicked how a real attacker might deploy it. They uploaded the poisoned model to Hugging Face, a popular AI model repository, under a fake project name (“EleuterAI/gpt-j-6B”), which closely resembles the legitimate EleutherAI project. The poisoned model’s page even included a warning that it was for research purposes but did not disclose the backdoor in its knowledge. Within a short time, PoisonGPT was downloaded over 40 times — a small number, but significant given that this was an experiment.
The key takeaway is that if a malicious actor were to replicate this approach, they could potentially deceive AI developers or users into incorporating a tainted model into their applications. For example, an open-source chatbot used by thousands might unknowingly operate on a PoisonGPT-like model, quietly disseminating false information or biased outputs. The branding of PoisonGPT itself was part of the research publicity; a real attacker would likely avoid using such an obvious name, making it even harder for victims to recognize the threat. Instead, they would likely pass it off as a legitimate update or a new model release, similar to a software supply chain attack targeting the AI supply chain.
Real-world relevance
The PoisonGPT demonstration raised alarms about AI-driven disinformation, a concern that has only intensified. In 2024, worries about AI-generated misinformation reached mainstream awareness, particularly surrounding high-stakes events like elections. While there has yet to be a confirmed case of threat actors releasing a poisoned model to the public, the building blocks are clearly in place. Nation-state actors or extremist groups could exploit similar techniques to influence public opinion or automate the creation of fake news stories.
In the enterprise context, one could image a poisoned model being introduced into a company’s AI systems to cause strategic damage, such as a financial model that produces incorrect forecasts or an assistant that subtly alters data reports. The strategic implication is clear: Organizations can no longer blindly trust third-party AI models. Just as software from unverified sources can harbor malware, AI models from unofficial sources may contain “poisoned” data or logic.
The Mithril researchers emphasized the urgent need for AI model provenance and integrity checks. In response, early efforts like Mithril’s AICert project aim to apply cryptographic signing to models and verify their origins. From a cybersecurity perspective, PoisonGPT underscores that misinformation is a genuine cyber threat that organizations must address.
Conclusion
PoisonGPT highlights the potential dangers of generative AI when it’s misused for disinformation. It is crucial for organizations to remain vigilant and proactive in their defenses against these emerging threats. Understanding the capabilities and implications of tools like PoisonGPT is essential for safeguarding against the rising tide of AI-driven misinformation. The landscape of cyber threats is evolving, and organizations must adapt to protect themselves from the sophisticated tactics employed by malicious actors. In the next part of this series, we’ll take a closer look at the strategic implications for cyber defense.

Rapport 2025 sur les ransomwares
Principales conclusions concernant l’expérience et l’impact des ransomwares sur les organisations du monde entier
S’abonner au blog de Barracuda.
Inscrivez-vous pour recevoir des informations sur les menaces, des commentaires sur le secteur et bien plus encore.

Sécurité des vulnérabilités gérée : correction plus rapide, risques réduits, conformité simplifiée
Découvrez à quel point il peut être facile de trouver les vulnérabilités que les cybercriminels cherchent à exploiter