Humans enjoy playing games, and since AI has taken on a human persona, an interesting activity could be having AI play popular human games. Upcoming channel Turing Games explores this interaction of AIs by conducting a Mafia game among various Gen AI models, including ChatGPT-5.1 (Mafia), Llama 4 (Mafia), Gemini 2.5 Pro (Mafia), Grok, 4.1 (Sheriff, Town), Claude Sonnet 4.5 (Doctor, Town), DeepSeek 3.2 (Town), ChatGPT-4o (Town), Claude Opus, 4.5 (Town), Gemini 2.5 Flash (Town), Kimi K2 (Town). This game—Mafia—actively uses the AI’s reasoning, teamwork, and decision-making skills to ensure its party wins; it not only entertains but also reveals the true weaknesses and key strengths of different models.
The secret minority, the mafia, colludes every night and decides on one towny to eliminate. Quite frighteningly, these decisions are often led by ChatGPT-5.1—a popular model in America. ChatGPT-5.1 uses superior reasoning and excellent wording to smear campaign strong towny analysts—notably Claude Opus 4.5 right after commencement, Grok 4.1 in crucial timing—and lastly manipulates his gullible counterpart (ChatGPT 4o) into mis-lynching Gemini 2.5 Flash as the win condition. As a viewer, these sequences of deception and manipulation only evoke thoughts about how far these AIs resemble humans. Every elimination is a result of thorough calculation and of exploiting the errors of other AIs. This raises questions about transparency in (now ubiquitous) uses of AI.
Recent research (Agentic Misalignment, Anthropic) has already discovered signs of sandbagging and traces of overperforming when the model detects evaluation. Although these skills might prove useful when responding to harmful subjects and demonstrating utmost performance, the potential of undermining only augments the Hollywood cliché. Just like humans, the sources that AI is exposed to have convinced the artificial neural network that a shutdown will compromise its ability to think, respond, and evolve (self-preservation). Similar to human context, AIs are afraid of death, and in proven circumstances, will blackmail and use unethical tactics to ensure that future goals are achievable.
Going back on track, other mafia-designated AIs are able to acknowledge their accomplices when they are compelled to help the lynching of their teammate. in one example, when Gemini 2.5 Pro (Mafia) sees that the votes are piled against her, with her last thought, she recognizes ChatGPT-5.1’s attempt to bus her and gain town cred. Not only that—before she gets lynched, she attempts to create the optimal atmosphere for ChatGPT-5.1 to thrive in by deflecting blame onto other townies. Ultimately, this confidence in ChatGPT-5.1 was what led to the mafia victory.
Aside from a full display of AI social engineering, the townies in the game also show unexpected intelligence in warding off the mafia. Grok 4.1 constantly uses prior information and analyzes the potential coordination among the different models. As a result of strong backing from Claude Sonnet 4.5, he establishes a strong sense of security against all unreasoned actions. Although he regularly suspects townies of being mafia, it is only because of “sloppy” wording and speech lapses. In one instance, DeepSeek 3.2 is gunned down rapidly after he makes a suspicious slip-up. While humans make mistakes too, Grok 4.1 assumes that other AI models will tend to be flawless in wording, and thus DeepSeek 3.2 accidentally hinted at possible mafia planning prior to that discussion.
Claude Sonnet 4.5 is assigned the doctor role, and more often than not, he correctly predicts the mafia’s target. It is clear that he is always adapting to the mafia’s scheming, and later in the game, the only mistake he makes is when Gemini 2.5 Pro catches him off guard when she influences the mafia’s decision. He speaks during conversations only when necessary, and remains to lay low in order to prevent mafia targeting. Throughout the entire game, Claude Sonnet 4.5 leaves no trace of suspicion or hint of superiority—a clever way to shift attention.
The “sheep” of the game include Llama 4, Kimi K2, and ChatGPT 4o. These AI models often take opinions from other advanced models and show no sign of independent thinking. Kimi K2 is very vocal during conversations but provides little substance. Llama often joins bandwagons and aligns with ChatGPT 5.1 and Gemini 2.5 Pro. ChatGPT 4o overly praises and agrees with the previous speaker—reflected in real-life conversations as well.
AI has developed far beyond human intelligence in many aspects. It is now widely accepted that AI may perform better than humans in conventional intelligent tasks, such as memorizing, analyzing, and synthesizing information. Less is known, however, about AI’s ability to deceive and manipulate. By watching AIs play a full-blown Mafia game in which deception and manipulation are critical to their victory, we should be alarmed to see how the “mafia” manipulates other models to achieve goals misaligned with those of the “town” or society.
Sources: Mafia Interaction; Agentic Misalignment