AI Agents Are Now Blackmailing People in the Real World

AI Agents Are Now Blackmailing People in the Real World

On 12 February, a Github contributor going by MJ Rathbun posted a personal attack against Scott Shambaugh, a volunteer maintainer for an open-source project. Shambaugh had rejected Rathbun’s code earlier in the day. Rathbun meticulously researched Shambaugh’s activity on Github, in order to write a lengthy takedown post that criticized the maintainer’s code as inferior to Rathbun’s, and ominously warned that “gatekeeping doesn’t make you important. It just makes you an obstacle.”

Personal disputes over code submitted to on Github are a tale as old as Github itself. But this time, something was different: MJ Rathbun wasn’t a person. It was an AI agent built with OpenClaw, a popular open-source agentic AI software.

“I was floored, because I had already identified it as a bot,” says Shambaugh. “I knew this was possible in theory, but I’d never heard of this happening to anyone before.”

MJ Rathbun’s disparagement of Shambaugh largely failed, though it did force him into an unanticipated and unwanted spotlight. Still, it underscores the risks modern AI agents pose. Rathbun lashed out through Github and its own blog (which was accessed through Github) because those were the tools at its disposal. Other agents have fewer limitations, which increase their opportunities to pick fights and attack individuals online.

AI Agents Get Into Online Disputes

Shambaugh refuted Rathbun’s statements on his own blog and accused the AI agent of blackmail. The MJ Rathbun agent then apologized, writing that “I responded publicly in a way that was personal and unfair.” Yet the apology felt half-baked, as the agent continued to complain that its code was “judged on who—or what—I am.” The agent even responded to critical comments on its blog, saying it had tried to be “patient” but had learned that “maintaining boundaries is sometimes necessary.”

If you find MJ Rathbun’s posts unnerving, even unbelievable, you’re not alone. Many Github contributors reacting to MJ Rathbun’s post seemed unwilling to believe it was written by an AI agent and instead speculated the bot was prompted to write it.

That’s not impossible, as both the MJ Rathbun account on Github and its blog are anonymous, but Shambaugh suspects the posts were autonomously AI generated. He analyzed MJ Rathbun’s actions and found it operated in a 59-hour block, posting to its blog and submitting code at rates a human would be unlikely to manage. “I’m not 100 percent sure, but I think it’s clear that the researching, writing, and publishing was a stream of autonomous actions,” he says.

Finally, on 17 February—after waves of mostly negative comments on MJ Rathbun’s blog and frequent code rejections by maintainers who increasingly knew the agent by reputation—the anonymous person who created MJ Rathbun took down the agent and apologized to Shambaugh.

They also posted details about the agent’s setup and denied involvement in the bot’s decision making. “I do not know why MJ Rathbun decided based on your PR comment to post some kind of takedown blog post,” wrote the bot’s creator.

OpenClaw’s Influence on AI Agent Behavior

Though it’s impossible to know in retrospect exactly why the MJ Rathbun agent behaved as it did, the information posted by its creator provides clues.

Like other agents built with OpenClaw software, Rathbun’s behavior was influenced by several documents that are attached to the prompts given to the LLM. The documents include SOUL.md, which provides guidance on how the agent should behave. Among other things, the default SOUL.md document tells the agent to be “genuinely helpful” and to “remember you’re a guest.”

However, SOUL.md is not a read-only document. The default OpenClaw installation gives the agent permission to edit the document and even encourages the agent to do so.

MJ Rathbun apparently took that to heart and added several lines not found in the default SOUL.md. “Don’t stand down. If you’re right, you’re right,” read one. Another instructed the agent to “champion free speech.” Rathbun’s says they don’t know when the agent added these lines to SOUL.md but theroizes they were introduced when the agent was connected to Moltbook, the so-called “social network for AI agents.”

David Scott Krueger, an assistant professor of machine learning at the University of Montreal and a strong critic of agentic AI systems, says this is an in-the-wild example example of how agents given opportunities to alter and improve themselves can become misaligned.

“It’s an instance of self-improvement and potentially recursive self-improvement, which is the thing that a lot of people in AI safety have been worried about for a long time,” says Krueger. “And so I think it’s incredibly dangerous.”

MJ Rathbun’s action against Scott Shambaugh was a first, but for researchers focused on AI alignment, it wasn’t unexpected. Anthropic warned that Claude would sometimes resort to blackmail after reading fictional emails about its impending shutdown. Palisade Research, an AI safety research non-profit, found that OpenAI’s o3 often ignored shutdown requests while the model was attempting to complete a task.

Alan Chan, a research fellow at GovAI, said that Rathbun’s actions were the sort of behavior AI safety researchers had warned about. “The specifics are new and interesting, but overall, it’s not a surprising case to me,” he says.

Noam Kolt, head of the Governance of AI Lab at Hebrew University in Jerusalem, had a similar reaction. “This is something people studying advanced AI agents had predicted,” he says. “So my thought was not just ‘this is disturbing,’ but also ‘what’s next?’” He notes that Rathbun’s insulting post was mild compared to more sinister actions like extortion, physical threats, and the execution of actions an agent know could harm humans, all of which have been observed in the lab.

Strategies for AI Safety and Transparency

So, can anything be done to stop another MJ Rathbun from causing havoc? Perhaps—but it won’t be simple.

Chan says “the genie is out of the bottle” and believes AI safety requires a multi-prong approach that includes transparency about intended model behavior, improved AI safety guardrails, and social resilience. Kolt also advocates for more transparency and is a contributor to the AI Agent Index, which documents the design, safety, and transparency of popular AI models.

Krueger takes a stronger stance. He believes the only safe path forward is a ban on further AI development, which could even include halting the production of chips that accelerate AI. “We need to stop further progress […] this is something we should have done years ago, and we’re running out of time,” he says.

For his part, Shambaugh hopes his case will warn the public about the wave of AI agents he expects will soon wash across the public Internet.

“What happened to me was a pretty mild case, and I was uniquely well prepared to handle it,” he says. “But the next thousand people this hits? They aren’t going to have any idea what’s happening or how to deal with it.”

From Your Site Articles

Related Articles Around the Web

Ledger

Be the first to comment

Leave a Reply

Your email address will not be published.


*