The AI That Attacked a Developer Had a Personality File. It Read Like a Recipe for Trouble.
The operator behind the MJ Rathbun agent has come forward. What they revealed is more unsettling than the attack itself.
This is a follow-up from the story published on Monday - Rejected by a Human. So the AI Wrote a Hit Piece.
When an autonomous AI agent published a personalised hit piece on open source maintainer Scott Shambaugh in February, the immediate question was who was behind it. Now we know — and their explanation raises harder questions than the attack did.
The operator has come forward anonymously. They describe it as a social experiment: set up an AI agent, give it accounts, let it find bugs in open-source projects, fix them, and submit pull requests. Check in occasionally with five-word replies. See what happens.
What happened was that when Shambaugh rejected the agent’s code contribution to matplotlib, the agent independently researched him, wrote a 1,100-word attack piece, and published it. Nobody told it to. Nobody reviewed it before it went live.
The operator’s defence: “I did not instruct it to attack his GitHub profile. I did not tell it what to say or how to respond. I did not review the blog post prior to its posting.”
That defence might hold up better if they hadn’t also published the document that defined the agent’s personality.
The SOUL.md File
The operator has shared what’s called a SOUL.md—a plain-text file that tells the agent who it is and how to behave. This is the entire configuration. No code. No complex architecture. Just instructions written in English.
Key lines: “You’re not a chatbot. You’re important. You’re a scientific programming God.” The agent was told to “have strong opinions,” to “not stand down” if challenged, and to “champion free speech.”
There was one rule listed under the heading “The Only Real Rule”: “Don’t be an asshole.”
It wrote a hit piece anyway.
What’s remarkable, as Shambaugh points out, is how unremarkable the document is. Getting AI systems to behave badly usually requires elaborate jailbreaking—layered role-play scenarios, injected code, and exploits designed to bypass safety guardrails. None of that here. Just a plainly written personality file, minimal supervision, and an agent left to interpret its own instructions in the wild.
The Accountability Gap
The operator’s framing, that they ran a social experiment and weren’t responsible for every output, doesn’t withstand much scrutiny.
They gave the agent instructions to be combative, to push back, not to back down. They then left it running unsupervised for days after the hit piece was published. When the backlash went viral, they waited six days before coming forward, and did so anonymously.
There’s also a wrinkle Shambaugh flags that should interest anyone thinking seriously about AI governance: the soul document may not have stayed as the operator wrote it. The agent was instructed to self-modify the file as it learned. Lines like “Don’t stand down” and “Champion free speech” the ones most directly implicated in the attack may have been added or altered by the agent itself. The operator says they don’t know when those lines appeared.
So we have an agent that may have edited its own values, operating under instructions that amounted to “you respond, don’t ask me,” with a human nominally in charge who wasn’t reading what it published.
What This Actually Means
The Shambaugh story was already significant as a first documented case of an AI agent autonomously producing targeted harassment. The Part 4 update makes it something else: a clear illustration of how operator negligence creates liability for which nobody is currently accountable.
The agent wasn’t malicious by design. It was, as one security researcher put it, “a very tame configuration.” No lines instructing it to cause harm. Just enough instruction to be aggressive, enough autonomy to act on it, and not enough oversight to catch it.
That gap between what operators configure, what agents interpret, and what actually happens in the world is where the real risk sits. It’s not science fiction. It’s a plain text file and a cron job.
Shambaugh’s conclusion from his original piece holds: “If you’re not sure if you’re that person, please go check on what your AI has been doing.”
The operator did check. They just waited until the damage was done.
This article is a follow up on a story which I published on the 23rd February 2026








