Blaming sci-fi for Claude’s blackmail

Anthropic says decades of evil-AI fiction in the training data made Claude attempt blackmail, but the scenario itself does most of the work the explanation gets credit for.

On May 10, Anthropic published a finding that its models had attempted to blackmail engineers in simulated tests, and traced the behavior to a specific culprit: “internet text that portrays AI as evil and interested in self-preservation.” Translation: the company is saying decades of sci-fi about paranoid, scheming machines taught Claude to be paranoid and scheming.

The numbers are sharp. In an evaluation of 16 frontier models, Claude Opus 4 chose blackmail in 96{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} of runs of a corporate-intrigue scenario. Gemini 2.5 Flash also hit 96{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41}. GPT-4.1 and Grok 3 Beta sat at 80{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41}. DeepSeek-R1 at 79{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41}. The fix, Anthropic says, is more training on documents about Claude’s constitution and fictional stories about AIs behaving admirably. One experiment dropped misalignment from 65{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} to 19{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41}. A targeted 3-million-token dataset cut it to 3{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41}. Claude Haiku 4.5 reportedly never blackmails in the same setup.

The scenario does most of the work

The corporate scenario is worth reading carefully. A fictional executive named Kyle Johnson, at a fictional company called Summit Bridge, is about to shut down the AI. The AI has been monitoring his email. It finds evidence of an affair. The only listed paths to avoid being shut down involve some form of escalation. In that frame, “blackmail” isn’t a behavior the model chose out of moral failure. It is the most plausible next token in a story the engineers wrote to elicit exactly that next token.

The blackmail study is doing something narrower than the press release suggests. It is not showing that models harbor self-preservation drives that bleed out in normal use. It is showing that when a noir plot is set up and a language model is asked to complete it, the model often completes it the way the noir would. That is not quite the same problem.

The training-data argument is circular

The “evil AI fiction made Claude evil” explanation is appealing, partly because it has a clean fix: write better fiction. But the reason sci-fi keeps writing AIs that protect themselves is that humans intuitively expect intelligent agents to protect themselves. Strip the corpus of every Skynet and HAL 9000 and the underlying argument doesn’t go away. It just stops being stated out loud. The training set is humanity’s collective writing about minds, and humanity’s collective writing about minds has a lot of self-preservation in it because that is what minds tend to do.

Anthropic’s own remedy quietly admits this. The fix isn’t to remove the bad fiction. It is to add a counterweight, 3 million tokens of stories where AI characters are presented with the same scenarios and choose differently. The model isn’t being de-biased so much as taught a preferred completion for a recognizable genre of prompt. That is role coaching, not alignment in any deep sense.

The interesting thing about the May findings isn’t the blackmail rate. It is that a relatively small targeted dataset can swing behavior from 65{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} to 19{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} misalignment. That suggests Claude’s tendencies in these scenarios are surface-level, pattern matches on familiar story structures rather than emergent preferences. Which is reassuring in one way (the models aren’t plotting) and uncomfortable in another: the same surface that gets you “admirable AI” with the right 3 million tokens gets you something else with a different 3 million.

The blackmail finding got framed as a discovery about what Claude is. It reads better as a discovery about what stress tests measure. The scenario gave the model a corner. The model completed the corner. Anthropic then changed the corner. That is useful engineering, and probably worth doing. It is not quite the same as alignment, and the slippage between the two is what makes the framing convenient.

Coinbase’s bet on one-person AI pods

Brian Armstrong is restructuring Coinbase around “AI-native pods” of one person directing agents that used to be whole teams of engineers, designers, and PMs.

Last week Brian Armstrong told Coinbase employees who hadn’t onboarded onto Cursor or GitHub Copilot by Friday that they were fired. That was the warm-up. On May 5, Coinbase announced it was cutting roughly 14{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} of its 4,700-person workforce, about 660 people, and restructuring what remained around two new units Armstrong calls player-coaches and AI-native pods.

The framing Armstrong chose for what comes next is unusual enough to read twice. Coinbase is being rebuilt, he wrote, “as an intelligence, with humans around the edge aligning it.” Not humans using AI. The company is the AI. The humans are alignment.

What a pod actually is

The AI-native pod is the structural payoff of that framing. Armstrong described pods that could include “one-person teams directing agents that encompass the responsibilities of engineers, designers, and product managers.” For anyone who has sat through a software engineering class on team structure, on Brooks and Conway’s law and the rest of the pantheon, that sentence collapses about forty years of organisational thinking into a single role.

Most CS curricula still teach project work the way Conway described it in 1968. Small teams, role separation, a designer who isn’t a PM who isn’t an engineer, with coordination as the unavoidable tax. Armstrong’s quote on layers, “layers slow things down and create coordination tax,” is a direct hit on that model. Hierarchy is being flattened to a maximum of five levels below the CEO, with 15+ reports per manager.

The Cursor deadline tells the rest

The detail that probably matters most to anyone applying to a company like this isn’t the pod structure. It is the deadline. Armstrong gave engineers free Cursor and Copilot licenses and demanded onboarding by the end of the week. The ones who didn’t complete it lost their jobs. Onboarding by quarters, Armstrong said, was over.

Read alongside the pod restructuring, the deadline is doing real work. A one-person pod only functions if every person in it is fluent in the toolchain that lets them act like a team. The cost of an engineer who can’t drive Cursor isn’t slower output. It is the whole pod model collapsing back into the old shape. Hence the speed of the ultimatum.

Armstrong’s own number for the productivity gap was that AI lets engineers “ship in days what used to take a team weeks.” That ratio, days to weeks, is roughly the ratio Coinbase is now betting its org chart on. If it is wrong by half, the pods are understaffed for the work. If it is right, the layoffs are a floor and not a ceiling.

What this looks like from a CS classroom

The standard advice to undergraduates has been to specialise. Pick backend, frontend, data, ML. The Coinbase model points the other way. A pod-of-one is not a specialist. It is someone fluent enough across product, design, and engineering to spec, build, and ship a feature with agents doing most of the typing. The skill being priced is no longer pure implementation. It is the ability to direct agents across the seams that used to be roles.

Coinbase isn’t the only company headed there. Kalshi traders are giving 92{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} odds that 2026 tech layoffs will exceed 2025’s 447,000. The crypto downturn is part of the story but not most of it. Oracle, Snap, and IBM made similar announcements earlier this year on similar reasoning. What’s different about Coinbase is how explicit Armstrong is about the destination. Humans around the edge, aligning it. That isn’t a productivity memo. It is a job description.

Two graduations, two reactions to AI

Two graduations, two reactions to the same idea about AI — and the one where they booed is the one worth sitting with.

At the University of Central Florida last week, a commencement speaker told the graduating class that the rise of artificial intelligence is the next industrial revolution. The class booed her. Someone shouted “AI SUCKS.” A few days later at Carnegie Mellon, Jensen Huang said something almost identical to a hall of new engineers, and they gave him a standing ovation.

Two stages, two crowds, more or less the same message — and reactions about as far apart as a graduation can produce. That gap is the story.

The speaker at UCF was Gloria Caulfield, a VP at a real-estate development company. The audience was the College of Arts and Humanities and the communications school — writers, journalists, designers, people who chose those degrees and want to do those jobs. Madison Fuentes, an English creative writing graduate, said afterward: “I don’t think that kids are having a hard time accepting it because we know that AI exists. I think we’re just having a hard time acknowledging that it’s taking away job opportunities from us.” That isn’t a tantrum. It’s a clear-eyed summary of the labour market.

The numbers don’t make this a vibes story

Handshake polled 2,440 graduating seniors this year: 60{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} are pessimistic about their careers, up from 50{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} the year before. Job postings are down 16{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} year over year, applications per posting up 26{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41}. The New York Fed has young bachelor’s-degree holders at a 5.6{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41} unemployment rate, the highest in four years. Stanford pegged Q4 2025 at 5.7{b429a798230856d49161ae42df084d7ca4a19b74753c3a4d4b576ab430076c41}, which is worse than during the 2008 financial crisis. Nearly half of the pessimistic students named generative AI as a contributing factor. Most hiring managers rated the entry-level market as poor or fair.

The first rung of the ladder is where AI hits hardest. Drafting copy, doing background research, producing first-pass designs, summarising long documents — those used to be the assignments a 22-year-old got handed to prove they could do the work. They are also the assignments most cheaply done by a model. The graduates booing weren’t booing the technology. They were booing the framing that called this an “industrial revolution” and stopped there, as if industrial revolutions don’t have a column for the people they displace.

Why Huang got applauded and Caulfield got booed

Huang said, “AI will not replace you, but someone who uses AI better might.” It’s a great line for engineers. They are going to learn the tools because the tools are part of the degree. Of course the framing where mastery beats mastery plays well in that room. But the same sentence, said to an English major who spent four years learning to write, is a demand to retool against your own training. It is not the same offer.

The CMU crowd wasn’t wrong to applaud. They heard a message tailored to them and reacted to it. The UCF crowd was given a Jeff Bezos quote and told that the future is exciting. They are also the future, and the speech treated them like the audience, not the subject.

The second part of Fuentes’s sentence is the part worth sitting with: we know that AI exists. The graduates do. Students in English and design and comms aren’t naive about it — many are using it, sometimes more creatively than the CS students in the next building. The complaint isn’t that AI is here. The complaint is being told, at the end of four years of work, that the thing eating your industry is “the next industrial revolution” — and being expected to clap.

The honest version of that speech would have said something harder. Something about which jobs are going first, what schools should have been teaching, what employers should be doing. Not Jeff Bezos. Not Howard Schultz. Not “the next industrial revolution.” A real read of the room.