The most interesting AI release this month wasn’t the smartest one

If you only read the headlines, May looked like a quiet month for AI. After a frantic spring of every lab racing to claim the smartest model, things suddenly went calm. No new “this changes everything.” No leaderboard getting torn up overnight.

But I think the calm is the story. Because while the frontier took a breath, the releases that did land were about something I find genuinely more exciting than another record score: making these models cheaper, faster, and architecturally smarter rather than just bigger.

Fast and cheap is its own kind of progress

The release that got the most attention was Google’s Gemini 3.5 Flash going generally available — frontier-level intelligence at roughly four times the speed of comparable models, at a price that makes it genuinely usable for the kind of thing students and small projects actually do. It even beats the bigger “Pro” model from a few months ago on coding and agent tasks.

That last detail is the one I keep thinking about. A smaller, faster, cheaper model outperforming the previous flagship isn’t a story about scale. It’s a story about doing more with less — which, as someone still learning where all the compute actually goes, feels like the more impressive engineering problem.

The word that made me sit up: subquadratic

The thing that actually got me, though, was reading that some of the new models are subquadratic.

If you’ve taken an algorithms course, that word means something specific and a little thrilling. The attention mechanism that powers most language models is, roughly, O(n²): double the amount of text it has to consider and you roughly quadruple the work. That quadratic cost is a big part of why long context windows have been so expensive, and why models used to “forget” the start of a long conversation.

So when a lab ships a commercial model with a genuinely subquadratic architecture and a context window measured in the millions of tokens, it’s not just a bigger number. It’s someone going after the actual complexity bottleneck — the O(n²) — instead of throwing more GPUs at it. That’s the kind of fix that makes me want to go read the paper, even if half of it goes over my head.

Why I think this matters more than another benchmark

Here’s my slightly contrarian take: the “smartest model” releases are exciting, but they mostly benefit the people who can afford the smartest model. The efficiency releases — faster, cheaper, longer-context, better architecture — are the ones that quietly decide what the rest of us can actually build.

A model that’s 90% as good but four times faster and a fraction of the cost is, for a student with a laptop and no budget, just better. It’s the difference between an idea I can prototype this weekend and one I file under “maybe when I have a research grant.”

I might be reading too much into one calm month. Maybe the frontier sprint resumes next week and I’ll feel silly for getting excited about plumbing. But the more I learn, the more I suspect the headline-grabbing capability jumps and the unglamorous efficiency work are two halves of the same thing — and that the second half is where a lot of the interesting computer science actually lives.

Either way, I now know what “subquadratic” means outside an exam. That feels like a good month.