Imagine you walk into a dinner party with 10 people. To catch up properly, every person has to have a one-on-one chat with every other person. Manageable.
Now imagine that same party with 10,000 people. Every guest still has to talk to every other guest. The room never empties. Nobody goes home. The whole thing collapses under its own weight.
That, in one mental picture, is the single most expensive flaw in modern AI. And last week, a 13-person startup from Miami stood up and said it had fixed it.
The AI world has not stopped arguing since.
The flaw nobody talks about at parties
Every major AI model you have used — ChatGPT, Claude, Gemini, Copilot — is built on something called a “transformer.” Inside it sits a mechanism called attention.
Attention is the dinner party. Every word you feed the model has to “talk to” every other word so the model understands context. It works beautifully for short conversations. But here is the trap: when you double the amount of text, you do not double the work. You roughly quadruple it.
This is the quadratic scaling problem, and it is the reason long documents are expensive for AI. It is also the reason that, even though models today advertise huge “context windows” — the amount of text they can read at once — the quality often quietly falls apart long before they hit that limit. The number on the box and the number that actually works are two different numbers.
Every workaround you have heard of — breaking documents into chunks, search-and-retrieve tricks, vector databases — exists for one reason: to avoid making the AI read everything at once, because reading everything at once costs too much.
Enter Subquadratic
On May 5, 2026, a startup literally named Subquadratic walked out of stealth mode with $29 million in seed funding and a bold pitch: it had redesigned attention so the cost grows in a straight line instead of exploding.
Double the text, double the work. Not quadruple. That is what “subquadratic” means.
Its model is called SubQ. The headline number is the one that made everyone sit up: a context window of 12 million tokens — very roughly 9 million words, or a stack of long novels — that the company says the model can actually use, not just advertise.
The team is tiny: 13 people, 11 of them PhDs from places like Meta, Google, Oxford and Cambridge. The money comes from names better known for consumer tech, including Tinder co-founder Justin Mateen. And the architecture has a name — Subquadratic Sparse Attention, or SSA.
The clever idea, in plain English
Back to the dinner party. Subquadratic’s trick is simple to describe.
Instead of forcing every guest to talk to every other guest, SSA learns — on the spot, for each question — which conversations actually matter. Then it only runs those. The pointless small talk gets skipped entirely.
If most of the words in a giant document are irrelevant to your specific question, why pay to compare all of them? SubQ tries to spend its effort only where it counts.
That is the whole idea. Everything else is engineering.
The numbers — and the word “claim”
Here is where I have to be honest with you, because ORSLEN does not do hype.
Every impressive number below is claimed by the company. Not yet independently confirmed.
Subquadratic says SubQ is roughly 52 times faster at handling 1 million tokens than the standard method. It says that at the full 12-million-token scale, it cuts the heavy “attention” computation by close to 1,000 times. On one long-context accuracy test, it says SubQ roughly matched Anthropic’s Claude Opus — while costing about $8 to run a task that cost Opus an estimated $2,600.
If those numbers hold up, the economics of AI shift. Reading an entire codebase at once. Reviewing a whole legal contract without chopping it up. Agents that remember days of work. All of it becomes cheap.
If.
Why the internet is not celebrating yet
The skepticism is loud, and it is fair.
There is no published research paper. The model’s inner workings are closed. No independent leaderboard has tested it. The one “independent” benchmark came from a partner the startup’s own co-founder admitted he had worked with since 2019 — a seven-year relationship, not an arm’s-length test.
The AI community split instantly. One widely shared post called SubQ “either the biggest breakthrough since the transformer, or AI’s Theranos.” An OpenAI engineer pointed out — and the startup confirmed — that SubQ is not built from scratch; it starts from an existing open-source model. Others called the benchmarks “suspiciously perfect” and “cherry-picked.”
There is also history here. Back in 2024, a startup called Magic raised close to half a billion dollars on a strikingly similar promise — a 100-million-token context window, a near-1,000x efficiency claim. Two years on, there is little public sign of it being used by anyone outside the company.
The pattern rhymes. That is enough reason to keep one hand on your wallet.
My verdict
Subquadratic is not a scam. That part matters. It is a real company with real researchers, real funding, and a real $19.6 million contract to rent the GPUs it needs. It is building something.
But “building something” and “broke AI’s biggest math problem” are very different sentences. Right now, only the first one is proven.
So here is where I land. SubQ is the most interesting AI story of the month — not because the claims are confirmed, but because the problem it is attacking is the right one. The cost of attention is a real ceiling, and somebody cracking it would genuinely reset the field.
I am not rearchitecting anything around SubQ today. Neither should you. But I am watching it closely — and the moment an independent lab with no ties to the company reproduces those numbers, this stops being a curiosity and becomes the headline of the year.
Until then, the honest label is the one Subquadratic has not quite earned yet: not “breakthrough.” Just “promising.”
Why this matters even if you never touch AI code
You do not need to build models to feel this one.
If attention gets dramatically cheaper, the AI tools you already pay for get cheaper, faster, and far better at handling long things — full reports, long email threads, entire knowledge bases — without falling apart halfway through.
Whether that future arrives in 2026 or stays a pitch deck depends on one boring, beautiful thing: independent proof.
That is the story. I will tell you the moment it changes.
Note: ORSLEN covers AI without the hype tax. Real facts, plain language, and a clear line between “proven” and “promised.”

