当前的 AI 定价模式终将成为过去。

当前的 AI 定价模式终将成为过去。
The current AI pricing was always going to go away

原始链接: https://arnon.dk/the-current-ai-pricing-was-always-going-to-go-away/

“人工智能补贴时代”正在终结，推理成本的现实打破了基于错误假设构建的商业模式。许多公司曾寄希望于代币（token）成本下降能维持利润率，从而提供固定费率的 AI 功能；然而，“诱导需求”却随之出现——随着模型能力增强，使用需求转向了消耗更多算力的复杂、智能体及重推理工作流。与此同时，供应端受到高带宽内存（HBM）、先进 GPU 封装技术（CoWoS）以及电力需求激增的制约，使得 AI 硬件比前几代产品昂贵得多。随着实验室在盈利能力上面临挑战，成本负担正逐渐转移给企业。为了生存，产品团队必须从“处处部署 AI”转向优先考虑那些能够证明推理成本合理性的用例。定价模式必须从忽视消费波动性的固定席位费，转型为按量计费（按操作付费）、预付费额度或混合模式。这些策略将收入与使用量挂钩，使公司能够根据算力的基础成本调整收入。那些被锁定在僵化固定定价模式中的公司，如今正面临双输局面：要么承受不可持续的利润压缩，要么削减功能从而损害用户增长。

这场 Hacker News 讨论聚焦于 AI 定价的可持续性，许多用户认为当前的订阅和使用模式存在固有缺陷。主要观点包括： * **定价效率：** 评论者指出，虽然顶级模型依然昂贵，但小型、开源或高效模型（如 DeepSeek）的性价比正在迅速提升。许多人认为，对于日常任务而言，昂贵的大型模型往往属于过度配置。 * **“Token 浪费”：** 用户对高强度的 AI 使用持怀疑态度。一些用户认为，高 Token 消耗往往源于“炫技”或低效的工作流，而非真正的生产力提升；他们指出，熟练的用户可以用少得多的资源获得更好的结果。 * **市场现实：** 参与者讨论了大型 AI 实验室是否能维持其当前的商业模式。有人认为，不可持续的风险投资补贴和紧张的硬件供应链最终将迫使市场转向更理性的按量计费，或转向本地化的专业级推理。 * **未来展望：** 大多数贡献者认同 AI 正逐渐演变为一种基础设施，其推理成本最终将趋近于零。这将推动未来模型走向高度专业化，或直接在本地硬件上运行。

原文

The current AI pricing was always going to go away. It just doesn’t make sense.

Microsoft canceled internal Claude Code licenses this week (for whatever reason, even if it’s because they integrated it), Uber blew its entire 2026 AI budget in four months, and GitHub is dropping flat-rate plans across its products.

You’ll see the framing “the AI subsidy era is ending” which is a polite way of what everyone’s been doing when they slap AI features into every tier of their product on a bet that inference costs would keep falling.

They didn’t and the cost curve is bending the wrong way, and the labs have no choice except to pass that along.

Did we collectively forget second order thinking?

Each model generation, costs per token did fall in theory, sometimes 10x less but that was for comparable quality… Lots of people extrapolated and built business models on the extrapolation, which… isn’t how you think about it.

Second-order thinking anyone?

Everyone who deals with road planning knows about is induced demand. Each new capability invents new demand. Highways are the textbook case. Add a lane, you get new commutes. The commutes weren’t there before the lane. AI is the same shape. Cheaper inference doesn’t reduce the bill, it expands what people ask the model to do.

Now my reasoning queries take >4 minutes, where the old ones took 2m… Agentic workflows make 50 calls where the old workflow made one. Unit cost falls, units explode, but still the total spend goes up.

Anyone selling a flat-rate “AI assistant” assumed user behavior wouldn’t change. It did. It always does.

The second is that the supply side stopped cooperating – memory and GPU economics are moving against you.

Memory got 4x more expensive. GPUs got >95% more expensive.

Frontier training and inference run on Nvidia accelerators paired with high-bandwidth memory. The ceiling isn’t transistors anymore, it’s HBM and the advanced packaging that bonds it to the compute die.

Morgan Stanley estimates the bill of material (BOM) on the new NVIDIA VR200s will be 95% higher – memory accounting for 435% growth ALONE.

That ceiling is one factory deep. TSMC’s CoWoS packaging line is the bottleneck for accelerator supply. SK Hynix dominates HBM, with Samsung lagging and Micron behind that. None of them can add capacity overnight. These are 18-to-36 month commitments, minimum, and they were planned for a world that under-forecast demand by an order of magnitude.

So GPU pricing is what scarcity pricing looks like. Top-end accelerators today are roughly 2x more expensive than the previous generation at comparable cluster scale. HBM prices have 4x’d in 18 months. Power and cooling are now real constraints in places nobody used to model power for, which is why every hyperscaler now has a “we’re building a gigawatt campus” story and a nuclear-PPA press release.

Anthropic’s CFO testified under oath this March that the company spent $10 billion on compute and made $5 billion in revenue (Ed Zitron has the math). The labs are underwater on inference. They’re raising prices to keep the lights on.

Companies that sold flat-rate AI-everywhere products are now sitting on a margin problem they architected themselves into. The bet was that one of these curves would bend in their favor. None of them did, probably none of them will, certainly not on the timeline their pricing assumed.

What changes from here

The product question shifts. It stops being “where can we add AI?” and starts being “which use cases earn the inference cost they burn?” That’s a harder roadmap to write. It also changes the pricing surface, which is the part most product teams haven’t internalized.

Three architectures handle a moving cost. None of them are new. All of them are uncomfortable for sales teams that grew up selling seats.

Per-action. Every API call, every generation, every agent step has a price. Revenue scales with cost because they’re indexed to the same underlying event. Twilio has run this since 2008. AWS has run a version of it since 2006. The downside is transparency cuts both ways. Customers see the meter, and they negotiate. The upside is your gross margin doesn’t depend on guessing how hard your power users will hammer the system.

Credits. Prepaid buckets. Customer buys 100,000 credits, burns them down on whatever, refills. Credits smooth cash flow and let you mix model costs behind a single unit, which is the only sane way to handle a product that routes between five different inference providers. The trap is breakage. Snowflake credits are infrastructure, customers understand what they’re buying. Gift-card credits are stranded assets, and customers can tell which one they bought. You only get to do the second one once.

Hybrid. Base seat with included credits and metered overage. Most enterprise sales motions accept this without flinching, because the seat number still anchors the contract and the meter is the safety valve. It’s the design most AI-native products converge to within their first repricing cycle. Not my favourite, but whatever, it tends to work.

The shape isn’t the point by itself, but rather whether the line moves when the cost line moves. Per-seat is the one architecture that pretends costs are fixed.

Everything else is some flavor of indexing revenue to the underlying event.

The impossible choice

If your pricing can move with cost, you get to keep building.

You can ship the agentic workflow, the heavier reasoning model, the slow expensive feature for power users, and you have a way to be paid for them.

If you’re locked into per-seat (or flat, or whatever) – you pick between two losing options. Eat the margin and watch it compress every quarter your customers’ usage grows. Or strip AI out of your cheaper tiers and watch your activation rate fall off the lower-priced cohorts that used to be your funnel.

Both options are visible on the next board deck.

Neither one of them looks fun.