亚马逊的“Tokenmaxxing”闹剧是 Claude 5 亿美元神秘账单背后的原因吗？

亚马逊的“Tokenmaxxing”闹剧是 Claude 5 亿美元神秘账单背后的原因吗？
Was Amazon's Tokenmaxxing Fiasco Behind Claude's $500M Mystery Bill?

原始链接: https://www.zerohedge.com/ai/was-amazons-tokenmaxxing-fiasco-behind-claudes-500m-mystery-bill

美国企业界正面临一场“代币最大化”（tokenmaxxing）危机，公司将人工智能的使用指标与实际生产力混为一谈。一份近期报告显示，某未具名的企业客户因缺乏使用限额，单月产生了高达 5 亿美元的 Claude 使用账单。虽然该客户身份未公开，但外界猜测指向了亚马逊；该公司近期关闭了一个内部排行榜，该榜单曾鼓励员工通过人工智能代理完成不必要的任务，以虚增使用评分。这一现象体现了古德哈特定律的危险应用：当一个指标成为目标时，它就不再是一个有效的衡量标准。包括 Meta、Uber 和微软在内的科技行业各公司，都在报告类似的“人工智能宿醉”现象，即代币消耗量远超实际产出。该行业正日益被循环经济流所驱动：超大规模云服务商投资人工智能公司，而后者又将这些资金投入云基础设施，与此同时，管理层推行的“人工智能采用率”配额激励着员工进行“指标表演”。归根结底，当前的企业人工智能热潮存在建立在人为需求之上的风险，即以过度的代币消耗误作为进步，从而掩盖了其缺乏真正经济价值的本质。

原文

Axios reported this week that an unnamed Anthropic enterprise client managed to run up roughly $500 million in Claude charges in a single month after failing to put usage limits on employee licenses.

The company was not named, but we suspect Blue Origin might not be the only thing that blew up for Jeff Bezos this month.

Just as the Axios report landed with the $500M tidbit, Amazon was shutting down an internal AI-usage leaderboard after employees reportedly began “tokenmaxxing” - routing unnecessary work through AI tools to inflate their usage scores. The result was a perfect case study in what happens when corporate America turns AI adoption into a metric, then acts surprised when employees optimize for the metric instead of the work.

Whether or not Amazon was the mystery Claude whale, its internal AI experiment shows exactly how a runaway enterprise AI bill can happen.

The $500M Claude Mystery

The Axios item was brief, but extraordinary:;

An AI consultant tells Axios one of their clients recently spent half a billion dollars in a single month after failing to put usage limits on Claude licenses for employees.

So, oops to every CFO who recently approved "AI adoption" as a corporate priority.

In the old software world, when true nerds roamed the land, a bad rollout usually meant paying for licenses employees barely touched. The waste was real, but at least it was mostly static. In the new agentic AI world, a bad rollout - or simply adopting AI for everything - can quickly become devastating: thousands of employees - or autonomous agents operating on their behalf - prompting, testing, summarizing, refactoring, retrying, and spinning up new tasks on usage-based pricing.

That is the heart of the current enterprise AI hangover. Companies spent the past year foisting AI on employees, often without a clean way to separate productivity from dashboard-friendly activity. And now the hangover is here.

Microsoft has reportedly started canceling most Claude Code licenses and steering developers toward GitHub Copilot CLI. Uber reportedly burned through its entire 2026 AI coding-tools budget by April, with COO Andrew Macdonald saying it was “very hard to draw a line” between rising Claude Code usage and useful consumer-facing output. Meta killed an employee-created “Claudeonomics” dashboard after workers competed to rank among the company’s top AI token users.

Amazon’s Tokenmaxxing Fiasco

Amazon’s version of the problem was almost too on-the-nose.

Earlier this month, Financial Times reported that Amazon employees were using MeshClaw, an internal OpenClaw-style AI agent tool, to inflate AI usage metrics. MeshClaw let employees vibecode themselves agents that could interact with workplace systems, including code deployments, email triage, and Slack-style communications.

The company had also been pushing aggressive AI adoption internally. According to the FT, more than 80% of Amazon developers were expected to use AI tools weekly, and internal leaderboards tracked AI usage. Employees reportedly responded by routing non-essential tasks through AI agents in order to boost their token counts.

They even had an internal leaderboard - KiroRank - that issued nerd points (or whatever) to employees who tokenmaxxed. Apparently it didn't take long for them to realize this was a huge mistake - nuking KiroRank after it encouraged some workers to perform tasks that did not necessarily solve customer or business problems, but did help them climb the rankings. Amazon senior vice president Dave Treadwell reportedly told staff: “Please don’t use AI just for the sake of using AI.”

Amazon later emphasized that KiroRank was an informal employee-created tracker, not a formal performance system, and said it was never intended to promote AI usage for usage’s sake. The company also said it still tracks AI token usage to measure costs, but does not encourage tokenmaxxing.

Why Amazon Tops The $500M Suspect List

Start with the obvious: Amazon has one of the deepest strategic relationships with Anthropic of any company on earth.

Amazon announced in April that it would invest another $5 billion in Anthropic, with the possibility of up to $20 billion more tied to commercial milestones, on top of the $8 billion it had already invested. The same announcement said Anthropic had committed to spend more than $100 billion over ten years on AWS technologies.

That makes Amazon more than an ordinary Claude customer. It is an investor, infrastructure provider, distribution partner, and cloud beneficiary of Anthropic’s growth.

Then there's the scale. Reuters reported in February that Amazon projected roughly $200 billion in capital expenditures for 2026, up sharply from 2025, as Big Tech raced to build out AI infrastructure. That level of spending needs demand signals. Internal AI usage is one of those signals.

Then there is the timing. Amazon’s MeshClaw usage controversy surfaced in May. KiroRank was deprecated in late May. Axios’ unnamed $500 million Claude bill appeared at the same moment the industry was waking up to the cost of tokenmaxxing.

So, yeah...

Circle Jerk Intensifies?

The broader issue is not whether Amazon specifically spent $500 million on Claude in one month. The broader issue is that the AI boom is increasingly built on circular flows of money, usage, and valuation.

Hyperscalers invest billions in model companies. Model companies commit to spend billions back on hyperscaler cloud infrastructure. Enterprises push employees to use the tools. Token consumption rises. Rising usage supports higher revenue projections. Higher revenue projections support higher valuations. Higher valuations justify more infrastructure spending.

On paper, it looks like demand. In practice, some of that demand may be employees and agents burning tokens because management told them usage equals progress.

Reuters recently warned that Anthropic’s explosive growth tells only half the story, noting early signs of corporate AI fatigue even as revenue projections and valuation math move higher. The warning is simple: AI demand may be real, but not all usage is economically productive.

Which is a pretty big narrative killer... If a developer uses Claude Code to ship a meaningful feature faster, that is adoption. If an employee routes fake busywork through an autonomous agent to climb a leaderboard, that is not adoption. It is metered theater.

The problem is that both show up as tokens.

There's an old idea in economics called Goodhart’s Law: when a measurement becomes the target, it stops being a useful measurement.

In plain English, if you tell employees they will be judged by a number, they will make the number go up - whether or not the underlying business gets any better.

That's exactly the danger with enterprise AI adoption. Token usage can be a useful internal signal. It can show whether employees are experimenting with tools, whether teams are adopting new workflows, and where demand is rising. But once token usage becomes a scoreboard, it no longer measures productivity. It measures willingness to burn tokens.