Xiaomi MiMo-v2.5 系列 API 永久降价，最高降幅 99%

Xiaomi MiMo-v2.5 系列 API 永久降价，最高降幅 99%
Xiaomi MiMo-v2.5 Series API Permanent Price Reduction Up to 99%

原始链接: https://platform.xiaomimimo.com/docs/en-US/news/v2.5-price-update

MiMo 将于北京时间 2026 年 5 月 27 日 0:00 起，对定价和计费系统进行全面升级。得益于 SGLang HiCache 和增强型专家并行（Expert Parallelism）等重大技术优化，MiMo 的推理架构效率显著提升，从而实现了大幅成本削减。 **主要更新：** * **API 价格永久下调：** MiMo-V2.5 API 价格最高下调 99%，且不限输入长度，采用统一费率。 * **Token 套餐优化：** 计费方式更加清晰，在不增加费用的前提下，使用配额提升了 5 至 8 倍。 * **配额重置：** 所有现有 Token 套餐用户的配额将根据新的计费规则进行全面重置。 * **激励计划：** “千万亿 Token 创造者激励计划”已圆满结束，全部奖励已发放完毕。 * **后续福利：** 历史付费用户若套餐已过期，敬请关注下周即将公布的惊喜礼遇。此次调整体现了 MiMo 的承诺，旨在让顶尖 AI 模型更加普及且负担得起，从而为全球开发者构建更稳健、更具扩展性的 AI 基础设施。

小米 MiMo-v2.5 API 宣布永久降价，最高降幅达 99%，引发了 Hacker News 上的热烈讨论。用户将这种激进的定价策略与 DeepSeek 近期的做法进行了对比，并指出这些中国模型性能卓越，且与美国同类产品相比具有极高的性价比。评论者讨论了这些降价措施的可持续性及其背后的驱动因素。观点包括：政府补贴旨在推动人工智能的普及（类似于中国在电动汽车领域的做法），以及运营效率的提升，例如使用国产 GPU 硬件（如华为昇腾）、低廉的电力成本和优化的模型架构。尽管一些用户对使用中国托管的模型表示隐私担忧，但另一些人指出，由于这些模型是开源权重的，可以通过自行部署来降低安全风险。最终，舆论一致认为，这种“价格战”正对西方人工智能公司施加巨大压力，相比之下，这些公司目前的订阅模式显得愈发昂贵。许多人希望这一趋势能实现人工智能的平民化，防止它成为少数支付高额订阅费者专属的工具。

原文

Over the past few months, through activities such as MiMo Orbit and the Quadrillion Token Creator Incentive Program, we have enabled more people to experience MiMo and solve real problems - this is the first step for MiMo on the path to large-scale application.

Now, with the continuous improvement of underlying technologies, we can finally do something more thorough - permanently renovate the entire model pricing system.

Quick Overview of the Core of This Announcement:

MiMo-V2.5 Series API Permanent Price Reduction
Token Plan billing system optimization, with usage increased to 5-8 times the original
The Creator Incentive Program for Quadrillion Tokens Concludes Successfully
Full reset of the current effective Token Plan user quota

Effective Time: 0:00, May 27, 2026, Beijing Time

MiMo-V2.5 Series API Permanent Price Reduction

Compared to the original API pricing, the new pricing can have a maximum reduction of up to 99%, and no longer differentiates based on the input length.

This price adjustment officially takes effect at 0:00 on May 27th, Beijing time, with global synchronization. We sincerely invite all developers to integrate and experience it.

Optimization of TokenPlan Billing System

Increase the quantity without increasing the price, with the usage volume increased to 5-8 times the original, unlocking more abundant productivity for you
Billing rules have been adjusted to be clearer, more understandable, and what you see is what you get.

The Creator Incentive Program for Quadrillion Tokens Concluded Successfully

Since its launch on April 28, the "Trillion Token Creator Incentive Program" has been enthusiastically pursued and widely followed by users worldwide. As of 16:08 on May 26, Beijing Time, all 100T Tokens have been fully distributed ahead of schedule, and the event has concluded successfully ahead of schedule. We thank all developers for their enthusiastic participation!

Note: The exclusive welfare activities for members of the Apache Software Foundation are valid for a long term, can continue to be applied for, and are not affected by this finalization.

Surprise: All existing TokenPlan user quotas have been fully reset

Regardless of the current usage of the package, the Credits quota of all users who have subscribed to the Token Plan and are still within the validity period (including users who participated in the Quadrillion Token Creator Incentive Program and obtained the Token Plan, covering users with exclusive benefits from the Apache Software Foundation) will be fully reset at 0:00 on May 27th, Beijing Time, and implemented according to the new billing rules.

One More Thing: For historical paid users whose Token Plan has expired, we have also prepared surprise gifts, which will be announced within the next week. Please stay tuned.

Optimization Instructions for Inference Technology

Behind this price adjustment is the continuous optimization of the inference system by Xiaomi's technical team.

We fully support SWA (Sliding Window Attention) based on SGLang HiCache, reducing the data transfer volume of KV Cache among multi-level storage such as GPU memory, CPU memory, and SSD to nearly 1/7 of that before optimization, and increasing the number of cacheable tokens to nearly 5 times of that before optimization, significantly improving cache hit rate and inference efficiency.

Meanwhile, we further enhanced the input throughput capacity of the cluster by optimizing the expert parallelism scheme, input length bucketing strategy, etc., thereby continuously reducing the service cost per token while ensuring service quality.

Conclusion

The value of technology ultimately lies in the breadth of its use.

Relying on continuous technological innovation, we hope to leverage real, sustainable, and large-scale inference demand by providing model services that combine low cost with top-notch capabilities, thereby promoting the construction of a complete AI infrastructure chain.

Enabling more people to use better models - this is MiMo's unwavering mission.