DeepSeek V4——几乎达到前沿水平，价格仅为其中一小部分。

DeepSeek V4——几乎达到前沿水平，价格仅为其中一小部分。
DeepSeek V4–almost on the frontier, a fraction of the price

原始链接: https://simonwillison.net/2026/Apr/24/deepseek-v4/

## DeepSeek V4：强大且经济实惠的AI模型发布 DeepSeek AI 发布了备受期待的V4系列大型语言模型：**DeepSeek-V4-Pro**（1.6万亿参数，490亿激活）和 **DeepSeek-V4-Flash**（总共2840亿，130亿激活）。两者均采用100万token的上下文长度，并以MIT许可证发布，使DeepSeek-V4-Pro目前成为最大的开源模型，超越Kimi K2.6和GLM-5.1。初步测试显示，图像生成能力与之前的DeepSeek模型相当。但关键亮点在于**定价**：DeepSeek V4比OpenAI的GPT-5.4和Google的Gemini模型便宜很多。V4-Flash是最经济的小模型，而V4-Pro在大型模型中具有最高的性价比。这种经济性源于DeepSeek对效率的关注。与V3.2相比，V4模型在处理长上下文提示时所需的计算能力和内存要少得多——KV缓存最多减少90%。虽然基准测试显示性能略逊于当前领先者GPT-5.4，但DeepSeek声称其“Pro-Max”变体具有更强的推理能力。量化版本预计很快发布，可能实现在消费级硬件上本地运行。

## DeepSeek V4：强大且经济实惠的AI DeepSeek V4 Pro 正在受到关注，它是一款可能达到前沿水平的语言模型，价格远低于 Claude Opus 等竞争对手。用户报告称，其性能可与 Claude Opus 4.6 相媲美，尤其是在分析大型代码库等复杂任务上——只需 0.09 美元即可获得详细摘要。目前，在 2026 年 5 月之前，它提供 75% 的折扣。然而，一些用户对与 Opus 的直接比较存在争议，并且对数据隐私存在担忧，特别是模型可能利用用户数据进行训练（尤其是在直接从开发者处使用模型时）。讨论还围绕模型的效率和补贴定价的影响。虽然目前价格非常实惠，但关于长期成本以及在补贴结束后是否能保持 3 倍的价格优势，仍然存在疑问。一些用户正在探索通过 OpenRouter、AWS Bedrock 和 pi.dev 等平台进行访问，而另一些用户则渴望拥有本地可运行的版本。

原文

24th April 2026

Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash.

Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They’re using the standard MIT license.

I think this makes DeepSeek-V4-Pro the new largest open weights model. It’s larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B).

Pro is 865GB on Hugging Face, Flash is 160GB. I’m hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It’s possible the Pro model may run on it if I can stream just the necessary active experts from disk.

For the moment I tried the models out via OpenRouter, using llm-openrouter:

llm install llm-openrouter
llm openrouter refresh
llm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'

Here’s the pelican for DeepSeek-V4-Flash:

Excellent bicycle - good frame shape, nice chain, even has a reflector on the front wheel. Pelican has a mean looking expression but has its wings on the handlebars and feet on the pedals. Pouch is a little sharp.

And for DeepSeek-V4-Pro:

Another solid bicycle, albeit the spokes are a little jagged and the frame is compressed a bit. Pelican has gone a bit wrong - it has a VERY large body, only one wing, a weirdly hairy backside and generally loos like it was drown be a different artist from the bicycle.

For comparison, take a look at the pelicans I got from DeepSeek V3.2 in December, V3.1 in August, and V3-0324 in March 2025.

So the pelicans are pretty good, but what’s really notable here is the cost. DeepSeek V4 is a very, very inexpensive model.

This is DeepSeek’s pricing page. They’re charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.

Here’s a comparison table with the frontier models from Gemini, OpenAI and Anthropic:

Model	Input ($/M)	Output ($/M)
DeepSeek V4 Flash	$0.14	$0.28
GPT-5.4 Nano	$0.20	$1.25
Gemini 3.1 Flash-Lite	$0.25	$1.50
Gemini 3 Flash Preview	$0.50	$3
GPT-5.4 Mini	$0.75	$4.50
Claude Haiku 4.5	$1	$5
DeepSeek V4 Pro	$1.74	$3.48
Gemini 3.1 Pro	$2	$12
GPT-5.4	$2.50	$15
Claude Sonnet 4.6	$3	$15
Claude Opus 4.7	$5	$25
GPT-5.5	$5	$30

DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI’s GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models.

This note from the DeepSeek paper helps explain why they can price these models so low—they’ve focused a great deal on efficiency with this release, especially for longer context prompts:

In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2.

DeepSeek’s self-reported benchmarks in their paper show their Pro model competitive with those other frontier models, albeit with this note:

Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.

I’m keeping an eye on huggingface.co/unsloth/models as I expect the Unsloth team will have a set of quantized versions out pretty soon. It’s going to be very interesting to see how well that Flash model runs on my own machine.

DeepSeek V4——几乎达到前沿水平，价格仅为其中一小部分。 DeepSeek V4–almost on the frontier, a fraction of the price

DeepSeek V4——几乎达到前沿水平，价格仅为其中一小部分。
DeepSeek V4–almost on the frontier, a fraction of the price