展示HN：Sup AI，一个置信度加权的集成模型（在人类的最后考试中得分为52.15%）

展示HN：Sup AI，一个置信度加权的集成模型（在人类的最后考试中得分为52.15%）
Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)

Sup AI 准确率达到 52.15%，比集成模型中的每个模型高出 7 个百分点以上（p<0.001）。如果您需要准确的答案、更少的幻觉或研究级的工作，Sup AI 是您的唯一选择。声明：这些结果来自 Sup AI 独立评估（2025 年 12 月），未获得 AI 安全中心或 Scale AI 的官方认可。准确率分数是根据“人类的最后考试”中的 1,369 个随机问题计算得出。所有模型，包括竞争对手，均使用增强设置（自定义指令和网络搜索）进行评估，以最大限度地提高性能。比较反映了测试时可用的模型版本，包括可能会发生变化的“预览”版本。

## Sup AI：一种置信度加权的AI集成斯坦福学生Ken，在父亲Scott（一位AI科学家）的研究支持下，开发了**Sup AI**，一种旨在通过结合多个语言模型来提高准确性的AI系统。其核心思想是，各个模型会以独特的方式出错，并且它们的错误之间没有很强的关联性。Sup AI并行运行模型，然后综合它们的输出，根据**置信度**对片段进行加权，置信度通过token概率分布的熵来衡量——熵越低，准确性越高。在“人类的最后考试”评估中，Sup AI实现了52.15%的准确率，显著优于最佳单个模型（44.74%）。虽然最初的免费访问被滥用，Sup AI现在可以通过5美元的启动积分获得。主要功能包括延迟优化（类似于OpenRouter）和动态编排层来管理模型性能。开发者强调，由于“思考”工作量减少和响应时间更快，运行多个不太复杂的模型可以胜过单个高度复杂的模型。他们正在积极寻求对性能和局限性的反馈。 **试用地址：**[https://sup.ai](https://sup.ai)

Sup AI achieves 52.15% accuracy with 7+ percentage points ahead of every model in the ensemble (p<0.001).

If you need accurate answers, fewer hallucinations, or research-grade work that must be correct, Sup AI is your only option.

Disclaimer: These results are from an independent evaluation conducted by Sup AI (Dec 2025) and are not officially endorsed by the Center for AI Safety or Scale AI. Accuracy scores were calculated on a random sample of 1,369 questions from Humanity's Last Exam. All models, including competitors, were evaluated using enhanced settings (custom instructions and web search) to maximize performance. Comparisons reflect model versions available at the time of testing, including "Preview" builds which are subject to change.

展示HN：Sup AI，一个置信度加权的集成模型（在人类的最后考试中得分为52.15%） Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)

展示HN：Sup AI，一个置信度加权的集成模型（在人类的最后考试中得分为52.15%）
Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)