人工智能基准测试是个笑话——而且大型语言模型开发者才是笑到最后的人。

人工智能基准测试是个笑话——而且大型语言模型开发者才是笑到最后的人。
AI benchmarks are a bad joke – and LLM makers are the ones laughing

原始链接: https://www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/

该网站正在使用安全服务来保护自身免受在线攻击。您刚才的操作触发了安全解决方案。提交特定词语或短语、SQL命令或格式错误的数据等行为可能会触发此阻止。

This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

人工智能基准测试是个笑话——而且大型语言模型开发者才是笑到最后的人。 AI benchmarks are a bad joke – and LLM makers are the ones laughing

人工智能基准测试是个笑话——而且大型语言模型开发者才是笑到最后的人。
AI benchmarks are a bad joke – and LLM makers are the ones laughing