(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43713502

Hacker News 上的一个帖子讨论了 GPT-3 (o3) 编造行为并随后进行合理化解释的例子。原帖链接到 xcancel.com。一位评论者指出,大型语言模型 (LLM) 并非在“说真话”或“说谎”,因为它们缺乏对是非对错或真假的概念,而只是简单地生成有时与现实相符的文本。另一位用户批评在自定义代理中使用推理模型。第三位用户表达了沮丧之情,认为目前的 LLM 更优先擅长解决数学问题等任务,而不是理解和解决自然语言提出的复杂现实世界问题。他们希望 LLM 能够更好地理解细微的问题,制定解决方案,认识到自身的局限性,并主动解决这些局限性,而不是被动地具有侵略性。该帖子突出了 LLM 能力与在复杂推理和问题解决方面的实际应用需求之间存在的差距。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
GPT o3 frequently fabricates actions, then elaborately justifies these actions (xcancel.com)
14 points by occamschainsaw 45 minutes ago | hide | past | favorite | 3 comments










> These behaviors are surprising. It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities.

It is only surprising to those who refuse to understand how LLMs work and continue to anthropomorphise them. There is no being “truthful” here, the model has no concept of right or wrong, true or false. It’s not “lying” to you, it’s spitting out text. It just so happens that sometimes that non-deterministic text aligns with reality, but you don’t really know when and neither does the model.



Reasoning models are complete nonsense in the face of custom agents. I would love to be proven wrong here.


I wish there are benchmarks for these scenarios. Anyone who has used LLMs know that they are very different from human. And after certain context, it become irritating to talk to these LLMs.

I don't want my LLM to excel in IMO or codeforces. I want it to understand my significantly easier but complex to state problem, think of solutions, understand its own issues and resolve it, rather than be passive agressive.







Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com