- -dangerously-skip-reading-code – olano.dev
- -dangerously-skip-reading-code

原始链接: https://olano.dev/blog/dangerously-skip/

与其将大语言模型生成的代码视为需要人工审查的对象,作者认为我们应将其视为一种新型的“机器代码”。由于大语言模型的输出具有非确定性和高产出量的特点,传统的代码审查正变得不再适用。 然而,这种转变需要审慎的组织策略,而非个人的选择。为了获得真正的生产力提升,企业必须摒弃传统的开发流程(如基于工单的任务分配和人工把关机制),转而采用自主的、以智能体为驱动的流程。由于重构成本显著降低,工程严谨性的重心必须从代码本身转移到对工作内容的定义上。 作者提出,“知识单元”应转变为标准化的、版本可控的规范以及强大的自动化测试套件。在这种模式下,责任在于维护清晰的规范并验证生成代码是否符合要求,而不是手动审计每一行输出。通过重构组织流程以应对“无限”的需求供给,并优先进行规范驱动的开发,团队便能在不牺牲项目完整性的前提下,有效利用大语言模型。

这篇 Hacker News 讨论帖探讨了随着大语言模型(LLM)深度融入软件生命周期,开发人员角色正在发生的变化。 讨论的核心在于“需求规格说明书”是否会取代传统的源代码。一些人认为,开发人员的工作重心将从手动实现转向架构监管和产品管理。支持者认为,人类的责任将转向高层验证,即决定“做什么”而非“如何编写代码”。 然而,怀疑论者对可靠性和验证性提出了重大担忧。他们指出,LLM 缺乏编译器所具备的严谨精确性,往往会生成难以审查和调试的代码。批评者强调,验证实现是否符合规格说明是一个尚未解决的复杂难题;指望人类能有效审查 LLM 生成的代码(其设计初衷是具有说服力而非逻辑无误),这一前提本身就存在缺陷。归根结底,这场辩论凸显了两种观点之间的张力:一种是认为人工智能将实现编程“如何实现”环节自动化的乐观愿景;另一种则是担忧由于缺乏代码透明度和 AI 的不可靠性,将催生出一类危险且复杂的调试挑战。
相关文章

原文

I concluded my previous post saying that it was irresponsible to assume that we won’t need to worry about reading and debugging our code anymore—to assume that whatever problem that pops up the LLMs will be able to fix for us. This felt irresponsible because, up until now, it has been the programmer’s job to understand and maintain the source code, as a proxy to understanding and maintaining the software system. We are held accountable for the LLMs’ output.

But what if this wasn’t the case anymore? What if we dutifully communicate the risks and trade-offs to our organizational leadership and they still want to take those risks? This isn’t unheard of: companies, and especially tech startups, regularly make short-term compromises to improve productivity, beat the competition to market, lure investors, etc.

If there’s an organizational mandate to leverage LLMs to minimize the time spent coding, then that’s a new constraint we can work with. We can figure out what good engineering looks like in that context. We can stop reading LLM-generated code just like we don’t read assembly, or bytecode, or transpiled JavaScript; our high-level language source would now be another form of machine code.

This finally clicked for me after reading Thoughtworks’ retreat report. The LLMs produce non-deterministic output and generate code much faster than we can read it, so we can’t seriously expect to effectively review, understand, and approve every diff anymore. But that doesn’t necessarily mean we stop being rigorous, it could mean we should move rigor elsewhere.

It’s fundamental to understand, though, that this is not an individual’s or team’s call: it has to be an organizational decision, and not just because of risk management and accountability, but because of Amdahl’s law. If we only maximize code generation speed without rearranging the organizational structures and processes in which our work is embedded, there won’t be any tangible productivity gains.

We can’t have some devs pumping 20k lines of slop a day and expect the rest to still read and understand it, let alone approve it. We can’t leverage agents if our unit of work is still “add a new endpoint to the RESTful API”. We can’t expect a Product Owner to stream enough work to keep a two-pizza team busy if each engineer can take on four tasks at a time and keep agents running off-hours.

Instead, we need to remove humans-in-the-loop, reduce coordination, friction, bureaucracy, and gate-keeping. We need a virtually infinite supply of requirements, engineers acting as pseudo-product designers, owning entire streams of work, with the purview to make autonomous decisions. Rework is almost free so we shouldn’t make an effort to prevent incorrect work from happening.

Then where does the rigor go? Similar to the Thoughtworks report, my first bet would be specifications (which is not the same as prompts) and tests (which is not the same as TDD). If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project. Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.

联系我们 contact @ memedata.com