Prompting LLMs is not engineering

原始链接: https://dmitriid.com/prompting-llms-is-not-engineering

As of July 2025, the industry's fascination with "prompt engineering" (now often called "context engineering/prompting/manipulation") is largely misguided. It's essentially reverse-engineering black box AI models with unknown parameters like training data, constraints, and compute availability, making results unpredictable. The "better result" touted by "context engineers" lacks clear criteria and is vulnerable to external factors like fluctuating compute resources, rendering previous tricks ineffective. Claims often lack rigorous evidence, akin to homeopathy. While highly specific prompts can work, they demand significant manual effort. The supposed advancements like "chain-of-thought" prompting have proven effective only in narrow, highly specific scenarios, not broadly applicable as initially claimed. New techniques for models like OpenAI o3 and Gemini 2 Pro are essentially new versions of snake oil with no guarantee of outcome. The entire practice relies more on faith and chance than on genuine engineering principles.

A Hacker News thread discusses an article arguing that "Prompting LLMs is not engineering." The top comment suggests that effective prompting is more akin to scientific discovery, involving rigorous evaluation to understand an existing system's properties rather than constructing something new through engineering principles. Another commenter draws a parallel to the debate around software "engineering" versus "real" engineering, implying a perceived lack of rigor in prompting. A final comment laments the use of the term "engineering" in this context, fearing its negative impact on the perception of actual engineering disciplines. The overall sentiment is that the term "prompt engineering" is a misnomer, potentially devaluing traditional engineering expertise.

With the proliferation of AI models and tools, there's a new industry-wide fascination with snake oil remedies called "prompt engineering".

To put it succinctly, prompt engineering is nothing but an attempt to reverse-engineer a non-deterministic black box for which any of the parameters below are unknown:

training set
weights
constraints on the model
layers between you and the model that transform both your input and the model's output that can change at any time
availability of compute for your specific query
and definitely some more details I haven't thought of

"Prompt engineers" will tell you that some specific ways of prompting some specific models will result in a "better result"... without any criteria for what a "better result" might signify. Whereas it's enough for users in the US to wake up for free/available compute to go down and for all models to get significantly dumber than just an hour prior regardless of any prompt tricks.

Most claims about prompts have as much evidence as homeopathy. When people actually even the tiniest bit of rigorous examination, most claims by prompt "engineers" disappear like the morning dew. For example, prior to the new breed of "thinking" models, chain-of-thought queries were touted as great, amazing, awe-inducing. Sadly, in reality they only improved anything for very narrow hyperspecific queries and had no effect on broader queries even if the same techniques could be applied to them:

https://arxiv.org/pdf/2405.04776
Very specific prompts are more likely to work, but they can require significantly more human labor to craft. Our results indicate that chain of thought prompts may only work consistently within a problem class if the problem class is narrow enough and the examples given are specific to that class

Now that that the models have progressed to OpenAI o3, and Google Gemini 2 Pro, prompt "engineering" has also progressed to Rules for AI and large context windows and other snake oil remedies that are as effective and deterministic as previous ones.

In reality these are just shamanic rituals with outcomes based on faith, fear, or excitement. Engineering it is not.