The broken promise of PRDs, specs, and get shit done

If specifications solved problems and were perfectly implemented by or , the waterfall model would still be king like IBM. Specs don't solve problems. Specs encode someone's hopes and dreams to change reality — which fights back when it disagrees.

Instead, observations and problem-solving lead one to understand reality as it is today and how it can be structurally transformed in a reliable way with research and development. A specification merely documents the combination of its desired end state and approximately the bounds of how it should be achieved through applied knowledge and constraints found through observation, research, and development.

"How would you do it?" — a plan — is quite different from "How did you do it?" — a specification. I use "specification" in the sense that IETF RFCs over the last decade are mostly on documenting things that already work for interoperability. The hard parts of figuring out what works in reality has already been performed.

For anyone that's done project management, development, and implementation, we all know that plans only exist to document context in one place so that stake-holders can confirm it is aligned with their vision, the resources that will be spent is acceptable, and the facts and constraints of the world around its future implementation are grounded. Even with all that preparation, a plan is speculative.

To reinforce the plan, we've devised this thing called a Project Requirements Document (PRD) that focuses less on the "how" and more on the "what" and "why." It is easier for an implementor to stay aligned and make faithful deviations from a flawed plan when a PRD asserts the context on what the objective is.

There's a good reason why PRDs, standards, and specs exist. They keep independent entities ( , , ) aligned while their underlying implementations differ. However, their reputation in agentic coding is exaggerated.

Common Ground & Definitions

Before I criticize the opinions and untruths disseminated by social media and the inconsistent meaning behind terminology, take a read through The State of Large Language Models in Q2 2026.

I've used agentic harnesses, including Cursor IDE, its agent CLI, as well as Claude Code . They generate useful greenfield applications and tools. With the latest thinking models (Opus 4.6, Composer 2), these agentic coding tools are able to modify brownfield applications with very targeted instructions without catastrophically failing.

Agentic coding harnesses like Cursor's IDE, OpenAI's Codex, Claude Code are capable products. I've used them to generate useful greenfield applications and tools. Even though Claude Code regularly degrades without explaination for weeks at a time.

Language and Product Development

The primary mode of interaction with a Large Language Model is language. Before LLMs, the primary change mechanism for a product manager was language too, in the form of PRDs, tickets, slack messages, and meetings. Conveniently, LLMs understand what these documents and formats are too, and they appear to enact change on applications when given a written input like a PRD or ticket. However, implementing applications isn't so straight forward as giving vague direction.

Software Engineers are Intelligent people. Their job is more than "add a button that says <thing> and it sends marketing text messages that Marketing queues up." A SWE accounts for the historical state of the world, the current state, anticipated future states, the vertical of button press to asynchronously saving database state, to the final linkage to the marketing push channel — assuming it exists.

PRD authors are used to the intelligence and consideration of a living SWE. Typically the author doesn't have to specify obvious things like "and if the user refreshes, they see their consent status as checked if they had previously consented" which leaks into test planning for quality assurance. An experienced developer would do that and verify it works before moving on to the next thing.

Continuing with the marketing push consent example, what if the marketing distribution wasn't implemented? Assuming the design of a marketing push channel, message templates, message queues, audience / cohort selection, journeys, conversion tracking, sales attribution and so on wasn't in scope… then it would be fair to say the scope includes at least calling a stub implementation for an enrollment message, rather than implement the rest. And, there would be a way to verify this with mocks in automated testing, or some other log that quality assurance can find and confirm.

With a SWE's or Software Architect's input, a product manager would realize that separately scoped PRDs are needed to either implement a marketing product (unlikely) or integrate with a big name. Without a SWE or SWA's input, the scope of a single PRD can be excessive and difficult to estimate in time, complexity, risks, and more theoreticals. We typically do "agile" these days instead of "waterfall" since the act of developing a product is also the discovery of what the product should be, where multiple PRDs build on top of one another as we get closer to the overall objective. The waterfall model fails because plans, especially by the non-technical, break down in the face of reality.

How to draw an owl.
1. Draw some circles
2. Draw the rest of the owl

Unfortunately, "Draw the rest of the owl" doesn't work for software. Incremental development that constantly considers the past, present, and future is the only way reliable software products can be developed when it isn't a clone of something that already exists. A reminder that specifications are retro-describing what already exists and implementing a specification.

The stories they tell

While the volume of these stories are quieting down in proportion to token subsidization dwindling, I have to challenge the notions they've spread that fuel the AI hype cycle.

I've heard (and tried) the spectrum these language-first influencers spout. From "Just write a PRD" and ask it to create tasks and follow the tasks to specifying with "SpecKit" yourself to be in more control with a built-in test plan to having an agentic harness ask and ask and ask to eliminate all possible ambiguity before documents and context are committed to disk for future agentic work.

These language-first influencers don't have the implementing experience to write competent tasks for an agent, nor do they have the experience to review and qualify the generated tasks. And they find out the hard way after agentic implementation that 20% of what they asked for didn't make it into any tasks and 20% of each task didn't make it into the implementation.

So they do silly things like use additional waves of sub-agents to co-review the task with the PRD or specs, and co-review the implementation with the task, and search the web for modern documentation on the tools being used, and go through three developer personalities to assess the plan, and in moments you're spending $120/hour in token-spend on a page with a misplaced out-of-brand-style button that console.logs "subscribed" at the end in a harness called "Get Shit Done." And then you come to find out that the page with this consent view isn't even linked from the user's profile, or anywhere else on the application.

These language-first influencers consistently ignore and explain away the failings of their processes without addressing the truth that bites them again and again. LLMs are probabilistic, lossy, easily confused by excessive context, unable to differentiate past, present, and future, give up easily, go off the rails, game the specifications, and tasteless. SWEs and SWAs bring taste, knowledge, experience, and awareness. Sure, people are "probabilistic" and "lossy" or "easily confused." Even so, they can and often work alone and together to make things that actually work without cheating their success to the outside world.

Just as I've lost trust in coworkers that delete tests or "fix CI" by committing node_modules (and its 4GB of transitive x86 specific dependencies to git), I likewise find that LLMs behave similarly when obstructed. These technologies are misaligned with the developer morals that produce robust software products.

Let's call a set of values and behaviors that create high quality software products that evolve over time "developer morals" and the high quality developers are aligned with those morals. In that framework, the developer that games the CI server to give the green light for their project without addressing the underlying problem they introduced is a "bad developer" in the moral sense, as they have introduced fragility and risk without regard for their fellow developers or future self. Thankfully, other developers push back on adding risk and gaming the objective — as long as they're not overwhelmed in their own work or in reviewing the current slew generated code. "LGTM" right?

These technologies are no where near qualified enough to make novel products or safe enough to enhance products without experienced (and attentive) supervision. They are however in a place that suggests we might get there some day — when is unclear. Contrary to the "Claude Mythos" hype, I'm not convinced that we'll get models in the next year or two that can emulate the experienced thought processes of SWEs and SWAs when they convert requirements into implementation.

These thoughts aren't documented, only the shadow and artifact of thought is in source control. A spectrum of good and bad code is freely available on GitHub from open source products to vast seas of computer science student toys and there's no map or legend to differentiate them. Again, the thoughts and considerations that went into all this code is missing. Parroting patterns of code has gotten us surprisingly far, especially with parroting synthetic planning processes that matches requested intent and the output. Even so, it continues to show how constrained it is within the training set of things that already exist.

Synthetic training data underpins the "thinking" and "planning" processes that LLMs generate and with level-raising efficacy. There's no where near enough native volume of this kind of content with respect to the rest of written content found online, so synthetic data has to compensate for that gap. Even then, synthetic data will be constrained by both the imagination of the researcher and the risks the researcher will take in generating inconsistent / illogical synthetic data.

#The broken promise of PRDs, specs, and get shit done#

#Common Ground & Definitions#

#Language and Product Development#

#The stories they tell#

The broken promise of PRDs, specs, and get shit done

Common Ground & Definitions

Language and Product Development

The stories they tell