Before I let an AI write code, I make two of them argue about the spec

I’m Fillip Kosorukov. I don’t have a computer science degree. I build and run real software anyway — several production web apps, solo, with a lot of help from AI — and the habit that changed my work was boring: I stopped letting the model write code before the spec had been attacked. I learned it the hard way: before any code gets written, I make the spec survive an argument.

Here’s what that actually looks like, because “write a better spec” is the kind of advice that sounds wise and helps no one.

The failure that taught me this

Early on, my loop was the obvious one. I’d describe what I wanted in a paragraph or two, hand it to a capable model, and let it build. It would build something fast, and it would get the obvious path right and miss exactly the part I’d failed to specify. The wrongness usually wasn’t random. It clustered in the places where my description had a gap I hadn’t noticed — an edge case I hadn’t named, an assumption I’d left implicit.

One that still stings: I had a job that ingested intake forms, and I’d never said what “empty” meant for a field that was submitted but blank. The model picked a reasonable-sounding interpretation, and I built two days of processing on top of it before I discovered it was the wrong one for half my real inputs. The model hadn’t filled that gap with my intent. It filled it with the most statistically ordinary guess, and by the time I noticed, the guess was load-bearing.

The lesson wasn’t “the model is bad at coding.” It can be very good at turning a concrete spec into code. The lesson was that the expensive mistakes happen at the spec, and a vague spec handed to a fast builder just gets you to the mistake faster.

Two contexts, not one

So now I run two separate conversations before a single line of implementation gets written.

The first conversation is the builder. Its job is to take my goal and produce the most complete, concrete specification it can — name the files, the data shapes, the sequence of operations, the failure handling, the thing it does when the input is empty or malformed. I push it to be specific and to stop hedging. A spec full of “the system should appropriately handle errors” is a spec that has decided nothing.

The second conversation is the adversary, and this is the part most people skip. I open a fresh conversation — no shared memory with the first — and I give it one job: tear the spec apart. Not improve it. Not be kind. Find what breaks. What did this spec assume that won’t hold, and where will it be brittle six months from now?

The reason it has to be a separate context is the same reason you don’t proofread your own writing well: a conversation that just produced something is invested in it, so it defends. A clean context with an explicitly adversarial job has nothing to protect, so it finds the thing I’d otherwise have hit in production three weeks later — except now it costs me a paragraph instead of a rewrite. A real example of what comes back: the spec says retry failed jobs — retry how many times, with what backoff, and what stops a retry from sending the same email twice? None of which I’d written down.

Then I bounce the attacks back to the builder. Address each one: fix it, push back with a reason, or write it down as a limitation I’m choosing to accept. Then back to a fresh adversary. Usually three or four rounds of this and the spec stops bleeding. The attacks get weaker, the builder stops finding things to change, and that convergence is the signal that the spec is ready. Then I let it build.

Name the verifier first

There’s one rule layered on top of this — a framing I picked up from the way Andrej Karpathy talks about keeping a verifier in the loop — and it’s the most useful sentence in my process. Before I accept any spec, I have to answer one question in a single sentence:

How will I know, concretely, in thirty days, that this worked?

Not “it’ll be better.” A falsifiable, observable thing. This endpoint returns 401 to an unauthenticated request. This number stops growing. This report lands in my inbox every morning by seven with a status of OK. If I can’t write that sentence, I don’t understand the problem well enough to spec it. A bigger model doesn’t fix that — it just makes a more confident version of my own confusion.

The verifier does double duty: it’s also the lens I hand the adversary. “Does this spec actually make that sentence come true?” is a far sharper attack than “is this spec good” — it has a definite answer.

Why this beats just using a smarter model

The instinct, when the output is wrong, is to reach for a bigger model. Sometimes that helps at the margin. But a smarter builder pointed at an ambiguous spec just produces a more sophisticated version of the wrong thing — it fills your gaps more plausibly, which is worse, because plausible wrong is harder to catch than obvious wrong.

The problem was upstream of the code. The bottleneck was never the model’s coding ability — it was the quality of the decision I’d made before the model started. I’m not a faster typist than the AI and I never will be. My job is to decide what should exist before the model starts typing. The adversary pass keeps me from mistaking a confident paragraph for a decision.

None of this requires tools — two chat windows and the discipline to keep one of them hostile. This process is one reason that gap matters less than it used to, and why the software I ship mostly holds up when real inputs hit it.

About the author

Fillip Kosorukov is a published researcher in behavioral psychology and substance use intervention — co-author of a peer-reviewed study on protective behavioral strategies and motivational interviewing (Journal of Substance Use, 2023; PMID: 37275205), BS Psychology (Summa Cum Laude), University of New Mexico, 2020. He now applies a behavioral-science lens to founder-led technology products and applied decision-making.

Find him elsewhere:
ORCID ·
Google Scholar ·
Scopus ·
Web of Science ·
ResearchGate ·
Academia.edu ·
LinkedIn ·
Substack ·
fillipkosorukov.net

The failure that taught me this

Two contexts, not one

Name the verifier first

Why this beats just using a smarter model

Leave a Comment Cancel Reply