// engineering_philosophy

your first draft is trash: why flow engineering beats prompt engineering

8 min read•october 2025

What is Flow Engineering? Flow engineering is a reliability architecture that breaks complex LLM tasks into discrete steps (planning, drafting, critiquing) to reduce hallucination rates. Unlike prompt engineering, which attempts to solve tasks in a single pass, flow engineering focuses on iterative correction.

you don't trust code that compiles on the first try. it's suspicious. it usually means you missed a dependency or the linter is broken.

so why are you trusting the first token that comes out of an llm?

for two years, we've been chasing "prompt engineering"—the idea that if you just find the magic incantation ("you are a world-class expert...", "take a deep breath"), the model will behave.

it's a trap. we stopped trying to prevent errors with magic words. we built a system to catch them.

> the "zero-shot" fallacy

llms are probabilistic, not logical. they generate the next token based on vibes, not truth. ask for a strict json object while also demanding a root-cause analysis and a specific tone? you're asking for trouble.

eventually? the model drops the ball. it hallucinates a field. it forgets a comma. it returns a 200 ok response where the body is just the text "i'm sorry, i can't do that."

in a standard workflow, you'd spend four hours tweaking the system prompt to fix that one edge case. tomorrow, it breaks somewhere else. that's not engineering; that's whack-a-mole.

> flow engineering vs. prompt engineering

we shifted strategy. prevention didn't work, so we moved to cure. we stopped relying on single prompts and started building agentic workflows.

// key_definition

flow engineering is a reliability architecture that breaks complex llm tasks into discrete steps (planning, drafting, critiquing) to reduce hallucination rates. unlike prompt engineering, which attempts to solve tasks in a single pass, flow engineering focuses on iterative correction.

instead of a single "god prompt," we implemented the x47 critic architecture. it's based on the reflexion paper (shinn et al., 2023), which showed that self-correcting agents jumped from 67% to 91% accuracy just by being allowed to fix their own mistakes.

the logic is asymmetric: an llm is terrible at writing perfect code from scratch, but it's surprisingly good at spotting bugs in code it just wrote. we weaponize that.

flip the self-heal toggle in the ide, and we cut the stream. you don't see the output yet. in the background, a chain of agents kicks off to ensure deterministic ai behavior:

the draft (actor agent)

the model tries its best. it's messy. it cites a paper that doesn't exist. it leaves a trailing comma in the json. we treat this as a candidate, not a result.

the audit (validation agent)

we pass that draft to a temporary "validator." we don't ask it to write; we ask it to destroy.

// system_prompt

"you are a compliance engine. verify that every claim has a source. check the json schema. list 3 specific failure points."

the fix (editor agent)

we feed the draft + the validation report back to the model. "rewrite this. fix these errors."

only then do you get the response.

> the "token tax"

"doesn't this triple my costs?"

yeah. it does. you pay for the draft, the critique, and the fix.

but look at the unit economics.

1,000 extra tokens:$0.03

one engineer debugging a broken data pipeline:$60/hour

if you're building a toy, save the 3 cents. if you're building a production pipeline, llm observability and reliability are the only metrics that count.

> when to use it

don't use this for brainstorming blog posts. the "draft" is usually fine. use the self-heal workflow when accuracy is non-negotiable:

data pipelines

if you are extracting data into json, turn this on. we've seen zero-shot prompts fail schema validation 15% of the time. the critic catches 99% of those failures.

the 'junior dev' trap

if you're using ai to generate sql or python, you need a second set of eyes. the critic acts as that code reviewer, catching syntax errors before they hit your interpreter.

compliance

don't let a chatbot talk to customers without a filter. a validator agent can check for tone and hallucinated promises before the user ever sees the message.

> the bottom line

stop looking for the magic prompt. it doesn't exist.

accept that the ai is going to mess up, and build a loop that cleans it up. that isn't "prompt engineering." it's software engineering.

try it now

the critic is available under the SELF_HEAL function in the x47 ide. go break it.