You ran the brief-to-draft prompt yesterday and the output cleared the editorial gate. You ran the same prompt this morning and the draft would not survive self-review. The model has not changed; the team blames “the AI being weird today”. The diagnosis is mechanical, and the fix is a discipline.
What is actually changing
Four mechanisms produce run-to-run drift on a prompt that “did not change”:
- Sampling. The model is not deterministic at default settings. Each run draws from a probability distribution; two draws on the same prompt land in different places. Temperature controls the spread — at temperature
0the spread narrows, at0.7the spread is wide enough to surface the variance you are seeing. - System-message drift. The chat surface —
chatgpt.com,claude.ai, anything you talk to in a browser — carries a system message you did not write. Vendors update it. Yesterday’s system message biased one way; today’s biases another. A prompt run inside a chat surface is not the same artefact as a prompt run through the API with a system message you control. - Hidden state in the chat session. A long-running chat carries previous turns into every subsequent turn. The model’s interpretation of your latest prompt is conditioned on the conversation history. A fresh chat tab and your hundred-message work chat do not run the same prompt, even when you paste the same text.
- Silent model updates. The model name on the menu —
gpt-4,claude-sonnet,gemini-pro— is a label, not a snapshot. Vendors push point updates that change behaviour without changing the name. The team that pinned the prompt three months ago is running it against a different model today.
The prompt is not just the prompt body
The artefact your team’s craft compounds in is a tuple, not a string:
prompt body + model snapshot + temperature + system message + tool config
Storing the body alone is half the artefact. The rerun that gives you different output tomorrow is rerunning a different five of those, and you have no way to tell which.
What you do
Three moves close the gap:
- Pin the tuple. Use API-pinned model snapshots — the dated identifiers Anthropic and OpenAI publish in their model documentation, not the menu labels in
chatgpt.comorclaude.ai. Set temperature explicitly. Write the system message yourself. - Store the artefact as a file. One template file per prompt, with declared variables, the body, the settings, and the worked example. The Playbook’s prompt template schema names the fields.
- Treat rerun as structural retry. A second run on the same tuple is your team’s quality-control sample. A second run that pastes the same body into a different chat tab is folklore.
Output stops drifting when you stop rerunning a prompt body and start rerunning a pinned tuple. The team’s craft compounds in an artefact; it does not compound in a chat history.
If you want the long-form treatment, the sample chapter is free in your browser, and the full Playbook is the rest of the work.