There is a whole genre of posts that treat prompts like spells: add this adjective, ban these words, threaten the model in all caps, watch the eval jump 2 points. Some of that works! But most of the time you are not optimizing the model. You are nudging it inside a very narrow band, and the real wins come from structure, not charisma.

Treat the prompt like an API contract

I write system prompts the same way I would document an internal function: inputs, outputs, invariants, failure behavior. “You are a helpful assistant” is not a spec. “Reply in at most three bullet points; if the user’s question needs data you do not have, say that explicitly; never invent an order id” is closer to something I can test.

Notebook and coffee — planning prompts like real specs

Few-shot examples: quality beats quantity

Throwing five mediocre examples at the model is worse than one great one. I try to show edge cases in the few-shot: the empty input, the user who pastes a wall of text, the question that almost sounds on-topic but is not. The model generalizes from what you hand it; if you only show happy paths, you should expect fragile behavior in production.

Drift is real, and it is not always OpenAI’s fault

When a provider ships a new model version, your prompt can behave differently even if you did not change a character. I am not here to name villains—that is the platform risk we sign up for. But there is also our drift: someone tweaks a string in a PR, nobody updates the eval set, and three weeks later support tickets spike.

I keep a tiny regression pack of user questions with expected shape of answers (not always exact wording). Run it on every change that touches the prompt, same as you would for auth code. Boring, but you sleep better.

Team reviewing something on a screen — prompts as team-owned code

The human bit

I still write first drafts of tricky prompts on paper sometimes. It sounds silly, but it slows me down enough to notice when I am asking the model to do three jobs at once. One job per call—or split the pipeline. Your tokens (and your brain) will thank you.

If you are stuck, swap “make the prompt cleverer” for “make the task smaller.” That single habit has fixed more “bad AI” tickets for me than every trick list on Medium combined.

Prompt engineering without the thread bro energy

Treat the prompt like an API contract

Few-shot examples: quality beats quantity

Drift is real, and it is not always OpenAI’s fault

The human bit