The first time you wire an API key into production, the dopamine is real. The second week is when the weird stuff shows up: empty responses, tool calls that “succeed” with nonsense, users who paste 30-page PDFs into a box meant for a sentence. The model did not get worse. Your safety net was missing.

Log like you will debug at 2 a.m. (because you will)

I log at minimum: the user-facing request id, the model name and version, token counts, latency, and a hash or truncated copy of the input. Not always full PII in plain text (check your policy), but enough to replay a failure in a staging environment. If you only have “it failed” in Sentry, you are guessing.

Server room lights — logging and observability for AI services

Fallbacks are a feature, not an apology

If the model times out, what happens? Silent error? Bad. A cached canned message? Better. A degraded path that does half the job without the model? Often best.

I try to make fallbacks feel intentional: “We could not generate a summary right now; here is the raw list instead.” People forgive honesty faster than they forgive confident nonsense.

Phone with map — user-facing error states and mobile flows

Evals do not have to be a research project

You do not need a PhD to run a spreadsheet. Write down ten questions your feature is supposed to handle well. After each deploy, spot-check them. If that sounds basic, good—basic is how you notice when a “small” model upgrade changes tone in your support replies.

The part nobody posts about

Sometimes the win is not smarter AI. It is less AI. A button that says “use last week’s report as the template” saves more real minutes than a five-paragraph generated essay nobody asked for.

I still get excited about new models. I just try to pair that excitement with the same paranoia I bring to anything that touches money or customer data. Boring? Sure. So is a good night’s sleep.

Shipping AI features while staying sort of sane

Log like you will debug at 2 a.m. (because you will)

Fallbacks are a feature, not an apology

Evals do not have to be a research project

The part nobody posts about