Shipping AI features while staying sort of sane

April 25, 2026

Shipping AI features while staying sort of sane

The first time you wire an API key into production, the dopamine is real. The second week is when the weird stuff shows up: empty responses, tool calls that “succeed” with nonsense, users who paste 30-page PDFs into a box meant for a sentence. The model did not get worse. Your safety net was missing.

Log like you will debug at 2 a.m. (because you will)

I log at minimum: the user-facing request id, the model name and version, token counts, latency, and a hash or truncated copy of the input. Not always full PII in plain text (check your policy), but enough to replay a failure in a staging environment. If you only have “it failed” in Sentry, you are guessing.

Server room lights — logging and observability for AI services

Fallbacks are a feature, not an apology

If the model times out, what happens? Silent error? Bad. A cached canned message? Better. A degraded path that does half the job without the model? Often best.

I try to make fallbacks feel intentional: “We could not generate a summary right now; here is the raw list instead.” People forgive honesty faster than they forgive confident nonsense.

Phone with map — user-facing error states and mobile flows

Evals do not have to be a research project

You do not need a PhD to run a spreadsheet. Write down ten questions your feature is supposed to handle well. After each deploy, spot-check them. If that sounds basic, good—basic is how you notice when a “small” model upgrade changes tone in your support replies.

The part nobody posts about

Sometimes the win is not smarter AI. It is less AI. A button that says “use last week’s report as the template” saves more real minutes than a five-paragraph generated essay nobody asked for.

I still get excited about new models. I just try to pair that excitement with the same paranoia I bring to anything that touches money or customer data. Boring? Sure. So is a good night’s sleep.

GitHub
LinkedIn