AI Exposed Our Deployment Problem (We Thought It Was an AI Problem)

AI didn't create our deployment problem. It exposed a decade of inconsistency between teams that siloing had quietly hidden.

A developer on my team shipped work in three codebases in one week. Three deploy commands. Three test setups. Three different ways to read logs.

Same engineer. Same team. Same company.

The AI had made her faster everywhere except in the friction between systems. That friction was not an AI cost. It was an org cost that had been hiding for years.


The bottleneck kept moving

Two weeks ago I wrote about the bottleneck moving upstream from coding into requirements. Last week, downstream into PR review. This week it kept moving, into deployment and the contract between teams. Same dynamic. New surface.

We were shipping once a day. AI velocity made that untenable inside a quarter. PRs stacked up faster than the pipeline could clear them. The test queue became the throttle on everything downstream of coding.

The obvious response was to speed up the pipeline. So we did. Then we hit a different wall. One that did not look like an AI problem at all.


AI broke the silo, and the silo was hiding the mess

Every team in our org had built their own deployment methodology, in isolation. Different test conventions. Different deploy commands. Different rollback rituals. Different definitions of "done." For a long time that was fine.

Team siloing hid it. Each team owned their codebase. They knew their pipeline. They rarely had to cross boundaries.

AI changed that assumption. A developer working with an agent moves between codebases in hours, not weeks. Context switching is cheap now. The only friction left is the friction between systems.

Most teams read that friction as an AI problem. It is not. The AI exposed inconsistency the org had quietly subsidized for a decade.


The architectural read

Deployment standards are an AI prerequisite, not a DevOps nicety. That is the whole thing.

When mobility was low, every team could run their own configuration and the only cost was a few onboarding days when someone rotated in. When mobility is high, every inconsistency becomes a daily tax. Multiply that tax by every developer crossing every boundary and you have a measurable productivity sink that no amount of better autocomplete fixes.

Chapter 5.1 of the book argues for standardizing on a single AI-driven IDE instead of letting each engineer pick their own. The principle generalizes. Standardize the substrate so AI velocity transfers across teams instead of getting trapped in tooling variance. Deployment is just the next layer down.


Four moves we made

None of them were about the AI. All of them were about the substrate.

1. Optimize the test bottleneck honestly

Playwright was the gating constraint at one release a day. Multi-hour end-to-end runs meant batched deploys. Batched deploys meant slower feedback. Slower feedback meant the AI's productivity gain stalled at the queue.

We parallelized, sharded, and cut tests that had not caught a regression in a year. The cuts were the hard part. A test that has never failed on a real bug is not a safety net. It is a cost center wearing the costume of one.

The tradeoff is sharp. Cutting tests requires honesty about what each test actually protects, and most teams cannot answer that question in a defensible way. If you cannot, you do not get to cut. You get to instrument first, then cut.

2. Match the deploy unit to the work unit

When developers ship multiple stories a day, a daily release window is absurd. We moved to per-story deploys. Every merged PR ships independently. Feature flags carry the risk on anything that is not safe to dark-launch alone.

The tradeoff: per-story deploy without disciplined feature flagging means you are deploying thirty times a day with no way to dark-launch risky changes. Deploy frequency is not the win. Decoupling merge, deploy, and release is the win. Skip the flag layer and you have a faster way to ship outages.

3. SRE as developer enablement, not infrastructure

We pulled our SRE off feature work and incidents for two full weeks. Not to build new platform. To run sessions, audit each team's deploy config, document patterns, and write the shared contract.

The cost was visible. Two weeks of no platform work, no on-call handoff anyone wanted. The value showed up six weeks later, when a developer onboarded to a third codebase in a sprint and lost about four hours instead of two days. Best Q1 investment we made.

4. A shared deployment contract

The contract names what gets standardized across every team: test conventions, deploy commands, environment variable patterns, observability hooks, rollback procedures, the definition of "deployed."

Per-team variation is allowed below the contract line. Not above it. Teams can pick their own test runner. They cannot redefine what "green" means.

Too rigid and teams route around it. Too loose and it stops being a contract. The right shape is the thinnest possible spec that lets a developer move from any codebase to any other codebase and recognize the deploy without asking anyone.


Where each move breaks

The failure modes are predictable once you start.

  • Optimizing tests without questioning coverage. Faster tests that still do not catch what matters. You shipped a faster green light, not a better one.
  • Per-story deploys without feature flags. You are now deploying thirty times a day with no dark-launch capacity. The first incident from this looks like an AI quality problem. It is not. It is a missing capability.
  • SRE enablement as a one-time event. Standards drift the day SRE rotates back to incidents. If nothing owns the contract, the contract degrades on a predictable curve.
  • A contract that is too rigid or too loose. Too rigid and teams build shadow tooling around it. Too loose and it becomes a wiki page nobody opens. The contract has to be enforced by a gate, not a culture deck.

Implementation checklist

Most of this is org work, not platform work. The tooling is already in your repos.

  • Audit each team's deploy config, side by side. Put them in one document and count the differences. The number will surprise you.
  • Write the contract before standardizing the substrate. Standardize on what already works across at least two teams. Avoid greenfield abstractions.
  • Make the contract a CI gate, not a guideline. A guideline is a suggestion with a Slack reminder. A gate fails the build.
  • Treat SRE enablement as a recurring rotation. Two weeks a quarter, named on the roadmap, defended like any other commitment.
  • Cut tests that have not caught a real bug in a defined window. Twelve months is reasonable. Six months is aggressive. Zero months is reckless.
  • Decouple merge, deploy, and release explicitly. Three verbs, three controls. If any two are joined, you have a hidden coupling that will bite under AI load.
  • Measure cross-codebase onboarding time. The hours a developer loses on their second and third codebase per sprint is the metric that tracks the contract's health.
The AI did not slow us down between codebases. The fifteen years of independent decisions did. The AI just made them visible by the hour.

One question

Chapter 5.3 frames this through the support cost lens. The metric most teams optimize is deploy frequency. The metric that actually predicts AI productivity at the org level is how cheap it is for a developer to move between codebases without losing a day to deployment trivia.

Where in your org is a developer paying that tax right now, and is anyone allowed to fix it across team lines?

For related field notes, browse the blog archive.