Post

Tactical Debt Silently Destroys Engineering Velocity

Tactical Debt Silently Destroys Engineering Velocity

Picture the cartoon: an engineer chained to a giant boulder labeled TACTICAL DEBT.

The boulder is not made of bad code. It is made of unclear ownership, manual work, status meetings, hero dependencies, reactive fire drills, poor handoffs, information silos, excess approvals, and tribal knowledge.

That image works because every senior engineer has felt it. You can be good at your job. Your team can be smart. The codebase can even be reasonably clean. And somehow the smallest change still takes three weeks to reach production.

This is the part of engineering productivity we keep underestimating.

AI can help you write code faster. Better tests can help you change code with more confidence. Cleaner architecture can reduce local complexity. But tactical debt absorbs those gains before they become delivery speed.

AI writes the patch in ten minutes. Tactical debt makes sure the patch waits two days for ownership clarification, one day for approval, another day for the one person who understands the deployment path, then gets delayed because production is on fire again.

Congratulations. You optimized the typing, not the system.

Tactical debt is not technical debt

Technical debt is about the shape of the code: messy abstractions, weak tests, duplicated logic, risky coupling, bad boundaries, poor naming, fragile dependencies.

Tactical debt is about the shape of the work: how decisions move, how ownership is understood, how knowledge spreads, how incidents interrupt plans, how changes travel from idea to production.

They are related, but they are not the same thing.

A service can have clean code and terrible tactical debt. The tests pass, the architecture is reasonable, the domain model is fine, and still nobody knows who owns it. Deployments still require a manual checklist. The release still needs three approvals from people who do not understand the change. The incident process still depends on one engineer reading logs from memory.

The opposite also happens. A codebase can be ugly but tactically healthy. Ownership is clear. Deployments are automated. Incidents have runbooks. The team communicates asynchronously. Decisions are recorded. In that environment, ugly code is at least visible and movable. You can gradually repair it because the operating system around the team allows repair.

Technical debt slows the code down. Tactical debt slows the organization down.

And when they compound, velocity collapses.

Unclear ownership turns small questions into investigations

The most expensive sentence in a software company is often:

Who owns this service?

It sounds harmless. It is not.

A team wants to change an API. The repository exists. The service is running. There are dashboards. Someone must own it, right?

Then the archaeology begins. The original team was reorganized. The current team uses the service but does not own it. The platform team owns the deployment pipeline but not the business logic. The person who wrote most of it moved to another domain. The Slack channel is still active, but mostly for alerts nobody wants.

Two days later, the team has not changed a line of code. They have only discovered the social topology of an abandoned system.

That is tactical debt.

Clear ownership is not bureaucracy. It is routing. Without it, every change starts with detective work.

Manual work makes velocity depend on human stamina

Manual work is tactical debt because it converts repeatable operations into rituals.

A deployment requires someone to follow a checklist. A database migration requires copying commands from a document. A customer fix requires an engineer to run a script locally. A release requires checking five dashboards by hand. An incident requires remembering which flag to toggle.

This may look fine when the system is small. It may even look efficient. Why automate something that happens once a week?

Because manual work compounds quietly. The team starts scheduling around humans instead of around systems. Deployments happen only when the right person is awake. Releases get delayed because someone is on holiday. Engineers become nervous because every operational step is an opportunity to make a boring but expensive mistake.

Manual work also destroys AI productivity gains. An agent can generate code, tests, migrations, and documentation. It cannot make a midnight checklist less fragile unless the organization decides that repeatable work deserves automation.

If production still depends on humans clicking carefully, the team does not have velocity. It has patience.

Status meetings are often a symptom of broken async work

Not every meeting is bad. Some conversations are faster live. Some decisions need debate. Some ambiguity is best resolved with humans in the same room.

But many status meetings exist because the team has failed to make work observable asynchronously.

Nobody trusts the board. The ticket descriptions are vague. Decisions happen in private threads. Risks are not written down. The deployment status is unclear. Dependencies are tracked in someone’s head. So the organization compensates with synchronous theater.

Everyone joins a call to say what should already be visible.

That meeting is not coordination. It is a tax charged by poor operating hygiene.

The worst part is that status meetings create the illusion of control. Managers hear updates. Engineers repeat updates. But the underlying system remains opaque. The next day, everyone needs the same meeting again.

A healthy team does not eliminate communication. It moves routine state out of people’s mouths and into durable, inspectable places.

Hero dependencies turn competence into a bottleneck

Every company has the person who knows how payments work.

Or infrastructure. Or identity. Or the old reconciliation job. Or the mobile release process. Or the vendor integration everyone hates but nobody can replace.

At first, this person looks like an asset. They are fast. They know the history. They solve incidents. They unblock people. They have the context.

Then they go on vacation and everyone holds their breath.

Hero dependencies are tactical debt wearing a compliment. The organization praises the hero because praising the hero is easier than fixing the dependency.

The real problem is not that someone is excellent. The problem is that their knowledge has not been converted into team capability: documentation, ownership, runbooks, tests, pairing, rotation, design notes, automated checks, and enough shared context that the system can survive an absence.

If one person leaving for two weeks changes your delivery forecast, that person is not the bottleneck. Your operating model is.

Reactive fire drills make planning fictional

A sprint plan can be beautifully organized and completely fake.

If production is constantly on fire, the plan is not a plan. It is a hope document.

Reactive fire drills are tactical debt because they steal attention in unpredictable chunks. A team thinks it has ten engineering days. Then an incident takes two. A customer escalation takes one. A flaky deployment takes half a day. A broken integration takes another. By Friday, the team did not miss the plan because it was lazy. It missed the plan because the plan ignored operational reality.

This is where tactical debt becomes emotionally corrosive. Engineers start every cycle knowing the commitments are probably impossible. Managers start asking for explanations. Teams start padding estimates. Stakeholders lose trust. The organization responds with more tracking, which creates more status meetings, which consumes even more time.

The fix is not to yell at people to focus. The fix is to reduce the source of interrupts: recurring incidents, weak observability, unclear escalation paths, brittle releases, missing ownership, and systems that fail in ways nobody understands.

Velocity is not how fast you move on a calm day. It is how much forward motion survives contact with reality.

Poor handoffs turn delivery into rework

“It works on my machine” is not a joke. It is a handoff failure.

Poor handoffs happen when work crosses a boundary without enough context to survive the crossing. Product throws requirements over the wall. Backend throws an API over the wall. Frontend discovers the edge cases late. QA finds ambiguity that should have been resolved in design. Operations receives a service without runbooks. Support receives a feature without failure modes.

Each handoff looks local. The total effect is brutal.

Work bounces. Questions repeat. People wait. The same decision gets rediscovered in three channels. The original engineer is pulled back into something they thought was done. The cycle time expands, but the code diff looks small, so nobody understands why delivery feels slow.

Good handoffs are not about heavier process. They are about transferring the right information at the right boundary: assumptions, constraints, examples, rollback paths, owner, expected behavior, known risks, and what “done” actually means.

If every handoff requires a meeting to explain the artifact, the artifact is incomplete.

Information silos make the company pay twice for knowledge

A silo is not just a team keeping information private. Sometimes nobody is hiding anything. The knowledge is merely scattered across Slack, old documents, abandoned tickets, meeting recordings, dashboards, and people’s memories.

That is enough to slow everything down.

Information silos make every new project start with a treasure hunt. Has anyone tried this before? Why did we choose this vendor? Is this API deprecated? What happens if this job fails? Which metric tells us the customer impact? Where is the architecture decision?

If the answer is “ask Maria,” the company does not have knowledge. Maria has knowledge.

This matters even more in the AI era. AI tools are only as useful as the context they can reach. If the real system knowledge lives in private conversations and tribal memory, agents will confidently generate code that fits the repository but violates the organization.

AI does not eliminate information architecture. It punishes the absence of it.

Excess approvals confuse control with safety

Approvals can be useful. Regulated systems need controls. Risky changes deserve review. Production deserves respect.

But excess approvals are often what organizations use when they do not trust their own engineering system.

A small change needs approval from a manager, a staff engineer, a platform owner, a security reviewer, and someone from another team who joined the process three reorganizations ago. Half of them rubber-stamp it because they do not have enough context to review it properly. The other half are overloaded, so the change waits.

This is not safety. It is latency with a checkbox.

Real safety comes from clear ownership, automated tests, policy-as-code, progressive delivery, observability, rollback, audit trails, and reviewers who understand the risk they are approving.

If approvals are mostly ceremonial, remove them. If they are necessary, make them sharp: who approves, why, under what conditions, with what evidence, and within what SLA.

Control that nobody can explain is tactical debt.

Tribal knowledge is just undocumented production behavior

Tribal knowledge sounds cozy. It is not.

It means the system behaves in ways that are known socially but not operationally. Do not deploy on Fridays. Restart that worker twice. Ignore that alert unless it happens after midnight. Run this script before month-end. That customer has a special flag. This service times out unless the batch job finishes first.

None of this is visible in the code. None of it is obvious from the dashboards. New engineers learn it by making mistakes or overhearing warnings.

Tribal knowledge is dangerous because it feels efficient to insiders. They already know the shortcut. They already know the trap. Documentation feels like overhead.

Then the team grows, the hero leaves, AI agents start modifying code, incidents happen at inconvenient times, and the organization discovers that its real architecture was oral tradition.

If production behavior matters, write it down, encode it, test it, monitor it, or automate it. Otherwise it is not knowledge. It is folklore.

The AI productivity trap

The current AI conversation is obsessed with code generation speed.

Can the model produce a feature? Can it write tests? Can it refactor a module? Can it open a pull request? Useful questions, but incomplete.

The real productivity question is:

Can the organization absorb faster code creation without turning the rest of delivery into a queue?

If ownership is unclear, AI creates more pull requests for nobody to review. If deployments are manual, AI creates more changes waiting for the same fragile release path. If knowledge is siloed, AI produces plausible work that misses hidden constraints. If approvals are excessive, AI increases the amount of work stuck in approval queues. If incidents dominate the calendar, AI-generated features wait behind the same fires.

AI accelerates the front of the system. Tactical debt throttles the back.

That is why some teams will get real leverage from AI and others will get more noise. The difference will not be model choice alone. It will be whether the team has invested in the operating mechanics that turn code into safe production change.

Paying it down

You do not pay down tactical debt with a grand transformation deck.

You pay it down by making the work easier to route, repeat, observe, and recover.

Start with ownership. Every production system should have a current owner, an escalation path, and a place where decisions live.

Automate the boring repeatable paths. Deployments, migrations, rollbacks, environment setup, recurring checks, customer operations. If humans must do it, the system should explain why.

Move routine status into durable async artifacts. Tickets, decision records, dashboards, release notes, incident timelines, dependency trackers. Meetings should discuss exceptions and decisions, not recite state.

Destroy hero dependencies kindly. Pair. Rotate. Document. Record decisions. Build runbooks. Make excellence reproducible instead of admirable.

Track interrupt load honestly. If 40% of the team’s time goes to incidents and escalations, plan with 60%, then fix the sources of the 40%.

Make handoffs explicit. Define what information must travel with work when it crosses a boundary.

Replace ceremonial approvals with evidence-based controls. Automated checks where possible. Focused human review where necessary.

Turn tribal knowledge into system knowledge. A runbook is better than a memory. A test is better than a warning. Automation is better than a checklist.

None of this is glamorous. That is the point. Tactical debt survives because it hides inside ordinary work.

The boulder always gets paid

Engineering velocity is not primarily about typing speed. It is about how quickly a team can turn an idea into a safe, observable, reversible production change.

Technical debt makes that harder by degrading the code. Tactical debt makes it harder by degrading the path around the code.

Ignore either one and the boulder gets heavier.

That is the uncomfortable lesson for the AI era. Faster code generation does not automatically create faster organizations. It can even make slow organizations feel worse, because the gap between “code exists” and “value shipped” becomes impossible to ignore.

The teams that win will not be the ones that merely generate more code. They will be the ones that remove the tactical drag around delivery: clear ownership, automated operations, strong async communication, shared knowledge, sane approvals, resilient systems, and handoffs that do not require archaeology.

Because tactical debt does not care how modern your stack is.

It will happily chain a very smart engineer, using a very expensive AI tool, to the same old boulder.

This post is licensed under CC BY 4.0 by the author.