EngineeringSystem Design

Tech Debt Management

How we identify, classify, and systematically pay down technical debt without letting it accumulate silently.

Overview

Technical debt is not a failure — it is a financing mechanism. Ward Cunningham coined the metaphor to describe the trade-off of shipping something that works now, knowing you will need to clean it up later. Like financial debt, it can be taken on deliberately and paid back in an orderly way. The problem is not debt itself; it is debt that is invisible, untracked, and never repaid.

There are two fundamentally different kinds of technical debt. Deliberate debt is a conscious choice: we know this is not the right solution, we've documented why, and we have a plan to address it. Accidental debt accretes silently: no one decided it was acceptable; it simply accumulated through shortcuts, changing requirements, and the natural entropy of software. Managing technical debt means making it visible, classifying it honestly, and treating paydown as real work — not something to be squeezed into the margins.


Why It Matters

Technical debt has compound interest. A messy module slows down the next change. A slow-to-change module gets fewer improvements. Fewer improvements means worse abstractions over time. The defect rate in high-complexity modules is consistently higher than in simpler ones, and the correlation gets stronger as the codebase ages. Addressing debt early, when the codebase is still understandable, costs a fraction of what it costs to address the same debt after it has propagated.

Velocity is borrowed, not saved. When a team cuts corners to hit a deadline, they do not save time — they borrow it from the future. The next sprint slows down because the shortcuts introduced friction. The sprint after that slows further. Teams that track technical debt honestly recognize this pattern and can have an evidence-based conversation with stakeholders about trade-offs, rather than being surprised by declining velocity that nobody can explain.

Debt accumulates faster than it's repaid without active management. In the absence of a visible debt register and a deliberate paydown process, debt grows every sprint. Features are added, the codebase is extended, and the wrong abstractions get used as load-bearing structure. By the time the team acknowledges the problem, the effort required to fix it has grown to the point where it seems impossible — and the only option appears to be a rewrite.

Engineers leave codebases they cannot take pride in. A codebase with severe technical debt is demoralizing to work in. Engineers know that their work is harder than it needs to be, that fixes are brittle, that onboarding is painful. Debt is not just a technical problem; it affects hiring, retention, and the quality of work that people are willing to do. A well-maintained codebase is an investment in the team's ability to attract and keep good people.


Standards & Best Practices

Classify debt before managing it

Different types of debt have different causes and different remedies. Misclassifying debt leads to the wrong kind of paydown effort. The two dimensions that matter:

Intentionality:

  • Deliberate debt — we made a conscious trade-off and documented it
  • Accidental debt — no one chose it; it accumulated through neglect or changing circumstances

Prudence:

  • Prudent debt — the trade-off was reasonable given the information available at the time
  • Reckless debt — shortcuts taken with full knowledge of the consequences, or through carelessness
DeliberateAccidental
Prudent"We'll use a simple approach now and revisit when we have more users""We didn't know that coupling those modules would cause this"
Reckless"We don't have time for tests on this release""What tests?"

Prudent deliberate debt is normal and healthy. Reckless debt — of either kind — is the type that damages teams and codebases most severely.

Maintain a debt register

A debt register is a curated, maintained list of known debt items. It is not a backlog graveyard; it is a prioritized inventory of what the team knows needs attention. Each entry should include:

  • What: a specific description of the problem, not "this module is messy"
  • Why it matters: the consequence of not fixing it — slower development, higher defect rate, operational risk
  • Estimated impact: which teams or workflows are affected
  • Proposed remedy: a specific fix, not just "clean it up"
  • Estimated effort: small / medium / large — rough sizing, not a full estimate
  • Priority: driven by frequency of change, blast radius, and defect density

The register is reviewed quarterly. Items that have been there for a year without movement are either no longer relevant (remove them) or need to be escalated (schedule them).

Reserve capacity for debt paydown

Debt that is never scheduled never gets paid. Reserve a consistent fraction of each sprint for debt work — the exact percentage depends on the team's context, but the common range is 15–20%. This is not "slack time"; it is planned work with tickets, reviewable PRs, and observable outcomes.

The alternative — waiting for a "cleanup sprint" — consistently fails. Cleanup sprints are perpetually deferred by feature pressure, and when they finally happen, the team tries to address months of debt in two weeks, does it poorly, and ships risky changes in a rush.

Distinguish debt from features

Not all things that feel like debt are debt. A feature that was designed correctly for the requirements at the time but now needs to be extended is not debt — it is a feature request. Debt is a specific claim: the current implementation is harder to work with than it should be, and that difficulty has a cost. Making this distinction keeps the debt register honest and prevents it from becoming a wish list.

The Boy Scout Rule as baseline hygiene

Leave code slightly better than you found it. When you open a file to make a change, fix the obvious problem nearby if it takes under 15 minutes — a poorly named variable, a duplicated block, an unnecessary comment. This is not debt paydown; it is preventing debt accumulation. The debt register is for systemic problems that require dedicated effort.


How to Implement

Writing a debt ticket

A debt ticket that says "clean up the auth module" will never get prioritized because it cannot be estimated or understood. A useful debt ticket answers:

## Symptom
[What makes this painful to work with? Be specific — what slows you down,
what confuses new engineers, what causes bugs?]

## Root cause
[Why does the problem exist? Changing requirements? Original design that
didn't anticipate growth? A dependency that has since been superseded?]

## Cost of not fixing
[What happens if this stays as-is? Which workflows slow down? How often
do developers touch this file? What is the defect history in this area?]

## Proposed fix
[Specific approach, not "refactor it." What changes? What gets extracted,
renamed, deleted, or replaced?]

## Estimated effort
[Small (< 1 sprint), medium (1–2 sprints), large (multi-sprint with phased delivery)]

## Risk
[What could go wrong? What needs to be tested? What other teams are affected?]

Running a debt review

Once per quarter, the team reviews the debt register:

  1. Remove resolved items. Verify each item that was supposedly fixed is actually fixed.
  2. Update priorities. Items that have been hit frequently this quarter move up; items in stable, rarely-touched modules move down.
  3. Add new items. Pull forward anything that surfaced in retros, post-mortems, or painful PRs since the last review.
  4. Schedule top items. Pick the three to five highest-priority items and commit to addressing them in the next quarter, not "eventually."

Pitching debt paydown to stakeholders

Engineers and managers often speak different languages about technical debt. Frame it in terms stakeholders understand:

  • Velocity risk: "This module is responsible for 40% of our production incidents and takes three times as long to change as it should. Every sprint we delay fixing it costs us approximately N days of engineering time in debugging and careful workarounds."
  • Specific, bounded asks: "We want to spend two sprints extracting the payment logic into a separate service. After that, payment-related changes will go from taking a week to taking a day, and the incident rate in this area should drop significantly."
  • Avoid: "our code quality is declining" or "we have a lot of tech debt" — these are vague and invite vague responses ("yes, we know, we'll get to it").

Measuring paydown progress

Track debt work like feature work. Useful metrics:

MetricWhat it tells you
Debt tickets opened vs. closed per quarterWhether the register is growing or shrinking
Cyclomatic complexity trend in high-debt modulesWhether the code is actually improving
Incident rate in previously high-debt areasWhether the paydown had its intended effect
Time-to-change in target modules (before and after)Direct measure of developer experience improvement

Tools & Templates

Debt register format

A simple spreadsheet or Notion table with these columns:

IDAreaDescriptionImpactEffortPriorityOwnerStatus
D-001Auth moduleToken validation mixed with session managementHigh — every auth change risks regressionMediumP1@engIn progress
D-002Reporting pipelineNo retry logic on external API callsMedium — intermittent failures require manual recoverySmallP2Backlog

Identifying high-churn, high-complexity files

# Files changed most often in the last 6 months
git log --since="6 months ago" --name-only --format="" \
  | sort | uniq -c | sort -rn | head -20

Cross-reference with cyclomatic complexity output from your static analysis tool. High churn + high complexity = highest debt priority.

Debt scoring rubric

Score each debt item across three dimensions (1–3 each) and sum for a priority score:

Dimension123
Frequency of changeRarely touchedChanged monthlyChanged weekly
Blast radiusIsolated utilityCore flow, single teamCross-team dependency
Defect densityNo recent bugsOccasional bugsRegular source of incidents

A score of 7–9 is high priority. 4–6 is medium. 3 is low.


Common Pitfalls

The debt register that's never updated. A register that reflects the state of the codebase from 18 months ago is actively misleading. Some items will have been resolved incidentally; new debt will have accumulated. A stale register makes planning impossible and destroys trust in the process. Review it quarterly, without exception.

Treating all debt as equally urgent. A long-standing naming inconsistency in a rarely-touched utility module is not in the same category as a tangled God Object at the center of the checkout flow. Triaging by frequency, blast radius, and defect density prevents the team from spending time on low-impact cleanup while high-impact debt compounds.

"We'll fix it in the rewrite." The rewrite is almost always further away than it appears, more expensive than anticipated, and risks reintroducing the same problems in a new form. Teams that defer debt paydown in anticipation of a rewrite end up with two problems: a codebase that gets progressively harder to work in, and a rewrite whose scope keeps growing. Pay down debt incrementally; don't wait for a clean break that may never come.

Reckless debt treated as deliberate. "We didn't have time for tests" is not the same as "we made a deliberate trade-off." Allowing reckless debt to be reclassified as deliberate removes any accountability for preventing it in the future. Be precise about what was chosen and what was merely not done.

Debt paydown with no observable outcome. A refactoring sprint that produces cleaner code but no measurable improvement in developer experience or defect rate is hard to defend in the next planning cycle. Tie debt paydown to outcomes: time-to-change, incident rate, onboarding time. If you cannot articulate the expected outcome, the scope of the paydown effort is probably wrong.

Using debt as an excuse to avoid quality now. Technical debt is a legitimate concept that gets misused as a blanket justification for shipping low-quality work. "We'll clean it up later" is sometimes the right call; it is also one of the most frequently abused phrases in software engineering. The default should be building things right. Debt is the exception, not the policy.