EngineeringCode Quality

Static Analysis & Quality Gates

How deep static analysis catches what linters cannot, and how we use quality gates without drowning the team in findings.

Overview

Static analysis examines source code without executing it — looking for bugs, security vulnerabilities, quality issues, and architectural violations that a linter alone cannot detect. Where linting enforces style rules and catches obvious anti-patterns on individual files, static analysis tools reason across the whole codebase: tracking data flow, detecting injection vulnerabilities, measuring complexity trends, and surfacing patterns that only become visible at scale.

Quality gates are the enforcement layer. They define pass/fail thresholds ("no new critical vulnerabilities", "new-code coverage must not drop below the target") and block a merge or deployment when the thresholds are violated.

This page is tool-agnostic. The principles apply whether the tooling is SonarQube, CodeQL, Semgrep, DeepSource, or a language-specific scanner. A given organisation will pick a primary tool based on its platforms, languages, and compliance context; the principles do not change with that choice.

For file-level linting and formatting, see Linting Standards. For pre-commit automated checks, see Pre-Commit Hooks. For the supply-chain dimension, see Dependency Management.


Why It Matters

Linters have a limited scope. File-level linters operate on small windows of the code. They cannot see issues that span files, data-flow vulnerabilities (SQL injection paths, XSS sinks, unsafe deserialisation), or trends in complexity over time. Static analysis is the layer that sees what the linter structurally cannot.

Security vulnerabilities are expensive to fix late. A static analysis tool that flags an injection vulnerability in a PR is orders of magnitude cheaper than the same vulnerability found in a penetration test, reported by a customer, or exploited in production. Every week a vulnerability lives in production compounds the cost of fixing it.

Quality gates prevent entropy. Without enforced thresholds, codebases accumulate issues gradually and invisibly. A quality gate that blocks merges when new issues are introduced makes the debt visible at the moment it is created — not months later when someone discovers the accumulated backlog.

Objective metrics support engineering conversations. "This module is getting difficult to maintain" is harder to act on than "this module's complexity has increased 40% over the last quarter and accounts for 60% of recent defects." Static analysis gives the conversation a shared set of facts.


Standards & Best Practices

Run on every pull request, not just on main

Catching issues in a PR is the lowest-cost moment to address them — before review, before merge, before deploy. A static analysis job that runs only on main arrives after the code has already landed and the author has moved on. The feedback loop is broken. Run on every PR, block the PR on quality-gate failure, and let authors fix issues while the context is still in their head.

Adopt the "new code" policy

Introducing static analysis to an existing codebase will surface a large backlog of pre-existing issues. Trying to fix them all before enabling gates is what kills static-analysis adoption. Instead:

  1. Establish a baseline — the current state of main
  2. Configure the quality gate to enforce the standard only against new or modified code
  3. Separately schedule debt reduction as planned work items

This is sometimes called the "clean new code" policy. It is the only pattern that makes gate adoption practical for a codebase that did not start with static analysis on day one.

Triage findings before enforcing them

Not every finding is a real issue. All static analysis tools produce false positives. Before flipping a rule into enforcing mode:

  • Is the finding real? (Does it describe an actual problem?)
  • Is it actionable? (Can the engineer do something about it?)
  • Is the noise worth the signal? (If 90% of findings are false positives, the team learns to ignore them)

A rule that fails these questions stays in reporting mode or is disabled. Enforcing a noisy rule is how teams lose trust in the tool.

Don't duplicate what linting already does

Most static analysis tools include rules that overlap with what linters catch (complexity metrics, unused variables, common style issues). Running both produces redundant findings and confuses authors about where to fix what. Standards:

  • Let the linter handle file-level style and structural issues
  • Let static analysis focus on what it uniquely provides — cross-file data flow, security, supply-chain, deep architectural rules
  • When a rule is covered in both tools, disable it in one deliberately, not by accident

Severity categories translate to actions

Tools use different severity vocabularies, but the rough mapping to action is consistent:

SeverityAction
Blocker / CriticalMust be fixed before merge; quality gate blocks the PR
High / MajorFixed in the current iteration; escalates if deferred
MediumTracked as debt; fixed opportunistically or on schedule
Low / InfoAdvisory; visible on the dashboard, not actively blocked

Setting the blocking threshold is an organisational decision — one that should be revisited as the codebase and team mature.

Suppressions are documented, not silent

When a finding is a confirmed false positive, suppress it with the tool's inline annotation — and document why. A one-line comment explaining the justification turns an invisible suppression into a reviewable decision.

Undocumented suppressions are indistinguishable from "we gave up." They accumulate, hide real issues, and eventually require a cleanup pass to understand what is actually suppressed and why.

Security hotspots are reviewed, not dismissed

Tools that distinguish "hotspot" (a place where sensitive operations happen and the code needs human review) from "vulnerability" (a confirmed issue) are surfacing different things. Hotspots are not always bugs — but they are always places where a human needs to make the final call. Treat hotspot review as part of the security posture, not an extension of noise suppression.

The tool is part of the supply chain

The static analysis tool itself — its server, its credentials, its configuration — is part of the security surface. Standards:

  • The tool's authentication tokens are treated like any other secret (see Secrets Management) and rotated on a schedule
  • The tool's configuration (ruleset, thresholds, suppressions) is version-controlled alongside the codebase where practical
  • Tool upgrades are tracked: a new version may add rules that surface pre-existing issues, which the new-code policy must handle gracefully

Gate design: block on new, report on total

Quality gates work best when they draw a sharp line between "new code must pass" and "existing code is improving." The PR-blocking check looks only at the delta; the overall trend — visible on a dashboard, reviewed periodically — shows whether the codebase is improving over time. Inverting this (blocking on total findings) produces a gate that is either trivially green or permanently red, neither of which is informative.


How to Implement

Adopting static analysis in an existing codebase

  1. Pick a primary tool — the one best suited to the team's platforms, languages, and security posture. One tool owns the baseline; additional tools layer on top for specific domains (e.g. language-specific scanners, supply-chain scanners)
  2. Run it first in report-only mode on main — nothing blocks, everyone sees the current state
  3. Establish a baseline; record the numbers, so improvements are measurable
  4. Turn on new-code gating — the PR pipeline fails when new code introduces findings above an agreed severity
  5. Triage the top-severity existing findings as a one-off, or schedule them as debt items
  6. Iterate on the ruleset — add rules that fit, disable rules that don't, document suppressions

What a quality gate should cover

A quality gate that is enforced on every PR typically checks some combination of:

  • No new findings above an agreed severity
  • No new security hotspots introduced without explicit review
  • New code test coverage at or above an agreed threshold
  • New code duplication below an agreed threshold
  • No new "critical" architectural violations (e.g. layering rules the codebase follows)

Each of these is a policy decision, not a technical one. The thresholds that work for one team will not work for another — the point is to make them explicit and to enforce the ones that have been chosen.

Choosing the primary tool

The landscape of static analysis tools is wide. A rough orientation:

Tool familyTypical strength
SonarQube / SonarCloudBroad quality + security across a multi-language portfolio; mature gate UX
CodeQLDeep security analysis; strong for custom security query authoring
SemgrepFast, rule-authoring-friendly; widely used for custom pattern matching
DeepSourceHosted quality + security; lighter-touch integration
Language-specific (Bandit, Brakeman, etc.)Deep coverage of a single language's idioms and vulnerabilities

The right default is the one that fits the organisation's stack, not the one highest in any given comparison. Teams frequently run one broad tool (SonarQube or equivalent) and layer one or two specialised tools on top for their specific risk areas.

Integrating with the development workflow

  • IDE integration (where the tool supports it) surfaces findings at the moment of authoring — shortest feedback loop
  • Pre-merge CI run enforces the gate; the PR cannot merge with a failing gate
  • Dashboard view tracks overall trends — reviewed on a cadence, not every day
  • Rules and thresholds live in version control; changes are reviewable

Common Pitfalls

Running only on the main branch. Post-merge findings arrive too late. The code is in history, the author has context-switched, and fixing requires a new PR. Run on every PR.

Enabling all rules at once on an existing codebase. Thousands of findings, most of them historical, all blocking. The team learns to ignore the gate — or worse, the gate gets disabled entirely. Adopt the new-code policy from the start.

Treating coverage as a quality ceiling. Coverage alone does not make tests good. A test that calls a function without asserting anything will increase coverage while providing no safety. Coverage is a minimum floor; test quality is a separate question. See Testing Standards.

Suppressing issues without documentation. A suppression without a comment is indistinguishable from "we gave up." It accumulates as invisible debt and obscures the real state of the codebase. Every suppression is a reviewable decision.

Choosing the tool for the dashboard. Good dashboards help, but a tool is not chosen for its graphs — it is chosen for the accuracy and usefulness of its findings on the codebase's actual languages and patterns. Evaluate with real code, not marketing material.

Treating the tool's defaults as the team's standards. Default rulesets are starting points, not finished products. Teams that never customise the ruleset inherit decisions made by someone who does not know the codebase. Review the defaults; disable what does not fit; document what is kept.

Not rotating the tool's credentials. The authentication token is a long-lived credential stored in CI. Treat it like any other secret: rotate on schedule, scope to the minimum permission, revoke immediately on exposure. See Secrets Management.

Ignoring tool upgrades. A static analysis tool evolves; new versions add rules that would have been valuable to apply sooner. A team on a year-old version is a team missing a year of tool improvements. Upgrade on a cadence, with the new-code policy absorbing any newly-surfaced historical findings.