EngineeringDevOps & Tooling

CI/CD Pipeline Best Practices

How we design and maintain continuous integration and delivery pipelines.

Overview

A CI/CD pipeline is the team's shared quality gate — the automated system that validates every change before it reaches users. Continuous integration runs on every push: it lints, type-checks, tests, and builds. Continuous delivery automates the path from a passing build to a running deployment.

The pipeline is not a personal check you run when you feel like it. It is the authoritative, reproducible verdict on whether code is ready to ship. Everything that can be enforced automatically should be enforced here — so that humans only review what humans are actually good at.


Why It Matters

Fast feedback loops catch problems before they compound. A lint error caught in 30 seconds by CI is trivially fixed. The same error discovered in a multi-hour merge conflict or a production incident costs orders of magnitude more. The sooner a failure surfaces, the cheaper it is.

Consistent environments eliminate "works on my machine." CI runs every build in the same clean environment — same OS, same dependency versions, same toolchain. What passes in CI will behave the same in production. What a developer runs locally may not.

A reliable safety net enables higher velocity. Teams that trust their pipeline can merge confidently, deploy frequently, and take smaller steps. Teams without one either move cautiously or accumulate risk. The pipeline enables speed precisely because it is a constraint.

Every run is an auditable record. Build logs, test results, deployment timestamps, and deployed SHAs are automatically recorded. When something goes wrong in production, you know exactly what was deployed, when, and from which commit.


Standards & Best Practices

Every push triggers CI — no exceptions

CI runs on every push to every branch, and on every pull request. It also runs on every merge to main. There is no branch where CI is optional. If CI is skipped, the guarantee it provides disappears.

Pipelines must complete in under 10 minutes

Developers wait for CI before merging. Once a pipeline takes more than 10 minutes, developers stop waiting — they context-switch, stack PRs, or bypass the gate. Target under 10 minutes for the full CI run. Achieve it by parallelising slow steps and caching aggressively.

Pipeline configuration lives in the repository

Workflow files belong in .github/workflows/ (or .gitlab-ci.yml, etc.) committed to version control — not configured through an admin UI. Pipeline config that isn't in the repo can't be reviewed, reverted, or understood by a new engineer.

Fail fast: run cheap checks first

StageWhat it runsApprox. time
Lint & formatESLint, Prettier, language linters< 1 min
Type-checktsc --noEmit, Pyright, etc.1–2 min
Unit testsFast in-process tests, no I/O2–4 min
IntegrationTests that hit a real DB or network3–6 min
BuildProduction bundle or container image2–5 min
DeployRuns only on main branchvaries

If lint fails, don't run tests. If tests fail, don't build. Fail at the earliest, cheapest stage.

Secrets live in environment variables, never in workflow files

Use your CI provider's secret store (GitHub Actions Secrets, GitLab CI Variables). Reference them as ${{ secrets.DEPLOY_KEY }}. A secret hardcoded in a workflow file is a secret that is now in your git history — permanently.

Branch protection requires passing CI

The main branch must be protected: PRs cannot be merged unless CI passes and at least one review is approved. Configure this in your repository settings. Without it, the pipeline is advisory, not authoritative.

Parallelise slow steps and cache dependencies

Jobs that don't depend on each other should run concurrently. Dependency installation should be cached by lockfile hash — reinstalling node_modules from scratch on every run is the most common cause of slow pipelines.


How to Implement

Step 1 — Define your pipeline stages

Before writing any YAML, agree on what your pipeline stages are and what order they run in. Map them to approximate time budgets. A pipeline without a planned structure accumulates jobs that nobody intended to add.

Step 2 — Create the CI workflow

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: ['**']
  pull_request:
    branches: [main]

jobs:
  lint:
    name: Lint & format
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with:
          version: 9
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm
      - run: pnpm install --frozen-lockfile
      - run: pnpm lint
      - run: pnpm format:check

  typecheck:
    name: Type-check
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with:
          version: 9
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm
      - run: pnpm install --frozen-lockfile
      - run: pnpm typecheck

  test:
    name: Tests
    runs-on: ubuntu-latest
    needs: [lint, typecheck]
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with:
          version: 9
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm
      - run: pnpm install --frozen-lockfile
      - run: pnpm test

  build:
    name: Build
    runs-on: ubuntu-latest
    needs: [test]
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with:
          version: 9
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm
      - run: pnpm install --frozen-lockfile
      - run: pnpm build

Step 3 — Cache dependencies

The cache: pnpm shorthand in actions/setup-node caches node_modules by lockfile hash. For more control, or for non-Node stacks:

- uses: actions/cache@v4
  with:
    path: ~/.pnpm-store
    key: ${{ runner.os }}-pnpm-${{ hashFiles('**/pnpm-lock.yaml') }}
    restore-keys: |
      ${{ runner.os }}-pnpm-

Cache by lockfile hash, not by date or run number. The cache should be invalidated exactly when dependencies change — not before, not after.

Step 4 — Enable branch protection

In GitHub: Settings → Branches → Add branch ruleset for main:

  • Require a pull request before merging
  • Require status checks to pass: add lint, typecheck, test, build
  • Require at least 1 approving review
  • Do not allow bypassing the above settings for administrators

Branch protection without administrator bypass is meaningless — if admins can push directly, "works in CI" stops meaning anything during incidents.

Step 5 — Scope deployments to main only

Deployments run in a separate workflow, triggered only when CI passes on main:

# .github/workflows/deploy.yml
name: Deploy

on:
  workflow_run:
    workflows: [CI]
    types: [completed]
    branches: [main]

jobs:
  deploy:
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to production
        run: ./scripts/deploy.sh
        env:
          DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }}

Never deploy from a feature branch. The deploy job runs after CI on main succeeds — not in the same workflow, not in parallel.

Step 6 — Notify on failure

Failing CI is only useful if the right people know about it. Post failures to a shared Slack channel, not just to the PR author:

- name: Notify on failure
  if: failure()
  uses: slackapi/slack-github-action@v2
  with:
    payload: |
      {
        "text": "CI failed on `${{ github.ref_name }}` — <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View run>"
      }
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_CI_WEBHOOK }}

Tools & Templates

CI provider comparison

ProviderStrengthsHosted / self-hosted
GitHub ActionsNative GitHub integration, large marketplaceHosted
GitLab CITight GitLab integration, flexible runnersBoth
CircleCIFast parallelism, Docker layer cachingHosted
BuildkiteSelf-hosted agents, great for large monoreposBoth
AWS CodePipelineNative AWS integration, IAM-scoped deploymentsHosted

Minimal reusable step: install + cache

- uses: pnpm/action-setup@v4
  with:
    version: 9
- uses: actions/setup-node@v4
  with:
    node-version: 20
    cache: pnpm
- run: pnpm install --frozen-lockfile

Use --frozen-lockfile in CI — it fails if pnpm-lock.yaml would be updated, catching lockfile drift before it reaches production.


Common Pitfalls

Flaky tests pollute the CI signal. A test that fails 10% of the time trains the team to re-run CI instead of reading it. Treat a flaky test as a blocking bug. Quarantine it and fix it — do not ignore it.

No dependency caching. A pipeline that reinstalls all dependencies on every run adds 2–5 minutes of pure overhead per engineer per day. At 10 engineers, that is hours of compute and waiting per day for nothing. Cache by lockfile hash.

Pipeline configuration lives in the admin UI. Config that isn't in the repo can't be reviewed, can't be reverted, and is invisible to new engineers. Every pipeline setting that can be expressed in code should be expressed in code.

Hardcoded secrets in workflow files. Secrets in YAML are secrets in git history — even if you delete the file, they remain in the history. Use the CI provider's secret store exclusively.

No branch protection. A pipeline that doesn't actually block merges is a suggestion, not a gate. Enable branch protection with required status checks. Make it apply to administrators too.

Deploying from feature branches. CI passing on a feature branch only guarantees that the branch, in isolation, is correct. It says nothing about whether it integrates cleanly with main. Always deploy from main after merge.

Long-running pipelines nobody waits for. A 40-minute pipeline encourages developers to merge optimistically and check the result later. By the time it fails, the developer has moved on. Keep pipelines under 10 minutes.