EngineeringDevOps & Tooling

Secrets Management

How we store, rotate, scope, and inject secrets — and why none of them belong in source control.

Overview

A secret is anything that grants access: a credential, a token, a signing key, a database password, a private key, an API key, a webhook secret. Lose one and the system's trust boundary is breached; lose one silently and the breach is indefinite. The entire engineering organisation has an interest in a small number of clear rules about how secrets are handled — because the most expensive incidents begin with a secret in the wrong place.

This page is the canonical home for secrets guidance. Other pages touch on the topic (secrets in Git, secrets in IaC, secrets in AI prompts) and link here instead of re-stating.

For keeping secrets out of commits, see Git Best Practices. For keeping secrets out of IaC state, see Infrastructure as Code. For keeping secrets out of AI tooling, see AI Code Assistants.


Why It Matters

A committed secret is public the moment it lands on a remote. Even if deleted later, it lives in the history, in clones, in forks, and often in cache layers the team cannot reach. Revocation is the only response — and revocation is never free.

The half-life of an unrotated secret is long. A credential that was set in a config file three years ago is still valid today if no one has rotated it. Former employees retain the access they had, machines and services retain credentials they no longer need, and the breach surface accumulates quietly.

Shared secrets are untraceable secrets. A secret used by every engineer looks identical in audit logs no matter who actually used it. In an incident, "who did this?" becomes unanswerable. Scoping secrets per person, per service, or per environment is a prerequisite for any meaningful audit.

A secret in a prompt leaves the system. Pasting a secret into an AI tool, a paste-bin, a chat with support, or a screenshot uploaded to a ticketing system is a disclosure. The secret is now in places you do not control and cannot purge.


Standards & Best Practices

Secrets never live in source control

The rule has no exceptions. Not in .env.example (even with placeholder-looking values), not in a "temporary" commit, not in a private repo, not in a comment. The trust model of a repo is "people with read access"; for a secret the trust model must be "people with a specific, time-bounded authorisation."

Operationally:

  • Real secrets live in a secret store (a vault, a managed secret service, a KMS)
  • The repo contains the schema of what secrets are needed (names, descriptions, which environments require which) — not the values
  • Local development uses a developer-scoped copy of the secret (per-person, not shared) fetched from the secret store

Every secret has an owner and a lifecycle

A secret that exists without an owner is a secret nobody knows how to rotate. For every secret the system uses, the team must know:

  • What it's for — which service or integration
  • Who owns it — the team or person accountable for it
  • Where it lives — which secret store, under what path
  • How to rotate it — the procedure, without reinventing it in the middle of an incident
  • When it was last rotated — audit-visible

Secrets without this metadata eventually expire, break production, or get rediscovered by someone who has to figure out what they are before anything can be fixed.

Rotation is scheduled, not reactive

Rotation is how the blast radius of a hypothetical leak is bounded. Schedule it:

  • Long-lived credentials (DB passwords, service-to-service API keys): rotated at a defined interval appropriate to their sensitivity
  • Short-lived credentials (deploy tokens, build-system secrets): rotated much more frequently, or made ephemeral entirely
  • Personal credentials (engineer access, individual tokens): rotated on a regular cadence and immediately on role change or departure

Rotation that only happens after a breach is not rotation; it is emergency response. A practiced rotation procedure is also what makes emergency rotation fast.

Prefer ephemeral credentials to long-lived ones

Wherever the platform supports it, a short-lived credential issued on demand is safer than a long-lived credential stored somewhere:

  • Workload identities (service account identity derived from the runtime environment)
  • Short-lived tokens minted per-request or per-job
  • OIDC-based trust between CI systems and cloud platforms, replacing static deploy keys
  • Database access via broker-issued credentials rather than shared passwords

An ephemeral credential cannot be leaked in a way that survives its own lifetime. A long-lived credential can sit in a log line for years.

Least privilege is the default scope

A secret grants access to something. That access should be scoped to the minimum necessary:

  • A CI job should have a credential that can deploy the service it is deploying — not one that can deploy any service
  • A read-replica credential is not a read-write credential
  • A developer has access to development and staging secrets; production secrets are scoped to the people and systems that actually need them
  • Third-party integrations receive scoped tokens (by resource, by action), never admin tokens

Broad scopes exist because they are easier to configure once and harder to configure correctly. The convenience is paid back at incident time with interest.

Secrets are injected at runtime, not baked into artefacts

Container images, binaries, serverless packages — none of these should embed a secret at build time. The build is distributable; the secret is not. Secrets are injected at runtime, from the secret store into the running process, through the platform's supported mechanism (env vars populated by the orchestrator, mounted files, sidecar, or SDK).

This separation is also what makes rotation possible. A secret baked into an artefact cannot be rotated without a redeploy.

Production secrets are not developer secrets

A developer machine does not hold production secrets. Not for debugging, not for "I just need to check something," not for a one-off data investigation. If an engineer needs to perform an action using a production credential, the action is performed through a platform that uses the credential on the engineer's behalf — and audits it — not by putting the credential on the laptop.

The reason is not distrust of the engineer; it is distrust of the laptop. Laptops get lost, stolen, malware-infected, or backed up to services the company does not control. Keeping production credentials away from them removes an entire class of incident.

Rotate on role change and departure — always, without exception

When someone changes teams or leaves the company, every secret they had access to is potentially compromised. This is true even if the person is trusted; the machine they used, the backups of that machine, and the chat messages where they may have pasted a secret are all part of the surface.

Rotation on departure is not a statement about the person; it is hygiene about access. Skipping it is how former-employee credentials end up being used months after departure, usually not by the former employee but by someone who acquired them.

Logs and error messages do not contain secrets

A secret in a log file is a secret in every log aggregation system, every archive, every backup, and every observability tool that ingested that log. The cleanup cost is high; the detection cost is often higher. Standards:

  • Never log credentials, tokens, or keys — not even truncated "for debugging"
  • Error messages that might surface a secret are scrubbed before leaving the process
  • Logging libraries are configured with redaction for known sensitive field names
  • Accidental disclosures trigger rotation of the disclosed secret, not just log purging

Break-glass access exists, is audited, and is rare

There are genuine situations where someone needs production access they do not normally have: a critical incident at an unusual hour, a cross-team response to a novel failure. Break-glass procedures are what allow this without either (a) granting broad standing access or (b) forcing ad-hoc credential sharing.

A break-glass is: a pre-defined path to elevated access, explicitly logged, time-bounded, reviewed after use. Any break-glass use that is not reviewed is the start of a broader problem.


How to Implement

Choosing a secret store

The system of record for secrets is a secret store. The choice of store depends on platform and scale:

  • Managed cloud secret services — Operationally simple, tightly integrated with the cloud platform's IAM, usually the right default for cloud-native teams
  • Self-hosted vaults — More control, more operational overhead, better fit for multi-cloud or strict compliance regimes
  • Lightweight developer tools (1Password, etc.) — Suitable for developer secrets (tokens an engineer uses); not suitable as the system of record for application secrets

What matters is not which product, but that there is one system of record and the whole team knows what it is. Fragmented secret storage is the problem any of these solve.

What belongs in the secret store schema

The repo documents what secrets are needed (names, purposes, required environments). The secret store holds the values. Together they form a complete picture that either alone is incomplete. A new engineer should be able to look at the schema, identify what they need access to, and request it — without needing to read code.

Developer workflow

  • Developers authenticate to the secret store as themselves (not via a shared credential)
  • Local configuration fetches developer-scoped values (not production values)
  • Secret rotation is transparent: the next run of the app picks up the new value without code changes
  • Secret exposure in local logs is prevented by the same redaction rules used in production

CI/CD workflow

  • CI does not hold long-lived platform credentials; it uses workload identity or short-lived tokens
  • Per-pipeline scoping: a pipeline for service X can deploy only service X
  • Secrets are surfaced to the job through the CI's secret mechanism (masked in logs, not persisted in artefacts)
  • Build artefacts never contain secrets; runtime injection is the only path

Production workflow

  • Services fetch secrets at startup, or via a sidecar/SDK that handles fetching and caching
  • A secret update in the store is reflected in running services without a full rebuild — via rotation APIs or short-lived refetching
  • Every secret read is authenticated (workload identity) and, where possible, audited

Common Pitfalls

The .env file on a laptop. A full set of production credentials, copied "temporarily" to make debugging easier, forgotten for months. A laptop that leaves a coffee shop with that file is the start of a multi-week incident.

Secrets in CI logs. A script prints its configuration at startup. A redact rule is missing one variable. The build log containing the secret is now in the CI system's retention indefinitely, readable by anyone with log access.

The shared admin token. A single high-privilege token that the team passes around for "quick" tasks. It is in people's shell histories, in Slack DMs, and in at least three .env files. Revoking it breaks workflows nobody has documented. Nobody remembers who rotated it last.

The .env.example with real values. "I'll replace these with dummies before I commit." The commit happens at 6pm on a Friday. The real values are now in the history.

Rotating on "when we remember." A rotation policy that exists in documentation but not in a scheduler. Months pass without rotation. The next incident asks "when was this last rotated?" and no one can answer.

The secret in the error message. A production error message includes the full request body, which included the authentication token. The error is captured by the monitoring tool, which is indexed and searchable by engineers — including engineers who should not have access to that token. The pipeline of tools built to help operations becomes a leak path.

Developer secrets used in production. A credential intended for development is also accepted by production because the environments were not strictly separated. A low-trust credential turns out to grant high-trust access because the system does not distinguish.

Giving up rotation because it is painful. Rotation is painful primarily when the team has never practiced it. Each rotation gets easier as the tooling hardens. A team that avoids rotation because it is painful has optimised its operations for the wrong metric — it is trading short-term friction for long-term blast radius.

Pasting a secret into a chatbot. A secret pasted into an AI coding assistant, a support chat, or any third-party tool is now in that vendor's systems. Rotate immediately. See AI Code Assistants for the broader policy.