Manage collector credentials with the secrets layer

The problem

Before the secrets layer, device_collector.auth stored credentials in cleartext JSONB:

{"password": "swedge7-real-password"}

Those secrets landed in pg_dump, in every backup, in any log line that printed the row. Operators have one bad afternoon and they’re rotating every credential in the fabric.

The shape: reference-not-value

l2trace’s secrets layer stores references instead. The actual secret lives wherever the operator wants — env vars, a mounted JSON file, Vaultwarden (future), Vault (future). What goes in the database is a URI that names the secret:

{"password": "env://L2TRACE_SW_EDGE_7_PW"}

A resolver step between the orchestrator (which reads the DB) and the collector (which uses the credential) substitutes the URI for the real value at use time. The actual password never sits in the database.

URI schemes today

`env://VAR_NAME`

Reads os.environ[VAR_NAME]. The bedrock backend — works everywhere (Docker environment: blocks, Kubernetes Secrets mounted as env, systemd EnvironmentFile, .env loaded by docker compose).

env://L2TRACE_SW_EDGE_7_PW

The collector process must have the var set when the resolver runs. Convention (not enforced): prefix names with L2TRACE_ so they’re grep-able in deployment manifests.

`file:///abs/path.json#/json/pointer`

Reads a JSON file from disk + applies an RFC 6901 JSON pointer. Works with Docker secrets (/run/secrets/) and Kubernetes Secrets mounted as files. No external service needed.

file:///run/secrets/l2trace.json#/sw-edge-7/password

The file content is parsed JSON. The JSON pointer is everything after the #. Examples:

Pointer	Resolves to
`#/sw-edge-7/password`	`obj["sw-edge-7"]["password"]`
`#/items/0/name`	`obj["items"][0]["name"]`
`#/key~1with~1slashes`	`obj["key/with/slashes"]` (RFC 6901 escapes: `~1`→`/`, `~0`→`~`)

The file path must be absolute. The parsed file is cached keyed on (path, mtime_ns); a file touch invalidates automatically, so rotating secrets is one cp new.json old.json && touch old.json away.

`literal://value`

Returns the URI body as-is. Backwards-compatibility path for existing cleartext rows from before the secrets module.

literal://my-password-in-cleartext

Gated by L2TRACE_ALLOW_LITERAL_SECRETS (default 1). Set to 0 to refuse cleartext and force operators to migrate to env:// or file://.

Auto-wrap (no scheme)

A plain string without :// is auto-wrapped as literal://<value> internally. This is what keeps existing deploys working with no schema change — the row {"community": "public"} from before the secrets layer still resolves to public on read, because the resolver treats it as literal://public.

Auto-wrap respects the L2TRACE_ALLOW_LITERAL_SECRETS=0 gate too, so that one env var also enforces migration of legacy cleartext rows.

Registering a device with secret references

Cleartext (lab/dev)

docker compose run --rm reconciler l2trace device add \
  --hostname sw-edge-7 \
  --mgmt-ip 10.0.0.7 \
  --source ssh \
  --vendor cisco-ios-xe \
  --username netmon \
  --password 'my-cleartext-pw'

This stores {"password": "my-cleartext-pw"} in the DB. CLI prints a hint: “prefer —password-uri for production.”

env:// (recommended for most production deploys)

# In .env / Compose / Kubernetes Secret:
L2TRACE_SW_EDGE_7_PW=actual-password-here

# When registering the device:
docker compose run --rm reconciler l2trace device add \
  --hostname sw-edge-7 \
  --mgmt-ip 10.0.0.7 \
  --source ssh \
  --vendor cisco-ios-xe \
  --username-uri env://L2TRACE_SW_EDGE_7_USER \
  --password-uri env://L2TRACE_SW_EDGE_7_PW

The DB row stores {"password": "env://L2TRACE_SW_EDGE_7_PW"}. The actual password lives only in the reconciler container’s environment.

file:// (Docker / Kubernetes secret-mount)

# /run/secrets/l2trace.json (mounted from Docker / Kubernetes secret):
{
  "sw-edge-7": {"password": "actual-password", "enable": "enable-pw"},
  "sw-core-1": {"password": "core-password"},
  "snmp": {"ro-community": "ro-shared-community"}
}

# Compose:
services:
  reconciler:
    secrets:
      - source: l2trace_secrets
        target: l2trace.json   # mounts to /run/secrets/l2trace.json

# Register:
docker compose run --rm reconciler l2trace device add \
  --hostname sw-edge-7 \
  --mgmt-ip 10.0.0.7 \
  --source ssh \
  --vendor cisco-ios-xe \
  --username-uri env://L2TRACE_SW_EDGE_7_USER \
  --password-uri 'file:///run/secrets/l2trace.json#/sw-edge-7/password' \
  --enable-secret-uri 'file:///run/secrets/l2trace.json#/sw-edge-7/enable'

One file holds every credential; the per-device URI fragments select which keys belong to which switch. Rotation is one file replace.

Migrating existing cleartext rows

You don’t have to. The auto-wrap path keeps legacy rows working indefinitely with L2TRACE_ALLOW_LITERAL_SECRETS=1 (default).

When you’re ready to enforce URI-only secrets:

Migrate every device’s auth to URI form. For each device:

docker compose run --rm reconciler l2trace device add \
  --hostname sw-edge-7 \
  --mgmt-ip 10.0.0.7 \
  --source ssh \
  --vendor cisco-ios-xe \
  --username-uri env://L2TRACE_SW_EDGE_7_USER \
  --password-uri env://L2TRACE_SW_EDGE_7_PW

device add is idempotent — it replaces the auth row in place.

Flip the gate in the reconciler’s environment:
```
L2TRACE_ALLOW_LITERAL_SECRETS=0
```
Restart the reconciler. Any device still on cleartext now fails resolution with a clear error in last_error. Fix and re-run device add.

Failure modes & operator-facing errors

Symptom	What happened	Fix
`last_error: secrets: environment variable 'X' is not set`	env URI references a var the reconciler can’t see	Set `X` in the reconciler container env + restart
`last_error: secrets: secrets file not found: /run/secrets/l2trace.json`	file URI points at a path that isn’t mounted	Add the secret mount in compose / Kubernetes
`last_error: secrets: JSON pointer segment 'X' not found`	file URI’s pointer doesn’t match the file’s structure	Verify file content with `jq`
`last_error: secrets: literal:// secrets disabled`	`L2TRACE_ALLOW_LITERAL_SECRETS=0` + a cleartext row	Migrate this device’s auth to env:// or file://
`last_error: secrets: No registered backend for scheme 'X'`	Operator typed a URI scheme that isn’t implemented	Use one of `env`, `file`, `literal` (or wait for vaultwarden/vault — see roadmap)

Per-device failures do not crash the orchestrator. The supervisor records last_error, backs off, and retries — if the operator fixes the env var or remounts the secrets file, the next cycle resolves cleanly without restarting anything.

Adding a new backend (for developers)

The plugin surface is one file in src/l2trace/secrets/. Skeleton:

from l2trace.secrets.base import SecretsResolutionError, register_backend

class MyBackBackend:
    async def get(self, opaque: str) -> str:
        # opaque is everything after "myback://"
        # raise SecretsResolutionError on any failure
        return "fetched value"

register_backend("myback", MyBackBackend())

Then add an import to src/l2trace/secrets/__init__.py (side-effect registers the backend on package import). One backend, ~10 lines of boilerplate, no changes to the orchestrator or collectors.