Register a switch
make device-add HOSTNAME=sw-edge-7 IP=10.0.0.7 SOURCE=snmp COMMUNITY=your-read-communityThat’s it. Within 30 seconds the CollectorOrchestrator (running
alongside the reconciler in the reconciler container) sees the new
row in device_collector, spawns a SnmpCollector for it, and events
start flowing.
What just happened
Section titled “What just happened”device add does two things in one transaction:
- UPSERTs a row in
device(the topology entity) - UPSERTs a row in
device_collector(the per-source collection config)
The orchestrator polls device_collector WHERE enabled = TRUE every
30 seconds. New rows → new supervised collector tasks. Disabled rows →
cancelled tasks. Each supervisor catches exceptions and retries with
exponential backoff (1s, 2s, 4s, …, capped at 60s).
After 10 consecutive failures, the supervisor gives up — last_error
is populated and l2trace device list shows it red. Re-enabling
(disable + enable) restarts the supervisor with a fresh failure count.
See what’s running
Section titled “See what’s running”make device-listOutput is a Rich table:
| hostname | mgmt_ip | vendor | sources | last_polled | error |
|---|---|---|---|---|---|
| sw-edge-7 | 10.0.0.7 | cisco | snmp | 2026-05-11T07:42:00+00:00 | |
| sw-edge-8 | 10.0.0.8 | cisco | snmp | - | snmp:timeout after 5s |
| sw-core-1 | 10.0.0.1 | arista | gnmi, snmp | - |
A blank error column = the collector is healthy on its most recent
cycle. A populated error column = the collector hit a failure (auth,
timeout, vendor weirdness) but is retrying.
Two sources on one device
Section titled “Two sources on one device”Register a device with BOTH gNMI and SNMP — the recommended deployment for production:
# Primary: gNMI streamingdocker compose run --rm reconciler l2trace device add \ --hostname sw-edge-7 --mgmt-ip 10.0.0.7 --source gnmi --vendor cisco
# Backstop: SNMP pollingdocker compose run --rm reconciler l2trace device add \ --hostname sw-edge-7 --mgmt-ip 10.0.0.7 --source snmp --community publicBoth collectors run independently. The reconciler’s source-priority order (gNMI > SNMP) means SNMP-sourced rows take backseat when gNMI is also reporting for the same MAC — but cross-source disagreements get surfaced via the OPS disagreements pane.
Stop collecting without losing history
Section titled “Stop collecting without losing history”docker compose run --rm reconciler l2trace device disable \ --hostname sw-edge-7 --source snmpThe orchestrator cancels the SnmpCollector on its next reconfig pass
(within 30s). The configuration row stays — device enable resumes
collection with the same auth.
device remove --hostname sw-edge-7 --yes is the destructive option:
it DELETEs the device + cascade-deletes its bitemporal history. Most
operators want disable, not remove.
Resuming after a failure
Section titled “Resuming after a failure”If a collector hit the failure-budget cap, its supervisor exited.
device list will show the error but the supervisor isn’t restarting
itself. To resume:
docker compose run --rm reconciler l2trace device disable \ --hostname sw-edge-7 --source snmpdocker compose run --rm reconciler l2trace device enable \ --hostname sw-edge-7 --source snmpThis forces an unspawn + respawn on the next orchestrator pass.
Behavior on add when the reconciler is running
Section titled “Behavior on add when the reconciler is running”Adding a new device while make up is running is safe. The
orchestrator’s reconfig poll picks it up within 30 seconds — no need
to restart the container.
If you want it picked up faster, the simplest path is:
docker compose restart reconcilerWhich forces an immediate reconfig pass.
See also
Section titled “See also”- SNMP collector reference — what config the SnmpCollector accepts (auth + extras)
- Architecture at a glance — where the orchestrator sits in the data-flow diagram
- The orchestrator source:
src/l2trace/reconciler/orchestrator.py