Collect from an SSH-only device

When you need this

“Lab gear: SNMP is administratively disabled and no one’s getting it re-enabled this quarter. We can ssh in fine.”
“This is a remote-site IOS-XE switch where gNMI isn’t compiled into the train we’re stuck on, and the SNMP read-community got rotated with no record.”
“We need cross-source telemetry asymmetry detection (the audit) but only have SSH access to half the fabric.”

SSH is the last-resort backstop. It’s the slowest of the three collectors (full session login + 4 RPC round-trips per poll), the most fragile (CLI scraping vs structured data), and the only one that doesn’t emit DeviceIdentified (so peer resolution still needs gNMI or SNMP somewhere). Use it when you can’t use anything else.

How the collector works

The collector wraps napalm — the Network Automation and Programmability Abstraction Layer with Multivendor support. napalm exposes a vendor-normalized API: the same get_mac_address_table() / get_lldp_neighbors_detail() signatures return the same-shaped dicts whether you point them at IOS, NX-OS, EOS, or JunOS.

That normalization is the reason SSH is napalm-based rather than raw scrapli/netmiko — without it, we’d be writing per-vendor regex against show output for every supported platform. With it, the SSH collector is ~250 lines and works against any platform napalm has a driver for.

Per poll cycle (default every 60 s):

napalm call	What it returns	What l2trace emits
`cli([VENDOR_PROBE])`	Raw `show` output containing the chassis MAC	`DeviceIdentified` (emitted FIRST)
`get_facts()`	Hostname, vendor, model, OS version	(logged only — not converted to events)
`get_mac_address_table()`	List of `{mac, interface, vlan, static, active, …}`	`MacLearned` per active row
`get_lldp_neighbors_detail()`	`{iface: [{remote_chassis_id, remote_port, …}]}`	`LldpNeighborUpdate` per neighbor
`get_interfaces()`	Per-port `{mac_address, is_up, …}`	(reserved — see below)

All five calls share one SSH session per poll cycle. The single asyncio.to_thread() wrap means each cycle is one ssh login rather than five. DeviceIdentified is emitted first so the reconciler can update device.chassis_id before processing the same-cycle LLDP events that may reference it — same in-cycle eager-resolution pattern the SNMP collector uses.

Supported vendors

Today: Cisco IOS-XE (cisco-ios-xe), Classic IOS (cisco-ios), and Cisco NX-OS via SSH (cisco-nxos). The --vendor flag maps to a napalm driver name internally via the SUPPORTED_DRIVERS map in l2trace/collectors/ssh.py.

For other napalm-supported vendors (Arista EOS, Juniper JunOS) you can set device_collector.extras['driver_name'] directly to the napalm driver string (eos, junos) — but the vendor-key lookup table is the supported path; everything else is “it’ll probably work.”

Register an SSH-only switch

Production (preferred — credentials never enter the DB)

Put the password in an env var, then reference it from the CLI:

# In .env on the reconciler:
L2TRACE_SW_EDGE_7_USER=netmon
L2TRACE_SW_EDGE_7_PW=your-actual-password

# Register the device:
docker compose run --rm reconciler l2trace device add \
  --hostname sw-edge-7 \
  --mgmt-ip 10.0.0.7 \
  --source ssh \
  --vendor cisco-ios-xe \
  --username-uri env://L2TRACE_SW_EDGE_7_USER \
  --password-uri env://L2TRACE_SW_EDGE_7_PW \
  --enable-secret-uri env://L2TRACE_SW_EDGE_7_ENABLE   # optional

The DB row stores {"password": "env://L2TRACE_SW_EDGE_7_PW"}. The actual password never lands in pg_dump, backups, or log lines. See Manage collector credentials for the full URI catalog (env://, file://, etc).

Key-based auth (preferred when the device supports it)

For switches with SSH key authorization configured, skip the password entirely:

# Mount the private key into the reconciler container, e.g. via Docker
# secrets in docker-compose.yml:
#   secrets:
#     sw_edge_7_key:
#       file: ./keys/sw-edge-7-id_ed25519
#   services:
#     reconciler:
#       secrets:
#         - source: sw_edge_7_key
#           target: ssh/sw-edge-7-key
# (mounted to /run/secrets/ssh/sw-edge-7-key inside the container)

docker compose run --rm reconciler l2trace device add \
  --hostname sw-edge-7 \
  --mgmt-ip 10.0.0.7 \
  --source ssh \
  --vendor cisco-ios-xe \
  --username-uri env://L2TRACE_SW_EDGE_7_USER \
  --ssh-key-file /run/secrets/ssh/sw-edge-7-key

If the private key is encrypted, add a passphrase:

  --ssh-passphrase-uri env://L2TRACE_SW_EDGE_7_KEY_PASSPHRASE

The path lives in device_collector.extras['ssh_key_file'] — it’s a filesystem reference, not a secret value. The actual key material stays in the file at that path, where Docker / Kubernetes manages the mode + ownership. device_collector.auth only ever holds the username (and optionally the passphrase URI), so pg_dump still contains nothing useful for an attacker.

You can also combine --password-uri and --ssh-key-file — netmiko tries the key first, then falls back to password. Useful for bastions that require both factors or mixed-mode device fleets during a key rollout.

Lab / dev (cleartext is fine)

make device-add-ssh \
  HOSTNAME=sw-edge-7 IP=10.0.0.7 \
  USERNAME=netmon PASSWORD='your-password' \
  VENDOR=cisco-ios-xe

Or directly:

docker compose run --rm reconciler l2trace device add \
  --hostname sw-edge-7 \
  --mgmt-ip 10.0.0.7 \
  --source ssh \
  --username netmon \
  --password 'your-password' \
  --vendor cisco-ios-xe \
  --enable-secret 'enable-password'   # optional, for IOS-XE 'enable' mode

The CLI prints a hint suggesting --password-uri for production use.

The orchestrator picks up the new row on its next reconfig pass (~30 s). After that, l2trace device list shows the new row + its last-polled timestamp + any error.

How `DeviceIdentified` works under napalm

napalm’s normalized getters don’t expose the LLDP-spec chassis_id (get_facts() returns hostname/vendor/model but not the chassis MAC). The collector closes this gap with a per-vendor cli([command]) sidecar — napalm’s escape hatch for raw show commands.

The per-vendor probes live in _CHASSIS_ID_PROBES in l2trace/collectors/ssh.py:

Driver	Probe command	Source line in output
`ios` (IOS / IOS-XE)	`show version`	`Base ethernet MAC Address : 8C:60:4F:69:E9:6C`
`nxos_ssh`	`show lldp local-info`	`Chassis ID : 8c60.4f69.e96c`

Each entry is a (command, regex) pair. The regex has one capturing group; whatever it captures gets piped through normalize_mac() so the final form is colon-hex regardless of how the vendor formats it (IOS-XE colons, NX-OS cisco-dots, Windows dashes — all normalize to 8c:60:4f:69:e9:6c).

Adding a new vendor is one map entry. Pick the show command that prints the chassis MAC, write a regex with one MAC-shaped capture group, add it to _CHASSIS_ID_PROBES. For EOS this might be show lldp local-info (Arista chose the NX-OS naming); for JunOS it’s show lldp local-information.

Probe failure is non-fatal. If cli() raises (privileged-mode denial, command unsupported on an older train) or the regex doesn’t match (vendor reformatted the output line), the collector logs a warning and the cycle continues — FDB and LLDP events still emit, but DeviceIdentified is skipped this round. The reconciler treats re-emits as no-op UPDATEs, so a transient probe failure self-heals on the next cycle.

For drivers without a registered probe (e.g. an EOS/JunOS escape-hatch via extras['driver_name'] that we haven’t added probe support for yet), the probe is silently skipped — device.chassis_id stays NULL for that source. Peer resolution then depends on gNMI or SNMP also being configured per device. See How peer resolution works for the mechanism.

Trying it on lab gear

You probably don’t have a fleet of IOS-XE switches handy for a smoke test. The pattern the project’s own tests use:

from l2trace.collectors.ssh import SshCollector
from l2trace.collectors.base import CollectorConfig
from l2trace.events.schema import Source

class _Stub:
    def __init__(self, **kw): pass
    def __enter__(self): return self
    def __exit__(self, *e): pass
    def get_facts(self): return {"fqdn": "sw-edge"}
    def get_mac_address_table(self):
        return [{"mac": "00:1a:a1:11:22:33", "interface": "Eth1",
                 "vlan": 10, "static": False, "active": True}]
    def get_lldp_neighbors_detail(self):
        return {"Eth1": [{"remote_chassis_id": "aa:bb:cc:dd:ee:ff",
                          "remote_port": "Eth8"}]}
    def get_interfaces(self): return {}

cfg = CollectorConfig(
    device_id=1, hostname="sw-edge", mgmt_ip="192.0.2.10",
    source=Source.SSH,
    auth={"username": "admin", "password": "secret"},
    extras={"vendor": "cisco-ios-xe"},
)

async def emit(envelope):
    print(envelope)

collector = SshCollector(cfg, emit, driver_factory=lambda _n: _Stub)
await collector._poll_once()

The driver_factory injection is the test seam — production uses napalm.get_network_driver(...), but injecting a stub class with the same with-block contract lets you exercise the entire emit path without netmiko, paramiko, or any real switch.