Event envelope
Every collector — gNMI streaming, SNMP polling, SSH show parsing —
emits the same shape of event onto NATS. The reconciler is the only
consumer. The shape is enforced by Pydantic models in
src/l2trace/events/schema.py.
The envelope
Section titled “The envelope”class EventEnvelope(BaseModel, frozen=True): event_id: UUID # UUIDv7, derived deterministically kind: EventKind # MAC_LEARNED, MAC_REMOVED, etc. source: Source # GNMI, SNMP, SSH, NETCONF, RECONCILER device_id: int payload: EventPayload # union by kind
# Four timestamps device_observed_at: datetime # device clock when it saw the event observed_at: datetime # corrected device time (skew-fixed) collector_emitted_at: datetime # collector wall-clock at NATS publish ingested_at: datetime # reconciler wall-clock at consumeWhy four timestamps
Section titled “Why four timestamps”A single timestamp can’t distinguish:
- “The device’s clock is 5 minutes off.”
- “The event sat in a queue for 5 minutes.”
These have the same symptom (now - timestamp = 5min) but completely
different remediation. The four-timestamp model lets you tell them apart:
| Comparison | What it tells you |
|---|---|
observed_at − device_observed_at | NTP skew at the device |
collector_emitted_at − observed_at | Latency from device → collector |
ingested_at − collector_emitted_at | Queue dwell time in NATS |
ingested_at − observed_at | Total end-to-end latency |
The reconciler uses these to classify quarantine events:
- device-skew —
|device_observed_at − collector_wall_clock| > THRESHOLD→ the device’s clock is off; correctedobserved_atis best-effort - queue-dwell —
ingested_at − collector_emitted_at > THRESHOLD→ the event was stale by the time the reconciler saw it; honoring it could rewrite a fact we already corrected
Why event_id is a UUIDv7
Section titled “Why event_id is a UUIDv7”UUIDv7 is timestamp-prefixed — sorting by event_id is approximately
sorting by event time, which is useful for cheap iteration over recent
events.
The event_id is deterministically derived from
(source, device_id, mac, vlan, port_name, device_observed_at) via
SHA-256. Re-emitting the same observation produces the same event_id,
which the reconciler’s ON CONFLICT (event_id) DO NOTHING clause
absorbs harmlessly. JetStream’s at-least-once delivery is therefore
free to redeliver — we just ack the duplicate.
A consequence: if a collector “corrects” device_observed_at after
the fact (NTP catches up, the device clock jumps), the corrected event
gets a different event_id and writes a new row. That’s intentional —
the original event with its broken timestamp is preserved as part of the
bitemporal record.
Payload variants
Section titled “Payload variants”payload is a Pydantic discriminated union on kind. The common variants:
class MacLearned(BaseModel, frozen=True): mac: str # normalized vlan: int port_name: str # vendor name like "Ethernet1/1" entry_type: MacType = MacType.DYNAMIC # or STATIC, SECURE
class MacRemoved(BaseModel, frozen=True): mac: str vlan: int port_name: str
class PortStateChanged(BaseModel, frozen=True): port_name: str state: str # "up", "down", "admin_down", ...
class LldpNeighborUpdate(BaseModel, frozen=True): local_port_name: str remote_chassis_id: str remote_port_descr: str | None protocol: AdjProto = AdjProto.LLDP
class DeviceIdentified(BaseModel, frozen=True): chassis_id: str # canonical MAC like '00:1a:a1:11:22:33'(See src/l2trace/events/schema.py for the authoritative list, plus
the snapshot variants CamSnapshot / LldpSnapshot / StpSnapshot
that bundle many entries into one envelope for SNMP poll cycles.)
DeviceIdentified — the smallest payload, biggest unlock
Section titled “DeviceIdentified — the smallest payload, biggest unlock”DeviceIdentified carries a single field (chassis_id) but it
unblocks every pending peer-resolution UPDATE in the adjacency
table. Both collectors emit it:
- gNMI reads
/lldp/state/chassis-idfrom the OpenConfig subscription and emits on every refresh. - SNMP walks
lldpLocChassisIdonce per poll cycle and emits before the LLDP neighbor walk events, so same-cycle eager peer resolution can use the just-known chassis_id.
The reconciler turns each event into two UPDATEs:
device.chassis_id = :chassis_id WHERE id = :device_id— always.adjacency.remote_device_id = :device_id WHERE remote_chassis_id = :chassis_id— backfills every pending row.
Idempotent: re-emitting the same chassis_id is a no-op UPDATE. See How peer resolution works for why this two-phase resolve-and-backfill pattern beats waiting for both ends to be registered.
Subject routing
Section titled “Subject routing”Events publish on subjects matching {NATS_SUBJECT_PREFIX}.{source}.{device_id}.{kind},
e.g.:
l2trace.gnmi.42.mac_learnedl2trace.snmp.99.mac_removedl2trace.reconciler.42.quarantineThe reconciler consumes l2trace.> with a durable JetStream consumer.
Quarantine events go on a separate subject (*.quarantine) so the OPS
screen can tail them independently.
See also
Section titled “See also”- The reconciler’s late-arrival classification
- Why bitemporal? — what the timestamps ultimately power