The compactor invariant
The compactor closes valid_during on MACs that have aged out of their
source’s threshold. It runs as a co-process in the reconciler container,
sharing one in-memory LiveSet object. This page explains why that
shared-LiveSet design exists, and why a subtle ordering bug in the
compactor could silently corrupt the bitemporal log.
The setup
Section titled “The setup”- Reconciler writes events from NATS → Postgres + LiveSet.
- Compactor periodically asks Postgres: “any MACs that haven’t been
observed in N seconds?” If yes, it closes
valid_duringon those rows and DELETEs the matchinglivenessrow. - Both processes share one LiveSet object (passed via the
livesetparameter; co-running in the same event loop).
The LiveSet is the reconciler’s hot-path classification cache. If it gets out of sync with the database, the reconciler silently misclassifies events.
The original bug
Section titled “The original bug”The first version of compact_once did this:
async def compact_once(session, settings, liveset=None): rs = await session.execute(_AGE_OUT_SQL, {...}) closed_keys = list(rs.mappings()) if liveset is not None: for k in closed_keys: liveset.remove(...) # ← MUTATES IN-MEMORY STATE return statsThen in the loop:
while True: async with session_scope() as session: await compact_once(session, settings, liveset=liveset) # session_scope commits here await asyncio.sleep(interval)That looks correct. It isn’t.
What can go wrong
Section titled “What can go wrong”session_scope() commits the transaction on __aexit__. Anything
between session.execute(...) and __aexit__ is before commit.
If the commit fails — deadlock, network blip, pool timeout,
serialization conflict — the SQL was rolled back, but liveset.remove(...)
already happened.
After rollback:
- The
mac_observationrow’svalid_duringis still open in the database. - The
livenessrow is still there. - The in-memory LiveSet has no entry for that key.
The next event for the same MAC will be classified as first-sight by the reconciler (LiveSet says “not seen before”). It writes a new INSERT — but the database already has an open row, so the EXCLUDE constraint fires:
ERROR: conflicting key value violates exclusion constraint"mac_obs_no_overlap_per_source"Or worse, in a different scenario: if valid_during of the old row
were briefly closed by the rollback’s loss, the new INSERT succeeds
without a constraint check, and now we have two rows that say the same
MAC was on two different ports simultaneously — within the same source.
The disagreement view will surface this, but you’ve still corrupted the
bitemporal log.
The fix
Section titled “The fix”Split the compactor into two phases:
async def compact_once(session, settings) -> tuple[list[Stats], list[ClosedKey]]: # SQL phase only — no LiveSet mutation rs = await session.execute(_AGE_OUT_SQL, {...}) closed_keys = [ClosedKey(...) for row in rs.mappings()] return stats, closed_keys
def apply_compaction_to_liveset(liveset, closed_keys): # In-memory phase — runs AFTER the transaction commits for k in closed_keys: liveset.remove(...)The runner pattern is now:
async with session_scope() as session: _stats, closed_keys = await compact_once(session, settings)# session_scope committed here — DB state is durableif liveset is not None and closed_keys: apply_compaction_to_liveset(liveset, closed_keys)LiveSet invalidation only happens once we know the SQL succeeded.
Regression test
Section titled “Regression test”The codebase includes a test that fails if anyone ever puts the
mutation back inline. From tests/test_compactor_race.py:
async def test_compact_once_does_not_mutate_liveset_inline(session): # ... seed an observation + liveness row liveset = LiveSet() liveset.upsert(device_id, source, mac, vlan, entry)
_stats, _closed = await compact_once(session, settings)
# CRITICAL: LiveSet entry must still be present until the caller applies it. assert liveset.lookup(...) is not None, ( "compact_once must NOT mutate LiveSet inline — invalidation belongs " "to the caller, run AFTER the surrounding session_scope() commits" )A future contributor who “simplifies” compact_once by re-introducing
inline LiveSet mutation will see this test fail with a clear message
and a Hamilton reference.
The same invariant applies to the writer
Section titled “The same invariant applies to the writer”The reconciler’s writer has the same shape:
apply_actions does SQL only, returning the LiveSet updates as a
separate value; apply_to_liveset applies them after commit. From
writer.py::apply_actions docstring:
Mutating in-memory state before commit is unsafe: a commit failure (deadlock, network error, serialization failure) would leave the LiveSet referencing a row that was never persisted, poisoning every subsequent classification for that key.
This is one shape of bug, applied to two places. Margaret Hamilton’s review of the first cut found the writer version; the polish pass found the compactor version. The general rule:
In-memory state is patched only after the surrounding transaction durably commits.
If you see code that mutates a cache before commit, that’s a bug.
See also
Section titled “See also”- The compactor source:
src/l2trace/reconciler/compactor.py - How late arrivals work — the writer’s classification flow, where the same invariant applies