stage 6: per-app

2026-05-20 23:47:19 +10:00 · 2026-05-20 23:47:19 +10:00 · fcf421b94c
commit fcf421b94c
parent 9edd809416
31 changed files with 6360 additions and 344 deletions
--- a/PLAN.md
+++ b/PLAN.md
@ -13,10 +13,16 @@ conversational sketch.

 ### Goals

- **Hard safety net.** Output is guaranteed to stay below a configurable
-  ceiling (default **−0.1 dBTP**) with proper inter-sample peak handling.
-  This guarantee survives daemon misbehaviour, profile reloads, and bad
-  routing decisions — it is enforced inline in the audio path.
+- **Hard safety net on the processed route.** Audio routed through
+  `headroom-processed` is guaranteed to leave the filter below a
+  configurable ceiling (default **−0.1 dBTP**) with proper inter-sample
+  peak handling. The guarantee is enforced inline in the filter,
+  downstream of every control-plane code path, and survives daemon
+  misbehaviour, profile reloads, and bad routing decisions. Streams
+  routed `bypass` ride the real sink directly and are **not** subject
+  to this contract (see §2 path ①); the contract also does not extend
+  to whatever resampling or post-processing the downstream device
+  path applies after the filter's output.
 - **Per-application exclusion.** Music players, games, and DAWs route
  around the processor; browsers, voice chat, and "everything else" go
  through it. Rules are app-level and live in profiles.
@ -472,10 +478,13 @@ is the irreducible cost of "no lookahead allowed." For absolute
 spike prevention you need lookahead, which means latency, which
 contradicts the constraint of this layer.

-The bus-level Layer C limiter (§3.1) catches anything that would
-exceed the absolute ceiling regardless of whether Layer A has caught
-up. Layer A reduces *workload* on Layer C by pre-attenuating noisy
-apps; it doesn't replace it.
+On the processed route the bus-level Layer C limiter (§3.1) catches
+anything that would exceed the ceiling regardless of whether Layer A
+has caught up; on bypass routes Layer A is the only thing watching, so
+isolated one-block transients reach the real sink. Layer A reduces
+*workload* on Layer C where Layer C is in the path, and is a
+best-effort comfort filter where it isn't; it doesn't replace the
+limiter.

 ### 4.6 Layered budget summary

@ -593,9 +602,24 @@ updates arrive over an `rtrb` SPSC queue from the control thread.

 ## 6. Profiles

-Location: `$XDG_CONFIG_HOME/headroom/profiles/*.toml` (overriding
-shipped defaults in `/usr/share/headroom/profiles/` if installed
-system-wide). Hot-reloaded via `notify-debouncer-mini`.
+Profile files live in `$XDG_CONFIG_HOME/headroom/profiles/*.toml`,
+shadowing shipped defaults in `/usr/share/headroom/profiles/` by
+name. Profile files are user-authored configuration — they're the
+thing you open in `$EDITOR`. File-watcher hot-reload via
+`notify-debouncer-mini` is planned; in the meantime `profile.reload`
+re-scans on demand.
+
+Daemon-managed user state — active profile name, per-app route
+overrides made via `route.set`, dotted-key tweaks made via
+`setting.set`, the global bypass flag — is *not* mixed in with the
+profile TOMLs. It lives in a single `overlay.toml` at
+`$XDG_STATE_HOME/headroom/overlay.toml`, written atomically by the
+daemon (stage to `overlay.toml.tmp-…`, then rename). The overlay
+rides on top of whichever profile is active, so `route.set obs
+bypass` persists across `profile.use night` — that's a user
+preference, not a tweak of `default`. If the overlay names an active
+profile that's not on disk, the daemon falls back to the built-in
+default and surfaces a warning; it does not refuse to start.

 Each profile is a complete listening scenario. Schema (`headroom-core::profile`):

@ -664,6 +688,10 @@ max_cut_db = 12.0             # never cut more than this
 peak_attack_ms = 5.0
 peak_release_ms = 500.0
 rms_window_ms = 1500.0
+# Controller-side knobs (all optional; defaults shown).
+smoother_ms = 30.0            # anti-bounce smoother on max(peak,rms)
+write_db_threshold = 0.5      # dB diff below which we don't fire a write
+min_write_interval_ms = 100.0 # min ms between writes per stream (10 Hz cap)
 defer_to_user = "ceiling"     # "ceiling" | "strict"

 [[per_app.rules]]
@ -826,6 +854,88 @@ routing engine. Hardcoded profile, no IPC server yet.
 IPC schema. Profile loading + hot-reload. Slow AGC loop ticking on
 real loudness measurements.

+Sub-stages used in commits / TODOs:
+
+- **4a–4d** — Unix socket server, op dispatch, mutating ops, event
+  broadcaster.
+- **4e** — `ProfileStore`: shipped + user profiles, atomic reload,
+  user overlay at `$XDG_STATE_HOME/headroom/overlay.toml`. `profile.use`,
+  `profile.reload`, `setting.set`, `route.set` all dispatch through it.
+- **4f** — DSP parameter propagation: `setting.set` reaches the running
+  filter via the `rtrb` control queue, so live profile/setting edits
+  take effect without restart.
+- **4h** — `preferred_real_sink` tracking: subscribe to
+  `default.audio.sink`, snapshot the prior default, promote
+  `headroom-processed`, retarget every bypassed stream on
+  default-sink change, on hotplug, and on Bluetooth handoff. Also
+  pins the filter's playback to the tracked real sink so processed
+  audio follows when the user switches default, and resolves the
+  real sink's node id from the registry for `status` reporting.
+- **4i** — `route.stream <node-id> processed|bypass`: ad-hoc per-stream
+  override that doesn't write a profile rule. Crosses the
+  IPC-thread → PipeWire-thread boundary via a `crossbeam` channel
+  drained by a 50 ms timer source on the main loop. State updates
+  synchronously; metadata write follows ≤ ~50 ms later.
+
+- **Slow AGC loop** — wraps up Phase 4. Audio-thread `AgcGain` stage
+  sits at the head of the DSP chain (anti-zipper smoother around a
+  per-sample multiplier). Filter pushes *pre-AGC* input samples into a
+  dedicated measurement ring. A `AgcController` on the PipeWire main
+  loop ticks at 50 ms: drains the ring into `ebur128` (Mode S | M |
+  TRUE_PEAK), reads `[agc]` config from the active profile, computes
+  `target_lufs − short_term_lufs` clamped to `[-max_cut_db,
+  +max_boost_db]`, gates below `silence_threshold_lufs`, slow-smooths
+  via leaky integrator, and pushes the result through `FilterControl`
+  on the same `rtrb` channel `setting.set` uses.
+
+### Tracked follow-ups (carried past their sub-stage)
+
+Items deliberately deferred from earlier sub-stages so they don't get
+lost. Pick up by name when the phase that consumes them lands.
+
+- **Ephemeral overlay mutations.** *(4e follow-up.)* All `route.set`
+  / `setting.set` changes are persisted to `overlay.toml`. A
+  `--ephemeral` flag (or `--volatile`) on the CLI for one-shot tweaks
+  that don't outlive the daemon was considered and dropped from v0
+  for simplicity. Revisit if real users ask for it; the store-level
+  change is a flag on the setter methods.
+- **Filter playback BUSY spikes (periodic, ~10 s cadence).** *(6c
+  manual smoke finding, 2026-05.)* On a quiet system with AGC and
+  per-app both off, the filter's `playback_process` BUSY
+  occasionally spikes from its ~240 μs steady-state to ~2.0 ms,
+  correlating with output-sink WAIT spikes of similar size. No
+  audible impact (sub-quantum at 21 ms). The ~10 s cadence rules
+  out sliding-max worst-case (which would be input-pattern-driven,
+  not periodic) and Layer A (the spikes persist with `per_app.enabled
+  = false`). Suspects with 10 s clocks somewhere: WirePlumber session
+  policy heartbeat, PipeWire internal graph re-eval, or system-level
+  scheduling (CPU governor, kernel housekeeping). Diagnostic for
+  Phase 8: timestamp the playback callback, log when its measured
+  duration crosses ~1 ms; correlate with `journalctl`,
+  `wireplumber --verbose`, and `pw-dump` snapshots taken around the
+  spike. If we can't attribute it to PipeWire-side reschedule and
+  it's something we can fix in our callback, the candidate
+  workaround is to break the limiter's per-block work into smaller
+  chunks (cap allocations / pops / branches per call) for more
+  predictable timing.
+- **Sub-millisecond dispatch primitive for spike-reactive writes.**
+  *(Phase 6 optimisation, downgraded from prerequisite.)* The 4i
+  `PwCommand` channel uses a 50 ms polling timer, fine for
+  `route.stream` and slow AGC. Layer A's per-app
+  `Props.channelVolumes` writes were originally feared to need a
+  sub-ms wake primitive. After 6a/6b benches landed (see
+  §11.6 below) we re-evaluated: at a 5 ms polling timer and 21 ms
+  PipeWire quantum, the worst-case detection-to-write latency stays
+  well inside one quantum, which is what PLAN §4.5 actually
+  promises. Polling reuses existing infrastructure and is cheap
+  (controller tick is ~30 ns; even at 200 Hz it's lost in the
+  noise). The tighter primitive — `EventSource::signal` with an
+  `unsafe impl Send` shim around `spa_loop_utils.signal_event`, or a
+  pipe + `IoSource` — stays on the table as an optimisation if
+  manual testing shows audible spike-leak artefacts. `pw::command`
+  module docs still carry the constraint warning for future variants
+  that might be tempted to share the 50 ms timer.
+
 **Phase 5 — CLI + monitor TUI.** `headroom-cli` implements all the
 subcommands above, plus a `monitor` TUI built on the meters
 subscription.
@ -837,6 +947,101 @@ tap creation, `AppLevelController` with peak + RMS envelopes,
 per-stream meter event on the IPC. Land after the bus path is stable
 so we have a baseline to compare against.

+Sub-stages:
+
+- **6a** — Pure DSP. `headroom_dsp::LevelEnvelopes`: two-tier (peak
+  + RMS) block-rate detector, `max(peak_reduction, rms_reduction)`
+  combined, clamped to `max_cut_db`. Allocation-free,
+  block-rate-driven (audio thread emits one `(peak, mean_sq)` pair
+  per quantum).
+- **6b** — Daemon-side glue.
+  `headroom_core::app_level::AppLevelController`: rule snapshot,
+  envelopes, 30 ms anti-bounce smoother, 0.5 dB / 100 ms write
+  gate, ceiling vs strict deference state.
+  `app_level::evaluate` matches `[[per_app.rules]]` against
+  `PwNodeInfo` using the same matcher the routing engine uses.
+- **6c** — PipeWire tap + audio-thread analysis. **Mechanism**:
+  per managed stream we create our own `pw_stream` (Direction::Input,
+  F32LE stereo, rate left unspecified to negotiate with the source,
+  `AUTOCONNECT` off, `NODE_DONT_RECONNECT`, `node.dont-move`),
+  `connect()` with no target, `set_active(true)`. PipeWire creates
+  our input ports from the declared format. We then build **explicit
+  passive port-level links** via `link-factory` with
+  `link.output.port` / `link.input.port` set to the source's and
+  tap's port global IDs respectively, plus `link.passive = true`.
+  **Why not `target.object` or `target_id`**: empirically (6c manual
+  smoke) WirePlumber's policy refuses to wire `Stream/Output →
+  Stream/Input` via any session-manager-mediated path — it logs no
+  error, just doesn't act. The stream-level target was getting set
+  on the node (`node.target = <source-id>`) but no link ever
+  appeared. Going through `link-factory` with explicit port IDs
+  bypasses the session manager entirely and uses PipeWire core
+  directly. **Per managed stream**: one `pw_stream`, two `Link`
+  proxies (one per channel), one `MeasurementSample` `rtrb`
+  (capacity 64). Audio-thread `process` runs `peak = max(|x|)` and
+  `mean_sq = Σx²/N` over the block, pushes one sample to the ring.
+  **Lifecycle**: registry watcher sees a `Stream/Output/Audio`
+  matching a `per_app` rule → spawn tap (ports come up
+  asynchronously) → the Layer A drain timer (6d) retries link
+  creation each tick until both port sets are visible on the
+  registry → links built, stream transitions to `Streaming`,
+  samples flow. On registry `global_remove` of the source, drop the
+  `ManagedStream`; declaration order severs links first, then the
+  tap stream + listener.
+- **6d** — `Props.channelVolumes` writes + controller drain timer.
+  A polling timer source on the PipeWire main loop ticks every 5 ms
+  (200 Hz, CPU cost ≪ 0.1% of one core per the benches), iterates
+  active controllers, drains each measurement ring, calls
+  `process_block`, and on a `Some` return writes
+  `Props.channelVolumes` via the bound `default` metadata
+  (subject = source node id). The 5 ms tick guarantees a spike
+  detected at quantum boundary `N` is written before quantum `N+1`
+  starts on typical 21 ms quanta — see §4.5 reaction-time honesty
+  table.
+- **6e** — User-volume deference + per-stream meter events.
+  Subscribe to `Props` param-change events on each managed stream.
+  Distinguish daemon writes from external by comparing against
+  `last_written_lin` (within 1e-4) — external changes apply
+  ceiling-mode or strict-mode deference per the matched rule's
+  `defer_to_user` field. Per-stream meters publish on the `meters`
+  topic with the smoothed reduction, the peak/RMS envelope values,
+  and the current applied `channelVolumes`.
+
+**Validated cost budget (criterion microbenches, run 2026-05).**
+PLAN §4.7 budgeted "~10 μs/quantum audio thread, few μs/measurement
+daemon thread." Reality on this hardware:
+
+| Bench | Time |
+|---|---|
+| Audio-thread peak + mean_sq scan, 1024-frame stereo block | 1.33 μs |
+| `LevelEnvelopes::process_block` (daemon) | 18 ns |
+| `AppLevelController::process_block` hot signal | 29 ns |
+| `AppLevelController::process_block` quiet signal | 22 ns |
+
+5 managed streams: audio thread ≈ 6.6 μs/quantum (0.03% of one
+core at 21 ms quanta); daemon ≈ 145 ns/quantum. ~7-10× under the
+PLAN budget, so the design has room for many more managed streams,
+or for adding ebur128 / TRUE_PEAK to Layer A later if useful.
+
+**Manual latency validation (post-6c implementation).** PipeWire
+scheduling can't be benched from Rust alone. Use:
+
+- **`pw-top`** — note the source-node `QUANT` and any WAIT/BUSY or
+  delay column before attaching the tap; attach Layer A; confirm
+  the source-node numbers don't change. The tap appears as a new
+  row with its own quantum; the test is whether the *app's* numbers
+  degrade.
+- **`qpwgraph`** / **`helvum`** — visually confirm the source node
+  has two outgoing links (one to its original destination, one to
+  our tap), both terminating correctly.
+- **Ear** — connect/disconnect the tap on live audio. Crackles or
+  dropouts on attach indicate the §4.1 sibling-fanout claim doesn't
+  hold and the design needs revisiting.
+
+If those three say "fine," the §4.1 promise is upheld in practice
+and 6c is acceptance-tested. `jack_iodelay` and other true-round-trip
+tools are overkill.
+
 **Phase 7 — Packaging.** systemd user unit, install paths, default
 profile install, basic NixOS module.