stage 6: per-app

This commit is contained in:
atagen 2026-05-20 23:47:19 +10:00
parent 9edd809416
commit fcf421b94c
31 changed files with 6360 additions and 344 deletions

227
PLAN.md
View file

@ -13,10 +13,16 @@ conversational sketch.
### Goals
- **Hard safety net.** Output is guaranteed to stay below a configurable
ceiling (default **0.1 dBTP**) with proper inter-sample peak handling.
This guarantee survives daemon misbehaviour, profile reloads, and bad
routing decisions — it is enforced inline in the audio path.
- **Hard safety net on the processed route.** Audio routed through
`headroom-processed` is guaranteed to leave the filter below a
configurable ceiling (default **0.1 dBTP**) with proper inter-sample
peak handling. The guarantee is enforced inline in the filter,
downstream of every control-plane code path, and survives daemon
misbehaviour, profile reloads, and bad routing decisions. Streams
routed `bypass` ride the real sink directly and are **not** subject
to this contract (see §2 path ①); the contract also does not extend
to whatever resampling or post-processing the downstream device
path applies after the filter's output.
- **Per-application exclusion.** Music players, games, and DAWs route
around the processor; browsers, voice chat, and "everything else" go
through it. Rules are app-level and live in profiles.
@ -472,10 +478,13 @@ is the irreducible cost of "no lookahead allowed." For absolute
spike prevention you need lookahead, which means latency, which
contradicts the constraint of this layer.
The bus-level Layer C limiter (§3.1) catches anything that would
exceed the absolute ceiling regardless of whether Layer A has caught
up. Layer A reduces *workload* on Layer C by pre-attenuating noisy
apps; it doesn't replace it.
On the processed route the bus-level Layer C limiter (§3.1) catches
anything that would exceed the ceiling regardless of whether Layer A
has caught up; on bypass routes Layer A is the only thing watching, so
isolated one-block transients reach the real sink. Layer A reduces
*workload* on Layer C where Layer C is in the path, and is a
best-effort comfort filter where it isn't; it doesn't replace the
limiter.
### 4.6 Layered budget summary
@ -593,9 +602,24 @@ updates arrive over an `rtrb` SPSC queue from the control thread.
## 6. Profiles
Location: `$XDG_CONFIG_HOME/headroom/profiles/*.toml` (overriding
shipped defaults in `/usr/share/headroom/profiles/` if installed
system-wide). Hot-reloaded via `notify-debouncer-mini`.
Profile files live in `$XDG_CONFIG_HOME/headroom/profiles/*.toml`,
shadowing shipped defaults in `/usr/share/headroom/profiles/` by
name. Profile files are user-authored configuration — they're the
thing you open in `$EDITOR`. File-watcher hot-reload via
`notify-debouncer-mini` is planned; in the meantime `profile.reload`
re-scans on demand.
Daemon-managed user state — active profile name, per-app route
overrides made via `route.set`, dotted-key tweaks made via
`setting.set`, the global bypass flag — is *not* mixed in with the
profile TOMLs. It lives in a single `overlay.toml` at
`$XDG_STATE_HOME/headroom/overlay.toml`, written atomically by the
daemon (stage to `overlay.toml.tmp-…`, then rename). The overlay
rides on top of whichever profile is active, so `route.set obs
bypass` persists across `profile.use night` — that's a user
preference, not a tweak of `default`. If the overlay names an active
profile that's not on disk, the daemon falls back to the built-in
default and surfaces a warning; it does not refuse to start.
Each profile is a complete listening scenario. Schema (`headroom-core::profile`):
@ -664,6 +688,10 @@ max_cut_db = 12.0 # never cut more than this
peak_attack_ms = 5.0
peak_release_ms = 500.0
rms_window_ms = 1500.0
# Controller-side knobs (all optional; defaults shown).
smoother_ms = 30.0 # anti-bounce smoother on max(peak,rms)
write_db_threshold = 0.5 # dB diff below which we don't fire a write
min_write_interval_ms = 100.0 # min ms between writes per stream (10 Hz cap)
defer_to_user = "ceiling" # "ceiling" | "strict"
[[per_app.rules]]
@ -826,6 +854,88 @@ routing engine. Hardcoded profile, no IPC server yet.
IPC schema. Profile loading + hot-reload. Slow AGC loop ticking on
real loudness measurements.
Sub-stages used in commits / TODOs:
- **4a4d** — Unix socket server, op dispatch, mutating ops, event
broadcaster.
- **4e**`ProfileStore`: shipped + user profiles, atomic reload,
user overlay at `$XDG_STATE_HOME/headroom/overlay.toml`. `profile.use`,
`profile.reload`, `setting.set`, `route.set` all dispatch through it.
- **4f** — DSP parameter propagation: `setting.set` reaches the running
filter via the `rtrb` control queue, so live profile/setting edits
take effect without restart.
- **4h**`preferred_real_sink` tracking: subscribe to
`default.audio.sink`, snapshot the prior default, promote
`headroom-processed`, retarget every bypassed stream on
default-sink change, on hotplug, and on Bluetooth handoff. Also
pins the filter's playback to the tracked real sink so processed
audio follows when the user switches default, and resolves the
real sink's node id from the registry for `status` reporting.
- **4i**`route.stream <node-id> processed|bypass`: ad-hoc per-stream
override that doesn't write a profile rule. Crosses the
IPC-thread → PipeWire-thread boundary via a `crossbeam` channel
drained by a 50 ms timer source on the main loop. State updates
synchronously; metadata write follows ≤ ~50 ms later.
- **Slow AGC loop** — wraps up Phase 4. Audio-thread `AgcGain` stage
sits at the head of the DSP chain (anti-zipper smoother around a
per-sample multiplier). Filter pushes *pre-AGC* input samples into a
dedicated measurement ring. A `AgcController` on the PipeWire main
loop ticks at 50 ms: drains the ring into `ebur128` (Mode S | M |
TRUE_PEAK), reads `[agc]` config from the active profile, computes
`target_lufs short_term_lufs` clamped to `[-max_cut_db,
+max_boost_db]`, gates below `silence_threshold_lufs`, slow-smooths
via leaky integrator, and pushes the result through `FilterControl`
on the same `rtrb` channel `setting.set` uses.
### Tracked follow-ups (carried past their sub-stage)
Items deliberately deferred from earlier sub-stages so they don't get
lost. Pick up by name when the phase that consumes them lands.
- **Ephemeral overlay mutations.** *(4e follow-up.)* All `route.set`
/ `setting.set` changes are persisted to `overlay.toml`. A
`--ephemeral` flag (or `--volatile`) on the CLI for one-shot tweaks
that don't outlive the daemon was considered and dropped from v0
for simplicity. Revisit if real users ask for it; the store-level
change is a flag on the setter methods.
- **Filter playback BUSY spikes (periodic, ~10 s cadence).** *(6c
manual smoke finding, 2026-05.)* On a quiet system with AGC and
per-app both off, the filter's `playback_process` BUSY
occasionally spikes from its ~240 μs steady-state to ~2.0 ms,
correlating with output-sink WAIT spikes of similar size. No
audible impact (sub-quantum at 21 ms). The ~10 s cadence rules
out sliding-max worst-case (which would be input-pattern-driven,
not periodic) and Layer A (the spikes persist with `per_app.enabled
= false`). Suspects with 10 s clocks somewhere: WirePlumber session
policy heartbeat, PipeWire internal graph re-eval, or system-level
scheduling (CPU governor, kernel housekeeping). Diagnostic for
Phase 8: timestamp the playback callback, log when its measured
duration crosses ~1 ms; correlate with `journalctl`,
`wireplumber --verbose`, and `pw-dump` snapshots taken around the
spike. If we can't attribute it to PipeWire-side reschedule and
it's something we can fix in our callback, the candidate
workaround is to break the limiter's per-block work into smaller
chunks (cap allocations / pops / branches per call) for more
predictable timing.
- **Sub-millisecond dispatch primitive for spike-reactive writes.**
*(Phase 6 optimisation, downgraded from prerequisite.)* The 4i
`PwCommand` channel uses a 50 ms polling timer, fine for
`route.stream` and slow AGC. Layer A's per-app
`Props.channelVolumes` writes were originally feared to need a
sub-ms wake primitive. After 6a/6b benches landed (see
§11.6 below) we re-evaluated: at a 5 ms polling timer and 21 ms
PipeWire quantum, the worst-case detection-to-write latency stays
well inside one quantum, which is what PLAN §4.5 actually
promises. Polling reuses existing infrastructure and is cheap
(controller tick is ~30 ns; even at 200 Hz it's lost in the
noise). The tighter primitive — `EventSource::signal` with an
`unsafe impl Send` shim around `spa_loop_utils.signal_event`, or a
pipe + `IoSource` — stays on the table as an optimisation if
manual testing shows audible spike-leak artefacts. `pw::command`
module docs still carry the constraint warning for future variants
that might be tempted to share the 50 ms timer.
**Phase 5 — CLI + monitor TUI.** `headroom-cli` implements all the
subcommands above, plus a `monitor` TUI built on the meters
subscription.
@ -837,6 +947,101 @@ tap creation, `AppLevelController` with peak + RMS envelopes,
per-stream meter event on the IPC. Land after the bus path is stable
so we have a baseline to compare against.
Sub-stages:
- **6a** — Pure DSP. `headroom_dsp::LevelEnvelopes`: two-tier (peak
+ RMS) block-rate detector, `max(peak_reduction, rms_reduction)`
combined, clamped to `max_cut_db`. Allocation-free,
block-rate-driven (audio thread emits one `(peak, mean_sq)` pair
per quantum).
- **6b** — Daemon-side glue.
`headroom_core::app_level::AppLevelController`: rule snapshot,
envelopes, 30 ms anti-bounce smoother, 0.5 dB / 100 ms write
gate, ceiling vs strict deference state.
`app_level::evaluate` matches `[[per_app.rules]]` against
`PwNodeInfo` using the same matcher the routing engine uses.
- **6c** — PipeWire tap + audio-thread analysis. **Mechanism**:
per managed stream we create our own `pw_stream` (Direction::Input,
F32LE stereo, rate left unspecified to negotiate with the source,
`AUTOCONNECT` off, `NODE_DONT_RECONNECT`, `node.dont-move`),
`connect()` with no target, `set_active(true)`. PipeWire creates
our input ports from the declared format. We then build **explicit
passive port-level links** via `link-factory` with
`link.output.port` / `link.input.port` set to the source's and
tap's port global IDs respectively, plus `link.passive = true`.
**Why not `target.object` or `target_id`**: empirically (6c manual
smoke) WirePlumber's policy refuses to wire `Stream/Output →
Stream/Input` via any session-manager-mediated path — it logs no
error, just doesn't act. The stream-level target was getting set
on the node (`node.target = <source-id>`) but no link ever
appeared. Going through `link-factory` with explicit port IDs
bypasses the session manager entirely and uses PipeWire core
directly. **Per managed stream**: one `pw_stream`, two `Link`
proxies (one per channel), one `MeasurementSample` `rtrb`
(capacity 64). Audio-thread `process` runs `peak = max(|x|)` and
`mean_sq = Σx²/N` over the block, pushes one sample to the ring.
**Lifecycle**: registry watcher sees a `Stream/Output/Audio`
matching a `per_app` rule → spawn tap (ports come up
asynchronously) → the Layer A drain timer (6d) retries link
creation each tick until both port sets are visible on the
registry → links built, stream transitions to `Streaming`,
samples flow. On registry `global_remove` of the source, drop the
`ManagedStream`; declaration order severs links first, then the
tap stream + listener.
- **6d**`Props.channelVolumes` writes + controller drain timer.
A polling timer source on the PipeWire main loop ticks every 5 ms
(200 Hz, CPU cost ≪ 0.1% of one core per the benches), iterates
active controllers, drains each measurement ring, calls
`process_block`, and on a `Some` return writes
`Props.channelVolumes` via the bound `default` metadata
(subject = source node id). The 5 ms tick guarantees a spike
detected at quantum boundary `N` is written before quantum `N+1`
starts on typical 21 ms quanta — see §4.5 reaction-time honesty
table.
- **6e** — User-volume deference + per-stream meter events.
Subscribe to `Props` param-change events on each managed stream.
Distinguish daemon writes from external by comparing against
`last_written_lin` (within 1e-4) — external changes apply
ceiling-mode or strict-mode deference per the matched rule's
`defer_to_user` field. Per-stream meters publish on the `meters`
topic with the smoothed reduction, the peak/RMS envelope values,
and the current applied `channelVolumes`.
**Validated cost budget (criterion microbenches, run 2026-05).**
PLAN §4.7 budgeted "~10 μs/quantum audio thread, few μs/measurement
daemon thread." Reality on this hardware:
| Bench | Time |
|---|---|
| Audio-thread peak + mean_sq scan, 1024-frame stereo block | 1.33 μs |
| `LevelEnvelopes::process_block` (daemon) | 18 ns |
| `AppLevelController::process_block` hot signal | 29 ns |
| `AppLevelController::process_block` quiet signal | 22 ns |
5 managed streams: audio thread ≈ 6.6 μs/quantum (0.03% of one
core at 21 ms quanta); daemon ≈ 145 ns/quantum. ~7-10× under the
PLAN budget, so the design has room for many more managed streams,
or for adding ebur128 / TRUE_PEAK to Layer A later if useful.
**Manual latency validation (post-6c implementation).** PipeWire
scheduling can't be benched from Rust alone. Use:
- **`pw-top`** — note the source-node `QUANT` and any WAIT/BUSY or
delay column before attaching the tap; attach Layer A; confirm
the source-node numbers don't change. The tap appears as a new
row with its own quantum; the test is whether the *app's* numbers
degrade.
- **`qpwgraph`** / **`helvum`** — visually confirm the source node
has two outgoing links (one to its original destination, one to
our tap), both terminating correctly.
- **Ear** — connect/disconnect the tap on live audio. Crackles or
dropouts on attach indicate the §4.1 sibling-fanout claim doesn't
hold and the design needs revisiting.
If those three say "fine," the §4.1 promise is upheld in practice
and 6c is acceptance-tested. `jack_iodelay` and other true-round-trip
tools are overkill.
**Phase 7 — Packaging.** systemd user unit, install paths, default
profile install, basic NixOS module.