stage 6: per-app
This commit is contained in:
parent
9edd809416
commit
fcf421b94c
31 changed files with 6360 additions and 344 deletions
227
PLAN.md
227
PLAN.md
|
|
@ -13,10 +13,16 @@ conversational sketch.
|
|||
|
||||
### Goals
|
||||
|
||||
- **Hard safety net.** Output is guaranteed to stay below a configurable
|
||||
ceiling (default **−0.1 dBTP**) with proper inter-sample peak handling.
|
||||
This guarantee survives daemon misbehaviour, profile reloads, and bad
|
||||
routing decisions — it is enforced inline in the audio path.
|
||||
- **Hard safety net on the processed route.** Audio routed through
|
||||
`headroom-processed` is guaranteed to leave the filter below a
|
||||
configurable ceiling (default **−0.1 dBTP**) with proper inter-sample
|
||||
peak handling. The guarantee is enforced inline in the filter,
|
||||
downstream of every control-plane code path, and survives daemon
|
||||
misbehaviour, profile reloads, and bad routing decisions. Streams
|
||||
routed `bypass` ride the real sink directly and are **not** subject
|
||||
to this contract (see §2 path ①); the contract also does not extend
|
||||
to whatever resampling or post-processing the downstream device
|
||||
path applies after the filter's output.
|
||||
- **Per-application exclusion.** Music players, games, and DAWs route
|
||||
around the processor; browsers, voice chat, and "everything else" go
|
||||
through it. Rules are app-level and live in profiles.
|
||||
|
|
@ -472,10 +478,13 @@ is the irreducible cost of "no lookahead allowed." For absolute
|
|||
spike prevention you need lookahead, which means latency, which
|
||||
contradicts the constraint of this layer.
|
||||
|
||||
The bus-level Layer C limiter (§3.1) catches anything that would
|
||||
exceed the absolute ceiling regardless of whether Layer A has caught
|
||||
up. Layer A reduces *workload* on Layer C by pre-attenuating noisy
|
||||
apps; it doesn't replace it.
|
||||
On the processed route the bus-level Layer C limiter (§3.1) catches
|
||||
anything that would exceed the ceiling regardless of whether Layer A
|
||||
has caught up; on bypass routes Layer A is the only thing watching, so
|
||||
isolated one-block transients reach the real sink. Layer A reduces
|
||||
*workload* on Layer C where Layer C is in the path, and is a
|
||||
best-effort comfort filter where it isn't; it doesn't replace the
|
||||
limiter.
|
||||
|
||||
### 4.6 Layered budget summary
|
||||
|
||||
|
|
@ -593,9 +602,24 @@ updates arrive over an `rtrb` SPSC queue from the control thread.
|
|||
|
||||
## 6. Profiles
|
||||
|
||||
Location: `$XDG_CONFIG_HOME/headroom/profiles/*.toml` (overriding
|
||||
shipped defaults in `/usr/share/headroom/profiles/` if installed
|
||||
system-wide). Hot-reloaded via `notify-debouncer-mini`.
|
||||
Profile files live in `$XDG_CONFIG_HOME/headroom/profiles/*.toml`,
|
||||
shadowing shipped defaults in `/usr/share/headroom/profiles/` by
|
||||
name. Profile files are user-authored configuration — they're the
|
||||
thing you open in `$EDITOR`. File-watcher hot-reload via
|
||||
`notify-debouncer-mini` is planned; in the meantime `profile.reload`
|
||||
re-scans on demand.
|
||||
|
||||
Daemon-managed user state — active profile name, per-app route
|
||||
overrides made via `route.set`, dotted-key tweaks made via
|
||||
`setting.set`, the global bypass flag — is *not* mixed in with the
|
||||
profile TOMLs. It lives in a single `overlay.toml` at
|
||||
`$XDG_STATE_HOME/headroom/overlay.toml`, written atomically by the
|
||||
daemon (stage to `overlay.toml.tmp-…`, then rename). The overlay
|
||||
rides on top of whichever profile is active, so `route.set obs
|
||||
bypass` persists across `profile.use night` — that's a user
|
||||
preference, not a tweak of `default`. If the overlay names an active
|
||||
profile that's not on disk, the daemon falls back to the built-in
|
||||
default and surfaces a warning; it does not refuse to start.
|
||||
|
||||
Each profile is a complete listening scenario. Schema (`headroom-core::profile`):
|
||||
|
||||
|
|
@ -664,6 +688,10 @@ max_cut_db = 12.0 # never cut more than this
|
|||
peak_attack_ms = 5.0
|
||||
peak_release_ms = 500.0
|
||||
rms_window_ms = 1500.0
|
||||
# Controller-side knobs (all optional; defaults shown).
|
||||
smoother_ms = 30.0 # anti-bounce smoother on max(peak,rms)
|
||||
write_db_threshold = 0.5 # dB diff below which we don't fire a write
|
||||
min_write_interval_ms = 100.0 # min ms between writes per stream (10 Hz cap)
|
||||
defer_to_user = "ceiling" # "ceiling" | "strict"
|
||||
|
||||
[[per_app.rules]]
|
||||
|
|
@ -826,6 +854,88 @@ routing engine. Hardcoded profile, no IPC server yet.
|
|||
IPC schema. Profile loading + hot-reload. Slow AGC loop ticking on
|
||||
real loudness measurements.
|
||||
|
||||
Sub-stages used in commits / TODOs:
|
||||
|
||||
- **4a–4d** — Unix socket server, op dispatch, mutating ops, event
|
||||
broadcaster.
|
||||
- **4e** — `ProfileStore`: shipped + user profiles, atomic reload,
|
||||
user overlay at `$XDG_STATE_HOME/headroom/overlay.toml`. `profile.use`,
|
||||
`profile.reload`, `setting.set`, `route.set` all dispatch through it.
|
||||
- **4f** — DSP parameter propagation: `setting.set` reaches the running
|
||||
filter via the `rtrb` control queue, so live profile/setting edits
|
||||
take effect without restart.
|
||||
- **4h** — `preferred_real_sink` tracking: subscribe to
|
||||
`default.audio.sink`, snapshot the prior default, promote
|
||||
`headroom-processed`, retarget every bypassed stream on
|
||||
default-sink change, on hotplug, and on Bluetooth handoff. Also
|
||||
pins the filter's playback to the tracked real sink so processed
|
||||
audio follows when the user switches default, and resolves the
|
||||
real sink's node id from the registry for `status` reporting.
|
||||
- **4i** — `route.stream <node-id> processed|bypass`: ad-hoc per-stream
|
||||
override that doesn't write a profile rule. Crosses the
|
||||
IPC-thread → PipeWire-thread boundary via a `crossbeam` channel
|
||||
drained by a 50 ms timer source on the main loop. State updates
|
||||
synchronously; metadata write follows ≤ ~50 ms later.
|
||||
|
||||
- **Slow AGC loop** — wraps up Phase 4. Audio-thread `AgcGain` stage
|
||||
sits at the head of the DSP chain (anti-zipper smoother around a
|
||||
per-sample multiplier). Filter pushes *pre-AGC* input samples into a
|
||||
dedicated measurement ring. A `AgcController` on the PipeWire main
|
||||
loop ticks at 50 ms: drains the ring into `ebur128` (Mode S | M |
|
||||
TRUE_PEAK), reads `[agc]` config from the active profile, computes
|
||||
`target_lufs − short_term_lufs` clamped to `[-max_cut_db,
|
||||
+max_boost_db]`, gates below `silence_threshold_lufs`, slow-smooths
|
||||
via leaky integrator, and pushes the result through `FilterControl`
|
||||
on the same `rtrb` channel `setting.set` uses.
|
||||
|
||||
### Tracked follow-ups (carried past their sub-stage)
|
||||
|
||||
Items deliberately deferred from earlier sub-stages so they don't get
|
||||
lost. Pick up by name when the phase that consumes them lands.
|
||||
|
||||
- **Ephemeral overlay mutations.** *(4e follow-up.)* All `route.set`
|
||||
/ `setting.set` changes are persisted to `overlay.toml`. A
|
||||
`--ephemeral` flag (or `--volatile`) on the CLI for one-shot tweaks
|
||||
that don't outlive the daemon was considered and dropped from v0
|
||||
for simplicity. Revisit if real users ask for it; the store-level
|
||||
change is a flag on the setter methods.
|
||||
- **Filter playback BUSY spikes (periodic, ~10 s cadence).** *(6c
|
||||
manual smoke finding, 2026-05.)* On a quiet system with AGC and
|
||||
per-app both off, the filter's `playback_process` BUSY
|
||||
occasionally spikes from its ~240 μs steady-state to ~2.0 ms,
|
||||
correlating with output-sink WAIT spikes of similar size. No
|
||||
audible impact (sub-quantum at 21 ms). The ~10 s cadence rules
|
||||
out sliding-max worst-case (which would be input-pattern-driven,
|
||||
not periodic) and Layer A (the spikes persist with `per_app.enabled
|
||||
= false`). Suspects with 10 s clocks somewhere: WirePlumber session
|
||||
policy heartbeat, PipeWire internal graph re-eval, or system-level
|
||||
scheduling (CPU governor, kernel housekeeping). Diagnostic for
|
||||
Phase 8: timestamp the playback callback, log when its measured
|
||||
duration crosses ~1 ms; correlate with `journalctl`,
|
||||
`wireplumber --verbose`, and `pw-dump` snapshots taken around the
|
||||
spike. If we can't attribute it to PipeWire-side reschedule and
|
||||
it's something we can fix in our callback, the candidate
|
||||
workaround is to break the limiter's per-block work into smaller
|
||||
chunks (cap allocations / pops / branches per call) for more
|
||||
predictable timing.
|
||||
- **Sub-millisecond dispatch primitive for spike-reactive writes.**
|
||||
*(Phase 6 optimisation, downgraded from prerequisite.)* The 4i
|
||||
`PwCommand` channel uses a 50 ms polling timer, fine for
|
||||
`route.stream` and slow AGC. Layer A's per-app
|
||||
`Props.channelVolumes` writes were originally feared to need a
|
||||
sub-ms wake primitive. After 6a/6b benches landed (see
|
||||
§11.6 below) we re-evaluated: at a 5 ms polling timer and 21 ms
|
||||
PipeWire quantum, the worst-case detection-to-write latency stays
|
||||
well inside one quantum, which is what PLAN §4.5 actually
|
||||
promises. Polling reuses existing infrastructure and is cheap
|
||||
(controller tick is ~30 ns; even at 200 Hz it's lost in the
|
||||
noise). The tighter primitive — `EventSource::signal` with an
|
||||
`unsafe impl Send` shim around `spa_loop_utils.signal_event`, or a
|
||||
pipe + `IoSource` — stays on the table as an optimisation if
|
||||
manual testing shows audible spike-leak artefacts. `pw::command`
|
||||
module docs still carry the constraint warning for future variants
|
||||
that might be tempted to share the 50 ms timer.
|
||||
|
||||
**Phase 5 — CLI + monitor TUI.** `headroom-cli` implements all the
|
||||
subcommands above, plus a `monitor` TUI built on the meters
|
||||
subscription.
|
||||
|
|
@ -837,6 +947,101 @@ tap creation, `AppLevelController` with peak + RMS envelopes,
|
|||
per-stream meter event on the IPC. Land after the bus path is stable
|
||||
so we have a baseline to compare against.
|
||||
|
||||
Sub-stages:
|
||||
|
||||
- **6a** — Pure DSP. `headroom_dsp::LevelEnvelopes`: two-tier (peak
|
||||
+ RMS) block-rate detector, `max(peak_reduction, rms_reduction)`
|
||||
combined, clamped to `max_cut_db`. Allocation-free,
|
||||
block-rate-driven (audio thread emits one `(peak, mean_sq)` pair
|
||||
per quantum).
|
||||
- **6b** — Daemon-side glue.
|
||||
`headroom_core::app_level::AppLevelController`: rule snapshot,
|
||||
envelopes, 30 ms anti-bounce smoother, 0.5 dB / 100 ms write
|
||||
gate, ceiling vs strict deference state.
|
||||
`app_level::evaluate` matches `[[per_app.rules]]` against
|
||||
`PwNodeInfo` using the same matcher the routing engine uses.
|
||||
- **6c** — PipeWire tap + audio-thread analysis. **Mechanism**:
|
||||
per managed stream we create our own `pw_stream` (Direction::Input,
|
||||
F32LE stereo, rate left unspecified to negotiate with the source,
|
||||
`AUTOCONNECT` off, `NODE_DONT_RECONNECT`, `node.dont-move`),
|
||||
`connect()` with no target, `set_active(true)`. PipeWire creates
|
||||
our input ports from the declared format. We then build **explicit
|
||||
passive port-level links** via `link-factory` with
|
||||
`link.output.port` / `link.input.port` set to the source's and
|
||||
tap's port global IDs respectively, plus `link.passive = true`.
|
||||
**Why not `target.object` or `target_id`**: empirically (6c manual
|
||||
smoke) WirePlumber's policy refuses to wire `Stream/Output →
|
||||
Stream/Input` via any session-manager-mediated path — it logs no
|
||||
error, just doesn't act. The stream-level target was getting set
|
||||
on the node (`node.target = <source-id>`) but no link ever
|
||||
appeared. Going through `link-factory` with explicit port IDs
|
||||
bypasses the session manager entirely and uses PipeWire core
|
||||
directly. **Per managed stream**: one `pw_stream`, two `Link`
|
||||
proxies (one per channel), one `MeasurementSample` `rtrb`
|
||||
(capacity 64). Audio-thread `process` runs `peak = max(|x|)` and
|
||||
`mean_sq = Σx²/N` over the block, pushes one sample to the ring.
|
||||
**Lifecycle**: registry watcher sees a `Stream/Output/Audio`
|
||||
matching a `per_app` rule → spawn tap (ports come up
|
||||
asynchronously) → the Layer A drain timer (6d) retries link
|
||||
creation each tick until both port sets are visible on the
|
||||
registry → links built, stream transitions to `Streaming`,
|
||||
samples flow. On registry `global_remove` of the source, drop the
|
||||
`ManagedStream`; declaration order severs links first, then the
|
||||
tap stream + listener.
|
||||
- **6d** — `Props.channelVolumes` writes + controller drain timer.
|
||||
A polling timer source on the PipeWire main loop ticks every 5 ms
|
||||
(200 Hz, CPU cost ≪ 0.1% of one core per the benches), iterates
|
||||
active controllers, drains each measurement ring, calls
|
||||
`process_block`, and on a `Some` return writes
|
||||
`Props.channelVolumes` via the bound `default` metadata
|
||||
(subject = source node id). The 5 ms tick guarantees a spike
|
||||
detected at quantum boundary `N` is written before quantum `N+1`
|
||||
starts on typical 21 ms quanta — see §4.5 reaction-time honesty
|
||||
table.
|
||||
- **6e** — User-volume deference + per-stream meter events.
|
||||
Subscribe to `Props` param-change events on each managed stream.
|
||||
Distinguish daemon writes from external by comparing against
|
||||
`last_written_lin` (within 1e-4) — external changes apply
|
||||
ceiling-mode or strict-mode deference per the matched rule's
|
||||
`defer_to_user` field. Per-stream meters publish on the `meters`
|
||||
topic with the smoothed reduction, the peak/RMS envelope values,
|
||||
and the current applied `channelVolumes`.
|
||||
|
||||
**Validated cost budget (criterion microbenches, run 2026-05).**
|
||||
PLAN §4.7 budgeted "~10 μs/quantum audio thread, few μs/measurement
|
||||
daemon thread." Reality on this hardware:
|
||||
|
||||
| Bench | Time |
|
||||
|---|---|
|
||||
| Audio-thread peak + mean_sq scan, 1024-frame stereo block | 1.33 μs |
|
||||
| `LevelEnvelopes::process_block` (daemon) | 18 ns |
|
||||
| `AppLevelController::process_block` hot signal | 29 ns |
|
||||
| `AppLevelController::process_block` quiet signal | 22 ns |
|
||||
|
||||
5 managed streams: audio thread ≈ 6.6 μs/quantum (0.03% of one
|
||||
core at 21 ms quanta); daemon ≈ 145 ns/quantum. ~7-10× under the
|
||||
PLAN budget, so the design has room for many more managed streams,
|
||||
or for adding ebur128 / TRUE_PEAK to Layer A later if useful.
|
||||
|
||||
**Manual latency validation (post-6c implementation).** PipeWire
|
||||
scheduling can't be benched from Rust alone. Use:
|
||||
|
||||
- **`pw-top`** — note the source-node `QUANT` and any WAIT/BUSY or
|
||||
delay column before attaching the tap; attach Layer A; confirm
|
||||
the source-node numbers don't change. The tap appears as a new
|
||||
row with its own quantum; the test is whether the *app's* numbers
|
||||
degrade.
|
||||
- **`qpwgraph`** / **`helvum`** — visually confirm the source node
|
||||
has two outgoing links (one to its original destination, one to
|
||||
our tap), both terminating correctly.
|
||||
- **Ear** — connect/disconnect the tap on live audio. Crackles or
|
||||
dropouts on attach indicate the §4.1 sibling-fanout claim doesn't
|
||||
hold and the design needs revisiting.
|
||||
|
||||
If those three say "fine," the §4.1 promise is upheld in practice
|
||||
and 6c is acceptance-tested. `jack_iodelay` and other true-round-trip
|
||||
tools are overkill.
|
||||
|
||||
**Phase 7 — Packaging.** systemd user unit, install paths, default
|
||||
profile install, basic NixOS module.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue