fix: further layer A (per-app) glitches

This commit is contained in:
atagen 2026-05-24 18:12:31 +10:00
parent 2978318019
commit 7797f60128
16 changed files with 1589 additions and 155 deletions

172
PLAN.md
View file

@ -93,8 +93,10 @@ bypass) and a per-app level-control flag (on vs. off).
│ ▼
│ ┌─────────────────────┐
│ │ headroom-filter │
│ │ (pw_stream pair) │
│ Layer C (bus DSP) │ AGC → compressor │
│ │ (pw_filter node, │
│ │ 4 mono ports — │
│ Layer C (bus DSP) │ FL/FR in + out) │
│ │ AGC → compressor │
│ │ → soft → hard │
│ └─────────┬───────────┘
│ │
@ -157,9 +159,15 @@ of Layers B and C.
sink, untouched."
- The **daemon** owns:
- the one virtual sink (created on startup, torn down on exit);
- the filter (a pair of `pw_stream`s — capture + playback — running
on PipeWire's realtime audio thread, with the playback half
targeting `preferred_real_sink`);
- the filter — a single `pw_filter` node (`headroom-filter`) with
four mono DSP ports (input FL/FR + output FL/FR) running on
PipeWire's realtime data thread. Wrapped by the in-house
`pipewire-filter` workspace crate because pipewire-rs 0.8 doesn't
expose `pw_filter`. WirePlumber doesn't auto-link `pw_filter`,
so the routing engine creates the `processed.monitor → filter
in` and `filter out → preferred_real_sink` links explicitly via
`link-factory` (the same primitive the routing engine already
uses for stream re-pinning in Phase 4k);
- one **`AppLevelController`** per managed app stream (§4), each
with its own passive `pw_stream` tap, peak/RMS envelopes, and
`Props.channelVolumes` writer. Created/destroyed on stream
@ -202,18 +210,43 @@ be inline**. The analytical-monitor approach is still used — for the
*slow* AGC loop, where multi-second time constants make control-plane
latency irrelevant — but it cannot own the ceiling.
### Why a `pw_stream` pair, not an LV2 plugin in `module-filter-chain`
### Why a native `pw_filter` node, not an LV2 plugin in `module-filter-chain`
LV2 is not native to PipeWire; it's one of several plugin formats
`module-filter-chain` happens to host (via lilv). Using LV2 would split
Headroom into a plugin + a daemon + a filter-chain JSON, pull in a lilv
runtime, and force gain-target updates through a 32-bit-float control-port
abstraction. A `pw_stream` capture+playback pair is the same pattern
abstraction. A native `pw_filter` node is the same primitive
`module-filter-chain` itself uses internally, but written directly in
Rust against `pipewire-rs`, in the same process as the rest of the
daemon. One binary, no IPC for parameter updates, idiomatic Rust audio
thread. An LV2 wrapper of `headroom-dsp` remains a viable optional
deliverable for use in DAWs.
Rust, in the same process as the rest of the daemon. One binary, no IPC
for parameter updates, idiomatic Rust audio thread. An LV2 wrapper of
`headroom-dsp` remains a viable optional deliverable for use in DAWs.
**Bus filter implementation — the in-house `pipewire-filter` crate.**
pipewire-rs 0.8 ships `Stream` but not `Filter`. Since `headroom-core`
declares `#![forbid(unsafe_code)]`, the unsafe FFI lives in a separate
small workspace crate (`crates/pipewire-filter/`), mirroring
pipewire-rs's `Stream` patterns (heap-boxed `FilterListener<D>`,
RAII `Buffer<'p>`, `// SAFETY:` on every `unsafe` block). The crate
covers exactly the events Headroom needs (`process`, `state_changed`,
`param_changed`). Audited by Codex on landing; the two findings that
would have been real bugs in our use (over-permissive `Sync` on
`PortData`; passing the `error` ptr to the old state in
`state_changed`) were applied. Architectural rule: when pipewire-rs
later ships its own `Filter`, switch to it and delete this crate.
**Earlier shape (now retired): two `pw_stream`s + a ring.** The
dual-`pw_stream` arrangement we shipped in Phase 3 had no PipeWire
graph dependency between capture and playback, so the scheduler was
free to fire playback before capture in the same quantum →
ring-empty → tremolo at quantum cadence. The mitigation was a
65k-sample SPSC ring sized for 4× the worst-case buffer
(`clock.quantum-limit` × `CHANNELS`), adding ~340 ms average
latency. `pw_filter` removes the ring entirely: a single node has
its own input→process→output ordering by construction (the same
ordering `module-filter-chain` relies on). See
[[headroom-pipewire-gotchas]] #14, #17, #18 for the full diagnostic
trail.
---
@ -381,6 +414,59 @@ timing is identical to the no-tap case.
└─────────────────────┘
```
**Important correction (2026-05-22):** the diagram above shows the
tap branching off the source *before* the `channelVolumes`
multiplier, but in practice PipeWire's standard adapter applies
`channelVolumes` *inside* the source node — anything reading the
output port sees the post-attenuation signal. Untreated, this
closes a feedback loop on the controller: write reduction → tap
measures attenuated signal → envelopes release → "no reduction
needed" → controller stops writing, gain freezes wherever it last
was, dynamics no longer tracked. The implementation compensates by
dividing incoming `peak_lin` / `mean_sq_lin` by `last_written_lin`
(and its square) inside `AppLevelController::process_block`,
recovering the pre-attenuation signal estimate. Below a floor of
40 dB applied gain (`GAIN_COMPENSATION_FLOOR = 0.01`) the
compensation is skipped — a fully-muted stream would otherwise
amplify floor noise back to max-cut and lock the user out of
unmuting. See `app_level.rs` and the per-app-gain memory note for
the rationale and the corner cases.
**Source-suspension catch-up.** When the source node suspends
(PipeWire's adapter stops delivering buffers — Strawberry between
tracks, the user pausing, a screensaver kicking in) the tap's
`process_block` doesn't run, so the envelopes don't release and
the controller carries stale attenuation into the next stretch of
audio. `AppLevelController::tick_silent(now)` — called from the
Layer A drain timer on every pass — advances envelopes through
silent gaps by feeding (0, 0) inputs at the controller's block
period. Bounded by `MAX_SILENT_CATCHUP_BLOCKS` (~10 s); past that
the envelopes have fully released anyway and we short-circuit via
`envelopes.reset()`. The drain pass runs at 5 ms cadence, so
post-resume audio sees a fresh controller within one tick.
**Per-app `user_ceiling` persistence across stream lifecycles.**
Apps like Strawberry create a fresh `Stream/Output/Audio` node
per track. PipeWire carries over the previous node's
`Props.channelVolumes` — frequently our own last-written value
from the prior track. The new `managed_stream`'s controller is
fresh (`last_written_lin = 1.0`); without intervention, the first
param event from `subscribe_params(Props)` fires with the
inherited daemon-value, the echo check fails (diff vs 1.0 is
huge), `on_external_change` misattributes it as user-set, and the
ceiling gets locked at whatever the previous track's reduction
was. `RoutingState` therefore holds a `persisted_ceilings` map
keyed by `app_label` (process_binary, falling back to
application_name); on managed_stream teardown we save the
controller's current `user_ceiling_lin`, and on spawn we
`AppLevelController::restore_state(ceiling, now)` plus write the
ceiling to `Props.channelVolumes` BEFORE calling
`subscribe_params(Props)`. The ordering is load-bearing —
writing after subscribe races against the initial-state replay
and the bug recurs. First-time apps (no persisted entry) still
treat the first observation as user-set, which is correct
because no daemon-value can have been inherited yet.
### 4.2 The metrics: peak + RMS, no LUFS
LUFS is the wrong measurement here. Its shortest window (momentary,
@ -565,17 +651,48 @@ current `preferred_real_sink` via `target.object` metadata writes
### 5.2 The filter
Two `pw_stream`s:
One `pw_filter` node (`headroom-filter`), wrapped by the in-house
`pipewire-filter` workspace crate, with **four mono DSP ports**
the canonical shape `module-filter-chain` uses:
- **Capture stream** linked to `headroom-processed`'s monitor. Format:
`F32 LE`, channels 2, rate matched to real sink, latency-quantum
matched (default 1024 frames; configurable).
- **Playback stream** linked to the current `preferred_real_sink`.
Same format.
- `input_FL` / `input_FR``Direction::Input`, `format.dsp = "32 bit
float mono audio"`, `audio.channel = FL|FR`. The routing engine
links these to the corresponding monitor ports on
`headroom-processed`.
- `output_FL` / `output_FR``Direction::Output`, same format
properties. The routing engine links these to the corresponding
input ports on `preferred_real_sink`.
`process` callback: pull a buffer from capture, run AGC gain →
compressor → limiter → push to playback. Allocation-free. Parameter
updates arrive over an `rtrb` SPSC queue from the control thread.
Single `process` callback per quantum: dequeue all four mono
buffers, run AGC gain → compressor → limiter on the
`(in_l[i], in_r[i])` pair, write `(out_l[i], out_r[i])`. Queue all
four buffers back via `Buffer::Drop`. Allocation-free; guarded by
`assert_no_alloc` in debug. Parameter updates arrive over an `rtrb`
SPSC queue from the control thread.
**Routing.** WirePlumber's policy does not auto-link `pw_filter`
nodes (the `pw_filter` API has no `AUTOCONNECT` flag and WP has no
default linking heuristic for hybrid input+output nodes). The
routing engine therefore wires the filter explicitly:
`try_capture_filter_playback` matches the filter's
registry global by `node.name`, then enqueues two routes through
the existing `pending_routes` machinery — one source-=-processed /
target-=-filter for the input legs, one source-=-filter /
target-=-real_sink for the output legs. The
`pair_count >= 2` ordinal pairing in `apply_pending_routes`
(FL→FL, FR→FR) is exactly the per-channel mono structure above.
The filter is resolved as a routing target via
`resolve_routing_target` / `is_routing_target` helpers that check
`filter_playback_id` ahead of `sinks_by_name` — the filter is
**not** registered as a fake `Audio/Sink`, so the map stays
genuinely sink-only.
**Rebuild on rate change.** When the real sink's negotiated rate
changes (`PwCommand::RebuildFilter`), the routing engine clears
`filter_playback_id` *before* dropping the old filter so the new
filter's registry global is recaptured even if its `global_add`
races ahead of the old `global_remove`.
### 5.3 Routing
@ -866,7 +983,10 @@ signals; limiter validated to hold a 0.1 dBTP ceiling on EBU TECH
3341 generators. *(this commit: limiter first)*
**Phase 3 — daemon core.** `headroom-core` brings up the
`headroom-processed` virtual sink, the filter (pw_stream pair),
`headroom-processed` virtual sink, the bus filter (originally a
`pw_stream` pair + SPSC ring; rewritten to a single `pw_filter`
node in 2026-05-22 — see PW gotchas #14, #17, #18 and the
`pipewire-filter` workspace crate),
the `preferred_real_sink` tracker, the registry subscriber, and the
routing engine. Hardcoded profile, no IPC server yet.
@ -928,6 +1048,16 @@ lost. Pick up by name when the trigger that gates them fires.
Layer A's `LAYER_A_BLOCK_DT_S` constant becoming dynamic too.
Gated on a multi-rate hardware test bench — no point shipping
the refactor without something to validate it against. **v1 scope.**
- ~~**Bus filter is two `pw_stream`s + an SPSC ring → per-quantum
tremolo on shared-driver topologies.**~~ **Closed 2026-05-22 by
rewrite to a single `pw_filter` node** (new in-house
`pipewire-filter` workspace crate holding the unsafe FFI; one
process callback with input→DSP→output ordering by construction;
capture↔playback ring deleted entirely). Surfaced on first soak
that WP doesn't auto-link `pw_filter`, so the filter was
restructured to 4 mono ports (canonical `module-filter-chain`
shape) and the routing engine extended to wire it explicitly via
`link-factory`. See §5.2 above and `pipewire-gotchas` #14/#17/#18.
- ~~**Filter playback BUSY spikes (periodic, ~10 s cadence).**~~
**Closed in 8e (`d52cd6d`).** The instrumentation added by 8e
did not reproduce the ~8×-baseline outlier pattern in a ~3 min