1300 lines
63 KiB
Markdown
1300 lines
63 KiB
Markdown
# Headroom
|
||
|
||
A Rust AGC + compressor + true-peak limiter for PipeWire. Per-application
|
||
exclusion, profile-based presets, single-binary daemon, scriptable over a
|
||
Unix-domain socket.
|
||
|
||
This document is the canonical plan. It supersedes the earlier
|
||
conversational sketch.
|
||
|
||
---
|
||
|
||
## 1. Goals & non-goals
|
||
|
||
### Goals
|
||
|
||
- **Hard safety net on the processed route.** Audio routed through
|
||
`headroom-processed` is guaranteed to leave the filter below a
|
||
configurable ceiling (default **−0.1 dBTP**) with proper inter-sample
|
||
peak handling. The guarantee is enforced inline in the filter,
|
||
downstream of every control-plane code path, and survives daemon
|
||
misbehaviour, profile reloads, and bad routing decisions. Streams
|
||
routed `bypass` ride the real sink directly and are **not** subject
|
||
to this contract (see §2 path ①); the contract also does not extend
|
||
to whatever resampling or post-processing the downstream device
|
||
path applies after the filter's output.
|
||
- **Per-application exclusion.** Music players, games, and DAWs route
|
||
around the processor; browsers, voice chat, and "everything else" go
|
||
through it. Rules are app-level and live in profiles.
|
||
- **Drop-in defaults.** First-run experience: install, enable user
|
||
service, done. No mandatory config. Power users edit TOML or use the
|
||
CLI.
|
||
- **Profiles** for distinct listening scenarios (default / night /
|
||
speech / transparent / bypass-all).
|
||
- **Single binary.** Daemon, filter, routing, and control loop all live
|
||
in one process. The DSP kernels are a separate crate so they can be
|
||
reused (LV2/standalone) later.
|
||
- **Scriptable.** Unix-domain-socket IPC with a documented JSON schema
|
||
so anyone can write an alternative client (Qt/QuickShell panel, Eww
|
||
widget, scripts). A first-party Rust crate (`headroom-ipc`) wraps it.
|
||
- **Rust, lean dep tree.** No NIH where mature crates exist, no bloat
|
||
where they don't.
|
||
|
||
### Non-goals (v0)
|
||
|
||
- Surround / >2-channel content. v0 is stereo only; >2ch is routed
|
||
directly to the real sink, untouched by Headroom's filter chain.
|
||
- LV2/CLAP plugin distribution. The DSP crate is plugin-shaped so this
|
||
is cheap to add later, but it's not a v0 deliverable.
|
||
- GUI. Third parties can build one against the IPC.
|
||
- Capture-side processing (microphone). v0 is playback only.
|
||
|
||
---
|
||
|
||
## 2. Architecture
|
||
|
||
Each app's audio takes one of four end-to-end paths, chosen by two
|
||
**orthogonal** profile flags: a routing decision (processed vs.
|
||
bypass) and a per-app level-control flag (on vs. off).
|
||
|
||
```
|
||
┌─── optional, opt-in per app (Layer A) ────────────────┐
|
||
│ │
|
||
│ ┌─► passive tap ─► peak + RMS ─► AppLevelController │
|
||
│ │ (sibling link in same quantum) │ │
|
||
│ │ │ │
|
||
│ │ Props.channelVolumes write ◄──────┘ │
|
||
│ │ │
|
||
└───┼───────────────────────────────────────────────────┘
|
||
│
|
||
│ APP STREAM NODE
|
||
│ ┌──────────────────────────┐
|
||
│ │ raw output │
|
||
app's audio ───►├──►│ × channelVolumes │──► output port
|
||
│ └──────────────────────────┘
|
||
│ │
|
||
└────────────────────────────────────────────│
|
||
│
|
||
routing decision (Layer B) │
|
||
target.object set by daemon │
|
||
│
|
||
┌─────────────────────────────────────────┴┐
|
||
▼ ▼
|
||
route = "bypass" route = "processed"
|
||
target.object = target.object =
|
||
preferred_real_sink headroom-processed
|
||
│ │
|
||
│ ▼
|
||
│ ┌─────────────────────┐
|
||
│ │ headroom-processed │
|
||
│ │ (virtual sink, the │
|
||
│ │ system default) │
|
||
│ └─────────┬───────────┘
|
||
│ ▼
|
||
│ ┌─────────────────────┐
|
||
│ │ headroom-filter │
|
||
│ │ (pw_filter node, │
|
||
│ │ 4 mono ports — │
|
||
│ Layer C (bus DSP) │ FL/FR in + out) │
|
||
│ │ AGC → compressor │
|
||
│ │ → soft → hard │
|
||
│ └─────────┬───────────┘
|
||
│ │
|
||
▼ ▼
|
||
preferred_real_sink ◄──────────────────────► (DAC)
|
||
```
|
||
|
||
### The four end-to-end paths
|
||
|
||
| | Routing = bypass | Routing = processed |
|
||
|---|---|---|
|
||
| **per-app off** | ① **true bypass** — Headroom touches nothing on the signal path. Same latency as if Headroom weren't installed. | ③ **bus DSP only** — stream flows through `headroom-processed` and the inline chain. `channelVolumes` left at whatever the user/app set. |
|
||
| **per-app on** | ② **per-app only** — level-reactive `channelVolumes` writes, no graph hop. Zero added signal-path latency. | ④ **full stack** — per-app level control *and* bus DSP. Maximum protection. |
|
||
|
||
Path-by-path properties:
|
||
|
||
| Path | Signal-path latency added | Limiter contract? | Per-app gain ride? |
|
||
|---|---|---|---|
|
||
| ① bypass / per-app off | 0 | no | no |
|
||
| ② bypass / per-app on | 0 | no | yes (Layer A) |
|
||
| ③ processed / per-app off | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | no |
|
||
| ④ processed / per-app on | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | yes (Layer A) |
|
||
|
||
The two flags are independent. A competitive game's typical config
|
||
is ①: zero Headroom involvement in its audio. A user concerned about
|
||
notification dings on top of that game would put Discord on ② or ④
|
||
(so notifications get tamed via Discord's own `channelVolumes`)
|
||
while leaving the game on ①.
|
||
|
||
```
|
||
headroom-core (daemon, one process)
|
||
• per-app level controllers (Layer A)
|
||
• routing engine + preferred_real_sink (Layer B)
|
||
• slow AGC loop, profile manager (Layer C)
|
||
• IPC server
|
||
│
|
||
▼
|
||
$XDG_RUNTIME_DIR/headroom/control.sock
|
||
│
|
||
┌───────────┴───────────┐
|
||
▼ ▼
|
||
headroom CLI third-party clients
|
||
(Qt panel, widgets, …)
|
||
```
|
||
|
||
See §4 for Layer A's mechanics and §5 for the PipeWire-level details
|
||
of Layers B and C.
|
||
|
||
### One virtual sink, one daemon process
|
||
|
||
- `headroom-processed` — virtual sink. Set as the system default so
|
||
new streams land in it by default. Its monitor is captured by
|
||
`headroom-filter`, pushed through the DSP graph, and emitted to the
|
||
current `preferred_real_sink`.
|
||
- **No bypass sink.** Streams marked `route = "bypass"` are pointed
|
||
directly at `preferred_real_sink` via a `target.object` metadata
|
||
write. They pay zero added latency vs. running without Headroom
|
||
installed at all — there's no extra graph hop, no extra DSP. The
|
||
word "bypass" in the profile DSL means "route directly to the real
|
||
sink, untouched."
|
||
- The **daemon** owns:
|
||
- the one virtual sink (created on startup, torn down on exit);
|
||
- the filter — a single `pw_filter` node (`headroom-filter`) with
|
||
four mono DSP ports (input FL/FR + output FL/FR) running on
|
||
PipeWire's realtime data thread. Wrapped by the in-house
|
||
`pipewire-filter` workspace crate because pipewire-rs 0.8 doesn't
|
||
expose `pw_filter`. WirePlumber doesn't auto-link `pw_filter`,
|
||
so the routing engine creates the `processed.monitor → filter
|
||
in` and `filter out → preferred_real_sink` links explicitly via
|
||
`link-factory` (the same primitive the routing engine already
|
||
uses for stream re-pinning in Phase 4k);
|
||
- one **`AppLevelController`** per managed app stream (§4), each
|
||
with its own passive `pw_stream` tap, peak/RMS envelopes, and
|
||
`Props.channelVolumes` writer. Created/destroyed on stream
|
||
lifecycle events.
|
||
- **`preferred_real_sink` tracking.** The daemon watches the
|
||
`default.audio.sink` metadata key. When the user changes the
|
||
system default (via pavucontrol, `wpctl set-default`, etc.) to a
|
||
hardware sink, the daemon (a) treats that sink as the new
|
||
`preferred_real_sink`, (b) re-links `headroom-filter`'s playback
|
||
stream to it, and (c) rewrites `target.object` for every
|
||
currently-bypassed stream so they follow. Hotplug / Bluetooth
|
||
handoffs use the same machinery.
|
||
- the slow AGC loop (reads loudness, writes gain target into the
|
||
filter via an `rtrb` channel);
|
||
- the routing engine (subscribes to the PipeWire registry, evaluates
|
||
rules on new streams, writes `target.object` to the `default`
|
||
metadata: either `headroom-processed` for processed streams or
|
||
`preferred_real_sink` for bypassed streams);
|
||
- the IPC server.
|
||
|
||
### Why no `headroom-bypass` sink
|
||
|
||
An earlier iteration of the design had a second virtual sink
|
||
(`headroom-bypass`) that loopback'd to the real sink, so "bypassed"
|
||
streams routed to it. This added one PipeWire quantum of latency to
|
||
every bypassed stream for no functional benefit — `module-loopback`
|
||
buffers across the quantum boundary even when the DSP is a no-op.
|
||
Direct routing via `target.object` skips the hop entirely. The win is
|
||
real for competitive games, DAW monitoring, and music players: they
|
||
now ride exactly the same path they'd take if Headroom weren't
|
||
installed.
|
||
|
||
### Why this is *not* the "analytical sink + adjust master volume"
|
||
### shape originally proposed
|
||
|
||
Volume control via SPA `Props` updates is not sample-accurate. A true-peak
|
||
limiter needs a small internal delay line so gain reduction is applied
|
||
to the same samples that were analyzed. Therefore the **brickwall must
|
||
be inline**. The analytical-monitor approach is still used — for the
|
||
*slow* AGC loop, where multi-second time constants make control-plane
|
||
latency irrelevant — but it cannot own the ceiling.
|
||
|
||
### Why a native `pw_filter` node, not an LV2 plugin in `module-filter-chain`
|
||
|
||
LV2 is not native to PipeWire; it's one of several plugin formats
|
||
`module-filter-chain` happens to host (via lilv). Using LV2 would split
|
||
Headroom into a plugin + a daemon + a filter-chain JSON, pull in a lilv
|
||
runtime, and force gain-target updates through a 32-bit-float control-port
|
||
abstraction. A native `pw_filter` node is the same primitive
|
||
`module-filter-chain` itself uses internally, but written directly in
|
||
Rust, in the same process as the rest of the daemon. One binary, no IPC
|
||
for parameter updates, idiomatic Rust audio thread. An LV2 wrapper of
|
||
`headroom-dsp` remains a viable optional deliverable for use in DAWs.
|
||
|
||
**Bus filter implementation — the in-house `pipewire-filter` crate.**
|
||
pipewire-rs 0.8 ships `Stream` but not `Filter`. Since `headroom-core`
|
||
declares `#![forbid(unsafe_code)]`, the unsafe FFI lives in a separate
|
||
small workspace crate (`crates/pipewire-filter/`), mirroring
|
||
pipewire-rs's `Stream` patterns (heap-boxed `FilterListener<D>`,
|
||
RAII `Buffer<'p>`, `// SAFETY:` on every `unsafe` block). The crate
|
||
covers exactly the events Headroom needs (`process`, `state_changed`,
|
||
`param_changed`). Audited by Codex on landing; the two findings that
|
||
would have been real bugs in our use (over-permissive `Sync` on
|
||
`PortData`; passing the `error` ptr to the old state in
|
||
`state_changed`) were applied. Architectural rule: when pipewire-rs
|
||
later ships its own `Filter`, switch to it and delete this crate.
|
||
|
||
**Earlier shape (now retired): two `pw_stream`s + a ring.** The
|
||
dual-`pw_stream` arrangement we shipped in Phase 3 had no PipeWire
|
||
graph dependency between capture and playback, so the scheduler was
|
||
free to fire playback before capture in the same quantum →
|
||
ring-empty → tremolo at quantum cadence. The mitigation was a
|
||
65k-sample SPSC ring sized for 4× the worst-case buffer
|
||
(`clock.quantum-limit` × `CHANNELS`), adding ~340 ms average
|
||
latency. `pw_filter` removes the ring entirely: a single node has
|
||
its own input→process→output ordering by construction (the same
|
||
ordering `module-filter-chain` relies on). See
|
||
[[headroom-pipewire-gotchas]] #14, #17, #18 for the full diagnostic
|
||
trail.
|
||
|
||
---
|
||
|
||
## 3. DSP
|
||
|
||
### 3.1 Two-tier true-peak limiter (`headroom-dsp::limiter`)
|
||
|
||
The limiter has **two parallel tiers** sharing the same upsampler,
|
||
downsampler, delay line, and sliding peak buffer. Both run at the
|
||
oversampled rate.
|
||
|
||
**Hard tier — the safety contract.** Output ceiling default
|
||
**−0.1 dBTP**, configurable. Instant attack on the gain envelope plus a
|
||
brief hold and a slow release. Two defensive `clamp` stages downstream
|
||
(once in the oversampled domain, once at the input rate after
|
||
downsampling) guarantee the contract numerically — the envelope can
|
||
misbehave and the contract still holds. Never bypassed, never
|
||
disabled.
|
||
|
||
**Contract scope (caveat).** The ≤ −0.1 dBTP guarantee holds at the
|
||
*filter's output*, not at the speaker. The bus filter is hardcoded
|
||
F32 stereo @ 48 kHz (`headroom-dsp::limiter`'s 4× oversampler is
|
||
sized for 48 k); when the real sink negotiates a different rate
|
||
(44.1 kHz, 96 kHz, 192 kHz), PipeWire inserts a downstream
|
||
resampler between `filter.playback` and the sink. Polynomial /
|
||
windowed-sinc resamplers can elevate inter-sample peaks slightly
|
||
through their own reconstruction, so the limiter's true-peak
|
||
guarantee leaks across that resampling stage. In practice the
|
||
elevation is small (a few tenths of a dB worst case for a clean
|
||
band-limited resampler), and the contract still holds at the bus
|
||
output where headroom is in control. **For the contract to hold
|
||
end-to-end the filter would need to match the real sink's rate
|
||
and rebuild its DSP coefficients on rate-change** — that's the
|
||
v1 work tracked as PLAN §11 "filter rate matching" (deferred from
|
||
8d, gated on a multi-rate hardware test bench).
|
||
|
||
**Soft tier — the comfort cap.** Targets a *dynamic* ceiling computed
|
||
as `program_lufs + max_psr_db`. Smooth attack/release envelope so the
|
||
gain reduction sounds like volume riding, not a slap. Pulls transients
|
||
to a comfortable peak-to-loudness ratio (default 14 dB) *before* they
|
||
ever threaten the hard ceiling. When the AGC hasn't yet provided a
|
||
program loudness (startup, after reset), the soft tier falls back to a
|
||
static ceiling. Disabled by omitting `[limiter.soft]` in a profile —
|
||
useful for the `transparent` profile where users want pure brickwall
|
||
behavior.
|
||
|
||
Algorithm (per oversampled sample, after upsampling):
|
||
|
||
1. Push raw `|s|` into the sliding-window peak buffer; read the
|
||
max-of-window.
|
||
2. **Soft tier** computes target = `soft_ceiling / window_peak` (clamped
|
||
to ≤ 1), runs through the smooth attack/release envelope, yields
|
||
`soft_gain`.
|
||
3. **Hard tier** predicts the worst-case effective peak after the soft
|
||
tier acts (max of `window_peak * soft_gain` and the asymptote
|
||
`min(window_peak, soft_ceiling)`), then sizes `hard_target` to keep
|
||
that under the hard ceiling. Instant attack, hold, exponential
|
||
release. Yields `hard_gain`.
|
||
4. `total_gain = min(soft_gain, hard_gain)`.
|
||
5. Multiply the delayed sample by `total_gain`.
|
||
6. Clamp at hard ceiling (defense-in-depth).
|
||
7. Downsample, clamp again at hard ceiling at the input rate.
|
||
|
||
When the soft tier is doing its job, the hard tier's "predicted-post-soft"
|
||
target stays above 1.0 and the hard tier never engages. When the soft
|
||
tier is mid-attack (peak just arrived), the hard tier snaps in as a
|
||
safety, then releases as the soft tier catches up.
|
||
|
||
The compressor and AGC stages run *before* the limiter.
|
||
|
||
### 3.2 Feed-forward compressor (`headroom-dsp::compressor`)
|
||
|
||
Standard shape: log-domain detector (peak or RMS, switchable) →
|
||
ratio + soft knee → attack/release envelope smoother → makeup gain →
|
||
linear gain → apply to (small) delayed input. ~150 lines of clean code.
|
||
|
||
Defaults aimed at "gentle, transparent": threshold −24 dBFS,
|
||
ratio 2.5:1, knee 6 dB, attack 10 ms, release 100 ms, makeup auto.
|
||
|
||
### 3.3 Slow AGC (`headroom-core::agc`)
|
||
|
||
Algorithmic descendant of EasyEffects' `autogain.cpp`. Runs *outside*
|
||
the audio thread, on a ~50 ms control tick.
|
||
|
||
- Feeds the audio thread's monitor tap into `ebur128` with
|
||
`Mode::M | S | I | TRUE_PEAK`.
|
||
- Computes `target_gain_dB = target_lufs − measured_lufs`.
|
||
- Smooths with separate attack/release coefficients (leaky integrator).
|
||
- Gates when momentary loudness < silence threshold.
|
||
- Soft-clamps so the AGC can never push more than ±N dB (profile knob).
|
||
- Writes the new gain target into the audio thread via an `rtrb` queue.
|
||
|
||
The AGC's gain is applied *before* the compressor. The compressor and
|
||
limiter still own their own behaviour and ceilings.
|
||
|
||
### 3.4 Measurement: `ebur128`
|
||
|
||
`Mode::M | S | I | TRUE_PEAK`. EBU TECH 3341/3342 conformant via the
|
||
`ebur128` crate. Constructed on the daemon thread; fed from a ring-buffer
|
||
consumer that pulls from the audio thread. The audio thread allocates
|
||
nothing.
|
||
|
||
This is **bus-level** measurement only — used to drive the slow AGC
|
||
loop and meter the processed sink output. Per-app measurement (§4)
|
||
uses a different, much cheaper metric.
|
||
|
||
---
|
||
|
||
## 4. Per-application level control (Layer A)
|
||
|
||
An opt-in, near-zero-latency feedback loop that watches each managed
|
||
application's output stream and adjusts its `Props.channelVolumes`
|
||
multiplier in response to **two parallel level metrics**:
|
||
|
||
- a **fast peak envelope** that catches short bursts and sustained
|
||
loud passages (think: a notification ding, a video that just got
|
||
louder), and
|
||
- a **slow RMS envelope** that catches *sustained loudness*
|
||
mismatches (think: "Discord is permanently louder than everything
|
||
else even when nobody's shouting").
|
||
|
||
A stream's applied gain reduction is `max(peak_reduction,
|
||
rms_reduction)` — whichever path is asking for more cut wins, and
|
||
recovery only happens when *both* paths agree the stream has settled.
|
||
This is the layer's whole point: the peak path handles transients
|
||
within one quantum; the RMS path keeps long-term inter-app loudness
|
||
balanced. Neither alone is enough.
|
||
|
||
Orthogonal to bus routing — a stream can be processed *or* bypassed
|
||
*and* level-controlled independently. Its goal is "tame noisy apps
|
||
without startling the listener and without making the chronic
|
||
loudmouth permanently dominate," while the signal path itself stays
|
||
untouched.
|
||
|
||
### 4.1 Why this is zero-latency
|
||
|
||
The per-app multiplier is the `channelVolumes` value PipeWire already
|
||
applies inside the app's stream node — it's the same number
|
||
`pavucontrol`'s per-app slider writes to. Adjusting it doesn't insert
|
||
a graph node; nothing new sits between the app and its destination
|
||
sink. The only cost is that **the analysis happens via a sibling
|
||
fanout link**, not in the playback path: PipeWire schedules fanout
|
||
consumers in parallel within the same quantum, so the playback path's
|
||
timing is identical to the no-tap case.
|
||
|
||
```
|
||
┌──► passive tap (analysis only)
|
||
│ │
|
||
│ ▼
|
||
│ peak + RMS envelopes
|
||
│ (audio thread, sub-ms)
|
||
app stream ──────┤ │
|
||
(output port) │ ▼
|
||
│ rtrb push
|
||
│ │
|
||
│ ▼
|
||
│ AppLevelController (daemon thread)
|
||
│ │
|
||
│ │ Props.channelVolumes write
|
||
│ ▼ (back into the app stream node)
|
||
│ ┌─────────────────────┐
|
||
└──►│ app stream multiplies
|
||
│ by channelVolumes, │──► (its sink — Layer B)
|
||
│ then publishes. │
|
||
└─────────────────────┘
|
||
```
|
||
|
||
**Important correction (2026-05-22):** the diagram above shows the
|
||
tap branching off the source *before* the `channelVolumes`
|
||
multiplier, but in practice PipeWire's standard adapter applies
|
||
`channelVolumes` *inside* the source node — anything reading the
|
||
output port sees the post-attenuation signal. Untreated, this
|
||
closes a feedback loop on the controller: write reduction → tap
|
||
measures attenuated signal → envelopes release → "no reduction
|
||
needed" → controller stops writing, gain freezes wherever it last
|
||
was, dynamics no longer tracked. The implementation compensates by
|
||
dividing incoming `peak_lin` / `mean_sq_lin` by `last_written_lin`
|
||
(and its square) inside `AppLevelController::process_block`,
|
||
recovering the pre-attenuation signal estimate. Below a floor of
|
||
−40 dB applied gain (`GAIN_COMPENSATION_FLOOR = 0.01`) the
|
||
compensation is skipped — a fully-muted stream would otherwise
|
||
amplify floor noise back to max-cut and lock the user out of
|
||
unmuting. See `app_level.rs` and the per-app-gain memory note for
|
||
the rationale and the corner cases.
|
||
|
||
**Source-suspension catch-up.** When the source node suspends
|
||
(PipeWire's adapter stops delivering buffers — Strawberry between
|
||
tracks, the user pausing, a screensaver kicking in) the tap's
|
||
`process_block` doesn't run, so the envelopes don't release and
|
||
the controller carries stale attenuation into the next stretch of
|
||
audio. `AppLevelController::tick_silent(now)` — called from the
|
||
Layer A drain timer on every pass — advances envelopes through
|
||
silent gaps by feeding (0, 0) inputs at the controller's block
|
||
period. Bounded by `MAX_SILENT_CATCHUP_BLOCKS` (~10 s); past that
|
||
the envelopes have fully released anyway and we short-circuit via
|
||
`envelopes.reset()`. The drain pass runs at 5 ms cadence, so
|
||
post-resume audio sees a fresh controller within one tick.
|
||
|
||
**Per-app `user_ceiling` persistence across stream lifecycles.**
|
||
Apps like Strawberry create a fresh `Stream/Output/Audio` node
|
||
per track. PipeWire carries over the previous node's
|
||
`Props.channelVolumes` — frequently our own last-written value
|
||
from the prior track. The new `managed_stream`'s controller is
|
||
fresh (`last_written_lin = 1.0`); without intervention, the first
|
||
param event from `subscribe_params(Props)` fires with the
|
||
inherited daemon-value, the echo check fails (diff vs 1.0 is
|
||
huge), `on_external_change` misattributes it as user-set, and the
|
||
ceiling gets locked at whatever the previous track's reduction
|
||
was. `RoutingState` therefore holds a `persisted_ceilings` map
|
||
keyed by `app_label` (process_binary, falling back to
|
||
application_name); on managed_stream teardown we save the
|
||
controller's current `user_ceiling_lin`, and on spawn we
|
||
`AppLevelController::restore_state(ceiling, now)` plus write the
|
||
ceiling to `Props.channelVolumes` BEFORE calling
|
||
`subscribe_params(Props)`. The ordering is load-bearing —
|
||
writing after subscribe races against the initial-state replay
|
||
and the bug recurs. First-time apps (no persisted entry) still
|
||
treat the first observation as user-set, which is correct
|
||
because no daemon-value can have been inherited yet.
|
||
|
||
### 4.2 The metrics: peak + RMS, no LUFS
|
||
|
||
LUFS is the wrong measurement here. Its shortest window (momentary,
|
||
400 ms) blurs out exactly the transients we want to catch, and the
|
||
K-weighting filter adds CPU for no benefit when we're trying to react
|
||
fast. We also explicitly want a *second* path that targets sustained
|
||
loudness — for that, plain mean-square RMS is the right cheap stand-in,
|
||
not LUFS.
|
||
|
||
| Metric | Window | Job |
|
||
|---|---|---|
|
||
| **Peak envelope** — `max(\|samples\|)` per block, smoothed | ~100 ms attack window, ~500 ms release | Fast: catches a notification ding, a clip getting louder, a partner standing up and shouting. Triggers cut on `peak_threshold_db` (default −6 dBFS). |
|
||
| **RMS envelope** — block mean-square, smoothed | ~1–2 s | Slow: catches "this app is just chronically louder than everything else." Triggers cut on `rms_target_db` (default ≈ −20 dBFS RMS). |
|
||
|
||
Both are computed from the *same* raw buffer in the audio thread, so
|
||
the audio-thread cost is one additional MAC accumulator and a max-
|
||
scan per sample. Cost analysis in §4.7.
|
||
|
||
### 4.3 Architecture
|
||
|
||
For each managed playback stream (matched by routing rule — see §6):
|
||
|
||
1. **Audio thread (tap stream's process callback):**
|
||
- Pull the buffer from the fanout link.
|
||
- `peak = max(|samples|)` over the block.
|
||
- `mean_sq = Σ(x*x) / n` over the block.
|
||
- Push `{node_id, peak, mean_sq}` to a per-stream `rtrb`.
|
||
2. **Daemon thread (`AppLevelController` per stream):**
|
||
- Drain the rtrb.
|
||
- Update peak envelope (one-pole, fast α — attack within a block,
|
||
release ~500 ms).
|
||
- Update RMS envelope (one-pole, slow α — window ~1–2 s).
|
||
- Compute `peak_reduction_db` and `rms_reduction_db` independently,
|
||
then `proposed = max(peak_reduction_db, rms_reduction_db)`.
|
||
- Smooth toward `proposed`.
|
||
- If the smoothed value is significantly different from
|
||
last-written AND we're not rate-limited (~10 Hz max writes per
|
||
stream), submit `Props.channelVolumes` update.
|
||
|
||
The recovery condition is intentionally *both*-paths-agree: a
|
||
release on the peak path only counts toward unwinding gain
|
||
reduction if the RMS path also reads quiet. This avoids the pumping
|
||
artefact where a transient-heavy stream would rapidly release
|
||
between transients only to be slapped back down on the next one.
|
||
|
||
### 4.4 Honouring user-set volumes
|
||
|
||
The daemon subscribes to `Props` param-change events on each managed
|
||
stream. When a `channelVolumes` change arrives that's meaningfully
|
||
different from `last_written_volume`, it wasn't us — the user
|
||
adjusted via pavucontrol, a hotkey, an app's own UI, etc. The
|
||
controller then either:
|
||
|
||
- **defers entirely** (stops adjusting the stream until the user opts
|
||
back in via `headroom per-app reset <app>`), or
|
||
- **treats the user value as a ceiling** (continues to cut on spikes
|
||
but never raises above what the user wanted).
|
||
|
||
Default is the ceiling behaviour — it's the principle-of-least-surprise
|
||
choice. Users who want strict deference set a profile flag.
|
||
|
||
#### A historical concern: apps that fight back
|
||
|
||
Some PulseAudio-era apps (Discord most famously) used to read and
|
||
re-assert their own `channelVolumes` periodically, fighting any
|
||
external volume manager. The pattern produced a visible ping-pong
|
||
loop and effectively disabled per-app management.
|
||
|
||
The pattern is largely absent from modern PipeWire-native and
|
||
Electron-based apps in 2024+: in-app sliders write `channelVolumes`
|
||
only on user interaction, not on a timer. From Headroom's
|
||
perspective, those user-interaction writes are indistinguishable from
|
||
a pavucontrol slider move — both are legitimate external changes the
|
||
deference policy correctly yields to.
|
||
|
||
If a fight-back app does appear, the **ceiling** deference mode
|
||
degrades gracefully:
|
||
|
||
- App produces hot output → Headroom cuts to 0.5.
|
||
- App writes `channelVolumes = 1.0` back over our cut.
|
||
- Headroom detects the external change, marks the new value
|
||
(1.0) as the ceiling, and stops actively writing.
|
||
- Layer A becomes effectively inert for that stream — there is no
|
||
ping-pong, the user just doesn't get the per-app cut they were
|
||
hoping for. The bus-level Layer C limiter (if engaged) still
|
||
enforces the absolute output ceiling regardless.
|
||
|
||
Explicit pattern detection and rate-limiting of ceiling updates
|
||
(e.g., "ignore ceiling-restoring writes that arrive within N seconds
|
||
of our own writes") is deferred to v1, pending evidence from
|
||
real-world testing that any modern app warrants it. The graceful
|
||
degradation property is the v0 contract.
|
||
|
||
### 4.5 Reaction-time honesty
|
||
|
||
The signal-path latency is **zero**. The reaction latency to a spike
|
||
is bounded by:
|
||
|
||
```
|
||
spike in block N ─► analysis (same quantum)
|
||
─► rtrb push (ns)
|
||
─► controller computes (μs)
|
||
─► Props write to pw main loop
|
||
─► applied to block N+1 of the app stream
|
||
```
|
||
|
||
So sustained loud passages are attenuated within ~one quantum
|
||
(5–20 ms depending on the system's quantum). **Isolated one-block
|
||
transients still leak through** — the first block carrying the spike
|
||
plays with the old gain; subsequent blocks see the reduction. This
|
||
is the irreducible cost of "no lookahead allowed." For absolute
|
||
spike prevention you need lookahead, which means latency, which
|
||
contradicts the constraint of this layer.
|
||
|
||
On the processed route the bus-level Layer C limiter (§3.1) catches
|
||
anything that would exceed the ceiling regardless of whether Layer A
|
||
has caught up; on bypass routes Layer A is the only thing watching, so
|
||
isolated one-block transients reach the real sink. Layer A reduces
|
||
*workload* on Layer C where Layer C is in the path, and is a
|
||
best-effort comfort filter where it isn't; it doesn't replace the
|
||
limiter.
|
||
|
||
### 4.6 Layered budget summary
|
||
|
||
| Layer | Metric | Time scale | Signal-path latency added |
|
||
|---|---|---|---|
|
||
| A: per-app peak | sample peak per block | tens of ms | **0** |
|
||
| A: per-app RMS | block mean-square | seconds | **0** |
|
||
| C: inline soft tier | true-peak, lookahead | sub-ms | shared with hard tier |
|
||
| C: inline hard tier | true-peak, lookahead | sub-ms | ~2 ms lookahead |
|
||
| C: bus AGC | LUFS (ebur128) | many seconds | — (control plane only) |
|
||
|
||
Five distinct jobs, five distinct time scales, no two layers
|
||
duplicate each other. Layer A is the cheapest line of defense and
|
||
the only one that costs zero latency on the audio path.
|
||
|
||
### 4.7 Resource budget per stream
|
||
|
||
| | No TRUE_PEAK (recommended for Layer A) |
|
||
|---|---|
|
||
| Audio thread per quantum | ~10 μs (peak + RMS pass) |
|
||
| Daemon thread per measurement | ~few μs (HashMap lookup + envelope math) |
|
||
| Memory per controller | ~100 bytes |
|
||
| Memory per ebur128 (if enabled) | — N/A; Layer A doesn't use ebur128 |
|
||
|
||
At realistic stream counts (2–5 managed apps): **<0.5% CPU total,
|
||
<1 KB RAM total**. Doesn't move the needle.
|
||
|
||
### 4.8 Lifecycle
|
||
|
||
- **Stream appears** with `media.class = Stream/Output/Audio`
|
||
matching a `[[per_app.rules]]` pattern: create tap link
|
||
(`pw_link_create`), spawn controller, register rtrb.
|
||
- **Stream disappears** (`pw_registry::global_removed`): tear down
|
||
tap, drop controller, clean up rtrb.
|
||
- **App restarts**: new `node_id` → fresh controller. User-volume
|
||
deference state is per-stream-instance, which is the right default.
|
||
|
||
---
|
||
|
||
## 5. PipeWire integration
|
||
|
||
### 5.1 Sinks
|
||
|
||
Created on daemon startup by emitting a `pipewire.conf.d` fragment into
|
||
`$XDG_CONFIG_HOME/pipewire/pipewire.conf.d/headroom.conf` (if not already
|
||
present) and reloading. Alternative: create them at runtime via
|
||
`pw-loopback` equivalents using `pipewire-rs`. v0 ships with the
|
||
runtime-creation path so the install footprint is "one binary, one
|
||
unit file."
|
||
|
||
Sink properties:
|
||
|
||
- `headroom-processed`: `node.name=headroom-processed`,
|
||
`media.class=Audio/Sink`, `audio.position=[FL,FR]`,
|
||
`node.description="Headroom (processed)"`. Promoted to system
|
||
default on startup so new streams land in it by default.
|
||
|
||
There is no second sink. Bypassed streams are routed directly at the
|
||
current `preferred_real_sink` via `target.object` metadata writes
|
||
(see §4.3).
|
||
|
||
### 5.2 The filter
|
||
|
||
One `pw_filter` node (`headroom-filter`), wrapped by the in-house
|
||
`pipewire-filter` workspace crate, with **four mono DSP ports** —
|
||
the canonical shape `module-filter-chain` uses:
|
||
|
||
- `input_FL` / `input_FR` — `Direction::Input`, `format.dsp = "32 bit
|
||
float mono audio"`, `audio.channel = FL|FR`. The routing engine
|
||
links these to the corresponding monitor ports on
|
||
`headroom-processed`.
|
||
- `output_FL` / `output_FR` — `Direction::Output`, same format
|
||
properties. The routing engine links these to the corresponding
|
||
input ports on `preferred_real_sink`.
|
||
|
||
Single `process` callback per quantum: dequeue all four mono
|
||
buffers, run AGC gain → compressor → limiter on the
|
||
`(in_l[i], in_r[i])` pair, write `(out_l[i], out_r[i])`. Queue all
|
||
four buffers back via `Buffer::Drop`. Allocation-free; guarded by
|
||
`assert_no_alloc` in debug. Parameter updates arrive over an `rtrb`
|
||
SPSC queue from the control thread.
|
||
|
||
**Routing.** WirePlumber's policy does not auto-link `pw_filter`
|
||
nodes (the `pw_filter` API has no `AUTOCONNECT` flag and WP has no
|
||
default linking heuristic for hybrid input+output nodes). The
|
||
routing engine therefore wires the filter explicitly:
|
||
`try_capture_filter_playback` matches the filter's
|
||
registry global by `node.name`, then enqueues two routes through
|
||
the existing `pending_routes` machinery — one source-=-processed /
|
||
target-=-filter for the input legs, one source-=-filter /
|
||
target-=-real_sink for the output legs. The
|
||
`pair_count >= 2` ordinal pairing in `apply_pending_routes`
|
||
(FL→FL, FR→FR) is exactly the per-channel mono structure above.
|
||
|
||
The filter is resolved as a routing target via
|
||
`resolve_routing_target` / `is_routing_target` helpers that check
|
||
`filter_playback_id` ahead of `sinks_by_name` — the filter is
|
||
**not** registered as a fake `Audio/Sink`, so the map stays
|
||
genuinely sink-only.
|
||
|
||
**Rebuild on rate change.** When the real sink's negotiated rate
|
||
changes (`PwCommand::RebuildFilter`), the routing engine clears
|
||
`filter_playback_id` *before* dropping the old filter so the new
|
||
filter's registry global is recaptured even if its `global_add`
|
||
races ahead of the old `global_remove`.
|
||
|
||
### 5.3 Routing
|
||
|
||
- On startup, write `default.audio.sink` in the `default` metadata to
|
||
point at `headroom-processed` so new streams default to the
|
||
processor. The previous value (the user's hardware sink) is
|
||
captured as the initial `preferred_real_sink`.
|
||
- Subscribe to `pw_registry` global-added events.
|
||
- On any new node with `media.class == "Stream/Output/Audio"` and
|
||
`node.dont-move != true`:
|
||
- Read `application.process.binary`, `application.name`,
|
||
`pipewire.access.portal.app_id`, `media.role`.
|
||
- Evaluate routing rules from the active profile to decide
|
||
`processed` vs. `bypass`.
|
||
- Write `target.object` into the `default` metadata for the new
|
||
stream:
|
||
- `processed` → `headroom-processed`'s `object.serial`.
|
||
- `bypass` → `preferred_real_sink`'s `object.serial`.
|
||
WirePlumber honours this for any movable stream.
|
||
- Watch `default.audio.sink` metadata changes. When the user switches
|
||
the system default to a hardware sink, the daemon:
|
||
- records that sink as the new `preferred_real_sink`,
|
||
- re-links `headroom-filter`'s playback stream to it,
|
||
- rewrites `target.object` for every currently-bypassed stream so
|
||
they follow the new hardware,
|
||
- re-asserts `headroom-processed` as the *default for new streams*
|
||
(so subsequent app launches still land in the processor).
|
||
- Hotplug (sink appears/disappears) goes through the same code path.
|
||
|
||
### 5.4 Stream identification
|
||
|
||
| Property | Reliability | Use |
|
||
|---|---|---|
|
||
| `application.process.binary` | high (kernel-sourced) | primary key |
|
||
| `application.name` | medium | secondary / display |
|
||
| `pipewire.access.portal.app_id` | high (Flatpak only) | match sandboxed apps |
|
||
| `media.role` | low (most apps omit) | bonus signal only |
|
||
| `media.class` | structural | gate to playback streams |
|
||
|
||
---
|
||
|
||
## 6. Profiles
|
||
|
||
Profile files live in `$XDG_CONFIG_HOME/headroom/profiles/*.toml`,
|
||
shadowing shipped defaults in `/usr/share/headroom/profiles/` by
|
||
name. Profile files are user-authored configuration — they're the
|
||
thing you open in `$EDITOR`. File-watcher hot-reload via
|
||
`notify-debouncer-mini` is planned; in the meantime `profile.reload`
|
||
re-scans on demand.
|
||
|
||
Daemon-managed user state — active profile name, per-app route
|
||
overrides made via `route.set`, dotted-key tweaks made via
|
||
`setting.set`, the global bypass flag — is *not* mixed in with the
|
||
profile TOMLs. It lives in a single `overlay.toml` at
|
||
`$XDG_STATE_HOME/headroom/overlay.toml`, written atomically by the
|
||
daemon (stage to `overlay.toml.tmp-…`, then rename). The overlay
|
||
rides on top of whichever profile is active, so `route.set obs
|
||
bypass` persists across `profile.use night` — that's a user
|
||
preference, not a tweak of `default`. If the overlay names an active
|
||
profile that's not on disk, the daemon falls back to the built-in
|
||
default and surfaces a warning; it does not refuse to start.
|
||
|
||
Each profile is a complete listening scenario. Schema (`headroom-core::profile`):
|
||
|
||
```toml
|
||
name = "default"
|
||
description = "Gentle transparent processing for everyday use."
|
||
|
||
[agc]
|
||
enabled = true
|
||
target_lufs = -18.0 # ITU-R BS.1770 integrated target
|
||
attack_ms = 2000.0
|
||
release_ms = 800.0
|
||
silence_threshold_lufs = -70.0
|
||
max_boost_db = 12.0
|
||
max_cut_db = 12.0
|
||
|
||
[compressor]
|
||
enabled = true
|
||
detector = "peak" # "peak" | "rms"
|
||
threshold_db = -24.0
|
||
ratio = 2.5
|
||
knee_db = 6.0
|
||
attack_ms = 10.0
|
||
release_ms = 100.0
|
||
makeup_db = "auto" # number or "auto"
|
||
|
||
[limiter]
|
||
ceiling_dbtp = -0.1
|
||
lookahead_ms = 2.0
|
||
release_ms = 80.0
|
||
hold_ms = 5.0
|
||
oversample = 4 # 1 | 2 | 4 | 8 (1 disables ISP detection)
|
||
link = "stereo" # "stereo" | "dual-mono"
|
||
|
||
[meters]
|
||
publish_hz = 20.0
|
||
|
||
[[rules]]
|
||
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph"] }
|
||
route = "bypass"
|
||
|
||
[[rules]]
|
||
match = { process_binary = ["firefox", "chromium", "google-chrome", "Discord", "discord", "element-desktop", "Slack", "zoom", "WEBRTC VoiceEngine"] }
|
||
route = "processed"
|
||
|
||
[default_route]
|
||
route = "processed" # safe default: anything unmatched is processed
|
||
|
||
# ----------------------------------------------------------------------
|
||
# Per-application level control (Layer A). Orthogonal to routing — you
|
||
# can enable per-app on bypass-routed streams to get zero-latency
|
||
# level control (e.g. tame Discord notifications without touching
|
||
# the game's audio path).
|
||
# ----------------------------------------------------------------------
|
||
[per_app]
|
||
enabled = true # master switch; false disables Layer A entirely
|
||
default_enabled = false # for streams not matched by any rule below
|
||
|
||
# Per-rule knobs. Matches use the same key set as [[rules]] above.
|
||
[[per_app.rules]]
|
||
match = { process_binary = ["Discord", "discord", "element-desktop", "Slack", "zoom"] }
|
||
enabled = true
|
||
peak_threshold_db = -6.0 # short-window peak above this triggers cut
|
||
rms_target_db = -20.0 # long-term RMS target (slow path)
|
||
max_cut_db = 12.0 # never cut more than this
|
||
peak_attack_ms = 5.0
|
||
peak_release_ms = 500.0
|
||
rms_window_ms = 1500.0
|
||
# Controller-side knobs (all optional; defaults shown).
|
||
smoother_ms = 30.0 # anti-bounce smoother on max(peak,rms)
|
||
write_db_threshold = 0.5 # dB diff below which we don't fire a write
|
||
min_write_interval_ms = 100.0 # min ms between writes per stream (10 Hz cap)
|
||
defer_to_user = "ceiling" # "ceiling" | "strict"
|
||
|
||
[[per_app.rules]]
|
||
match = { process_binary = ["firefox", "chromium", "google-chrome"] }
|
||
enabled = true
|
||
peak_threshold_db = -3.0 # browsers run hotter; raise the trigger
|
||
rms_target_db = -18.0
|
||
|
||
# Music, DAWs, games default to per-app off — they're either trusted
|
||
# to set their own level or routed bypass for a reason.
|
||
[[per_app.rules]]
|
||
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph", "carla"] }
|
||
enabled = false
|
||
```
|
||
|
||
### Shipped profiles
|
||
|
||
| name | one-liner |
|
||
|---|---|
|
||
| `default` | Gentle transparent processing, sensible for daily use. |
|
||
| `night` | Aggressive: −20 LUFS, 4:1, fast release, narrow dynamic range. |
|
||
| `speech` | VoIP-focused; short attack, fast release, controlled dynamic range. |
|
||
| `transparent` | Limiter only. Compressor + AGC bypassed. Safety net only. |
|
||
| `bypass-all` | Routes everything directly to the real sink. The kill switch. |
|
||
| `spike-protection` | Minimal processing; high-threshold catch only. Untouched audio, hard guard against blasts. |
|
||
| `movie` | Wide-DR film: lifts dialogue, keeps action punchy but bounded. |
|
||
| `music` | Inter-track loudness leveling; routes music players *through* the bus. |
|
||
| `podcast` | Spoken-word playback: even narration loudness, smooth and unfatiguing. |
|
||
| `commute` | Listening in noise: heavy normalization + boost, kept loud. |
|
||
| `gaming` | Latency-first: games bypass, voice chat processed, notifications tamed per-app. |
|
||
| `party` | Loud room playback (anti-`night`): maximum loudness, dynamics sacrificed. |
|
||
| `broadcast-14` | Normalizes everything to −14 LUFS (streaming loudness) so sources match. |
|
||
| `quiet-hours` | More aggressive than `night`: very low ceiling, near-flat dynamics. |
|
||
|
||
The limiter section of `bypass-all` is irrelevant in practice (nothing
|
||
flows through `headroom-processed`), but its ceiling field is still
|
||
respected as a fail-safe in case a stream lands on the processed sink
|
||
anyway.
|
||
|
||
---
|
||
|
||
## 7. IPC
|
||
|
||
Transport: Unix-domain socket, `SOCK_STREAM`, `0600`, at
|
||
`$XDG_RUNTIME_DIR/headroom/control.sock`.
|
||
|
||
Wire protocol: **see `IPC.md`** for the full normative schema.
|
||
Summary: u32 BE length prefix + UTF-8 JSON payload. Three message
|
||
shapes — `Request` (id + op + args), `Response` (id + result|error),
|
||
`Event` (topic + data). Subscribers signal interest by topic; events
|
||
fan out to all subscribers with bounded per-subscriber queues. Slow
|
||
subscribers have events **dropped** (overflow events count is itself
|
||
published on the `daemon` topic so clients know they fell behind).
|
||
|
||
The first-party Rust wrapper is `headroom-client`, mirroring how
|
||
[`niri-ipc`](https://github.com/YaLTeR/niri/tree/main/niri-ipc) wraps
|
||
Niri's socket: a thin, no-magic crate that re-exports the wire types
|
||
from `headroom-ipc` and adds a blocking `Client` (and an optional async
|
||
`AsyncClient` behind a feature flag).
|
||
|
||
---
|
||
|
||
## 8. CLI
|
||
|
||
```
|
||
headroom status # current profile, sinks, levels
|
||
headroom daemon # run the daemon (systemd Type=simple)
|
||
headroom profile list | use <name> | show [name]
|
||
headroom route list
|
||
headroom route set <app> processed|bypass # persists in user profile
|
||
headroom route unset <app>
|
||
headroom route stream <node-id> processed|bypass # ad-hoc
|
||
headroom set <key> <value> # tweak active profile in place
|
||
headroom get <key>
|
||
headroom bypass on|off # global kill switch
|
||
headroom reload # reload profiles from disk
|
||
headroom monitor # live meter TUI (uses subscribe)
|
||
```
|
||
|
||
CLI is sync, blocks on `UnixStream`. Talks the same JSON wire as any
|
||
other client.
|
||
|
||
---
|
||
|
||
## 9. Crates
|
||
|
||
```
|
||
headroom/
|
||
├── flake.nix # devshell + package
|
||
├── Cargo.toml # workspace
|
||
├── PLAN.md # this file
|
||
├── IPC.md # wire-protocol schema (normative)
|
||
├── README.md
|
||
└── crates/
|
||
├── headroom-dsp/ # AGC + compressor + limiter (pure DSP, no PW)
|
||
├── headroom-ipc/ # wire types, framing, serde; no I/O
|
||
├── headroom-client/ # blocking client (+ optional async); thin
|
||
├── headroom-core/ # daemon: PW integration, routing, profiles, IPC server
|
||
└── headroom-cli/ # `headroom` binary; depends on headroom-client
|
||
```
|
||
|
||
### External crates (final v0 dep list)
|
||
|
||
**Audio / DSP**
|
||
- `pipewire`, `libspa` — official PipeWire bindings.
|
||
- `ebur128` — measurement.
|
||
- `rtrb` — SPSC ring buffer (audio ↔ control).
|
||
- `basedrop` — RT-safe shared ownership.
|
||
- `assert_no_alloc` — debug-build tripwire.
|
||
|
||
**Plumbing**
|
||
- `serde`, `serde_json` — IPC + profile (de)serialization.
|
||
- `serde-toml` (`toml`) — profile files.
|
||
- `clap` (derive) — CLI.
|
||
- `tracing`, `tracing-subscriber`, `tracing-journald` — logs.
|
||
- `notify`, `notify-debouncer-mini` — profile hot-reload.
|
||
- `crossbeam-channel` — control-plane channels.
|
||
- `parking_lot` — mutexes.
|
||
- `signal-hook` — clean shutdown.
|
||
- `thiserror` — error types.
|
||
|
||
No `tokio`, no `zbus`, no `dbus-*`.
|
||
|
||
---
|
||
|
||
## 10. Nix
|
||
|
||
`flake.nix` ships:
|
||
|
||
- A **devshell** with rust toolchain (via `rust-overlay` for pinned
|
||
channel; default to a stable release pinned in
|
||
`rust-toolchain.toml`), `pkg-config`, `pipewire`'s dev outputs,
|
||
`clang` (for bindgen if invoked by deps), `socat` (handy for poking
|
||
the IPC), `jq`.
|
||
- A **package** output (`packages.<system>.default`) that builds the
|
||
daemon + CLI with `rustPlatform.buildRustPackage`. v0 uses
|
||
`cargoLock.lockFile`. Crane can come later if incremental builds in
|
||
CI become a bottleneck.
|
||
- A `nixosModules.default` placeholder so packagers can wire the user
|
||
unit later. Not implemented in v0 of the flake itself.
|
||
|
||
Intermediate dev work uses plain `cargo` inside `nix develop`. Final
|
||
builds and any CI go through `nix build`.
|
||
|
||
---
|
||
|
||
## 11. Phased implementation
|
||
|
||
The phases are roughly token-of-work units, not calendar weeks. **All
|
||
planned phases (0–8) are done as of 2026-05-21**; this section is
|
||
preserved as historical context + a reading guide to the commit log.
|
||
See [[headroom-project]] in team memory for the per-commit ledger.
|
||
|
||
**Phase 0 — scaffolding.** Flake, workspace, crate skeletons, README,
|
||
PLAN/IPC docs. *(done as part of this commit)*
|
||
|
||
**Phase 1 — IPC + client.** `headroom-ipc` (types, framing, codec) and
|
||
`headroom-client` (blocking `Client`) implemented against the schema in
|
||
`IPC.md`. Round-trip tests, fuzz the codec. *(this commit)*
|
||
|
||
**Phase 2 — DSP kernels.** `headroom-dsp` with limiter, compressor, AGC,
|
||
oversampler, envelope. Tested in isolation against synthesized
|
||
signals; limiter validated to hold a −0.1 dBTP ceiling on EBU TECH
|
||
3341 generators. *(this commit: limiter first)*
|
||
|
||
**Phase 3 — daemon core.** `headroom-core` brings up the
|
||
`headroom-processed` virtual sink, the bus filter (originally a
|
||
`pw_stream` pair + SPSC ring; rewritten to a single `pw_filter`
|
||
node in 2026-05-22 — see PW gotchas #14, #17, #18 and the
|
||
`pipewire-filter` workspace crate),
|
||
the `preferred_real_sink` tracker, the registry subscriber, and the
|
||
routing engine. Hardcoded profile, no IPC server yet.
|
||
|
||
**Phase 4 — IPC server + profile manager.** Wire `headroom-core` to the
|
||
IPC schema. Profile loading + hot-reload. Slow AGC loop ticking on
|
||
real loudness measurements.
|
||
|
||
Sub-stages used in commits / TODOs:
|
||
|
||
- **4a–4d** — Unix socket server, op dispatch, mutating ops, event
|
||
broadcaster.
|
||
- **4e** — `ProfileStore`: shipped + user profiles, atomic reload,
|
||
user overlay at `$XDG_STATE_HOME/headroom/overlay.toml`. `profile.use`,
|
||
`profile.reload`, `setting.set`, `route.set` all dispatch through it.
|
||
- **4f** — DSP parameter propagation: `setting.set` reaches the running
|
||
filter via the `rtrb` control queue, so live profile/setting edits
|
||
take effect without restart.
|
||
- **4h** — `preferred_real_sink` tracking: subscribe to
|
||
`default.audio.sink`, snapshot the prior default, promote
|
||
`headroom-processed`, retarget every bypassed stream on
|
||
default-sink change, on hotplug, and on Bluetooth handoff. Also
|
||
pins the filter's playback to the tracked real sink so processed
|
||
audio follows when the user switches default, and resolves the
|
||
real sink's node id from the registry for `status` reporting.
|
||
- **4i** — `route.stream <node-id> processed|bypass`: ad-hoc per-stream
|
||
override that doesn't write a profile rule. Crosses the
|
||
IPC-thread → PipeWire-thread boundary via a `crossbeam` channel
|
||
drained by a 50 ms timer source on the main loop. State updates
|
||
synchronously; metadata write follows ≤ ~50 ms later.
|
||
|
||
- **Slow AGC loop** — wraps up Phase 4. Audio-thread `AgcGain` stage
|
||
sits at the head of the DSP chain (anti-zipper smoother around a
|
||
per-sample multiplier). Filter pushes *pre-AGC* input samples into a
|
||
dedicated measurement ring. A `AgcController` on the PipeWire main
|
||
loop ticks at 50 ms: drains the ring into `ebur128` (Mode S | M |
|
||
TRUE_PEAK), reads `[agc]` config from the active profile, computes
|
||
`target_lufs − short_term_lufs` clamped to `[-max_cut_db,
|
||
+max_boost_db]`, gates below `silence_threshold_lufs`, slow-smooths
|
||
via leaky integrator, and pushes the result through `FilterControl`
|
||
on the same `rtrb` channel `setting.set` uses.
|
||
|
||
### Tracked follow-ups (carried past their sub-stage)
|
||
|
||
Items deliberately deferred from earlier sub-stages so they don't get
|
||
lost. Pick up by name when the trigger that gates them fires.
|
||
|
||
- **Ephemeral overlay mutations.** *(4e follow-up.)* All `route.set`
|
||
/ `setting.set` changes are persisted to `overlay.toml`. A
|
||
`--ephemeral` flag (or `--volatile`) on the CLI for one-shot tweaks
|
||
that don't outlive the daemon was considered and dropped from v0
|
||
for simplicity. Revisit if real users ask for it; the store-level
|
||
change is a flag on the setter methods. **Dormant** — no user has
|
||
asked through Phase 8.
|
||
- **Filter rate matching to the real sink.** *(F5 follow-up.)* §3.1
|
||
documents the contract leak when the real sink runs at a
|
||
non-48 kHz native rate. Closing it requires dynamic
|
||
`FILTER_SAMPLE_RATE`, kernel rebuild on real-sink change
|
||
(compressor + limiter coefficients are rate-dependent), and
|
||
Layer A's `LAYER_A_BLOCK_DT_S` constant becoming dynamic too.
|
||
Gated on a multi-rate hardware test bench — no point shipping
|
||
the refactor without something to validate it against. **v1 scope.**
|
||
- ~~**Bus filter is two `pw_stream`s + an SPSC ring → per-quantum
|
||
tremolo on shared-driver topologies.**~~ **Closed 2026-05-22 by
|
||
rewrite to a single `pw_filter` node** (new in-house
|
||
`pipewire-filter` workspace crate holding the unsafe FFI; one
|
||
process callback with input→DSP→output ordering by construction;
|
||
capture↔playback ring deleted entirely). Surfaced on first soak
|
||
that WP doesn't auto-link `pw_filter`, so the filter was
|
||
restructured to 4 mono ports (canonical `module-filter-chain`
|
||
shape) and the routing engine extended to wire it explicitly via
|
||
`link-factory`. See §5.2 above and `pipewire-gotchas` #14/#17/#18.
|
||
- ~~**Filter playback BUSY spikes (periodic, ~10 s cadence).**~~
|
||
**Closed in 8e (`d52cd6d`).** The instrumentation added by 8e
|
||
did not reproduce the ~8×-baseline outlier pattern in a ~3 min
|
||
release-build capture; steady state was ~2.2 ms / call at this
|
||
hardware's quantum with max growing only to 1.3× baseline.
|
||
`PlaybackTiming` stays so future regressions surface at WARN.
|
||
Original observation may have been a transient WP/PW housekeeping
|
||
artefact under a different config; no actionable code change.
|
||
- **Sub-millisecond dispatch primitive for spike-reactive writes.**
|
||
*(Phase 6 optimisation, downgraded from prerequisite.)* The 4i
|
||
`PwCommand` channel uses a 50 ms polling timer, fine for
|
||
`route.stream` and slow AGC. Layer A's per-app
|
||
`Props.channelVolumes` writes were originally feared to need a
|
||
sub-ms wake primitive. After 6a/6b benches landed (see
|
||
§11.6 below) we re-evaluated: at a 5 ms polling timer and 21 ms
|
||
PipeWire quantum, the worst-case detection-to-write latency stays
|
||
well inside one quantum, which is what PLAN §4.5 actually
|
||
promises. Polling reuses existing infrastructure and is cheap
|
||
(controller tick is ~30 ns; even at 200 Hz it's lost in the
|
||
noise). The tighter primitive — `EventSource::signal` with an
|
||
`unsafe impl Send` shim around `spa_loop_utils.signal_event`, or a
|
||
pipe + `IoSource` — stays on the table as an optimisation if
|
||
manual testing shows audible spike-leak artefacts. `pw::command`
|
||
module docs still carry the constraint warning for future variants
|
||
that might be tempted to share the 50 ms timer.
|
||
|
||
**Phase 5 — CLI + monitor TUI.** `headroom-cli` implements all the
|
||
subcommands above, plus a `monitor` TUI built on the meters
|
||
subscription.
|
||
|
||
**Phase 6 — Per-application level control (Layer A).** Per-managed-stream
|
||
tap creation, `AppLevelController` with peak + RMS envelopes,
|
||
`Props.channelVolumes` writer, user-volume deference logic,
|
||
`[per_app]` profile parsing, `headroom per-app …` CLI verbs, and a
|
||
per-stream meter event on the IPC. Land after the bus path is stable
|
||
so we have a baseline to compare against.
|
||
|
||
Sub-stages:
|
||
|
||
- **6a** — Pure DSP. `headroom_dsp::LevelEnvelopes`: two-tier (peak
|
||
+ RMS) block-rate detector, `max(peak_reduction, rms_reduction)`
|
||
combined, clamped to `max_cut_db`. Allocation-free,
|
||
block-rate-driven (audio thread emits one `(peak, mean_sq)` pair
|
||
per quantum).
|
||
- **6b** — Daemon-side glue.
|
||
`headroom_core::app_level::AppLevelController`: rule snapshot,
|
||
envelopes, 30 ms anti-bounce smoother, 0.5 dB / 100 ms write
|
||
gate, ceiling vs strict deference state.
|
||
`app_level::evaluate` matches `[[per_app.rules]]` against
|
||
`PwNodeInfo` using the same matcher the routing engine uses.
|
||
- **6c** — PipeWire tap + audio-thread analysis. **Mechanism**:
|
||
per managed stream we create our own `pw_stream` (Direction::Input,
|
||
F32LE stereo, rate left unspecified to negotiate with the source,
|
||
`AUTOCONNECT` off, `NODE_DONT_RECONNECT`, `node.dont-move`),
|
||
`connect()` with no target, `set_active(true)`. PipeWire creates
|
||
our input ports from the declared format. We then build **explicit
|
||
passive port-level links** via `link-factory` with
|
||
`link.output.port` / `link.input.port` set to the source's and
|
||
tap's port global IDs respectively, plus `link.passive = true`.
|
||
**Why not `target.object` or `target_id`**: empirically (6c manual
|
||
smoke) WirePlumber's policy refuses to wire `Stream/Output →
|
||
Stream/Input` via any session-manager-mediated path — it logs no
|
||
error, just doesn't act. The stream-level target was getting set
|
||
on the node (`node.target = <source-id>`) but no link ever
|
||
appeared. Going through `link-factory` with explicit port IDs
|
||
bypasses the session manager entirely and uses PipeWire core
|
||
directly. **Per managed stream**: one `pw_stream`, two `Link`
|
||
proxies (one per channel), one `MeasurementSample` `rtrb`
|
||
(capacity 64). Audio-thread `process` runs `peak = max(|x|)` and
|
||
`mean_sq = Σx²/N` over the block, pushes one sample to the ring.
|
||
**Lifecycle**: registry watcher sees a `Stream/Output/Audio`
|
||
matching a `per_app` rule → spawn tap (ports come up
|
||
asynchronously) → the Layer A drain timer (6d) retries link
|
||
creation each tick until both port sets are visible on the
|
||
registry → links built, stream transitions to `Streaming`,
|
||
samples flow. On registry `global_remove` of the source, drop the
|
||
`ManagedStream`; declaration order severs links first, then the
|
||
tap stream + listener.
|
||
- **6d** — `Props.channelVolumes` writes + controller drain timer.
|
||
A polling timer source on the PipeWire main loop ticks every 5 ms
|
||
(200 Hz, CPU cost ≪ 0.1% of one core per the benches), iterates
|
||
active controllers, drains each measurement ring, calls
|
||
`process_block`, and on a `Some` return writes
|
||
`Props.channelVolumes` via the bound `default` metadata
|
||
(subject = source node id). The 5 ms tick guarantees a spike
|
||
detected at quantum boundary `N` is written before quantum `N+1`
|
||
starts on typical 21 ms quanta — see §4.5 reaction-time honesty
|
||
table.
|
||
- **6e** — User-volume deference + per-stream meter events.
|
||
Subscribe to `Props` param-change events on each managed stream.
|
||
Distinguish daemon writes from external by comparing against
|
||
`last_written_lin` (within 1e-4) — external changes apply
|
||
ceiling-mode or strict-mode deference per the matched rule's
|
||
`defer_to_user` field. Per-stream meters publish on the `meters`
|
||
topic with the smoothed reduction, the peak/RMS envelope values,
|
||
and the current applied `channelVolumes`.
|
||
|
||
**Validated cost budget (criterion microbenches, run 2026-05).**
|
||
PLAN §4.7 budgeted "~10 μs/quantum audio thread, few μs/measurement
|
||
daemon thread." Reality on this hardware:
|
||
|
||
| Bench | Time |
|
||
|---|---|
|
||
| Audio-thread peak + mean_sq scan, 1024-frame stereo block | 1.33 μs |
|
||
| `LevelEnvelopes::process_block` (daemon) | 18 ns |
|
||
| `AppLevelController::process_block` hot signal | 29 ns |
|
||
| `AppLevelController::process_block` quiet signal | 22 ns |
|
||
|
||
5 managed streams: audio thread ≈ 6.6 μs/quantum (0.03% of one
|
||
core at 21 ms quanta); daemon ≈ 145 ns/quantum. ~7-10× under the
|
||
PLAN budget, so the design has room for many more managed streams,
|
||
or for adding ebur128 / TRUE_PEAK to Layer A later if useful.
|
||
|
||
**Manual latency validation (post-6c implementation).** PipeWire
|
||
scheduling can't be benched from Rust alone. Use:
|
||
|
||
- **`pw-top`** — note the source-node `QUANT` and any WAIT/BUSY or
|
||
delay column before attaching the tap; attach Layer A; confirm
|
||
the source-node numbers don't change. The tap appears as a new
|
||
row with its own quantum; the test is whether the *app's* numbers
|
||
degrade.
|
||
- **`qpwgraph`** / **`helvum`** — visually confirm the source node
|
||
has two outgoing links (one to its original destination, one to
|
||
our tap), both terminating correctly.
|
||
- **Ear** — connect/disconnect the tap on live audio. Crackles or
|
||
dropouts on attach indicate the §4.1 sibling-fanout claim doesn't
|
||
hold and the design needs revisiting.
|
||
|
||
If those three say "fine," the §4.1 promise is upheld in practice
|
||
and 6c is acceptance-tested. `jack_iodelay` and other true-round-trip
|
||
tools are overkill.
|
||
|
||
**Phase 7 — Packaging.** *Done — `c65c75b`.* `contrib/systemd/headroom.service`
|
||
(user-scope, Type=simple, After=pipewire.service, Restart=on-failure,
|
||
journald, LimitRTPRIO=20). The package's `postInstall` substitutes
|
||
the unit's `@bindir@` placeholder with an absolute store path and
|
||
copies `profiles/*.toml` to `share/headroom/profiles/`. Two Nix
|
||
modules: `nixosModules.default` (`programs.headroom.enable` —
|
||
binary on global PATH + `systemd.packages` for `systemctl --user`
|
||
discovery + hard assertion on `services.pipewire.enable`) and
|
||
`homeModules.default` (`services.headroom.enable` — symlinks
|
||
shipped profiles into `$XDG_CONFIG_HOME/headroom/profiles/`,
|
||
`extraProfiles` attrset for per-user overrides, writes the systemd
|
||
user unit). README rewritten with install + usage sections.
|
||
|
||
**Phase 8 — Hardening.** *Done — `9220143` + `d52cd6d` + verification.*
|
||
- **8a — `assert_no_alloc` on audio-thread callbacks (`9220143`).**
|
||
`#[global_allocator] AllocDisabler` in `headroom-cli/src/main.rs`
|
||
behind `cfg(debug_assertions)` (release strips it via the crate's
|
||
default `disable_release`). The three RT callbacks
|
||
(`capture_process`, `playback_process`, `tap_process`) wrap their
|
||
body in `assert_no_alloc(|| inner(...))`. Verified by a deliberate
|
||
`Vec::with_capacity` injection → SIGABRT on first audio callback;
|
||
reverted before commit. Audio thread proven alloc-free under
|
||
multi-thousand-callback live load.
|
||
- **8b — live profile-reload under signal flow (verification only).**
|
||
Edit `$XDG_CONFIG_HOME/headroom/profiles/<active>.toml` while a
|
||
sine plays: notify-debouncer-mini fires, `ProfileStore::reload`
|
||
runs, `setting.set` propagates via `FilterControl`'s rtrb to the
|
||
audio thread. Compressor GR went 0 → −9.3 dB ≈ 1 s after edit
|
||
and back to 0 after restore; 180 meter ticks over 9 s with max
|
||
inter-tick gap = exact 50.0 ms (the AGC period). No glitches.
|
||
- **8c — sink hotplug / default-sink change (verification only).**
|
||
`wpctl set-default <other-sink>` while daemon runs:
|
||
`on_metadata_property` fires, `adopt_new_real_sink` runs,
|
||
filter.playback re-pinned via 4k explicit-link enforcement,
|
||
`routing/real_sink_changed` emitted on the wire. Bounces back
|
||
cleanly.
|
||
- **8d — multi-rate hardware (partial / deferred).** Filter is
|
||
hardcoded F32 stereo @ 48 kHz; PipeWire's link layer inserts a
|
||
resampler at the filter.playback → real-sink edge when rates
|
||
differ; bus DSP stays at 48 kHz internally. Architecture is
|
||
sound; real-hardware validation (USB DAC at 96k etc.) deferred
|
||
until available.
|
||
- **8e — playback callback timing instrumentation (`d52cd6d`).**
|
||
Lock-free `PlaybackTiming` atomics in `meters.rs`; AGC controller
|
||
drains once per second and logs at WARN above
|
||
`SPIKE_THRESHOLD_US = 5000`. The original ~10 s-cadence ~8×
|
||
spike pattern from §11 follow-ups *did not reproduce* in a ~3 min
|
||
release-build capture; steady state 2.2 ms / call at ~4 Hz,
|
||
max climbed to only 1.3× baseline. Instrumentation kept so
|
||
future regressions surface.
|
||
|
||
---
|
||
|
||
## 12. Risks & open questions
|
||
|
||
These are the original v0 design risks — still useful as a checklist
|
||
for new contributors. Phase 4k/4l/8c have exercised the routing /
|
||
hotplug / default-sink branches; the bullets below are unchanged
|
||
since several of them remain live concerns for non-NixOS distros
|
||
and multi-rate hardware. See [[headroom-project]] in team memory
|
||
for current status per risk.
|
||
|
||
- **WirePlumber re-linking on device hotplug.** When a Bluetooth
|
||
headset connects, WP re-evaluates linking. Headroom must re-pin its
|
||
routed streams. Tractable; the registry events surface this.
|
||
- **Latency budget.** Processed path: one quantum hop (the filter)
|
||
plus lookahead (~2 ms) plus 4× oversampling buffering ≈ 8–15 ms
|
||
added to processed-path latency. Fine for video/voice. Bypass path:
|
||
**zero added latency** — the stream rides the real sink directly.
|
||
- **Default-sink changes.** When the user switches the system default
|
||
to a hardware sink, the daemon adopts it as `preferred_real_sink`,
|
||
re-links the filter's playback, retargets bypassed streams, and
|
||
re-asserts `headroom-processed` as the default for new streams.
|
||
Watching `default.audio.sink` in the metadata is the trigger.
|
||
- **Sample-rate mismatch.** `headroom-processed`, the filter, and the
|
||
real sink must agree, or PipeWire resamples behind our back. The
|
||
filter should source its rate from the real sink and convert on the
|
||
capture side only.
|
||
- **Surround content downmix vs. passthrough.** v0 punts: anything
|
||
`>2ch` is force-bypassed regardless of profile rule. The bus
|
||
filter is F32 stereo by construction and pulling a 5.1+ stream
|
||
into it would either drop the centre/LFE/surround channels (with
|
||
explicit links pairing only the first two ports) or run our DSP
|
||
on a downmix that wasn't asked for. The check fires in
|
||
`routing::evaluate` based on `PwNodeInfo.audio_channels` (parsed
|
||
from the stream's `audio.channels` property). The explicit-link
|
||
pairing in `apply_pending_routes` was generalised from `take(2)`
|
||
to `take(min(src, dst))` so wide bypass to a wide real sink links
|
||
all channels; narrower sinks let PipeWire's source-side adapter
|
||
handle downmix as usual.
|
||
|
||
---
|
||
|
||
## 13. License
|
||
|
||
GPL-3.0-or-later for the daemon and CLI. `headroom-dsp` and `headroom-ipc`
|
||
are MPL-2.0 so third-party clients and plugin hosts can link them
|
||
without GPL contagion. (Re-evaluate when LSP-derived code is
|
||
introduced; current plan does not pull any.)
|