Phase 3 — bring up the daemon end-to-end through six checkpoints:
3a Module skeleton (error, profile, routing, runtime, pw/*)
3b Pure routing engine + 13 tests (no PipeWire dep)
3c PwContext: main loop, sigprocmask-block SIGTERM/SIGINT before
add_signal_local so signalfd actually picks them up
3d headroom-processed virtual sink via the adapter factory with
factory.name=support.null-audio-sink
3e Filter: two pw_streams (capture from monitor / playback to real
sink) with an rtrb SPSC ring between them. DSP chain
(Compressor → two-tier Limiter) runs in the playback callback.
Allocation-free; #![forbid(unsafe_code)] preserved via
bytemuck::try_cast_slice for the byte↔f32 reinterpretation.
3f Registry watcher binds the default metadata, evaluates new
Stream/Output/Audio nodes against profile rules, writes
target.object for processed routes. Self-stream guard skips
anything whose node.name starts with 'headroom-filter'.
Workspace deps added: pipewire = { features = ["v0_3_44"] } for the
modern TARGET_OBJECT key, libspa, rtrb, nix (sigprocmask), bytemuck.
Tests: 65 passing (28 dsp, 20 ipc, 4 client, 13 core). Clippy clean
at default level under -D warnings.
PLAN.md §5 renumbered to fix stale subsection labels (was 4.1–4.4
from before the per-app insertion).
Known limitations punted to Phase 4 (documented in commit history
and team memory):
- WirePlumber doesn't always honor late target.object writes once
a stream is already linked (timing race).
- preferred_real_sink dynamic tracking stubbed.
- No auto-promote of headroom-processed to system default.
- application.process.binary occasionally arrives in late metadata
updates after the global registers; routing logs show '?' until
we add a re-read.
879 lines
39 KiB
Markdown
879 lines
39 KiB
Markdown
# Headroom
|
||
|
||
A Rust AGC + compressor + true-peak limiter for PipeWire. Per-application
|
||
exclusion, profile-based presets, single-binary daemon, scriptable over a
|
||
Unix-domain socket.
|
||
|
||
This document is the canonical plan. It supersedes the earlier
|
||
conversational sketch.
|
||
|
||
---
|
||
|
||
## 1. Goals & non-goals
|
||
|
||
### Goals
|
||
|
||
- **Hard safety net.** Output is guaranteed to stay below a configurable
|
||
ceiling (default **−0.1 dBTP**) with proper inter-sample peak handling.
|
||
This guarantee survives daemon misbehaviour, profile reloads, and bad
|
||
routing decisions — it is enforced inline in the audio path.
|
||
- **Per-application exclusion.** Music players, games, and DAWs route
|
||
around the processor; browsers, voice chat, and "everything else" go
|
||
through it. Rules are app-level and live in profiles.
|
||
- **Drop-in defaults.** First-run experience: install, enable user
|
||
service, done. No mandatory config. Power users edit TOML or use the
|
||
CLI.
|
||
- **Profiles** for distinct listening scenarios (default / night /
|
||
speech / transparent / bypass-all).
|
||
- **Single binary.** Daemon, filter, routing, and control loop all live
|
||
in one process. The DSP kernels are a separate crate so they can be
|
||
reused (LV2/standalone) later.
|
||
- **Scriptable.** Unix-domain-socket IPC with a documented JSON schema
|
||
so anyone can write an alternative client (Qt/QuickShell panel, Eww
|
||
widget, scripts). A first-party Rust crate (`headroom-ipc`) wraps it.
|
||
- **Rust, lean dep tree.** No NIH where mature crates exist, no bloat
|
||
where they don't.
|
||
|
||
### Non-goals (v0)
|
||
|
||
- Surround / >2-channel content. v0 is stereo only; >2ch is routed
|
||
directly to the real sink, untouched by Headroom's filter chain.
|
||
- LV2/CLAP plugin distribution. The DSP crate is plugin-shaped so this
|
||
is cheap to add later, but it's not a v0 deliverable.
|
||
- GUI. Third parties can build one against the IPC.
|
||
- Capture-side processing (microphone). v0 is playback only.
|
||
|
||
---
|
||
|
||
## 2. Architecture
|
||
|
||
Each app's audio takes one of four end-to-end paths, chosen by two
|
||
**orthogonal** profile flags: a routing decision (processed vs.
|
||
bypass) and a per-app level-control flag (on vs. off).
|
||
|
||
```
|
||
┌─── optional, opt-in per app (Layer A) ────────────────┐
|
||
│ │
|
||
│ ┌─► passive tap ─► peak + RMS ─► AppLevelController │
|
||
│ │ (sibling link in same quantum) │ │
|
||
│ │ │ │
|
||
│ │ Props.channelVolumes write ◄──────┘ │
|
||
│ │ │
|
||
└───┼───────────────────────────────────────────────────┘
|
||
│
|
||
│ APP STREAM NODE
|
||
│ ┌──────────────────────────┐
|
||
│ │ raw output │
|
||
app's audio ───►├──►│ × channelVolumes │──► output port
|
||
│ └──────────────────────────┘
|
||
│ │
|
||
└────────────────────────────────────────────│
|
||
│
|
||
routing decision (Layer B) │
|
||
target.object set by daemon │
|
||
│
|
||
┌─────────────────────────────────────────┴┐
|
||
▼ ▼
|
||
route = "bypass" route = "processed"
|
||
target.object = target.object =
|
||
preferred_real_sink headroom-processed
|
||
│ │
|
||
│ ▼
|
||
│ ┌─────────────────────┐
|
||
│ │ headroom-processed │
|
||
│ │ (virtual sink, the │
|
||
│ │ system default) │
|
||
│ └─────────┬───────────┘
|
||
│ ▼
|
||
│ ┌─────────────────────┐
|
||
│ │ headroom-filter │
|
||
│ │ (pw_stream pair) │
|
||
│ Layer C (bus DSP) │ AGC → compressor │
|
||
│ │ → soft → hard │
|
||
│ └─────────┬───────────┘
|
||
│ │
|
||
▼ ▼
|
||
preferred_real_sink ◄──────────────────────► (DAC)
|
||
```
|
||
|
||
### The four end-to-end paths
|
||
|
||
| | Routing = bypass | Routing = processed |
|
||
|---|---|---|
|
||
| **per-app off** | ① **true bypass** — Headroom touches nothing on the signal path. Same latency as if Headroom weren't installed. | ③ **bus DSP only** — stream flows through `headroom-processed` and the inline chain. `channelVolumes` left at whatever the user/app set. |
|
||
| **per-app on** | ② **per-app only** — level-reactive `channelVolumes` writes, no graph hop. Zero added signal-path latency. | ④ **full stack** — per-app level control *and* bus DSP. Maximum protection. |
|
||
|
||
Path-by-path properties:
|
||
|
||
| Path | Signal-path latency added | Limiter contract? | Per-app gain ride? |
|
||
|---|---|---|---|
|
||
| ① bypass / per-app off | 0 | no | no |
|
||
| ② bypass / per-app on | 0 | no | yes (Layer A) |
|
||
| ③ processed / per-app off | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | no |
|
||
| ④ processed / per-app on | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | yes (Layer A) |
|
||
|
||
The two flags are independent. A competitive game's typical config
|
||
is ①: zero Headroom involvement in its audio. A user concerned about
|
||
notification dings on top of that game would put Discord on ② or ④
|
||
(so notifications get tamed via Discord's own `channelVolumes`)
|
||
while leaving the game on ①.
|
||
|
||
```
|
||
headroom-core (daemon, one process)
|
||
• per-app level controllers (Layer A)
|
||
• routing engine + preferred_real_sink (Layer B)
|
||
• slow AGC loop, profile manager (Layer C)
|
||
• IPC server
|
||
│
|
||
▼
|
||
$XDG_RUNTIME_DIR/headroom/control.sock
|
||
│
|
||
┌───────────┴───────────┐
|
||
▼ ▼
|
||
headroom CLI third-party clients
|
||
(Qt panel, widgets, …)
|
||
```
|
||
|
||
See §4 for Layer A's mechanics and §5 for the PipeWire-level details
|
||
of Layers B and C.
|
||
|
||
### One virtual sink, one daemon process
|
||
|
||
- `headroom-processed` — virtual sink. Set as the system default so
|
||
new streams land in it by default. Its monitor is captured by
|
||
`headroom-filter`, pushed through the DSP graph, and emitted to the
|
||
current `preferred_real_sink`.
|
||
- **No bypass sink.** Streams marked `route = "bypass"` are pointed
|
||
directly at `preferred_real_sink` via a `target.object` metadata
|
||
write. They pay zero added latency vs. running without Headroom
|
||
installed at all — there's no extra graph hop, no extra DSP. The
|
||
word "bypass" in the profile DSL means "route directly to the real
|
||
sink, untouched."
|
||
- The **daemon** owns:
|
||
- the one virtual sink (created on startup, torn down on exit);
|
||
- the filter (a pair of `pw_stream`s — capture + playback — running
|
||
on PipeWire's realtime audio thread, with the playback half
|
||
targeting `preferred_real_sink`);
|
||
- one **`AppLevelController`** per managed app stream (§4), each
|
||
with its own passive `pw_stream` tap, peak/RMS envelopes, and
|
||
`Props.channelVolumes` writer. Created/destroyed on stream
|
||
lifecycle events.
|
||
- **`preferred_real_sink` tracking.** The daemon watches the
|
||
`default.audio.sink` metadata key. When the user changes the
|
||
system default (via pavucontrol, `wpctl set-default`, etc.) to a
|
||
hardware sink, the daemon (a) treats that sink as the new
|
||
`preferred_real_sink`, (b) re-links `headroom-filter`'s playback
|
||
stream to it, and (c) rewrites `target.object` for every
|
||
currently-bypassed stream so they follow. Hotplug / Bluetooth
|
||
handoffs use the same machinery.
|
||
- the slow AGC loop (reads loudness, writes gain target into the
|
||
filter via an `rtrb` channel);
|
||
- the routing engine (subscribes to the PipeWire registry, evaluates
|
||
rules on new streams, writes `target.object` to the `default`
|
||
metadata: either `headroom-processed` for processed streams or
|
||
`preferred_real_sink` for bypassed streams);
|
||
- the IPC server.
|
||
|
||
### Why no `headroom-bypass` sink
|
||
|
||
An earlier iteration of the design had a second virtual sink
|
||
(`headroom-bypass`) that loopback'd to the real sink, so "bypassed"
|
||
streams routed to it. This added one PipeWire quantum of latency to
|
||
every bypassed stream for no functional benefit — `module-loopback`
|
||
buffers across the quantum boundary even when the DSP is a no-op.
|
||
Direct routing via `target.object` skips the hop entirely. The win is
|
||
real for competitive games, DAW monitoring, and music players: they
|
||
now ride exactly the same path they'd take if Headroom weren't
|
||
installed.
|
||
|
||
### Why this is *not* the "analytical sink + adjust master volume"
|
||
### shape originally proposed
|
||
|
||
Volume control via SPA `Props` updates is not sample-accurate. A true-peak
|
||
limiter needs a small internal delay line so gain reduction is applied
|
||
to the same samples that were analyzed. Therefore the **brickwall must
|
||
be inline**. The analytical-monitor approach is still used — for the
|
||
*slow* AGC loop, where multi-second time constants make control-plane
|
||
latency irrelevant — but it cannot own the ceiling.
|
||
|
||
### Why a `pw_stream` pair, not an LV2 plugin in `module-filter-chain`
|
||
|
||
LV2 is not native to PipeWire; it's one of several plugin formats
|
||
`module-filter-chain` happens to host (via lilv). Using LV2 would split
|
||
Headroom into a plugin + a daemon + a filter-chain JSON, pull in a lilv
|
||
runtime, and force gain-target updates through a 32-bit-float control-port
|
||
abstraction. A `pw_stream` capture+playback pair is the same pattern
|
||
`module-filter-chain` itself uses internally, but written directly in
|
||
Rust against `pipewire-rs`, in the same process as the rest of the
|
||
daemon. One binary, no IPC for parameter updates, idiomatic Rust audio
|
||
thread. An LV2 wrapper of `headroom-dsp` remains a viable optional
|
||
deliverable for use in DAWs.
|
||
|
||
---
|
||
|
||
## 3. DSP
|
||
|
||
### 3.1 Two-tier true-peak limiter (`headroom-dsp::limiter`)
|
||
|
||
The limiter has **two parallel tiers** sharing the same upsampler,
|
||
downsampler, delay line, and sliding peak buffer. Both run at the
|
||
oversampled rate.
|
||
|
||
**Hard tier — the safety contract.** Output ceiling default
|
||
**−0.1 dBTP**, configurable. Instant attack on the gain envelope plus a
|
||
brief hold and a slow release. Two defensive `clamp` stages downstream
|
||
(once in the oversampled domain, once at the input rate after
|
||
downsampling) guarantee the contract numerically — the envelope can
|
||
misbehave and the contract still holds. Never bypassed, never
|
||
disabled.
|
||
|
||
**Soft tier — the comfort cap.** Targets a *dynamic* ceiling computed
|
||
as `program_lufs + max_psr_db`. Smooth attack/release envelope so the
|
||
gain reduction sounds like volume riding, not a slap. Pulls transients
|
||
to a comfortable peak-to-loudness ratio (default 14 dB) *before* they
|
||
ever threaten the hard ceiling. When the AGC hasn't yet provided a
|
||
program loudness (startup, after reset), the soft tier falls back to a
|
||
static ceiling. Disabled by omitting `[limiter.soft]` in a profile —
|
||
useful for the `transparent` profile where users want pure brickwall
|
||
behavior.
|
||
|
||
Algorithm (per oversampled sample, after upsampling):
|
||
|
||
1. Push raw `|s|` into the sliding-window peak buffer; read the
|
||
max-of-window.
|
||
2. **Soft tier** computes target = `soft_ceiling / window_peak` (clamped
|
||
to ≤ 1), runs through the smooth attack/release envelope, yields
|
||
`soft_gain`.
|
||
3. **Hard tier** predicts the worst-case effective peak after the soft
|
||
tier acts (max of `window_peak * soft_gain` and the asymptote
|
||
`min(window_peak, soft_ceiling)`), then sizes `hard_target` to keep
|
||
that under the hard ceiling. Instant attack, hold, exponential
|
||
release. Yields `hard_gain`.
|
||
4. `total_gain = min(soft_gain, hard_gain)`.
|
||
5. Multiply the delayed sample by `total_gain`.
|
||
6. Clamp at hard ceiling (defense-in-depth).
|
||
7. Downsample, clamp again at hard ceiling at the input rate.
|
||
|
||
When the soft tier is doing its job, the hard tier's "predicted-post-soft"
|
||
target stays above 1.0 and the hard tier never engages. When the soft
|
||
tier is mid-attack (peak just arrived), the hard tier snaps in as a
|
||
safety, then releases as the soft tier catches up.
|
||
|
||
The compressor and AGC stages run *before* the limiter.
|
||
|
||
### 3.2 Feed-forward compressor (`headroom-dsp::compressor`)
|
||
|
||
Standard shape: log-domain detector (peak or RMS, switchable) →
|
||
ratio + soft knee → attack/release envelope smoother → makeup gain →
|
||
linear gain → apply to (small) delayed input. ~150 lines of clean code.
|
||
|
||
Defaults aimed at "gentle, transparent": threshold −24 dBFS,
|
||
ratio 2.5:1, knee 6 dB, attack 10 ms, release 100 ms, makeup auto.
|
||
|
||
### 3.3 Slow AGC (`headroom-core::agc`)
|
||
|
||
Algorithmic descendant of EasyEffects' `autogain.cpp`. Runs *outside*
|
||
the audio thread, on a ~50 ms control tick.
|
||
|
||
- Feeds the audio thread's monitor tap into `ebur128` with
|
||
`Mode::M | S | I | TRUE_PEAK`.
|
||
- Computes `target_gain_dB = target_lufs − measured_lufs`.
|
||
- Smooths with separate attack/release coefficients (leaky integrator).
|
||
- Gates when momentary loudness < silence threshold.
|
||
- Soft-clamps so the AGC can never push more than ±N dB (profile knob).
|
||
- Writes the new gain target into the audio thread via an `rtrb` queue.
|
||
|
||
The AGC's gain is applied *before* the compressor. The compressor and
|
||
limiter still own their own behaviour and ceilings.
|
||
|
||
### 3.4 Measurement: `ebur128`
|
||
|
||
`Mode::M | S | I | TRUE_PEAK`. EBU TECH 3341/3342 conformant via the
|
||
`ebur128` crate. Constructed on the daemon thread; fed from a ring-buffer
|
||
consumer that pulls from the audio thread. The audio thread allocates
|
||
nothing.
|
||
|
||
This is **bus-level** measurement only — used to drive the slow AGC
|
||
loop and meter the processed sink output. Per-app measurement (§4)
|
||
uses a different, much cheaper metric.
|
||
|
||
---
|
||
|
||
## 4. Per-application level control (Layer A)
|
||
|
||
An opt-in, near-zero-latency feedback loop that watches each managed
|
||
application's output stream and adjusts its `Props.channelVolumes`
|
||
multiplier in response to **two parallel level metrics**:
|
||
|
||
- a **fast peak envelope** that catches short bursts and sustained
|
||
loud passages (think: a notification ding, a video that just got
|
||
louder), and
|
||
- a **slow RMS envelope** that catches *sustained loudness*
|
||
mismatches (think: "Discord is permanently louder than everything
|
||
else even when nobody's shouting").
|
||
|
||
A stream's applied gain reduction is `max(peak_reduction,
|
||
rms_reduction)` — whichever path is asking for more cut wins, and
|
||
recovery only happens when *both* paths agree the stream has settled.
|
||
This is the layer's whole point: the peak path handles transients
|
||
within one quantum; the RMS path keeps long-term inter-app loudness
|
||
balanced. Neither alone is enough.
|
||
|
||
Orthogonal to bus routing — a stream can be processed *or* bypassed
|
||
*and* level-controlled independently. Its goal is "tame noisy apps
|
||
without startling the listener and without making the chronic
|
||
loudmouth permanently dominate," while the signal path itself stays
|
||
untouched.
|
||
|
||
### 4.1 Why this is zero-latency
|
||
|
||
The per-app multiplier is the `channelVolumes` value PipeWire already
|
||
applies inside the app's stream node — it's the same number
|
||
`pavucontrol`'s per-app slider writes to. Adjusting it doesn't insert
|
||
a graph node; nothing new sits between the app and its destination
|
||
sink. The only cost is that **the analysis happens via a sibling
|
||
fanout link**, not in the playback path: PipeWire schedules fanout
|
||
consumers in parallel within the same quantum, so the playback path's
|
||
timing is identical to the no-tap case.
|
||
|
||
```
|
||
┌──► passive tap (analysis only)
|
||
│ │
|
||
│ ▼
|
||
│ peak + RMS envelopes
|
||
│ (audio thread, sub-ms)
|
||
app stream ──────┤ │
|
||
(output port) │ ▼
|
||
│ rtrb push
|
||
│ │
|
||
│ ▼
|
||
│ AppLevelController (daemon thread)
|
||
│ │
|
||
│ │ Props.channelVolumes write
|
||
│ ▼ (back into the app stream node)
|
||
│ ┌─────────────────────┐
|
||
└──►│ app stream multiplies
|
||
│ by channelVolumes, │──► (its sink — Layer B)
|
||
│ then publishes. │
|
||
└─────────────────────┘
|
||
```
|
||
|
||
### 4.2 The metrics: peak + RMS, no LUFS
|
||
|
||
LUFS is the wrong measurement here. Its shortest window (momentary,
|
||
400 ms) blurs out exactly the transients we want to catch, and the
|
||
K-weighting filter adds CPU for no benefit when we're trying to react
|
||
fast. We also explicitly want a *second* path that targets sustained
|
||
loudness — for that, plain mean-square RMS is the right cheap stand-in,
|
||
not LUFS.
|
||
|
||
| Metric | Window | Job |
|
||
|---|---|---|
|
||
| **Peak envelope** — `max(\|samples\|)` per block, smoothed | ~100 ms attack window, ~500 ms release | Fast: catches a notification ding, a clip getting louder, a partner standing up and shouting. Triggers cut on `peak_threshold_db` (default −6 dBFS). |
|
||
| **RMS envelope** — block mean-square, smoothed | ~1–2 s | Slow: catches "this app is just chronically louder than everything else." Triggers cut on `rms_target_db` (default ≈ −20 dBFS RMS). |
|
||
|
||
Both are computed from the *same* raw buffer in the audio thread, so
|
||
the audio-thread cost is one additional MAC accumulator and a max-
|
||
scan per sample. Cost analysis in §4.7.
|
||
|
||
### 4.3 Architecture
|
||
|
||
For each managed playback stream (matched by routing rule — see §6):
|
||
|
||
1. **Audio thread (tap stream's process callback):**
|
||
- Pull the buffer from the fanout link.
|
||
- `peak = max(|samples|)` over the block.
|
||
- `mean_sq = Σ(x*x) / n` over the block.
|
||
- Push `{node_id, peak, mean_sq}` to a per-stream `rtrb`.
|
||
2. **Daemon thread (`AppLevelController` per stream):**
|
||
- Drain the rtrb.
|
||
- Update peak envelope (one-pole, fast α — attack within a block,
|
||
release ~500 ms).
|
||
- Update RMS envelope (one-pole, slow α — window ~1–2 s).
|
||
- Compute `peak_reduction_db` and `rms_reduction_db` independently,
|
||
then `proposed = max(peak_reduction_db, rms_reduction_db)`.
|
||
- Smooth toward `proposed`.
|
||
- If the smoothed value is significantly different from
|
||
last-written AND we're not rate-limited (~10 Hz max writes per
|
||
stream), submit `Props.channelVolumes` update.
|
||
|
||
The recovery condition is intentionally *both*-paths-agree: a
|
||
release on the peak path only counts toward unwinding gain
|
||
reduction if the RMS path also reads quiet. This avoids the pumping
|
||
artefact where a transient-heavy stream would rapidly release
|
||
between transients only to be slapped back down on the next one.
|
||
|
||
### 4.4 Honouring user-set volumes
|
||
|
||
The daemon subscribes to `Props` param-change events on each managed
|
||
stream. When a `channelVolumes` change arrives that's meaningfully
|
||
different from `last_written_volume`, it wasn't us — the user
|
||
adjusted via pavucontrol, a hotkey, an app's own UI, etc. The
|
||
controller then either:
|
||
|
||
- **defers entirely** (stops adjusting the stream until the user opts
|
||
back in via `headroom per-app reset <app>`), or
|
||
- **treats the user value as a ceiling** (continues to cut on spikes
|
||
but never raises above what the user wanted).
|
||
|
||
Default is the ceiling behaviour — it's the principle-of-least-surprise
|
||
choice. Users who want strict deference set a profile flag.
|
||
|
||
#### A historical concern: apps that fight back
|
||
|
||
Some PulseAudio-era apps (Discord most famously) used to read and
|
||
re-assert their own `channelVolumes` periodically, fighting any
|
||
external volume manager. The pattern produced a visible ping-pong
|
||
loop and effectively disabled per-app management.
|
||
|
||
The pattern is largely absent from modern PipeWire-native and
|
||
Electron-based apps in 2024+: in-app sliders write `channelVolumes`
|
||
only on user interaction, not on a timer. From Headroom's
|
||
perspective, those user-interaction writes are indistinguishable from
|
||
a pavucontrol slider move — both are legitimate external changes the
|
||
deference policy correctly yields to.
|
||
|
||
If a fight-back app does appear, the **ceiling** deference mode
|
||
degrades gracefully:
|
||
|
||
- App produces hot output → Headroom cuts to 0.5.
|
||
- App writes `channelVolumes = 1.0` back over our cut.
|
||
- Headroom detects the external change, marks the new value
|
||
(1.0) as the ceiling, and stops actively writing.
|
||
- Layer A becomes effectively inert for that stream — there is no
|
||
ping-pong, the user just doesn't get the per-app cut they were
|
||
hoping for. The bus-level Layer C limiter (if engaged) still
|
||
enforces the absolute output ceiling regardless.
|
||
|
||
Explicit pattern detection and rate-limiting of ceiling updates
|
||
(e.g., "ignore ceiling-restoring writes that arrive within N seconds
|
||
of our own writes") is deferred to v1, pending evidence from
|
||
real-world testing that any modern app warrants it. The graceful
|
||
degradation property is the v0 contract.
|
||
|
||
### 4.5 Reaction-time honesty
|
||
|
||
The signal-path latency is **zero**. The reaction latency to a spike
|
||
is bounded by:
|
||
|
||
```
|
||
spike in block N ─► analysis (same quantum)
|
||
─► rtrb push (ns)
|
||
─► controller computes (μs)
|
||
─► Props write to pw main loop
|
||
─► applied to block N+1 of the app stream
|
||
```
|
||
|
||
So sustained loud passages are attenuated within ~one quantum
|
||
(5–20 ms depending on the system's quantum). **Isolated one-block
|
||
transients still leak through** — the first block carrying the spike
|
||
plays with the old gain; subsequent blocks see the reduction. This
|
||
is the irreducible cost of "no lookahead allowed." For absolute
|
||
spike prevention you need lookahead, which means latency, which
|
||
contradicts the constraint of this layer.
|
||
|
||
The bus-level Layer C limiter (§3.1) catches anything that would
|
||
exceed the absolute ceiling regardless of whether Layer A has caught
|
||
up. Layer A reduces *workload* on Layer C by pre-attenuating noisy
|
||
apps; it doesn't replace it.
|
||
|
||
### 4.6 Layered budget summary
|
||
|
||
| Layer | Metric | Time scale | Signal-path latency added |
|
||
|---|---|---|---|
|
||
| A: per-app peak | sample peak per block | tens of ms | **0** |
|
||
| A: per-app RMS | block mean-square | seconds | **0** |
|
||
| C: inline soft tier | true-peak, lookahead | sub-ms | shared with hard tier |
|
||
| C: inline hard tier | true-peak, lookahead | sub-ms | ~2 ms lookahead |
|
||
| C: bus AGC | LUFS (ebur128) | many seconds | — (control plane only) |
|
||
|
||
Five distinct jobs, five distinct time scales, no two layers
|
||
duplicate each other. Layer A is the cheapest line of defense and
|
||
the only one that costs zero latency on the audio path.
|
||
|
||
### 4.7 Resource budget per stream
|
||
|
||
| | No TRUE_PEAK (recommended for Layer A) |
|
||
|---|---|
|
||
| Audio thread per quantum | ~10 μs (peak + RMS pass) |
|
||
| Daemon thread per measurement | ~few μs (HashMap lookup + envelope math) |
|
||
| Memory per controller | ~100 bytes |
|
||
| Memory per ebur128 (if enabled) | — N/A; Layer A doesn't use ebur128 |
|
||
|
||
At realistic stream counts (2–5 managed apps): **<0.5% CPU total,
|
||
<1 KB RAM total**. Doesn't move the needle.
|
||
|
||
### 4.8 Lifecycle
|
||
|
||
- **Stream appears** with `media.class = Stream/Output/Audio`
|
||
matching a `[[per_app.rules]]` pattern: create tap link
|
||
(`pw_link_create`), spawn controller, register rtrb.
|
||
- **Stream disappears** (`pw_registry::global_removed`): tear down
|
||
tap, drop controller, clean up rtrb.
|
||
- **App restarts**: new `node_id` → fresh controller. User-volume
|
||
deference state is per-stream-instance, which is the right default.
|
||
|
||
---
|
||
|
||
## 5. PipeWire integration
|
||
|
||
### 5.1 Sinks
|
||
|
||
Created on daemon startup by emitting a `pipewire.conf.d` fragment into
|
||
`$XDG_CONFIG_HOME/pipewire/pipewire.conf.d/headroom.conf` (if not already
|
||
present) and reloading. Alternative: create them at runtime via
|
||
`pw-loopback` equivalents using `pipewire-rs`. v0 ships with the
|
||
runtime-creation path so the install footprint is "one binary, one
|
||
unit file."
|
||
|
||
Sink properties:
|
||
|
||
- `headroom-processed`: `node.name=headroom-processed`,
|
||
`media.class=Audio/Sink`, `audio.position=[FL,FR]`,
|
||
`node.description="Headroom (processed)"`. Promoted to system
|
||
default on startup so new streams land in it by default.
|
||
|
||
There is no second sink. Bypassed streams are routed directly at the
|
||
current `preferred_real_sink` via `target.object` metadata writes
|
||
(see §4.3).
|
||
|
||
### 5.2 The filter
|
||
|
||
Two `pw_stream`s:
|
||
|
||
- **Capture stream** linked to `headroom-processed`'s monitor. Format:
|
||
`F32 LE`, channels 2, rate matched to real sink, latency-quantum
|
||
matched (default 1024 frames; configurable).
|
||
- **Playback stream** linked to the current `preferred_real_sink`.
|
||
Same format.
|
||
|
||
`process` callback: pull a buffer from capture, run AGC gain →
|
||
compressor → limiter → push to playback. Allocation-free. Parameter
|
||
updates arrive over an `rtrb` SPSC queue from the control thread.
|
||
|
||
### 5.3 Routing
|
||
|
||
- On startup, write `default.audio.sink` in the `default` metadata to
|
||
point at `headroom-processed` so new streams default to the
|
||
processor. The previous value (the user's hardware sink) is
|
||
captured as the initial `preferred_real_sink`.
|
||
- Subscribe to `pw_registry` global-added events.
|
||
- On any new node with `media.class == "Stream/Output/Audio"` and
|
||
`node.dont-move != true`:
|
||
- Read `application.process.binary`, `application.name`,
|
||
`pipewire.access.portal.app_id`, `media.role`.
|
||
- Evaluate routing rules from the active profile to decide
|
||
`processed` vs. `bypass`.
|
||
- Write `target.object` into the `default` metadata for the new
|
||
stream:
|
||
- `processed` → `headroom-processed`'s `object.serial`.
|
||
- `bypass` → `preferred_real_sink`'s `object.serial`.
|
||
WirePlumber honours this for any movable stream.
|
||
- Watch `default.audio.sink` metadata changes. When the user switches
|
||
the system default to a hardware sink, the daemon:
|
||
- records that sink as the new `preferred_real_sink`,
|
||
- re-links `headroom-filter`'s playback stream to it,
|
||
- rewrites `target.object` for every currently-bypassed stream so
|
||
they follow the new hardware,
|
||
- re-asserts `headroom-processed` as the *default for new streams*
|
||
(so subsequent app launches still land in the processor).
|
||
- Hotplug (sink appears/disappears) goes through the same code path.
|
||
|
||
### 5.4 Stream identification
|
||
|
||
| Property | Reliability | Use |
|
||
|---|---|---|
|
||
| `application.process.binary` | high (kernel-sourced) | primary key |
|
||
| `application.name` | medium | secondary / display |
|
||
| `pipewire.access.portal.app_id` | high (Flatpak only) | match sandboxed apps |
|
||
| `media.role` | low (most apps omit) | bonus signal only |
|
||
| `media.class` | structural | gate to playback streams |
|
||
|
||
---
|
||
|
||
## 6. Profiles
|
||
|
||
Location: `$XDG_CONFIG_HOME/headroom/profiles/*.toml` (overriding
|
||
shipped defaults in `/usr/share/headroom/profiles/` if installed
|
||
system-wide). Hot-reloaded via `notify-debouncer-mini`.
|
||
|
||
Each profile is a complete listening scenario. Schema (`headroom-core::profile`):
|
||
|
||
```toml
|
||
name = "default"
|
||
description = "Gentle transparent processing for everyday use."
|
||
|
||
[agc]
|
||
enabled = true
|
||
target_lufs = -18.0 # ITU-R BS.1770 integrated target
|
||
attack_ms = 2000.0
|
||
release_ms = 800.0
|
||
silence_threshold_lufs = -70.0
|
||
max_boost_db = 12.0
|
||
max_cut_db = 12.0
|
||
|
||
[compressor]
|
||
enabled = true
|
||
detector = "peak" # "peak" | "rms"
|
||
threshold_db = -24.0
|
||
ratio = 2.5
|
||
knee_db = 6.0
|
||
attack_ms = 10.0
|
||
release_ms = 100.0
|
||
makeup_db = "auto" # number or "auto"
|
||
|
||
[limiter]
|
||
ceiling_dbtp = -0.1
|
||
lookahead_ms = 2.0
|
||
release_ms = 80.0
|
||
hold_ms = 5.0
|
||
oversample = 4 # 1 | 2 | 4 | 8 (1 disables ISP detection)
|
||
link = "stereo" # "stereo" | "dual-mono"
|
||
|
||
[meters]
|
||
publish_hz = 20.0
|
||
|
||
[[rules]]
|
||
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph"] }
|
||
route = "bypass"
|
||
|
||
[[rules]]
|
||
match = { process_binary = ["firefox", "chromium", "google-chrome", "Discord", "discord", "element-desktop", "Slack", "zoom", "WEBRTC VoiceEngine"] }
|
||
route = "processed"
|
||
|
||
[default_route]
|
||
route = "processed" # safe default: anything unmatched is processed
|
||
|
||
# ----------------------------------------------------------------------
|
||
# Per-application level control (Layer A). Orthogonal to routing — you
|
||
# can enable per-app on bypass-routed streams to get zero-latency
|
||
# level control (e.g. tame Discord notifications without touching
|
||
# the game's audio path).
|
||
# ----------------------------------------------------------------------
|
||
[per_app]
|
||
enabled = true # master switch; false disables Layer A entirely
|
||
default_enabled = false # for streams not matched by any rule below
|
||
|
||
# Per-rule knobs. Matches use the same key set as [[rules]] above.
|
||
[[per_app.rules]]
|
||
match = { process_binary = ["Discord", "discord", "element-desktop", "Slack", "zoom"] }
|
||
enabled = true
|
||
peak_threshold_db = -6.0 # short-window peak above this triggers cut
|
||
rms_target_db = -20.0 # long-term RMS target (slow path)
|
||
max_cut_db = 12.0 # never cut more than this
|
||
peak_attack_ms = 5.0
|
||
peak_release_ms = 500.0
|
||
rms_window_ms = 1500.0
|
||
defer_to_user = "ceiling" # "ceiling" | "strict"
|
||
|
||
[[per_app.rules]]
|
||
match = { process_binary = ["firefox", "chromium", "google-chrome"] }
|
||
enabled = true
|
||
peak_threshold_db = -3.0 # browsers run hotter; raise the trigger
|
||
rms_target_db = -18.0
|
||
|
||
# Music, DAWs, games default to per-app off — they're either trusted
|
||
# to set their own level or routed bypass for a reason.
|
||
[[per_app.rules]]
|
||
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph", "carla"] }
|
||
enabled = false
|
||
```
|
||
|
||
### Shipped profiles
|
||
|
||
| name | one-liner |
|
||
|---|---|
|
||
| `default` | Gentle transparent processing, sensible for daily use. |
|
||
| `night` | Aggressive: −20 LUFS, 4:1, fast release, narrow dynamic range. |
|
||
| `speech` | VoIP-focused; short attack, fast release, slight rumble cut. |
|
||
| `transparent` | Limiter only. Compressor + AGC bypassed. Safety net only. |
|
||
| `bypass-all` | Routes everything directly to the real sink. The kill switch. |
|
||
|
||
The limiter section of `bypass-all` is irrelevant in practice (nothing
|
||
flows through `headroom-processed`), but its ceiling field is still
|
||
respected as a fail-safe in case a stream lands on the processed sink
|
||
anyway.
|
||
|
||
---
|
||
|
||
## 7. IPC
|
||
|
||
Transport: Unix-domain socket, `SOCK_STREAM`, `0600`, at
|
||
`$XDG_RUNTIME_DIR/headroom/control.sock`.
|
||
|
||
Wire protocol: **see `IPC.md`** for the full normative schema.
|
||
Summary: u32 BE length prefix + UTF-8 JSON payload. Three message
|
||
shapes — `Request` (id + op + args), `Response` (id + result|error),
|
||
`Event` (topic + data). Subscribers signal interest by topic; events
|
||
fan out to all subscribers with bounded per-subscriber queues. Slow
|
||
subscribers have events **dropped** (overflow events count is itself
|
||
published on the `daemon` topic so clients know they fell behind).
|
||
|
||
The first-party Rust wrapper is `headroom-client`, mirroring how
|
||
[`niri-ipc`](https://github.com/YaLTeR/niri/tree/main/niri-ipc) wraps
|
||
Niri's socket: a thin, no-magic crate that re-exports the wire types
|
||
from `headroom-ipc` and adds a blocking `Client` (and an optional async
|
||
`AsyncClient` behind a feature flag).
|
||
|
||
---
|
||
|
||
## 8. CLI
|
||
|
||
```
|
||
headroom status # current profile, sinks, levels
|
||
headroom daemon # run the daemon (systemd Type=simple)
|
||
headroom profile list | use <name> | show [name]
|
||
headroom route list
|
||
headroom route set <app> processed|bypass # persists in user profile
|
||
headroom route unset <app>
|
||
headroom route stream <node-id> processed|bypass # ad-hoc
|
||
headroom set <key> <value> # tweak active profile in place
|
||
headroom get <key>
|
||
headroom bypass on|off # global kill switch
|
||
headroom reload # reload profiles from disk
|
||
headroom monitor # live meter TUI (uses subscribe)
|
||
```
|
||
|
||
CLI is sync, blocks on `UnixStream`. Talks the same JSON wire as any
|
||
other client.
|
||
|
||
---
|
||
|
||
## 9. Crates
|
||
|
||
```
|
||
headroom/
|
||
├── flake.nix # devshell + package
|
||
├── Cargo.toml # workspace
|
||
├── PLAN.md # this file
|
||
├── IPC.md # wire-protocol schema (normative)
|
||
├── README.md
|
||
└── crates/
|
||
├── headroom-dsp/ # AGC + compressor + limiter (pure DSP, no PW)
|
||
├── headroom-ipc/ # wire types, framing, serde; no I/O
|
||
├── headroom-client/ # blocking client (+ optional async); thin
|
||
├── headroom-core/ # daemon: PW integration, routing, profiles, IPC server
|
||
└── headroom-cli/ # `headroom` binary; depends on headroom-client
|
||
```
|
||
|
||
### External crates (final v0 dep list)
|
||
|
||
**Audio / DSP**
|
||
- `pipewire`, `libspa` — official PipeWire bindings.
|
||
- `ebur128` — measurement.
|
||
- `rtrb` — SPSC ring buffer (audio ↔ control).
|
||
- `basedrop` — RT-safe shared ownership.
|
||
- `assert_no_alloc` — debug-build tripwire.
|
||
|
||
**Plumbing**
|
||
- `serde`, `serde_json` — IPC + profile (de)serialization.
|
||
- `serde-toml` (`toml`) — profile files.
|
||
- `clap` (derive) — CLI.
|
||
- `tracing`, `tracing-subscriber`, `tracing-journald` — logs.
|
||
- `notify`, `notify-debouncer-mini` — profile hot-reload.
|
||
- `crossbeam-channel` — control-plane channels.
|
||
- `parking_lot` — mutexes.
|
||
- `signal-hook` — clean shutdown.
|
||
- `thiserror` — error types.
|
||
|
||
No `tokio`, no `zbus`, no `dbus-*`.
|
||
|
||
---
|
||
|
||
## 10. Nix
|
||
|
||
`flake.nix` ships:
|
||
|
||
- A **devshell** with rust toolchain (via `rust-overlay` for pinned
|
||
channel; default to a stable release pinned in
|
||
`rust-toolchain.toml`), `pkg-config`, `pipewire`'s dev outputs,
|
||
`clang` (for bindgen if invoked by deps), `socat` (handy for poking
|
||
the IPC), `jq`.
|
||
- A **package** output (`packages.<system>.default`) that builds the
|
||
daemon + CLI with `rustPlatform.buildRustPackage`. v0 uses
|
||
`cargoLock.lockFile`. Crane can come later if incremental builds in
|
||
CI become a bottleneck.
|
||
- A `nixosModules.default` placeholder so packagers can wire the user
|
||
unit later. Not implemented in v0 of the flake itself.
|
||
|
||
Intermediate dev work uses plain `cargo` inside `nix develop`. Final
|
||
builds and any CI go through `nix build`.
|
||
|
||
---
|
||
|
||
## 11. Phased implementation
|
||
|
||
The phases are roughly token-of-work units, not calendar weeks.
|
||
|
||
**Phase 0 — scaffolding.** Flake, workspace, crate skeletons, README,
|
||
PLAN/IPC docs. *(done as part of this commit)*
|
||
|
||
**Phase 1 — IPC + client.** `headroom-ipc` (types, framing, codec) and
|
||
`headroom-client` (blocking `Client`) implemented against the schema in
|
||
`IPC.md`. Round-trip tests, fuzz the codec. *(this commit)*
|
||
|
||
**Phase 2 — DSP kernels.** `headroom-dsp` with limiter, compressor, AGC,
|
||
oversampler, envelope. Tested in isolation against synthesized
|
||
signals; limiter validated to hold a −0.1 dBTP ceiling on EBU TECH
|
||
3341 generators. *(this commit: limiter first)*
|
||
|
||
**Phase 3 — daemon core.** `headroom-core` brings up the
|
||
`headroom-processed` virtual sink, the filter (pw_stream pair),
|
||
the `preferred_real_sink` tracker, the registry subscriber, and the
|
||
routing engine. Hardcoded profile, no IPC server yet.
|
||
|
||
**Phase 4 — IPC server + profile manager.** Wire `headroom-core` to the
|
||
IPC schema. Profile loading + hot-reload. Slow AGC loop ticking on
|
||
real loudness measurements.
|
||
|
||
**Phase 5 — CLI + monitor TUI.** `headroom-cli` implements all the
|
||
subcommands above, plus a `monitor` TUI built on the meters
|
||
subscription.
|
||
|
||
**Phase 6 — Per-application level control (Layer A).** Per-managed-stream
|
||
tap creation, `AppLevelController` with peak + RMS envelopes,
|
||
`Props.channelVolumes` writer, user-volume deference logic,
|
||
`[per_app]` profile parsing, `headroom per-app …` CLI verbs, and a
|
||
per-stream meter event on the IPC. Land after the bus path is stable
|
||
so we have a baseline to compare against.
|
||
|
||
**Phase 7 — Packaging.** systemd user unit, install paths, default
|
||
profile install, basic NixOS module.
|
||
|
||
**Phase 8 — Hardening.** Latency budget verification on real hardware,
|
||
Bluetooth-handoff edge case, profile-reload while audio is flowing,
|
||
multi-rate hardware, allocation-tracer sweep with
|
||
`assert_no_alloc` in debug.
|
||
|
||
---
|
||
|
||
## 12. Risks & open questions
|
||
|
||
- **WirePlumber re-linking on device hotplug.** When a Bluetooth
|
||
headset connects, WP re-evaluates linking. Headroom must re-pin its
|
||
routed streams. Tractable; the registry events surface this.
|
||
- **Latency budget.** Processed path: one quantum hop (the filter)
|
||
plus lookahead (~2 ms) plus 4× oversampling buffering ≈ 8–15 ms
|
||
added to processed-path latency. Fine for video/voice. Bypass path:
|
||
**zero added latency** — the stream rides the real sink directly.
|
||
- **Default-sink changes.** When the user switches the system default
|
||
to a hardware sink, the daemon adopts it as `preferred_real_sink`,
|
||
re-links the filter's playback, retargets bypassed streams, and
|
||
re-asserts `headroom-processed` as the default for new streams.
|
||
Watching `default.audio.sink` in the metadata is the trigger.
|
||
- **Sample-rate mismatch.** `headroom-processed`, the filter, and the
|
||
real sink must agree, or PipeWire resamples behind our back. The
|
||
filter should source its rate from the real sink and convert on the
|
||
capture side only.
|
||
- **Surround content downmix vs. passthrough.** v0 punts: anything
|
||
>2ch is routed directly to the real sink (bypass behaviour)
|
||
regardless of profile rule. Documented behaviour.
|
||
|
||
---
|
||
|
||
## 13. License
|
||
|
||
GPL-3.0-or-later for the daemon and CLI. `headroom-dsp` and `headroom-ipc`
|
||
are MPL-2.0 so third-party clients and plugin hosts can link them
|
||
without GPL contagion. (Re-evaluate when LSP-derived code is
|
||
introduced; current plan does not pull any.)
|