headroom/PLAN.md

# Headroom

A Rust AGC + compressor + true-peak limiter for PipeWire. Per-application
exclusion, profile-based presets, single-binary daemon, scriptable over a
Unix-domain socket.

This document is the canonical plan. It supersedes the earlier
conversational sketch.

---

## 1. Goals & non-goals

### Goals

- **Hard safety net on the processed route.** Audio routed through
  `headroom-processed` is guaranteed to leave the filter below a
  configurable ceiling (default **−0.1 dBTP**) with proper inter-sample
  peak handling. The guarantee is enforced inline in the filter,
  downstream of every control-plane code path, and survives daemon
  misbehaviour, profile reloads, and bad routing decisions. Streams
  routed `bypass` ride the real sink directly and are **not** subject
  to this contract (see §2 path ①); the contract also does not extend
  to whatever resampling or post-processing the downstream device
  path applies after the filter's output.
- **Per-application exclusion.** Music players, games, and DAWs route
  around the processor; browsers, voice chat, and "everything else" go
  through it. Rules are app-level and live in profiles.
- **Drop-in defaults.** First-run experience: install, enable user
  service, done. No mandatory config. Power users edit TOML or use the
  CLI.
- **Profiles** for distinct listening scenarios (default / night /
  speech / transparent / bypass-all).
- **Single binary.** Daemon, filter, routing, and control loop all live
  in one process. The DSP kernels are a separate crate so they can be
  reused (LV2/standalone) later.
- **Scriptable.** Unix-domain-socket IPC with a documented JSON schema
  so anyone can write an alternative client (Qt/QuickShell panel, Eww
  widget, scripts). A first-party Rust crate (`headroom-ipc`) wraps it.
- **Rust, lean dep tree.** No NIH where mature crates exist, no bloat
  where they don't.

### Non-goals (v0)

- Surround / >2-channel content. v0 is stereo only; >2ch is routed
  directly to the real sink, untouched by Headroom's filter chain.
- LV2/CLAP plugin distribution. The DSP crate is plugin-shaped so this
  is cheap to add later, but it's not a v0 deliverable.
- GUI. Third parties can build one against the IPC.
- Capture-side processing (microphone). v0 is playback only.

---

## 2. Architecture

Each app's audio takes one of four end-to-end paths, chosen by two
**orthogonal** profile flags: a routing decision (processed vs.
bypass) and a per-app level-control flag (on vs. off).

```
                ┌─── optional, opt-in per app (Layer A) ────────────────┐
                │                                                       │
                │   ┌─► passive tap ─► peak + RMS ─► AppLevelController │
                │   │      (sibling link in same quantum)         │     │
                │   │                                             │     │
                │   │           Props.channelVolumes write ◄──────┘     │
                │   │                                                   │
                └───┼───────────────────────────────────────────────────┘
                    │
                    │       APP STREAM NODE
                    │   ┌──────────────────────────┐
                    │   │ raw output               │
   app's audio  ───►├──►│   × channelVolumes       │──► output port
                    │   └──────────────────────────┘
                    │                                            │
                    └────────────────────────────────────────────│
                                                                 │
                              routing decision (Layer B)         │
                              target.object set by daemon        │
                                                                 │
                       ┌─────────────────────────────────────────┴┐
                       ▼                                          ▼
                  route = "bypass"                       route = "processed"
                  target.object =                        target.object =
                  preferred_real_sink                    headroom-processed
                       │                                          │
                       │                                          ▼
                       │                              ┌─────────────────────┐
                       │                              │ headroom-processed  │
                       │                              │ (virtual sink, the  │
                       │                              │  system default)    │
                       │                              └─────────┬───────────┘
                       │                                        ▼
                       │                              ┌─────────────────────┐
                       │                              │  headroom-filter    │
                       │                              │  (pw_filter node,   │
                       │                              │   4 mono ports —    │
                       │              Layer C (bus DSP) │  FL/FR in + out)   │
                       │                              │  AGC → compressor   │
                       │                              │  → soft → hard      │
                       │                              └─────────┬───────────┘
                       │                                        │
                       ▼                                        ▼
                  preferred_real_sink  ◄──────────────────────► (DAC)
```

### The four end-to-end paths

|   | Routing = bypass | Routing = processed |
|---|---|---|
| **per-app off** | ① **true bypass** — Headroom touches nothing on the signal path. Same latency as if Headroom weren't installed. | ③ **bus DSP only** — stream flows through `headroom-processed` and the inline chain. `channelVolumes` left at whatever the user/app set. |
| **per-app on**  | ② **per-app only** — level-reactive `channelVolumes` writes, no graph hop. Zero added signal-path latency. | ④ **full stack** — per-app level control *and* bus DSP. Maximum protection. |

Path-by-path properties:

| Path | Signal-path latency added | Limiter contract? | Per-app gain ride? |
|---|---|---|---|
| ① bypass / per-app off | 0 | no | no |
| ② bypass / per-app on  | 0 | no | yes (Layer A) |
| ③ processed / per-app off | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | no |
| ④ processed / per-app on  | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | yes (Layer A) |

The two flags are independent. A competitive game's typical config
is ①: zero Headroom involvement in its audio. A user concerned about
notification dings on top of that game would put Discord on ② or ④
(so notifications get tamed via Discord's own `channelVolumes`)
while leaving the game on ①.

```
                  headroom-core (daemon, one process)
                  • per-app level controllers (Layer A)
                  • routing engine + preferred_real_sink (Layer B)
                  • slow AGC loop, profile manager (Layer C)
                  • IPC server
                          │
                          ▼
              $XDG_RUNTIME_DIR/headroom/control.sock
                          │
              ┌───────────┴───────────┐
              ▼                       ▼
         headroom CLI         third-party clients
                              (Qt panel, widgets, …)
```

See §4 for Layer A's mechanics and §5 for the PipeWire-level details
of Layers B and C.

### One virtual sink, one daemon process

- `headroom-processed` — virtual sink. Set as the system default so
  new streams land in it by default. Its monitor is captured by
  `headroom-filter`, pushed through the DSP graph, and emitted to the
  current `preferred_real_sink`.
- **No bypass sink.** Streams marked `route = "bypass"` are pointed
  directly at `preferred_real_sink` via a `target.object` metadata
  write. They pay zero added latency vs. running without Headroom
  installed at all — there's no extra graph hop, no extra DSP. The
  word "bypass" in the profile DSL means "route directly to the real
  sink, untouched."
- The **daemon** owns:
  - the one virtual sink (created on startup, torn down on exit);
  - the filter — a single `pw_filter` node (`headroom-filter`) with
    four mono DSP ports (input FL/FR + output FL/FR) running on
    PipeWire's realtime data thread. Wrapped by the in-house
    `pipewire-filter` workspace crate because pipewire-rs 0.8 doesn't
    expose `pw_filter`. WirePlumber doesn't auto-link `pw_filter`,
    so the routing engine creates the `processed.monitor → filter
    in` and `filter out → preferred_real_sink` links explicitly via
    `link-factory` (the same primitive the routing engine already
    uses for stream re-pinning in Phase 4k);
  - one **`AppLevelController`** per managed app stream (§4), each
    with its own passive `pw_stream` tap, peak/RMS envelopes, and
    `Props.channelVolumes` writer. Created/destroyed on stream
    lifecycle events.
  - **`preferred_real_sink` tracking.** The daemon watches the
    `default.audio.sink` metadata key. When the user changes the
    system default (via pavucontrol, `wpctl set-default`, etc.) to a
    hardware sink, the daemon (a) treats that sink as the new
    `preferred_real_sink`, (b) re-links `headroom-filter`'s playback
    stream to it, and (c) rewrites `target.object` for every
    currently-bypassed stream so they follow. Hotplug / Bluetooth
    handoffs use the same machinery.
  - the slow AGC loop (reads loudness, writes gain target into the
    filter via an `rtrb` channel);
  - the routing engine (subscribes to the PipeWire registry, evaluates
    rules on new streams, writes `target.object` to the `default`
    metadata: either `headroom-processed` for processed streams or
    `preferred_real_sink` for bypassed streams);
  - the IPC server.

### Why no `headroom-bypass` sink

An earlier iteration of the design had a second virtual sink
(`headroom-bypass`) that loopback'd to the real sink, so "bypassed"
streams routed to it. This added one PipeWire quantum of latency to
every bypassed stream for no functional benefit — `module-loopback`
buffers across the quantum boundary even when the DSP is a no-op.
Direct routing via `target.object` skips the hop entirely. The win is
real for competitive games, DAW monitoring, and music players: they
now ride exactly the same path they'd take if Headroom weren't
installed.

### Why this is *not* the "analytical sink + adjust master volume"
### shape originally proposed

Volume control via SPA `Props` updates is not sample-accurate. A true-peak
limiter needs a small internal delay line so gain reduction is applied
to the same samples that were analyzed. Therefore the **brickwall must
be inline**. The analytical-monitor approach is still used — for the
*slow* AGC loop, where multi-second time constants make control-plane
latency irrelevant — but it cannot own the ceiling.

### Why a native `pw_filter` node, not an LV2 plugin in `module-filter-chain`

LV2 is not native to PipeWire; it's one of several plugin formats
`module-filter-chain` happens to host (via lilv). Using LV2 would split
Headroom into a plugin + a daemon + a filter-chain JSON, pull in a lilv
runtime, and force gain-target updates through a 32-bit-float control-port
abstraction. A native `pw_filter` node is the same primitive
`module-filter-chain` itself uses internally, but written directly in
Rust, in the same process as the rest of the daemon. One binary, no IPC
for parameter updates, idiomatic Rust audio thread. An LV2 wrapper of
`headroom-dsp` remains a viable optional deliverable for use in DAWs.

**Bus filter implementation — the in-house `pipewire-filter` crate.**
pipewire-rs 0.8 ships `Stream` but not `Filter`. Since `headroom-core`
declares `#![forbid(unsafe_code)]`, the unsafe FFI lives in a separate
small workspace crate (`crates/pipewire-filter/`), mirroring
pipewire-rs's `Stream` patterns (heap-boxed `FilterListener<D>`,
RAII `Buffer<'p>`, `// SAFETY:` on every `unsafe` block). The crate
covers exactly the events Headroom needs (`process`, `state_changed`,
`param_changed`). Audited by Codex on landing; the two findings that
would have been real bugs in our use (over-permissive `Sync` on
`PortData`; passing the `error` ptr to the old state in
`state_changed`) were applied. Architectural rule: when pipewire-rs
later ships its own `Filter`, switch to it and delete this crate.

**Earlier shape (now retired): two `pw_stream`s + a ring.** The
dual-`pw_stream` arrangement we shipped in Phase 3 had no PipeWire
graph dependency between capture and playback, so the scheduler was
free to fire playback before capture in the same quantum →
ring-empty → tremolo at quantum cadence. The mitigation was a
65k-sample SPSC ring sized for 4× the worst-case buffer
(`clock.quantum-limit` × `CHANNELS`), adding ~340 ms average
latency. `pw_filter` removes the ring entirely: a single node has
its own input→process→output ordering by construction (the same
ordering `module-filter-chain` relies on). See
[[headroom-pipewire-gotchas]] #14, #17, #18 for the full diagnostic
trail.

---

## 3. DSP

### 3.1 Two-tier true-peak limiter (`headroom-dsp::limiter`)

The limiter has **two parallel tiers** sharing the same upsampler,
downsampler, delay line, and sliding peak buffer. Both run at the
oversampled rate.

**Hard tier — the safety contract.** Output ceiling default
**−0.1 dBTP**, configurable. Instant attack on the gain envelope plus a
brief hold and a slow release. Two defensive `clamp` stages downstream
(once in the oversampled domain, once at the input rate after
downsampling) guarantee the contract numerically — the envelope can
misbehave and the contract still holds. Never bypassed, never
disabled.

**Contract scope (caveat).** The ≤ −0.1 dBTP guarantee holds at the
*filter's output*, not at the speaker. The bus filter is hardcoded
F32 stereo @ 48 kHz (`headroom-dsp::limiter`'s 4× oversampler is
sized for 48 k); when the real sink negotiates a different rate
(44.1 kHz, 96 kHz, 192 kHz), PipeWire inserts a downstream
resampler between `filter.playback` and the sink. Polynomial /
windowed-sinc resamplers can elevate inter-sample peaks slightly
through their own reconstruction, so the limiter's true-peak
guarantee leaks across that resampling stage. In practice the
elevation is small (a few tenths of a dB worst case for a clean
band-limited resampler), and the contract still holds at the bus
output where headroom is in control. **For the contract to hold
end-to-end the filter would need to match the real sink's rate
and rebuild its DSP coefficients on rate-change** — that's the
v1 work tracked as PLAN §11 "filter rate matching" (deferred from
8d, gated on a multi-rate hardware test bench).

**Soft tier — the comfort cap.** Targets a *dynamic* ceiling computed
as `program_lufs + max_psr_db`. Smooth attack/release envelope so the
gain reduction sounds like volume riding, not a slap. Pulls transients
to a comfortable peak-to-loudness ratio (default 14 dB) *before* they
ever threaten the hard ceiling. When the AGC hasn't yet provided a
program loudness (startup, after reset), the soft tier falls back to a
static ceiling. Disabled by omitting `[limiter.soft]` in a profile —
useful for the `transparent` profile where users want pure brickwall
behavior.

Algorithm (per oversampled sample, after upsampling):

1. Push raw `|s|` into the sliding-window peak buffer; read the
   max-of-window.
2. **Soft tier** computes target = `soft_ceiling / window_peak` (clamped
   to ≤ 1), runs through the smooth attack/release envelope, yields
   `soft_gain`.
3. **Hard tier** predicts the worst-case effective peak after the soft
   tier acts (max of `window_peak * soft_gain` and the asymptote
   `min(window_peak, soft_ceiling)`), then sizes `hard_target` to keep
   that under the hard ceiling. Instant attack, hold, exponential
   release. Yields `hard_gain`.
4. `total_gain = min(soft_gain, hard_gain)`.
5. Multiply the delayed sample by `total_gain`.
6. Clamp at hard ceiling (defense-in-depth).
7. Downsample, clamp again at hard ceiling at the input rate.

When the soft tier is doing its job, the hard tier's "predicted-post-soft"
target stays above 1.0 and the hard tier never engages. When the soft
tier is mid-attack (peak just arrived), the hard tier snaps in as a
safety, then releases as the soft tier catches up.

The compressor and AGC stages run *before* the limiter.

### 3.2 Feed-forward compressor (`headroom-dsp::compressor`)

Standard shape: log-domain detector (peak or RMS, switchable) →
ratio + soft knee → attack/release envelope smoother → makeup gain →
linear gain → apply to (small) delayed input. ~150 lines of clean code.

Defaults aimed at "gentle, transparent": threshold −24 dBFS,
ratio 2.5:1, knee 6 dB, attack 10 ms, release 100 ms, makeup auto.

### 3.3 Slow AGC (`headroom-core::agc`)

Algorithmic descendant of EasyEffects' `autogain.cpp`. Runs *outside*
the audio thread, on a ~50 ms control tick.

- Feeds the audio thread's monitor tap into `ebur128` with
  `Mode::M | S | I | TRUE_PEAK`.
- Computes `target_gain_dB = target_lufs − measured_lufs`.
- Smooths with separate attack/release coefficients (leaky integrator).
- Gates when momentary loudness < silence threshold.
- Soft-clamps so the AGC can never push more than ±N dB (profile knob).
- Writes the new gain target into the audio thread via an `rtrb` queue.

The AGC's gain is applied *before* the compressor. The compressor and
limiter still own their own behaviour and ceilings.

### 3.4 Measurement: `ebur128`

`Mode::M | S | I | TRUE_PEAK`. EBU TECH 3341/3342 conformant via the
`ebur128` crate. Constructed on the daemon thread; fed from a ring-buffer
consumer that pulls from the audio thread. The audio thread allocates
nothing.

This is **bus-level** measurement only — used to drive the slow AGC
loop and meter the processed sink output. Per-app measurement (§4)
uses a different, much cheaper metric.

---

## 4. Per-application level control (Layer A)

An opt-in, near-zero-latency feedback loop that watches each managed
application's output stream and adjusts its `Props.channelVolumes`
multiplier in response to **two parallel level metrics**:

- a **fast peak envelope** that catches short bursts and sustained
  loud passages (think: a notification ding, a video that just got
  louder), and
- a **slow RMS envelope** that catches *sustained loudness*
  mismatches (think: "Discord is permanently louder than everything
  else even when nobody's shouting").

A stream's applied gain reduction is `max(peak_reduction,
rms_reduction)` — whichever path is asking for more cut wins, and
recovery only happens when *both* paths agree the stream has settled.
This is the layer's whole point: the peak path handles transients
within one quantum; the RMS path keeps long-term inter-app loudness
balanced. Neither alone is enough.

Orthogonal to bus routing — a stream can be processed *or* bypassed
*and* level-controlled independently. Its goal is "tame noisy apps
without startling the listener and without making the chronic
loudmouth permanently dominate," while the signal path itself stays
untouched.

### 4.1 Why this is zero-latency

The per-app multiplier is the `channelVolumes` value PipeWire already
applies inside the app's stream node — it's the same number
`pavucontrol`'s per-app slider writes to. Adjusting it doesn't insert
a graph node; nothing new sits between the app and its destination
sink. The only cost is that **the analysis happens via a sibling
fanout link**, not in the playback path: PipeWire schedules fanout
consumers in parallel within the same quantum, so the playback path's
timing is identical to the no-tap case.

```
                    ┌──► passive tap (analysis only)
                    │       │
                    │       ▼
                    │   peak + RMS envelopes
                    │   (audio thread, sub-ms)
   app stream ──────┤       │
   (output port)    │       ▼
                    │   rtrb push
                    │       │
                    │       ▼
                    │   AppLevelController (daemon thread)
                    │       │
                    │       │  Props.channelVolumes write
                    │       ▼  (back into the app stream node)
                    │   ┌─────────────────────┐
                    └──►│ app stream multiplies
                        │ by channelVolumes,  │──► (its sink — Layer B)
                        │ then publishes.     │
                        └─────────────────────┘
```

**Important correction (2026-05-22):** the diagram above shows the
tap branching off the source *before* the `channelVolumes`
multiplier, but in practice PipeWire's standard adapter applies
`channelVolumes` *inside* the source node — anything reading the
output port sees the post-attenuation signal. Untreated, this
closes a feedback loop on the controller: write reduction → tap
measures attenuated signal → envelopes release → "no reduction
needed" → controller stops writing, gain freezes wherever it last
was, dynamics no longer tracked. The implementation compensates by
dividing incoming `peak_lin` / `mean_sq_lin` by `last_written_lin`
(and its square) inside `AppLevelController::process_block`,
recovering the pre-attenuation signal estimate. Below a floor of
−40 dB applied gain (`GAIN_COMPENSATION_FLOOR = 0.01`) the
compensation is skipped — a fully-muted stream would otherwise
amplify floor noise back to max-cut and lock the user out of
unmuting. See `app_level.rs` and the per-app-gain memory note for
the rationale and the corner cases.

**Source-suspension catch-up.** When the source node suspends
(PipeWire's adapter stops delivering buffers — Strawberry between
tracks, the user pausing, a screensaver kicking in) the tap's
`process_block` doesn't run, so the envelopes don't release and
the controller carries stale attenuation into the next stretch of
audio. `AppLevelController::tick_silent(now)` — called from the
Layer A drain timer on every pass — advances envelopes through
silent gaps by feeding (0, 0) inputs at the controller's block
period. Bounded by `MAX_SILENT_CATCHUP_BLOCKS` (~10 s); past that
the envelopes have fully released anyway and we short-circuit via
`envelopes.reset()`. The drain pass runs at 5 ms cadence, so
post-resume audio sees a fresh controller within one tick.

**Per-app `user_ceiling` persistence across stream lifecycles.**
Apps like Strawberry create a fresh `Stream/Output/Audio` node
per track. PipeWire carries over the previous node's
`Props.channelVolumes` — frequently our own last-written value
from the prior track. The new `managed_stream`'s controller is
fresh (`last_written_lin = 1.0`); without intervention, the first
param event from `subscribe_params(Props)` fires with the
inherited daemon-value, the echo check fails (diff vs 1.0 is
huge), `on_external_change` misattributes it as user-set, and the
ceiling gets locked at whatever the previous track's reduction
was. `RoutingState` therefore holds a `persisted_ceilings` map
keyed by `app_label` (process_binary, falling back to
application_name); on managed_stream teardown we save the
controller's current `user_ceiling_lin`, and on spawn we
`AppLevelController::restore_state(ceiling, now)` plus write the
ceiling to `Props.channelVolumes` BEFORE calling
`subscribe_params(Props)`. The ordering is load-bearing —
writing after subscribe races against the initial-state replay
and the bug recurs. First-time apps (no persisted entry) still
treat the first observation as user-set, which is correct
because no daemon-value can have been inherited yet.

### 4.2 The metrics: peak + RMS, no LUFS

LUFS is the wrong measurement here. Its shortest window (momentary,
400 ms) blurs out exactly the transients we want to catch, and the
K-weighting filter adds CPU for no benefit when we're trying to react
fast. We also explicitly want a *second* path that targets sustained
loudness — for that, plain mean-square RMS is the right cheap stand-in,
not LUFS.

| Metric | Window | Job |
|---|---|---|
| **Peak envelope** — `max(\|samples\|)` per block, smoothed | ~100 ms attack window, ~500 ms release | Fast: catches a notification ding, a clip getting louder, a partner standing up and shouting. Triggers cut on `peak_threshold_db` (default −6 dBFS). |
| **RMS envelope** — block mean-square, smoothed | ~1–2 s | Slow: catches "this app is just chronically louder than everything else." Triggers cut on `rms_target_db` (default ≈ −20 dBFS RMS). |

Both are computed from the *same* raw buffer in the audio thread, so
the audio-thread cost is one additional MAC accumulator and a max-
scan per sample. Cost analysis in §4.7.

### 4.3 Architecture

For each managed playback stream (matched by routing rule — see §6):

1. **Audio thread (tap stream's process callback):**
   - Pull the buffer from the fanout link.
   - `peak = max(|samples|)` over the block.
   - `mean_sq = Σ(x*x) / n` over the block.
   - Push `{node_id, peak, mean_sq}` to a per-stream `rtrb`.
2. **Daemon thread (`AppLevelController` per stream):**
   - Drain the rtrb.
   - Update peak envelope (one-pole, fast α — attack within a block,
     release ~500 ms).
   - Update RMS envelope (one-pole, slow α — window ~1–2 s).
   - Compute `peak_reduction_db` and `rms_reduction_db` independently,
     then `proposed = max(peak_reduction_db, rms_reduction_db)`.
   - Smooth toward `proposed`.
   - If the smoothed value is significantly different from
     last-written AND we're not rate-limited (~10 Hz max writes per
     stream), submit `Props.channelVolumes` update.

The recovery condition is intentionally *both*-paths-agree: a
release on the peak path only counts toward unwinding gain
reduction if the RMS path also reads quiet. This avoids the pumping
artefact where a transient-heavy stream would rapidly release
between transients only to be slapped back down on the next one.

### 4.4 Honouring user-set volumes

The daemon subscribes to `Props` param-change events on each managed
stream. When a `channelVolumes` change arrives that's meaningfully
different from `last_written_volume`, it wasn't us — the user
adjusted via pavucontrol, a hotkey, an app's own UI, etc. The
controller then either:

- **defers entirely** (stops adjusting the stream until the user opts
  back in via `headroom per-app reset <app>`), or
- **treats the user value as a ceiling** (continues to cut on spikes
  but never raises above what the user wanted).

Default is the ceiling behaviour — it's the principle-of-least-surprise
choice. Users who want strict deference set a profile flag.

#### A historical concern: apps that fight back

Some PulseAudio-era apps (Discord most famously) used to read and
re-assert their own `channelVolumes` periodically, fighting any
external volume manager. The pattern produced a visible ping-pong
loop and effectively disabled per-app management.

The pattern is largely absent from modern PipeWire-native and
Electron-based apps in 2024+: in-app sliders write `channelVolumes`
only on user interaction, not on a timer. From Headroom's
perspective, those user-interaction writes are indistinguishable from
a pavucontrol slider move — both are legitimate external changes the
deference policy correctly yields to.

If a fight-back app does appear, the **ceiling** deference mode
degrades gracefully:

- App produces hot output → Headroom cuts to 0.5.
- App writes `channelVolumes = 1.0` back over our cut.
- Headroom detects the external change, marks the new value
  (1.0) as the ceiling, and stops actively writing.
- Layer A becomes effectively inert for that stream — there is no
  ping-pong, the user just doesn't get the per-app cut they were
  hoping for. The bus-level Layer C limiter (if engaged) still
  enforces the absolute output ceiling regardless.

Explicit pattern detection and rate-limiting of ceiling updates
(e.g., "ignore ceiling-restoring writes that arrive within N seconds
of our own writes") is deferred to v1, pending evidence from
real-world testing that any modern app warrants it. The graceful
degradation property is the v0 contract.

### 4.5 Reaction-time honesty

The signal-path latency is **zero**. The reaction latency to a spike
is bounded by:

```
spike in block N ─► analysis (same quantum)
                ─► rtrb push (ns)
                ─► controller computes (μs)
                ─► Props write to pw main loop
                ─► applied to block N+1 of the app stream
```

So sustained loud passages are attenuated within ~one quantum
(5–20 ms depending on the system's quantum). **Isolated one-block
transients still leak through** — the first block carrying the spike
plays with the old gain; subsequent blocks see the reduction. This
is the irreducible cost of "no lookahead allowed." For absolute
spike prevention you need lookahead, which means latency, which
contradicts the constraint of this layer.

On the processed route the bus-level Layer C limiter (§3.1) catches
anything that would exceed the ceiling regardless of whether Layer A
has caught up; on bypass routes Layer A is the only thing watching, so
isolated one-block transients reach the real sink. Layer A reduces
*workload* on Layer C where Layer C is in the path, and is a
best-effort comfort filter where it isn't; it doesn't replace the
limiter.

### 4.6 Layered budget summary

| Layer | Metric | Time scale | Signal-path latency added |
|---|---|---|---|
| A: per-app peak | sample peak per block | tens of ms | **0** |
| A: per-app RMS | block mean-square | seconds | **0** |
| C: inline soft tier | true-peak, lookahead | sub-ms | shared with hard tier |
| C: inline hard tier | true-peak, lookahead | sub-ms | ~2 ms lookahead |
| C: bus AGC | LUFS (ebur128) | many seconds | — (control plane only) |

Five distinct jobs, five distinct time scales, no two layers
duplicate each other. Layer A is the cheapest line of defense and
the only one that costs zero latency on the audio path.

### 4.7 Resource budget per stream

| | No TRUE_PEAK (recommended for Layer A) |
|---|---|
| Audio thread per quantum | ~10 μs (peak + RMS pass) |
| Daemon thread per measurement | ~few μs (HashMap lookup + envelope math) |
| Memory per controller | ~100 bytes |
| Memory per ebur128 (if enabled) | — N/A; Layer A doesn't use ebur128 |

At realistic stream counts (2–5 managed apps): **<0.5% CPU total,
<1 KB RAM total**. Doesn't move the needle.

### 4.8 Lifecycle

- **Stream appears** with `media.class = Stream/Output/Audio`
  matching a `[[per_app.rules]]` pattern: create tap link
  (`pw_link_create`), spawn controller, register rtrb.
- **Stream disappears** (`pw_registry::global_removed`): tear down
  tap, drop controller, clean up rtrb.
- **App restarts**: new `node_id` → fresh controller. User-volume
  deference state is per-stream-instance, which is the right default.

---

## 5. PipeWire integration

### 5.1 Sinks

Created on daemon startup by emitting a `pipewire.conf.d` fragment into
`$XDG_CONFIG_HOME/pipewire/pipewire.conf.d/headroom.conf` (if not already
present) and reloading. Alternative: create them at runtime via
`pw-loopback` equivalents using `pipewire-rs`. v0 ships with the
runtime-creation path so the install footprint is "one binary, one
unit file."

Sink properties:

- `headroom-processed`: `node.name=headroom-processed`,
  `media.class=Audio/Sink`, `audio.position=[FL,FR]`,
  `node.description="Headroom (processed)"`. Promoted to system
  default on startup so new streams land in it by default.

There is no second sink. Bypassed streams are routed directly at the
current `preferred_real_sink` via `target.object` metadata writes
(see §4.3).

### 5.2 The filter

One `pw_filter` node (`headroom-filter`), wrapped by the in-house
`pipewire-filter` workspace crate, with **four mono DSP ports** —
the canonical shape `module-filter-chain` uses:

- `input_FL` / `input_FR` — `Direction::Input`, `format.dsp = "32 bit
  float mono audio"`, `audio.channel = FL|FR`. The routing engine
  links these to the corresponding monitor ports on
  `headroom-processed`.
- `output_FL` / `output_FR` — `Direction::Output`, same format
  properties. The routing engine links these to the corresponding
  input ports on `preferred_real_sink`.

Single `process` callback per quantum: dequeue all four mono
buffers, run AGC gain → compressor → limiter on the
`(in_l[i], in_r[i])` pair, write `(out_l[i], out_r[i])`. Queue all
four buffers back via `Buffer::Drop`. Allocation-free; guarded by
`assert_no_alloc` in debug. Parameter updates arrive over an `rtrb`
SPSC queue from the control thread.

**Routing.** WirePlumber's policy does not auto-link `pw_filter`
nodes (the `pw_filter` API has no `AUTOCONNECT` flag and WP has no
default linking heuristic for hybrid input+output nodes). The
routing engine therefore wires the filter explicitly:
`try_capture_filter_playback` matches the filter's
registry global by `node.name`, then enqueues two routes through
the existing `pending_routes` machinery — one source-=-processed /
target-=-filter for the input legs, one source-=-filter /
target-=-real_sink for the output legs. The
`pair_count >= 2` ordinal pairing in `apply_pending_routes`
(FL→FL, FR→FR) is exactly the per-channel mono structure above.

The filter is resolved as a routing target via
`resolve_routing_target` / `is_routing_target` helpers that check
`filter_playback_id` ahead of `sinks_by_name` — the filter is
**not** registered as a fake `Audio/Sink`, so the map stays
genuinely sink-only.

**Rebuild on rate change.** When the real sink's negotiated rate
changes (`PwCommand::RebuildFilter`), the routing engine clears
`filter_playback_id` *before* dropping the old filter so the new
filter's registry global is recaptured even if its `global_add`
races ahead of the old `global_remove`.

### 5.3 Routing

- On startup, write `default.audio.sink` in the `default` metadata to
  point at `headroom-processed` so new streams default to the
  processor. The previous value (the user's hardware sink) is
  captured as the initial `preferred_real_sink`.
- Subscribe to `pw_registry` global-added events.
- On any new node with `media.class == "Stream/Output/Audio"` and
  `node.dont-move != true`:
  - Read `application.process.binary`, `application.name`,
    `pipewire.access.portal.app_id`, `media.role`.
  - Evaluate routing rules from the active profile to decide
    `processed` vs. `bypass`.
  - Write `target.object` into the `default` metadata for the new
    stream:
    - `processed` → `headroom-processed`'s `object.serial`.
    - `bypass` → `preferred_real_sink`'s `object.serial`.
  WirePlumber honours this for any movable stream.
- Watch `default.audio.sink` metadata changes. When the user switches
  the system default to a hardware sink, the daemon:
  - records that sink as the new `preferred_real_sink`,
  - re-links `headroom-filter`'s playback stream to it,
  - rewrites `target.object` for every currently-bypassed stream so
    they follow the new hardware,
  - re-asserts `headroom-processed` as the *default for new streams*
    (so subsequent app launches still land in the processor).
- Hotplug (sink appears/disappears) goes through the same code path.

### 5.4 Stream identification

| Property | Reliability | Use |
|---|---|---|
| `application.process.binary` | high (kernel-sourced) | primary key |
| `application.name` | medium | secondary / display |
| `pipewire.access.portal.app_id` | high (Flatpak only) | match sandboxed apps |
| `media.role` | low (most apps omit) | bonus signal only |
| `media.class` | structural | gate to playback streams |

---

## 6. Profiles

Profile files live in `$XDG_CONFIG_HOME/headroom/profiles/*.toml`,
shadowing shipped defaults in `/usr/share/headroom/profiles/` by
name. Profile files are user-authored configuration — they're the
thing you open in `$EDITOR`. File-watcher hot-reload via
`notify-debouncer-mini` is planned; in the meantime `profile.reload`
re-scans on demand.

Daemon-managed user state — active profile name, per-app route
overrides made via `route.set`, dotted-key tweaks made via
`setting.set`, the global bypass flag — is *not* mixed in with the
profile TOMLs. It lives in a single `overlay.toml` at
`$XDG_STATE_HOME/headroom/overlay.toml`, written atomically by the
daemon (stage to `overlay.toml.tmp-…`, then rename). The overlay
rides on top of whichever profile is active, so `route.set obs
bypass` persists across `profile.use night` — that's a user
preference, not a tweak of `default`. If the overlay names an active
profile that's not on disk, the daemon falls back to the built-in
default and surfaces a warning; it does not refuse to start.

Each profile is a complete listening scenario. Schema (`headroom-core::profile`):

```toml
name = "default"
description = "Gentle transparent processing for everyday use."

[agc]
enabled = true
target_lufs = -18.0       # ITU-R BS.1770 integrated target
attack_ms = 2000.0
release_ms = 800.0
silence_threshold_lufs = -70.0
max_boost_db = 12.0
max_cut_db = 12.0

[compressor]
enabled = true
detector = "peak"          # "peak" | "rms"
threshold_db = -24.0
ratio = 2.5
knee_db = 6.0
attack_ms = 10.0
release_ms = 100.0
makeup_db = "auto"         # number or "auto"

[limiter]
ceiling_dbtp = -0.1
lookahead_ms = 2.0
release_ms = 80.0
hold_ms = 5.0
oversample = 4             # 1 | 2 | 4 | 8 (1 disables ISP detection)
link = "stereo"            # "stereo" | "dual-mono"

[meters]
publish_hz = 20.0

[[rules]]
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph"] }
route = "bypass"

[[rules]]
match = { process_binary = ["firefox", "chromium", "google-chrome", "Discord", "discord", "element-desktop", "Slack", "zoom", "WEBRTC VoiceEngine"] }
route = "processed"

[default_route]
route = "processed"        # safe default: anything unmatched is processed

# ----------------------------------------------------------------------
# Per-application level control (Layer A). Orthogonal to routing — you
# can enable per-app on bypass-routed streams to get zero-latency
# level control (e.g. tame Discord notifications without touching
# the game's audio path).
# ----------------------------------------------------------------------
[per_app]
enabled = true                # master switch; false disables Layer A entirely
default_enabled = false       # for streams not matched by any rule below

# Per-rule knobs. Matches use the same key set as [[rules]] above.
[[per_app.rules]]
match = { process_binary = ["Discord", "discord", "element-desktop", "Slack", "zoom"] }
enabled = true
peak_threshold_db = -6.0      # short-window peak above this triggers cut
rms_target_db = -20.0         # long-term RMS target (slow path)
max_cut_db = 12.0             # never cut more than this
peak_attack_ms = 5.0
peak_release_ms = 500.0
rms_window_ms = 1500.0
# Controller-side knobs (all optional; defaults shown).
smoother_ms = 30.0            # anti-bounce smoother on max(peak,rms)
write_db_threshold = 0.5      # dB diff below which we don't fire a write
min_write_interval_ms = 100.0 # min ms between writes per stream (10 Hz cap)
defer_to_user = "ceiling"     # "ceiling" | "strict"

[[per_app.rules]]
match = { process_binary = ["firefox", "chromium", "google-chrome"] }
enabled = true
peak_threshold_db = -3.0      # browsers run hotter; raise the trigger
rms_target_db = -18.0

# Music, DAWs, games default to per-app off — they're either trusted
# to set their own level or routed bypass for a reason.
[[per_app.rules]]
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph", "carla"] }
enabled = false
```

### Shipped profiles

| name | one-liner |
|---|---|
| `default` | Gentle transparent processing, sensible for daily use. |
| `night` | Aggressive: −20 LUFS, 4:1, fast release, narrow dynamic range. |
| `speech` | VoIP-focused; short attack, fast release, controlled dynamic range. |
| `transparent` | Limiter only. Compressor + AGC bypassed. Safety net only. |
| `bypass-all` | Routes everything directly to the real sink. The kill switch. |
| `spike-protection` | Minimal processing; high-threshold catch only. Untouched audio, hard guard against blasts. |
| `movie` | Wide-DR film: lifts dialogue, keeps action punchy but bounded. |
| `music` | Inter-track loudness leveling; routes music players *through* the bus. |
| `podcast` | Spoken-word playback: even narration loudness, smooth and unfatiguing. |
| `commute` | Listening in noise: heavy normalization + boost, kept loud. |
| `gaming` | Latency-first: games bypass, voice chat processed, notifications tamed per-app. |
| `party` | Loud room playback (anti-`night`): maximum loudness, dynamics sacrificed. |
| `broadcast-14` | Normalizes everything to −14 LUFS (streaming loudness) so sources match. |
| `quiet-hours` | More aggressive than `night`: very low ceiling, near-flat dynamics. |

The limiter section of `bypass-all` is irrelevant in practice (nothing
flows through `headroom-processed`), but its ceiling field is still
respected as a fail-safe in case a stream lands on the processed sink
anyway.

---

## 7. IPC

Transport: Unix-domain socket, `SOCK_STREAM`, `0600`, at
`$XDG_RUNTIME_DIR/headroom/control.sock`.

Wire protocol: **see `IPC.md`** for the full normative schema.
Summary: u32 BE length prefix + UTF-8 JSON payload. Three message
shapes — `Request` (id + op + args), `Response` (id + result|error),
`Event` (topic + data). Subscribers signal interest by topic; events
fan out to all subscribers with bounded per-subscriber queues. Slow
subscribers have events **dropped** (overflow events count is itself
published on the `daemon` topic so clients know they fell behind).

The first-party Rust wrapper is `headroom-client`, mirroring how
[`niri-ipc`](https://github.com/YaLTeR/niri/tree/main/niri-ipc) wraps
Niri's socket: a thin, no-magic crate that re-exports the wire types
from `headroom-ipc` and adds a blocking `Client` (and an optional async
`AsyncClient` behind a feature flag).

---

## 8. CLI

```
headroom status                              # current profile, sinks, levels
headroom daemon                              # run the daemon (systemd Type=simple)
headroom profile list | use <name> | show [name]
headroom route list
headroom route set   <app> processed|bypass  # persists in user profile
headroom route unset <app>
headroom route stream <node-id> processed|bypass    # ad-hoc
headroom set <key> <value>                   # tweak active profile in place
headroom get <key>
headroom bypass on|off                       # global kill switch
headroom reload                              # reload profiles from disk
headroom monitor                             # live meter TUI (uses subscribe)
```

CLI is sync, blocks on `UnixStream`. Talks the same JSON wire as any
other client.

---

## 9. Crates

```
headroom/
├── flake.nix                  # devshell + package
├── Cargo.toml                 # workspace
├── PLAN.md                    # this file
├── IPC.md                     # wire-protocol schema (normative)
├── README.md
└── crates/
    ├── headroom-dsp/            # AGC + compressor + limiter (pure DSP, no PW)
    ├── headroom-ipc/            # wire types, framing, serde; no I/O
    ├── headroom-client/         # blocking client (+ optional async); thin
    ├── headroom-core/           # daemon: PW integration, routing, profiles, IPC server
    └── headroom-cli/            # `headroom` binary; depends on headroom-client
```

### External crates (final v0 dep list)

**Audio / DSP**
- `pipewire`, `libspa` — official PipeWire bindings.
- `ebur128` — measurement.
- `rtrb` — SPSC ring buffer (audio ↔ control).
- `basedrop` — RT-safe shared ownership.
- `assert_no_alloc` — debug-build tripwire.

**Plumbing**
- `serde`, `serde_json` — IPC + profile (de)serialization.
- `serde-toml` (`toml`) — profile files.
- `clap` (derive) — CLI.
- `tracing`, `tracing-subscriber`, `tracing-journald` — logs.
- `notify`, `notify-debouncer-mini` — profile hot-reload.
- `crossbeam-channel` — control-plane channels.
- `parking_lot` — mutexes.
- `signal-hook` — clean shutdown.
- `thiserror` — error types.

No `tokio`, no `zbus`, no `dbus-*`.

---

## 10. Nix

`flake.nix` ships:

- A **devshell** with rust toolchain (via `rust-overlay` for pinned
  channel; default to a stable release pinned in
  `rust-toolchain.toml`), `pkg-config`, `pipewire`'s dev outputs,
  `clang` (for bindgen if invoked by deps), `socat` (handy for poking
  the IPC), `jq`.
- A **package** output (`packages.<system>.default`) that builds the
  daemon + CLI with `rustPlatform.buildRustPackage`. v0 uses
  `cargoLock.lockFile`. Crane can come later if incremental builds in
  CI become a bottleneck.
- A `nixosModules.default` placeholder so packagers can wire the user
  unit later. Not implemented in v0 of the flake itself.

Intermediate dev work uses plain `cargo` inside `nix develop`. Final
builds and any CI go through `nix build`.

---

## 11. Phased implementation

The phases are roughly token-of-work units, not calendar weeks. **All
planned phases (0–8) are done as of 2026-05-21**; this section is
preserved as historical context + a reading guide to the commit log.
See [[headroom-project]] in team memory for the per-commit ledger.

**Phase 0 — scaffolding.** Flake, workspace, crate skeletons, README,
PLAN/IPC docs. *(done as part of this commit)*

**Phase 1 — IPC + client.** `headroom-ipc` (types, framing, codec) and
`headroom-client` (blocking `Client`) implemented against the schema in
`IPC.md`. Round-trip tests, fuzz the codec. *(this commit)*

**Phase 2 — DSP kernels.** `headroom-dsp` with limiter, compressor, AGC,
oversampler, envelope. Tested in isolation against synthesized
signals; limiter validated to hold a −0.1 dBTP ceiling on EBU TECH
3341 generators. *(this commit: limiter first)*

**Phase 3 — daemon core.** `headroom-core` brings up the
`headroom-processed` virtual sink, the bus filter (originally a
`pw_stream` pair + SPSC ring; rewritten to a single `pw_filter`
node in 2026-05-22 — see PW gotchas #14, #17, #18 and the
`pipewire-filter` workspace crate),
the `preferred_real_sink` tracker, the registry subscriber, and the
routing engine. Hardcoded profile, no IPC server yet.

**Phase 4 — IPC server + profile manager.** Wire `headroom-core` to the
IPC schema. Profile loading + hot-reload. Slow AGC loop ticking on
real loudness measurements.

Sub-stages used in commits / TODOs:

- **4a–4d** — Unix socket server, op dispatch, mutating ops, event
  broadcaster.
- **4e** — `ProfileStore`: shipped + user profiles, atomic reload,
  user overlay at `$XDG_STATE_HOME/headroom/overlay.toml`. `profile.use`,
  `profile.reload`, `setting.set`, `route.set` all dispatch through it.
- **4f** — DSP parameter propagation: `setting.set` reaches the running
  filter via the `rtrb` control queue, so live profile/setting edits
  take effect without restart.
- **4h** — `preferred_real_sink` tracking: subscribe to
  `default.audio.sink`, snapshot the prior default, promote
  `headroom-processed`, retarget every bypassed stream on
  default-sink change, on hotplug, and on Bluetooth handoff. Also
  pins the filter's playback to the tracked real sink so processed
  audio follows when the user switches default, and resolves the
  real sink's node id from the registry for `status` reporting.
- **4i** — `route.stream <node-id> processed|bypass`: ad-hoc per-stream
  override that doesn't write a profile rule. Crosses the
  IPC-thread → PipeWire-thread boundary via a `crossbeam` channel
  drained by a 50 ms timer source on the main loop. State updates
  synchronously; metadata write follows ≤ ~50 ms later.

- **Slow AGC loop** — wraps up Phase 4. Audio-thread `AgcGain` stage
  sits at the head of the DSP chain (anti-zipper smoother around a
  per-sample multiplier). Filter pushes *pre-AGC* input samples into a
  dedicated measurement ring. A `AgcController` on the PipeWire main
  loop ticks at 50 ms: drains the ring into `ebur128` (Mode S | M |
  TRUE_PEAK), reads `[agc]` config from the active profile, computes
  `target_lufs − short_term_lufs` clamped to `[-max_cut_db,
  +max_boost_db]`, gates below `silence_threshold_lufs`, slow-smooths
  via leaky integrator, and pushes the result through `FilterControl`
  on the same `rtrb` channel `setting.set` uses.

### Tracked follow-ups (carried past their sub-stage)

Items deliberately deferred from earlier sub-stages so they don't get
lost. Pick up by name when the trigger that gates them fires.

- **Ephemeral overlay mutations.** *(4e follow-up.)* All `route.set`
  / `setting.set` changes are persisted to `overlay.toml`. A
  `--ephemeral` flag (or `--volatile`) on the CLI for one-shot tweaks
  that don't outlive the daemon was considered and dropped from v0
  for simplicity. Revisit if real users ask for it; the store-level
  change is a flag on the setter methods. **Dormant** — no user has
  asked through Phase 8.
- **Filter rate matching to the real sink.** *(F5 follow-up.)* §3.1
  documents the contract leak when the real sink runs at a
  non-48 kHz native rate. Closing it requires dynamic
  `FILTER_SAMPLE_RATE`, kernel rebuild on real-sink change
  (compressor + limiter coefficients are rate-dependent), and
  Layer A's `LAYER_A_BLOCK_DT_S` constant becoming dynamic too.
  Gated on a multi-rate hardware test bench — no point shipping
  the refactor without something to validate it against. **v1 scope.**
- ~~**Bus filter is two `pw_stream`s + an SPSC ring → per-quantum
  tremolo on shared-driver topologies.**~~ **Closed 2026-05-22 by
  rewrite to a single `pw_filter` node** (new in-house
  `pipewire-filter` workspace crate holding the unsafe FFI; one
  process callback with input→DSP→output ordering by construction;
  capture↔playback ring deleted entirely). Surfaced on first soak
  that WP doesn't auto-link `pw_filter`, so the filter was
  restructured to 4 mono ports (canonical `module-filter-chain`
  shape) and the routing engine extended to wire it explicitly via
  `link-factory`. See §5.2 above and `pipewire-gotchas` #14/#17/#18.
- ~~**Filter playback BUSY spikes (periodic, ~10 s cadence).**~~
  **Closed in 8e (`d52cd6d`).** The instrumentation added by 8e
  did not reproduce the ~8×-baseline outlier pattern in a ~3 min
  release-build capture; steady state was ~2.2 ms / call at this
  hardware's quantum with max growing only to 1.3× baseline.
  `PlaybackTiming` stays so future regressions surface at WARN.
  Original observation may have been a transient WP/PW housekeeping
  artefact under a different config; no actionable code change.
- **Sub-millisecond dispatch primitive for spike-reactive writes.**
  *(Phase 6 optimisation, downgraded from prerequisite.)* The 4i
  `PwCommand` channel uses a 50 ms polling timer, fine for
  `route.stream` and slow AGC. Layer A's per-app
  `Props.channelVolumes` writes were originally feared to need a
  sub-ms wake primitive. After 6a/6b benches landed (see
  §11.6 below) we re-evaluated: at a 5 ms polling timer and 21 ms
  PipeWire quantum, the worst-case detection-to-write latency stays
  well inside one quantum, which is what PLAN §4.5 actually
  promises. Polling reuses existing infrastructure and is cheap
  (controller tick is ~30 ns; even at 200 Hz it's lost in the
  noise). The tighter primitive — `EventSource::signal` with an
  `unsafe impl Send` shim around `spa_loop_utils.signal_event`, or a
  pipe + `IoSource` — stays on the table as an optimisation if
  manual testing shows audible spike-leak artefacts. `pw::command`
  module docs still carry the constraint warning for future variants
  that might be tempted to share the 50 ms timer.

**Phase 5 — CLI + monitor TUI.** `headroom-cli` implements all the
subcommands above, plus a `monitor` TUI built on the meters
subscription.

**Phase 6 — Per-application level control (Layer A).** Per-managed-stream
tap creation, `AppLevelController` with peak + RMS envelopes,
`Props.channelVolumes` writer, user-volume deference logic,
`[per_app]` profile parsing, `headroom per-app …` CLI verbs, and a
per-stream meter event on the IPC. Land after the bus path is stable
so we have a baseline to compare against.

Sub-stages:

- **6a** — Pure DSP. `headroom_dsp::LevelEnvelopes`: two-tier (peak
  + RMS) block-rate detector, `max(peak_reduction, rms_reduction)`
  combined, clamped to `max_cut_db`. Allocation-free,
  block-rate-driven (audio thread emits one `(peak, mean_sq)` pair
  per quantum).
- **6b** — Daemon-side glue.
  `headroom_core::app_level::AppLevelController`: rule snapshot,
  envelopes, 30 ms anti-bounce smoother, 0.5 dB / 100 ms write
  gate, ceiling vs strict deference state.
  `app_level::evaluate` matches `[[per_app.rules]]` against
  `PwNodeInfo` using the same matcher the routing engine uses.
- **6c** — PipeWire tap + audio-thread analysis. **Mechanism**:
  per managed stream we create our own `pw_stream` (Direction::Input,
  F32LE stereo, rate left unspecified to negotiate with the source,
  `AUTOCONNECT` off, `NODE_DONT_RECONNECT`, `node.dont-move`),
  `connect()` with no target, `set_active(true)`. PipeWire creates
  our input ports from the declared format. We then build **explicit
  passive port-level links** via `link-factory` with
  `link.output.port` / `link.input.port` set to the source's and
  tap's port global IDs respectively, plus `link.passive = true`.
  **Why not `target.object` or `target_id`**: empirically (6c manual
  smoke) WirePlumber's policy refuses to wire `Stream/Output →
  Stream/Input` via any session-manager-mediated path — it logs no
  error, just doesn't act. The stream-level target was getting set
  on the node (`node.target = <source-id>`) but no link ever
  appeared. Going through `link-factory` with explicit port IDs
  bypasses the session manager entirely and uses PipeWire core
  directly. **Per managed stream**: one `pw_stream`, two `Link`
  proxies (one per channel), one `MeasurementSample` `rtrb`
  (capacity 64). Audio-thread `process` runs `peak = max(|x|)` and
  `mean_sq = Σx²/N` over the block, pushes one sample to the ring.
  **Lifecycle**: registry watcher sees a `Stream/Output/Audio`
  matching a `per_app` rule → spawn tap (ports come up
  asynchronously) → the Layer A drain timer (6d) retries link
  creation each tick until both port sets are visible on the
  registry → links built, stream transitions to `Streaming`,
  samples flow. On registry `global_remove` of the source, drop the
  `ManagedStream`; declaration order severs links first, then the
  tap stream + listener.
- **6d** — `Props.channelVolumes` writes + controller drain timer.
  A polling timer source on the PipeWire main loop ticks every 5 ms
  (200 Hz, CPU cost ≪ 0.1% of one core per the benches), iterates
  active controllers, drains each measurement ring, calls
  `process_block`, and on a `Some` return writes
  `Props.channelVolumes` via the bound `default` metadata
  (subject = source node id). The 5 ms tick guarantees a spike
  detected at quantum boundary `N` is written before quantum `N+1`
  starts on typical 21 ms quanta — see §4.5 reaction-time honesty
  table.
- **6e** — User-volume deference + per-stream meter events.
  Subscribe to `Props` param-change events on each managed stream.
  Distinguish daemon writes from external by comparing against
  `last_written_lin` (within 1e-4) — external changes apply
  ceiling-mode or strict-mode deference per the matched rule's
  `defer_to_user` field. Per-stream meters publish on the `meters`
  topic with the smoothed reduction, the peak/RMS envelope values,
  and the current applied `channelVolumes`.

**Validated cost budget (criterion microbenches, run 2026-05).**
PLAN §4.7 budgeted "~10 μs/quantum audio thread, few μs/measurement
daemon thread." Reality on this hardware:

| Bench | Time |
|---|---|
| Audio-thread peak + mean_sq scan, 1024-frame stereo block | 1.33 μs |
| `LevelEnvelopes::process_block` (daemon) | 18 ns |
| `AppLevelController::process_block` hot signal | 29 ns |
| `AppLevelController::process_block` quiet signal | 22 ns |

5 managed streams: audio thread ≈ 6.6 μs/quantum (0.03% of one
core at 21 ms quanta); daemon ≈ 145 ns/quantum. ~7-10× under the
PLAN budget, so the design has room for many more managed streams,
or for adding ebur128 / TRUE_PEAK to Layer A later if useful.

**Manual latency validation (post-6c implementation).** PipeWire
scheduling can't be benched from Rust alone. Use:

- **`pw-top`** — note the source-node `QUANT` and any WAIT/BUSY or
  delay column before attaching the tap; attach Layer A; confirm
  the source-node numbers don't change. The tap appears as a new
  row with its own quantum; the test is whether the *app's* numbers
  degrade.
- **`qpwgraph`** / **`helvum`** — visually confirm the source node
  has two outgoing links (one to its original destination, one to
  our tap), both terminating correctly.
- **Ear** — connect/disconnect the tap on live audio. Crackles or
  dropouts on attach indicate the §4.1 sibling-fanout claim doesn't
  hold and the design needs revisiting.

If those three say "fine," the §4.1 promise is upheld in practice
and 6c is acceptance-tested. `jack_iodelay` and other true-round-trip
tools are overkill.

**Phase 7 — Packaging.** *Done — `c65c75b`.* `contrib/systemd/headroom.service`
(user-scope, Type=simple, After=pipewire.service, Restart=on-failure,
journald, LimitRTPRIO=20). The package's `postInstall` substitutes
the unit's `@bindir@` placeholder with an absolute store path and
copies `profiles/*.toml` to `share/headroom/profiles/`. Two Nix
modules: `nixosModules.default` (`programs.headroom.enable` —
binary on global PATH + `systemd.packages` for `systemctl --user`
discovery + hard assertion on `services.pipewire.enable`) and
`homeModules.default` (`services.headroom.enable` — symlinks
shipped profiles into `$XDG_CONFIG_HOME/headroom/profiles/`,
`extraProfiles` attrset for per-user overrides, writes the systemd
user unit). README rewritten with install + usage sections.

**Phase 8 — Hardening.** *Done — `9220143` + `d52cd6d` + verification.*
- **8a — `assert_no_alloc` on audio-thread callbacks (`9220143`).**
  `#[global_allocator] AllocDisabler` in `headroom-cli/src/main.rs`
  behind `cfg(debug_assertions)` (release strips it via the crate's
  default `disable_release`). The three RT callbacks
  (`capture_process`, `playback_process`, `tap_process`) wrap their
  body in `assert_no_alloc(|| inner(...))`. Verified by a deliberate
  `Vec::with_capacity` injection → SIGABRT on first audio callback;
  reverted before commit. Audio thread proven alloc-free under
  multi-thousand-callback live load.
- **8b — live profile-reload under signal flow (verification only).**
  Edit `$XDG_CONFIG_HOME/headroom/profiles/<active>.toml` while a
  sine plays: notify-debouncer-mini fires, `ProfileStore::reload`
  runs, `setting.set` propagates via `FilterControl`'s rtrb to the
  audio thread. Compressor GR went 0 → −9.3 dB ≈ 1 s after edit
  and back to 0 after restore; 180 meter ticks over 9 s with max
  inter-tick gap = exact 50.0 ms (the AGC period). No glitches.
- **8c — sink hotplug / default-sink change (verification only).**
  `wpctl set-default <other-sink>` while daemon runs:
  `on_metadata_property` fires, `adopt_new_real_sink` runs,
  filter.playback re-pinned via 4k explicit-link enforcement,
  `routing/real_sink_changed` emitted on the wire. Bounces back
  cleanly.
- **8d — multi-rate hardware (partial / deferred).** Filter is
  hardcoded F32 stereo @ 48 kHz; PipeWire's link layer inserts a
  resampler at the filter.playback → real-sink edge when rates
  differ; bus DSP stays at 48 kHz internally. Architecture is
  sound; real-hardware validation (USB DAC at 96k etc.) deferred
  until available.
- **8e — playback callback timing instrumentation (`d52cd6d`).**
  Lock-free `PlaybackTiming` atomics in `meters.rs`; AGC controller
  drains once per second and logs at WARN above
  `SPIKE_THRESHOLD_US = 5000`. The original ~10 s-cadence ~8×
  spike pattern from §11 follow-ups *did not reproduce* in a ~3 min
  release-build capture; steady state 2.2 ms / call at ~4 Hz,
  max climbed to only 1.3× baseline. Instrumentation kept so
  future regressions surface.

---

## 12. Risks & open questions

These are the original v0 design risks — still useful as a checklist
for new contributors. Phase 4k/4l/8c have exercised the routing /
hotplug / default-sink branches; the bullets below are unchanged
since several of them remain live concerns for non-NixOS distros
and multi-rate hardware. See [[headroom-project]] in team memory
for current status per risk.

- **WirePlumber re-linking on device hotplug.** When a Bluetooth
  headset connects, WP re-evaluates linking. Headroom must re-pin its
  routed streams. Tractable; the registry events surface this.
- **Latency budget.** Processed path: one quantum hop (the filter)
  plus lookahead (~2 ms) plus 4× oversampling buffering ≈ 8–15 ms
  added to processed-path latency. Fine for video/voice. Bypass path:
  **zero added latency** — the stream rides the real sink directly.
- **Default-sink changes.** When the user switches the system default
  to a hardware sink, the daemon adopts it as `preferred_real_sink`,
  re-links the filter's playback, retargets bypassed streams, and
  re-asserts `headroom-processed` as the default for new streams.
  Watching `default.audio.sink` in the metadata is the trigger.
- **Sample-rate mismatch.** `headroom-processed`, the filter, and the
  real sink must agree, or PipeWire resamples behind our back. The
  filter should source its rate from the real sink and convert on the
  capture side only.
- **Surround content downmix vs. passthrough.** v0 punts: anything
  `>2ch` is force-bypassed regardless of profile rule. The bus
  filter is F32 stereo by construction and pulling a 5.1+ stream
  into it would either drop the centre/LFE/surround channels (with
  explicit links pairing only the first two ports) or run our DSP
  on a downmix that wasn't asked for. The check fires in
  `routing::evaluate` based on `PwNodeInfo.audio_channels` (parsed
  from the stream's `audio.channels` property). The explicit-link
  pairing in `apply_pending_routes` was generalised from `take(2)`
  to `take(min(src, dst))` so wide bypass to a wide real sink links
  all channels; narrower sinks let PipeWire's source-side adapter
  handle downmix as usual.

---

## 13. License

GPL-3.0-or-later for the daemon and CLI. `headroom-dsp` and `headroom-ipc`
are MPL-2.0 so third-party clients and plugin hosts can link them
without GPL contagion. (Re-evaluate when LSP-derived code is
introduced; current plan does not pull any.)