Phase 3 — bring up the daemon end-to-end through six checkpoints:
3a Module skeleton (error, profile, routing, runtime, pw/*)
3b Pure routing engine + 13 tests (no PipeWire dep)
3c PwContext: main loop, sigprocmask-block SIGTERM/SIGINT before
add_signal_local so signalfd actually picks them up
3d headroom-processed virtual sink via the adapter factory with
factory.name=support.null-audio-sink
3e Filter: two pw_streams (capture from monitor / playback to real
sink) with an rtrb SPSC ring between them. DSP chain
(Compressor → two-tier Limiter) runs in the playback callback.
Allocation-free; #![forbid(unsafe_code)] preserved via
bytemuck::try_cast_slice for the byte↔f32 reinterpretation.
3f Registry watcher binds the default metadata, evaluates new
Stream/Output/Audio nodes against profile rules, writes
target.object for processed routes. Self-stream guard skips
anything whose node.name starts with 'headroom-filter'.
Workspace deps added: pipewire = { features = ["v0_3_44"] } for the
modern TARGET_OBJECT key, libspa, rtrb, nix (sigprocmask), bytemuck.
Tests: 65 passing (28 dsp, 20 ipc, 4 client, 13 core). Clippy clean
at default level under -D warnings.
PLAN.md §5 renumbered to fix stale subsection labels (was 4.1–4.4
from before the per-app insertion).
Known limitations punted to Phase 4 (documented in commit history
and team memory):
- WirePlumber doesn't always honor late target.object writes once
a stream is already linked (timing race).
- preferred_real_sink dynamic tracking stubbed.
- No auto-promote of headroom-processed to system default.
- application.process.binary occasionally arrives in late metadata
updates after the global registers; routing logs show '?' until
we add a re-read.
39 KiB
Headroom
A Rust AGC + compressor + true-peak limiter for PipeWire. Per-application exclusion, profile-based presets, single-binary daemon, scriptable over a Unix-domain socket.
This document is the canonical plan. It supersedes the earlier conversational sketch.
1. Goals & non-goals
Goals
- Hard safety net. Output is guaranteed to stay below a configurable ceiling (default −0.1 dBTP) with proper inter-sample peak handling. This guarantee survives daemon misbehaviour, profile reloads, and bad routing decisions — it is enforced inline in the audio path.
- Per-application exclusion. Music players, games, and DAWs route around the processor; browsers, voice chat, and "everything else" go through it. Rules are app-level and live in profiles.
- Drop-in defaults. First-run experience: install, enable user service, done. No mandatory config. Power users edit TOML or use the CLI.
- Profiles for distinct listening scenarios (default / night / speech / transparent / bypass-all).
- Single binary. Daemon, filter, routing, and control loop all live in one process. The DSP kernels are a separate crate so they can be reused (LV2/standalone) later.
- Scriptable. Unix-domain-socket IPC with a documented JSON schema
so anyone can write an alternative client (Qt/QuickShell panel, Eww
widget, scripts). A first-party Rust crate (
headroom-ipc) wraps it. - Rust, lean dep tree. No NIH where mature crates exist, no bloat where they don't.
Non-goals (v0)
- Surround / >2-channel content. v0 is stereo only; >2ch is routed directly to the real sink, untouched by Headroom's filter chain.
- LV2/CLAP plugin distribution. The DSP crate is plugin-shaped so this is cheap to add later, but it's not a v0 deliverable.
- GUI. Third parties can build one against the IPC.
- Capture-side processing (microphone). v0 is playback only.
2. Architecture
Each app's audio takes one of four end-to-end paths, chosen by two orthogonal profile flags: a routing decision (processed vs. bypass) and a per-app level-control flag (on vs. off).
┌─── optional, opt-in per app (Layer A) ────────────────┐
│ │
│ ┌─► passive tap ─► peak + RMS ─► AppLevelController │
│ │ (sibling link in same quantum) │ │
│ │ │ │
│ │ Props.channelVolumes write ◄──────┘ │
│ │ │
└───┼───────────────────────────────────────────────────┘
│
│ APP STREAM NODE
│ ┌──────────────────────────┐
│ │ raw output │
app's audio ───►├──►│ × channelVolumes │──► output port
│ └──────────────────────────┘
│ │
└────────────────────────────────────────────│
│
routing decision (Layer B) │
target.object set by daemon │
│
┌─────────────────────────────────────────┴┐
▼ ▼
route = "bypass" route = "processed"
target.object = target.object =
preferred_real_sink headroom-processed
│ │
│ ▼
│ ┌─────────────────────┐
│ │ headroom-processed │
│ │ (virtual sink, the │
│ │ system default) │
│ └─────────┬───────────┘
│ ▼
│ ┌─────────────────────┐
│ │ headroom-filter │
│ │ (pw_stream pair) │
│ Layer C (bus DSP) │ AGC → compressor │
│ │ → soft → hard │
│ └─────────┬───────────┘
│ │
▼ ▼
preferred_real_sink ◄──────────────────────► (DAC)
The four end-to-end paths
| Routing = bypass | Routing = processed | |
|---|---|---|
| per-app off | ① true bypass — Headroom touches nothing on the signal path. Same latency as if Headroom weren't installed. | ③ bus DSP only — stream flows through headroom-processed and the inline chain. channelVolumes left at whatever the user/app set. |
| per-app on | ② per-app only — level-reactive channelVolumes writes, no graph hop. Zero added signal-path latency. |
④ full stack — per-app level control and bus DSP. Maximum protection. |
Path-by-path properties:
| Path | Signal-path latency added | Limiter contract? | Per-app gain ride? |
|---|---|---|---|
| ① bypass / per-app off | 0 | no | no |
| ② bypass / per-app on | 0 | no | yes (Layer A) |
| ③ processed / per-app off | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | no |
| ④ processed / per-app on | filter hop + ~2 ms lookahead | yes (Layer C hard tier) | yes (Layer A) |
The two flags are independent. A competitive game's typical config
is ①: zero Headroom involvement in its audio. A user concerned about
notification dings on top of that game would put Discord on ② or ④
(so notifications get tamed via Discord's own channelVolumes)
while leaving the game on ①.
headroom-core (daemon, one process)
• per-app level controllers (Layer A)
• routing engine + preferred_real_sink (Layer B)
• slow AGC loop, profile manager (Layer C)
• IPC server
│
▼
$XDG_RUNTIME_DIR/headroom/control.sock
│
┌───────────┴───────────┐
▼ ▼
headroom CLI third-party clients
(Qt panel, widgets, …)
See §4 for Layer A's mechanics and §5 for the PipeWire-level details of Layers B and C.
One virtual sink, one daemon process
headroom-processed— virtual sink. Set as the system default so new streams land in it by default. Its monitor is captured byheadroom-filter, pushed through the DSP graph, and emitted to the currentpreferred_real_sink.- No bypass sink. Streams marked
route = "bypass"are pointed directly atpreferred_real_sinkvia atarget.objectmetadata write. They pay zero added latency vs. running without Headroom installed at all — there's no extra graph hop, no extra DSP. The word "bypass" in the profile DSL means "route directly to the real sink, untouched." - The daemon owns:
- the one virtual sink (created on startup, torn down on exit);
- the filter (a pair of
pw_streams — capture + playback — running on PipeWire's realtime audio thread, with the playback half targetingpreferred_real_sink); - one
AppLevelControllerper managed app stream (§4), each with its own passivepw_streamtap, peak/RMS envelopes, andProps.channelVolumeswriter. Created/destroyed on stream lifecycle events. preferred_real_sinktracking. The daemon watches thedefault.audio.sinkmetadata key. When the user changes the system default (via pavucontrol,wpctl set-default, etc.) to a hardware sink, the daemon (a) treats that sink as the newpreferred_real_sink, (b) re-linksheadroom-filter's playback stream to it, and (c) rewritestarget.objectfor every currently-bypassed stream so they follow. Hotplug / Bluetooth handoffs use the same machinery.- the slow AGC loop (reads loudness, writes gain target into the
filter via an
rtrbchannel); - the routing engine (subscribes to the PipeWire registry, evaluates
rules on new streams, writes
target.objectto thedefaultmetadata: eitherheadroom-processedfor processed streams orpreferred_real_sinkfor bypassed streams); - the IPC server.
Why no headroom-bypass sink
An earlier iteration of the design had a second virtual sink
(headroom-bypass) that loopback'd to the real sink, so "bypassed"
streams routed to it. This added one PipeWire quantum of latency to
every bypassed stream for no functional benefit — module-loopback
buffers across the quantum boundary even when the DSP is a no-op.
Direct routing via target.object skips the hop entirely. The win is
real for competitive games, DAW monitoring, and music players: they
now ride exactly the same path they'd take if Headroom weren't
installed.
Why this is not the "analytical sink + adjust master volume"
shape originally proposed
Volume control via SPA Props updates is not sample-accurate. A true-peak
limiter needs a small internal delay line so gain reduction is applied
to the same samples that were analyzed. Therefore the brickwall must
be inline. The analytical-monitor approach is still used — for the
slow AGC loop, where multi-second time constants make control-plane
latency irrelevant — but it cannot own the ceiling.
Why a pw_stream pair, not an LV2 plugin in module-filter-chain
LV2 is not native to PipeWire; it's one of several plugin formats
module-filter-chain happens to host (via lilv). Using LV2 would split
Headroom into a plugin + a daemon + a filter-chain JSON, pull in a lilv
runtime, and force gain-target updates through a 32-bit-float control-port
abstraction. A pw_stream capture+playback pair is the same pattern
module-filter-chain itself uses internally, but written directly in
Rust against pipewire-rs, in the same process as the rest of the
daemon. One binary, no IPC for parameter updates, idiomatic Rust audio
thread. An LV2 wrapper of headroom-dsp remains a viable optional
deliverable for use in DAWs.
3. DSP
3.1 Two-tier true-peak limiter (headroom-dsp::limiter)
The limiter has two parallel tiers sharing the same upsampler, downsampler, delay line, and sliding peak buffer. Both run at the oversampled rate.
Hard tier — the safety contract. Output ceiling default
−0.1 dBTP, configurable. Instant attack on the gain envelope plus a
brief hold and a slow release. Two defensive clamp stages downstream
(once in the oversampled domain, once at the input rate after
downsampling) guarantee the contract numerically — the envelope can
misbehave and the contract still holds. Never bypassed, never
disabled.
Soft tier — the comfort cap. Targets a dynamic ceiling computed
as program_lufs + max_psr_db. Smooth attack/release envelope so the
gain reduction sounds like volume riding, not a slap. Pulls transients
to a comfortable peak-to-loudness ratio (default 14 dB) before they
ever threaten the hard ceiling. When the AGC hasn't yet provided a
program loudness (startup, after reset), the soft tier falls back to a
static ceiling. Disabled by omitting [limiter.soft] in a profile —
useful for the transparent profile where users want pure brickwall
behavior.
Algorithm (per oversampled sample, after upsampling):
- Push raw
|s|into the sliding-window peak buffer; read the max-of-window. - Soft tier computes target =
soft_ceiling / window_peak(clamped to ≤ 1), runs through the smooth attack/release envelope, yieldssoft_gain. - Hard tier predicts the worst-case effective peak after the soft
tier acts (max of
window_peak * soft_gainand the asymptotemin(window_peak, soft_ceiling)), then sizeshard_targetto keep that under the hard ceiling. Instant attack, hold, exponential release. Yieldshard_gain. total_gain = min(soft_gain, hard_gain).- Multiply the delayed sample by
total_gain. - Clamp at hard ceiling (defense-in-depth).
- Downsample, clamp again at hard ceiling at the input rate.
When the soft tier is doing its job, the hard tier's "predicted-post-soft" target stays above 1.0 and the hard tier never engages. When the soft tier is mid-attack (peak just arrived), the hard tier snaps in as a safety, then releases as the soft tier catches up.
The compressor and AGC stages run before the limiter.
3.2 Feed-forward compressor (headroom-dsp::compressor)
Standard shape: log-domain detector (peak or RMS, switchable) → ratio + soft knee → attack/release envelope smoother → makeup gain → linear gain → apply to (small) delayed input. ~150 lines of clean code.
Defaults aimed at "gentle, transparent": threshold −24 dBFS, ratio 2.5:1, knee 6 dB, attack 10 ms, release 100 ms, makeup auto.
3.3 Slow AGC (headroom-core::agc)
Algorithmic descendant of EasyEffects' autogain.cpp. Runs outside
the audio thread, on a ~50 ms control tick.
- Feeds the audio thread's monitor tap into
ebur128withMode::M | S | I | TRUE_PEAK. - Computes
target_gain_dB = target_lufs − measured_lufs. - Smooths with separate attack/release coefficients (leaky integrator).
- Gates when momentary loudness < silence threshold.
- Soft-clamps so the AGC can never push more than ±N dB (profile knob).
- Writes the new gain target into the audio thread via an
rtrbqueue.
The AGC's gain is applied before the compressor. The compressor and limiter still own their own behaviour and ceilings.
3.4 Measurement: ebur128
Mode::M | S | I | TRUE_PEAK. EBU TECH 3341/3342 conformant via the
ebur128 crate. Constructed on the daemon thread; fed from a ring-buffer
consumer that pulls from the audio thread. The audio thread allocates
nothing.
This is bus-level measurement only — used to drive the slow AGC loop and meter the processed sink output. Per-app measurement (§4) uses a different, much cheaper metric.
4. Per-application level control (Layer A)
An opt-in, near-zero-latency feedback loop that watches each managed
application's output stream and adjusts its Props.channelVolumes
multiplier in response to two parallel level metrics:
- a fast peak envelope that catches short bursts and sustained loud passages (think: a notification ding, a video that just got louder), and
- a slow RMS envelope that catches sustained loudness mismatches (think: "Discord is permanently louder than everything else even when nobody's shouting").
A stream's applied gain reduction is max(peak_reduction, rms_reduction) — whichever path is asking for more cut wins, and
recovery only happens when both paths agree the stream has settled.
This is the layer's whole point: the peak path handles transients
within one quantum; the RMS path keeps long-term inter-app loudness
balanced. Neither alone is enough.
Orthogonal to bus routing — a stream can be processed or bypassed and level-controlled independently. Its goal is "tame noisy apps without startling the listener and without making the chronic loudmouth permanently dominate," while the signal path itself stays untouched.
4.1 Why this is zero-latency
The per-app multiplier is the channelVolumes value PipeWire already
applies inside the app's stream node — it's the same number
pavucontrol's per-app slider writes to. Adjusting it doesn't insert
a graph node; nothing new sits between the app and its destination
sink. The only cost is that the analysis happens via a sibling
fanout link, not in the playback path: PipeWire schedules fanout
consumers in parallel within the same quantum, so the playback path's
timing is identical to the no-tap case.
┌──► passive tap (analysis only)
│ │
│ ▼
│ peak + RMS envelopes
│ (audio thread, sub-ms)
app stream ──────┤ │
(output port) │ ▼
│ rtrb push
│ │
│ ▼
│ AppLevelController (daemon thread)
│ │
│ │ Props.channelVolumes write
│ ▼ (back into the app stream node)
│ ┌─────────────────────┐
└──►│ app stream multiplies
│ by channelVolumes, │──► (its sink — Layer B)
│ then publishes. │
└─────────────────────┘
4.2 The metrics: peak + RMS, no LUFS
LUFS is the wrong measurement here. Its shortest window (momentary, 400 ms) blurs out exactly the transients we want to catch, and the K-weighting filter adds CPU for no benefit when we're trying to react fast. We also explicitly want a second path that targets sustained loudness — for that, plain mean-square RMS is the right cheap stand-in, not LUFS.
| Metric | Window | Job |
|---|---|---|
Peak envelope — max(|samples|) per block, smoothed |
~100 ms attack window, ~500 ms release | Fast: catches a notification ding, a clip getting louder, a partner standing up and shouting. Triggers cut on peak_threshold_db (default −6 dBFS). |
| RMS envelope — block mean-square, smoothed | ~1–2 s | Slow: catches "this app is just chronically louder than everything else." Triggers cut on rms_target_db (default ≈ −20 dBFS RMS). |
Both are computed from the same raw buffer in the audio thread, so the audio-thread cost is one additional MAC accumulator and a max- scan per sample. Cost analysis in §4.7.
4.3 Architecture
For each managed playback stream (matched by routing rule — see §6):
- Audio thread (tap stream's process callback):
- Pull the buffer from the fanout link.
peak = max(|samples|)over the block.mean_sq = Σ(x*x) / nover the block.- Push
{node_id, peak, mean_sq}to a per-streamrtrb.
- Daemon thread (
AppLevelControllerper stream):- Drain the rtrb.
- Update peak envelope (one-pole, fast α — attack within a block, release ~500 ms).
- Update RMS envelope (one-pole, slow α — window ~1–2 s).
- Compute
peak_reduction_dbandrms_reduction_dbindependently, thenproposed = max(peak_reduction_db, rms_reduction_db). - Smooth toward
proposed. - If the smoothed value is significantly different from
last-written AND we're not rate-limited (~10 Hz max writes per
stream), submit
Props.channelVolumesupdate.
The recovery condition is intentionally both-paths-agree: a release on the peak path only counts toward unwinding gain reduction if the RMS path also reads quiet. This avoids the pumping artefact where a transient-heavy stream would rapidly release between transients only to be slapped back down on the next one.
4.4 Honouring user-set volumes
The daemon subscribes to Props param-change events on each managed
stream. When a channelVolumes change arrives that's meaningfully
different from last_written_volume, it wasn't us — the user
adjusted via pavucontrol, a hotkey, an app's own UI, etc. The
controller then either:
- defers entirely (stops adjusting the stream until the user opts
back in via
headroom per-app reset <app>), or - treats the user value as a ceiling (continues to cut on spikes but never raises above what the user wanted).
Default is the ceiling behaviour — it's the principle-of-least-surprise choice. Users who want strict deference set a profile flag.
A historical concern: apps that fight back
Some PulseAudio-era apps (Discord most famously) used to read and
re-assert their own channelVolumes periodically, fighting any
external volume manager. The pattern produced a visible ping-pong
loop and effectively disabled per-app management.
The pattern is largely absent from modern PipeWire-native and
Electron-based apps in 2024+: in-app sliders write channelVolumes
only on user interaction, not on a timer. From Headroom's
perspective, those user-interaction writes are indistinguishable from
a pavucontrol slider move — both are legitimate external changes the
deference policy correctly yields to.
If a fight-back app does appear, the ceiling deference mode degrades gracefully:
- App produces hot output → Headroom cuts to 0.5.
- App writes
channelVolumes = 1.0back over our cut. - Headroom detects the external change, marks the new value (1.0) as the ceiling, and stops actively writing.
- Layer A becomes effectively inert for that stream — there is no ping-pong, the user just doesn't get the per-app cut they were hoping for. The bus-level Layer C limiter (if engaged) still enforces the absolute output ceiling regardless.
Explicit pattern detection and rate-limiting of ceiling updates (e.g., "ignore ceiling-restoring writes that arrive within N seconds of our own writes") is deferred to v1, pending evidence from real-world testing that any modern app warrants it. The graceful degradation property is the v0 contract.
4.5 Reaction-time honesty
The signal-path latency is zero. The reaction latency to a spike is bounded by:
spike in block N ─► analysis (same quantum)
─► rtrb push (ns)
─► controller computes (μs)
─► Props write to pw main loop
─► applied to block N+1 of the app stream
So sustained loud passages are attenuated within ~one quantum (5–20 ms depending on the system's quantum). Isolated one-block transients still leak through — the first block carrying the spike plays with the old gain; subsequent blocks see the reduction. This is the irreducible cost of "no lookahead allowed." For absolute spike prevention you need lookahead, which means latency, which contradicts the constraint of this layer.
The bus-level Layer C limiter (§3.1) catches anything that would exceed the absolute ceiling regardless of whether Layer A has caught up. Layer A reduces workload on Layer C by pre-attenuating noisy apps; it doesn't replace it.
4.6 Layered budget summary
| Layer | Metric | Time scale | Signal-path latency added |
|---|---|---|---|
| A: per-app peak | sample peak per block | tens of ms | 0 |
| A: per-app RMS | block mean-square | seconds | 0 |
| C: inline soft tier | true-peak, lookahead | sub-ms | shared with hard tier |
| C: inline hard tier | true-peak, lookahead | sub-ms | ~2 ms lookahead |
| C: bus AGC | LUFS (ebur128) | many seconds | — (control plane only) |
Five distinct jobs, five distinct time scales, no two layers duplicate each other. Layer A is the cheapest line of defense and the only one that costs zero latency on the audio path.
4.7 Resource budget per stream
| No TRUE_PEAK (recommended for Layer A) | |
|---|---|
| Audio thread per quantum | ~10 μs (peak + RMS pass) |
| Daemon thread per measurement | ~few μs (HashMap lookup + envelope math) |
| Memory per controller | ~100 bytes |
| Memory per ebur128 (if enabled) | — N/A; Layer A doesn't use ebur128 |
At realistic stream counts (2–5 managed apps): <0.5% CPU total, <1 KB RAM total. Doesn't move the needle.
4.8 Lifecycle
- Stream appears with
media.class = Stream/Output/Audiomatching a[[per_app.rules]]pattern: create tap link (pw_link_create), spawn controller, register rtrb. - Stream disappears (
pw_registry::global_removed): tear down tap, drop controller, clean up rtrb. - App restarts: new
node_id→ fresh controller. User-volume deference state is per-stream-instance, which is the right default.
5. PipeWire integration
5.1 Sinks
Created on daemon startup by emitting a pipewire.conf.d fragment into
$XDG_CONFIG_HOME/pipewire/pipewire.conf.d/headroom.conf (if not already
present) and reloading. Alternative: create them at runtime via
pw-loopback equivalents using pipewire-rs. v0 ships with the
runtime-creation path so the install footprint is "one binary, one
unit file."
Sink properties:
headroom-processed:node.name=headroom-processed,media.class=Audio/Sink,audio.position=[FL,FR],node.description="Headroom (processed)". Promoted to system default on startup so new streams land in it by default.
There is no second sink. Bypassed streams are routed directly at the
current preferred_real_sink via target.object metadata writes
(see §4.3).
5.2 The filter
Two pw_streams:
- Capture stream linked to
headroom-processed's monitor. Format:F32 LE, channels 2, rate matched to real sink, latency-quantum matched (default 1024 frames; configurable). - Playback stream linked to the current
preferred_real_sink. Same format.
process callback: pull a buffer from capture, run AGC gain →
compressor → limiter → push to playback. Allocation-free. Parameter
updates arrive over an rtrb SPSC queue from the control thread.
5.3 Routing
- On startup, write
default.audio.sinkin thedefaultmetadata to point atheadroom-processedso new streams default to the processor. The previous value (the user's hardware sink) is captured as the initialpreferred_real_sink. - Subscribe to
pw_registryglobal-added events. - On any new node with
media.class == "Stream/Output/Audio"andnode.dont-move != true:- Read
application.process.binary,application.name,pipewire.access.portal.app_id,media.role. - Evaluate routing rules from the active profile to decide
processedvs.bypass. - Write
target.objectinto thedefaultmetadata for the new stream:processed→headroom-processed'sobject.serial.bypass→preferred_real_sink'sobject.serial. WirePlumber honours this for any movable stream.
- Read
- Watch
default.audio.sinkmetadata changes. When the user switches the system default to a hardware sink, the daemon:- records that sink as the new
preferred_real_sink, - re-links
headroom-filter's playback stream to it, - rewrites
target.objectfor every currently-bypassed stream so they follow the new hardware, - re-asserts
headroom-processedas the default for new streams (so subsequent app launches still land in the processor).
- records that sink as the new
- Hotplug (sink appears/disappears) goes through the same code path.
5.4 Stream identification
| Property | Reliability | Use |
|---|---|---|
application.process.binary |
high (kernel-sourced) | primary key |
application.name |
medium | secondary / display |
pipewire.access.portal.app_id |
high (Flatpak only) | match sandboxed apps |
media.role |
low (most apps omit) | bonus signal only |
media.class |
structural | gate to playback streams |
6. Profiles
Location: $XDG_CONFIG_HOME/headroom/profiles/*.toml (overriding
shipped defaults in /usr/share/headroom/profiles/ if installed
system-wide). Hot-reloaded via notify-debouncer-mini.
Each profile is a complete listening scenario. Schema (headroom-core::profile):
name = "default"
description = "Gentle transparent processing for everyday use."
[agc]
enabled = true
target_lufs = -18.0 # ITU-R BS.1770 integrated target
attack_ms = 2000.0
release_ms = 800.0
silence_threshold_lufs = -70.0
max_boost_db = 12.0
max_cut_db = 12.0
[compressor]
enabled = true
detector = "peak" # "peak" | "rms"
threshold_db = -24.0
ratio = 2.5
knee_db = 6.0
attack_ms = 10.0
release_ms = 100.0
makeup_db = "auto" # number or "auto"
[limiter]
ceiling_dbtp = -0.1
lookahead_ms = 2.0
release_ms = 80.0
hold_ms = 5.0
oversample = 4 # 1 | 2 | 4 | 8 (1 disables ISP detection)
link = "stereo" # "stereo" | "dual-mono"
[meters]
publish_hz = 20.0
[[rules]]
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph"] }
route = "bypass"
[[rules]]
match = { process_binary = ["firefox", "chromium", "google-chrome", "Discord", "discord", "element-desktop", "Slack", "zoom", "WEBRTC VoiceEngine"] }
route = "processed"
[default_route]
route = "processed" # safe default: anything unmatched is processed
# ----------------------------------------------------------------------
# Per-application level control (Layer A). Orthogonal to routing — you
# can enable per-app on bypass-routed streams to get zero-latency
# level control (e.g. tame Discord notifications without touching
# the game's audio path).
# ----------------------------------------------------------------------
[per_app]
enabled = true # master switch; false disables Layer A entirely
default_enabled = false # for streams not matched by any rule below
# Per-rule knobs. Matches use the same key set as [[rules]] above.
[[per_app.rules]]
match = { process_binary = ["Discord", "discord", "element-desktop", "Slack", "zoom"] }
enabled = true
peak_threshold_db = -6.0 # short-window peak above this triggers cut
rms_target_db = -20.0 # long-term RMS target (slow path)
max_cut_db = 12.0 # never cut more than this
peak_attack_ms = 5.0
peak_release_ms = 500.0
rms_window_ms = 1500.0
defer_to_user = "ceiling" # "ceiling" | "strict"
[[per_app.rules]]
match = { process_binary = ["firefox", "chromium", "google-chrome"] }
enabled = true
peak_threshold_db = -3.0 # browsers run hotter; raise the trigger
rms_target_db = -18.0
# Music, DAWs, games default to per-app off — they're either trusted
# to set their own level or routed bypass for a reason.
[[per_app.rules]]
match = { process_binary = ["spotify", "mpv", "ardour", "reaper", "qpwgraph", "carla"] }
enabled = false
Shipped profiles
| name | one-liner |
|---|---|
default |
Gentle transparent processing, sensible for daily use. |
night |
Aggressive: −20 LUFS, 4:1, fast release, narrow dynamic range. |
speech |
VoIP-focused; short attack, fast release, slight rumble cut. |
transparent |
Limiter only. Compressor + AGC bypassed. Safety net only. |
bypass-all |
Routes everything directly to the real sink. The kill switch. |
The limiter section of bypass-all is irrelevant in practice (nothing
flows through headroom-processed), but its ceiling field is still
respected as a fail-safe in case a stream lands on the processed sink
anyway.
7. IPC
Transport: Unix-domain socket, SOCK_STREAM, 0600, at
$XDG_RUNTIME_DIR/headroom/control.sock.
Wire protocol: see IPC.md for the full normative schema.
Summary: u32 BE length prefix + UTF-8 JSON payload. Three message
shapes — Request (id + op + args), Response (id + result|error),
Event (topic + data). Subscribers signal interest by topic; events
fan out to all subscribers with bounded per-subscriber queues. Slow
subscribers have events dropped (overflow events count is itself
published on the daemon topic so clients know they fell behind).
The first-party Rust wrapper is headroom-client, mirroring how
niri-ipc wraps
Niri's socket: a thin, no-magic crate that re-exports the wire types
from headroom-ipc and adds a blocking Client (and an optional async
AsyncClient behind a feature flag).
8. CLI
headroom status # current profile, sinks, levels
headroom daemon # run the daemon (systemd Type=simple)
headroom profile list | use <name> | show [name]
headroom route list
headroom route set <app> processed|bypass # persists in user profile
headroom route unset <app>
headroom route stream <node-id> processed|bypass # ad-hoc
headroom set <key> <value> # tweak active profile in place
headroom get <key>
headroom bypass on|off # global kill switch
headroom reload # reload profiles from disk
headroom monitor # live meter TUI (uses subscribe)
CLI is sync, blocks on UnixStream. Talks the same JSON wire as any
other client.
9. Crates
headroom/
├── flake.nix # devshell + package
├── Cargo.toml # workspace
├── PLAN.md # this file
├── IPC.md # wire-protocol schema (normative)
├── README.md
└── crates/
├── headroom-dsp/ # AGC + compressor + limiter (pure DSP, no PW)
├── headroom-ipc/ # wire types, framing, serde; no I/O
├── headroom-client/ # blocking client (+ optional async); thin
├── headroom-core/ # daemon: PW integration, routing, profiles, IPC server
└── headroom-cli/ # `headroom` binary; depends on headroom-client
External crates (final v0 dep list)
Audio / DSP
pipewire,libspa— official PipeWire bindings.ebur128— measurement.rtrb— SPSC ring buffer (audio ↔ control).basedrop— RT-safe shared ownership.assert_no_alloc— debug-build tripwire.
Plumbing
serde,serde_json— IPC + profile (de)serialization.serde-toml(toml) — profile files.clap(derive) — CLI.tracing,tracing-subscriber,tracing-journald— logs.notify,notify-debouncer-mini— profile hot-reload.crossbeam-channel— control-plane channels.parking_lot— mutexes.signal-hook— clean shutdown.thiserror— error types.
No tokio, no zbus, no dbus-*.
10. Nix
flake.nix ships:
- A devshell with rust toolchain (via
rust-overlayfor pinned channel; default to a stable release pinned inrust-toolchain.toml),pkg-config,pipewire's dev outputs,clang(for bindgen if invoked by deps),socat(handy for poking the IPC),jq. - A package output (
packages.<system>.default) that builds the daemon + CLI withrustPlatform.buildRustPackage. v0 usescargoLock.lockFile. Crane can come later if incremental builds in CI become a bottleneck. - A
nixosModules.defaultplaceholder so packagers can wire the user unit later. Not implemented in v0 of the flake itself.
Intermediate dev work uses plain cargo inside nix develop. Final
builds and any CI go through nix build.
11. Phased implementation
The phases are roughly token-of-work units, not calendar weeks.
Phase 0 — scaffolding. Flake, workspace, crate skeletons, README, PLAN/IPC docs. (done as part of this commit)
Phase 1 — IPC + client. headroom-ipc (types, framing, codec) and
headroom-client (blocking Client) implemented against the schema in
IPC.md. Round-trip tests, fuzz the codec. (this commit)
Phase 2 — DSP kernels. headroom-dsp with limiter, compressor, AGC,
oversampler, envelope. Tested in isolation against synthesized
signals; limiter validated to hold a −0.1 dBTP ceiling on EBU TECH
3341 generators. (this commit: limiter first)
Phase 3 — daemon core. headroom-core brings up the
headroom-processed virtual sink, the filter (pw_stream pair),
the preferred_real_sink tracker, the registry subscriber, and the
routing engine. Hardcoded profile, no IPC server yet.
Phase 4 — IPC server + profile manager. Wire headroom-core to the
IPC schema. Profile loading + hot-reload. Slow AGC loop ticking on
real loudness measurements.
Phase 5 — CLI + monitor TUI. headroom-cli implements all the
subcommands above, plus a monitor TUI built on the meters
subscription.
Phase 6 — Per-application level control (Layer A). Per-managed-stream
tap creation, AppLevelController with peak + RMS envelopes,
Props.channelVolumes writer, user-volume deference logic,
[per_app] profile parsing, headroom per-app … CLI verbs, and a
per-stream meter event on the IPC. Land after the bus path is stable
so we have a baseline to compare against.
Phase 7 — Packaging. systemd user unit, install paths, default profile install, basic NixOS module.
Phase 8 — Hardening. Latency budget verification on real hardware,
Bluetooth-handoff edge case, profile-reload while audio is flowing,
multi-rate hardware, allocation-tracer sweep with
assert_no_alloc in debug.
12. Risks & open questions
- WirePlumber re-linking on device hotplug. When a Bluetooth headset connects, WP re-evaluates linking. Headroom must re-pin its routed streams. Tractable; the registry events surface this.
- Latency budget. Processed path: one quantum hop (the filter) plus lookahead (~2 ms) plus 4× oversampling buffering ≈ 8–15 ms added to processed-path latency. Fine for video/voice. Bypass path: zero added latency — the stream rides the real sink directly.
- Default-sink changes. When the user switches the system default
to a hardware sink, the daemon adopts it as
preferred_real_sink, re-links the filter's playback, retargets bypassed streams, and re-assertsheadroom-processedas the default for new streams. Watchingdefault.audio.sinkin the metadata is the trigger. - Sample-rate mismatch.
headroom-processed, the filter, and the real sink must agree, or PipeWire resamples behind our back. The filter should source its rate from the real sink and convert on the capture side only. - Surround content downmix vs. passthrough. v0 punts: anything
2ch is routed directly to the real sink (bypass behaviour) regardless of profile rule. Documented behaviour.
13. License
GPL-3.0-or-later for the daemon and CLI. headroom-dsp and headroom-ipc
are MPL-2.0 so third-party clients and plugin hosts can link them
without GPL contagion. (Re-evaluate when LSP-derived code is
introduced; current plan does not pull any.)