/ parabun

bun:audio

WAV / MP3 / Opus codecs, biquads, FFT, mel spectrograms, voice activity detection, denoising, AGC, dynamics, and ALSA capture / playback.

ts
import audio from "bun:audio";

A from-scratch audio toolkit. Heavy codecs (libopus 1.6.1, minimp3, rnnoise) are vendored statically. The DSP surface is enough for a full voice-call pipeline plus the audio frontend that feeds Whisper STT.

File I/O

readWav(bytes) / writeWav(samples, opts)

WAV decode + encode. Handles 8/16/24/32-bit PCM, IEEE-float, mono / stereo / multichannel.

ts
const wav = audio.readWav(new Uint8Array(await Bun.file("clip.wav").arrayBuffer()));
// { sampleRate: 48000, channels: 1, samples: Float32Array, bitDepth: 16 }
const out = audio.writeWav(wav.samples, { sampleRate: 48000, channels: 1, bitDepth: 16 });
await Bun.write("normalized.wav", out);

decodeMp3(bytes)

minimp3-backed decoder. Returns { sampleRate, channels, samples } with PCM as interleaved Float32.

Opus codec

ts
const enc = new audio.OpusEncoder({ sampleRate: 48000, channels: 1, application: "voip" });
const dec = new audio.OpusDecoder({ sampleRate: 48000, channels: 1 });

const opus = enc.encode(f32Frame);          // Uint8Array
const f32 = dec.decode(opus);

application is "voip" | "audio" | "lowdelay". Frame sizes are the Opus standard (2.5 / 5 / 10 / 20 / 40 / 60 ms at 48 kHz). Bitrate, complexity, FEC, DTX, in-band PLC are all knobs on the encoder constructor; see source for the full option set.

Pair with bun:rtp for a wire-format Opus / RTP stream.

Biquad filters (RBJ Audio EQ Cookbook)

Stateless functions that return a new Float32Array:

FunctionDescription
lowpass(samples, sr, freq, q?)Q defaults to 0.707 (Butterworth).
highpass(samples, sr, freq, q?)
bandpass(samples, sr, freq, q?)
notch(samples, sr, freq, q?)

Each does a single second-order IIR pass — chain them for steeper rolloff.

Mixing, level, conversion

FunctionDescription
mix(a, b, gainA?, gainB?)Sample-wise mix into a new Float32Array.
normalize(samples, target?)Scale to target peak. Default target = 0.95.
peak(samples) / rms(samples)Whole-buffer level.
envelope(samples, windowMs, sampleRate)Sliding-window RMS envelope.
i16ToF32(int16) / f32ToI16(float32)PCM type conversion.
interleave(channels) / deinterleave(samples, n)Frame-major ⇄ planar.
resample(samples, from, to)Sinc-windowed resample.

FFT

Cooley-Tukey radix-2, in place:

ts
const x = new Float32Array(1024);     // real input
const X = audio.fft(x);                // complex Float32Array, length 2048 (interleaved Re/Im)
const back = audio.ifft(X);            // round-trips to ~1e-5

fft accepts either a real signal (length must be power of two) or an interleaved-complex buffer (length must be even). ifft returns the real part of the inverse — the imaginary part is dropped.

Spectrograms

spectrogram(samples, { window, hop })

STFT magnitudes. Returns Float32Array[] — one frame per window position, each (window/2 + 1) long. Hann window applied before each FFT.

melSpectrogram(samples, opts?)

Slaney-normalized triangular mel filterbank — the standard preprocessing frontend for Whisper / Wav2Vec2.

ts
const mel = audio.melSpectrogram(samples, {
  sampleRate: 16000,
  nMels: 80,
  windowSize: 400,
  hop: 160,
  nFft: 512,
  mode: "whisper",
});
// { frames: Float32Array[], nMels: 80, nFft: 512, hop: 160 }
OptionDefaultDescription
sampleRate16000Whisper's rate.
nMels80Whisper's count. Wav2Vec2 uses 128.
windowSize40025 ms at 16 kHz.
hop16010 ms at 16 kHz.
nFftnextPow2(windowSize)Must be a power of 2 ≥ windowSize.
mode"whisper""log10" returns dB-style log10(power). "whisper" clips to 8 dB dynamic range and rescales to ~[-1, 1].

The mel filter bank matches librosa.filters.mel(htk=False).

Voice activity detection

ts
const vad = audio.detectVoice(samples, { frameSize: 480, ratio: 3.0, noiseWindow: 100 });
// { energies: Float32Array, speech: boolean[], noiseFloor: number }

Adaptive RMS-vs-noise-floor classifier. The noise floor is a sliding-window minimum of frame energies; a frame is "speech" when its RMS exceeds noiseFloor × ratio. Defaults track 30 ms frames (480 samples at 16 kHz) and a 3-second noise-window memory.

For utterance-level segmentation (pre-roll, hangover, minimum length filtering) use speech.listen — it's a wrapper around detectVoice that yields one segment per speech burst.

Dynamics

In-place processors with persistent state — useful for live streams. Call .process(buffer) to apply, .reset() to clear state.

ts
const den = new audio.Denoiser();        // rnnoise, 480-sample frames at 48 kHz
den.process(f32);                         // suppresses background noise

const gain = new audio.Gain({ targetLevel: 0.1 });    // simple AGC
gain.process(f32);

const comp = new audio.Compressor({
  threshold: -20, ratio: 4, attack: 5, release: 50, knee: 6, makeupGain: 0,
});
comp.process(f32);

const lim = new audio.Limiter({ ceiling: -1, release: 50 });
lim.process(f32);

Compressor / Limiter run feed-forward dynamics on the same shape as the Gain class — process / reset / persistent state. The Limiter is brick-wall: instant-rise envelope (no smoothing on rise), so the ceiling is enforced sample-accurate.

OS audio I/O — Linux today

Live ALSA capture + playback. CoreAudio (macOS) and WASAPI (Windows) follow on the same surface.

devices()

Returns { name, description, id, type: "capture" | "playback" }[] from ALSA.

capture(opts)

ts
await using mic = await audio.capture({
  sampleRate: 16000,
  channels: 1,
  device: "default",        // or one of the ids from devices()
  bufferMs: 30,              // analysis frame length
});
for await (const frame of mic.frames()) {
  // frame is { samples: Float32Array, timestampMs: number }
}

mic is AsyncDisposableawait using releases the ALSA handle on scope exit. mic.frames() is an async iterator of float32 PCM frames; on the wire, ALSA delivers S16_LE which is converted in-place.

play(opts)

ts
await using spk = await audio.play({ sampleRate: 48000, channels: 2 });
await spk.write(f32Frame);

spk.write returns when the frame is queued (not when it finishes playing). On scope exit, the buffer drains before close.

Limits