/ parabun

Parabun

A fork of Bun with extra runtime modules: a worker pool with shared typed arrays, raw CUDA and Metal kernels, SIMD primitives, V4L2 / ALSA capture, GGUF LLM inference, and statically-linked image / audio / CSV codecs.

These aren't npm packages — they're built into the runtime, so there's no node-gyp step and no per-platform binary distribution. Imports look the same as Bun's other built-ins (import gpu from "bun:gpu"). Plain .ts / .js files behave the same as upstream Bun.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install.sh | bash

Linux and macOS. Windows build is in progress. parabun self-update refreshes an existing install along with the VS Code extension.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install-extension.sh | bash

Installs the VS Code extension into any of code, cursor, or kiro found on $PATH. The extension provides the .pts / .pjs TextMate grammar and an LSP with hover, go-to-definition, purity diagnostics, memo hints, and operator documentation.

Module index

Three groups. Tier 0 is the primitive layer — SIMD, worker pool, GPU kernels (NVRTC / MSL), buffer pool. Tier 1 is the codec / capture / inference layer — image, audio, camera, CSV, GGUF LLM. Tier 2 is built on Tier 1: bun:vision (frames + motion), bun:speech (VAD utterances), bun:arrow (columnar tables + computes + Arrow IPC reader/writer; Parquet pending). Engines that need vendored deps (Whisper, Piper, ONNX detectors) throw with documented errors until those land.

Tier 2 — appssome engines stub

Runtime modules

bun:parallel

pmap and preduce chunk arrays across a persistent worker pool. Functions are serialized via fn.toString(), so they must be pure — no closures, no outer references. TypedArrays are passed through a SharedArrayBuffer, so postMessage transfers a handle rather than a copy.

parabun
import { pmap } from "bun:parallel";

pure function score(row) { return row.reduce((a, b) => a + b * b, 0); }

const rows = new Float32Array(new SharedArrayBuffer(1_000_000 * 4));
// ...fill rows...
const scores = await pmap(score, rows, { concurrency: 8 });

bun:simd

WebAssembly v128 kernels for Float32Array (f32x4) and Float64Array (f64x2). Inputs above 4 MiB are processed in place rather than copied into WASM memory. alloc() returns a typed array backed by the WASM linear memory for zero-copy use.

typescript
import { mulScalar, add, dot, sum } from "bun:simd";

const y = mulScalar(new Float32Array([1, 2, 3, 4]), 3); // [3, 6, 9, 12]
const z = add(a, b);
const d = dot(u, v);
const s = sum(a);
op (N=100k, f32) .map / .reduce tight loop bun:simd
mulScalar(a, 3) 808 µs 60 µs 30 µs
add(a, b) 884 µs 73 µs 40 µs
sum(a) 574 µs 43 µs 17 µs
dot(a, b) 716 µs 51 µs 24 µs

bun:gpu

Metal on macOS, CUDA on Linux and Windows, CPU fallback on hosts without a GPU. A matrix passed to gpu.hold() stays resident across matVec calls, so only the input vector crosses the host↔device boundary per call. Pure Float32ArrayFloat32Array functions are runtime-compiled to PTX (via NVRTC) or MSL (via newLibraryWithSource:) when the body fits a supported shape: arithmetic, ternary, Math.*.

typescript
import gpu from "bun:gpu";

const mat = gpu.alloc(M * K, "f32");
// ...fill mat...
const held = gpu.hold(mat);                   // uploaded once
for (const q of queries) {
  const scores = gpu.matVec(held, q, M, K); // no copy
}
gpu.release(held);

Beyond matVec / simdMap, bun:gpu ships conv2D, scan, reduce, argMin / argMax, histogram, and median / quantile — CPU correctness paths today, with optional CUDA / Metal hooks on the same dispatch surface for follow-up device kernels.

bun:arena

A pool of SharedArrayBuffer-backed typed arrays. bun:parallel and bun:pipeline draw from it so per-chunk work doesn't allocate a fresh buffer every time. Entries return to the pool at the end of an arena { } block or a pmap chunk, instead of waiting for the GC.

bun:pipeline

A chain of bun:simd calls (mulScalar, add, relu, …) is collapsed into a single pass at .run() time, so the intermediate arrays don't get allocated. If the input is large enough that GPU dispatch wins (gpu.winsForSize(...)), the fused chain runs as a single bun:gpu simdMap instead.

bun:signals

signal() is a reactive cell, computed() derives one from others, and effect() runs side effects when something it read changes. Reads inside an effect register a dependency; writes invalidate downstream and a microtask flush re-runs only the effects that observed a value that actually changed. Pairs with the signal / effect { } language extensions.

bun:rtp

RFC 3550 packet pack / parse and a jitter buffer. Built to sit under bun:audio's Opus encoder for a WebRTC-style send/receive path. rtp.pack({ payloadType, sequence, timestamp, ssrc, payload }) produces a wire-format packet; the jitter buffer reorders by sequence number with a configurable depth.

bun:image

A Sharp-class image module baked into the runtime — JPEG / PNG / WebP decode and encode (libjpeg-turbo, libpng, libwebp + libsharpyuv vendored statically), bilinear and Lanczos resize, separable Gaussian blur, unsharp-mask sharpen, Sobel edge-detect, 90 / 180 / 270 rotate, flip, crop, brightness / contrast / saturation adjust, threshold, invert, grayscale, per-channel histogram, and Porter-Duff source-over alpha compositing. No npm install sharp, no Node-ABI-versioned binary distribution.

typescript
import image from "bun:image";

const bytes = await Bun.file("photo.jpg").bytes();
const img = image.decode(bytes);
const small = image.resize(img, { width: 800, height: 600, kernel: "lanczos" });
const sharp = image.sharpen(small, { amount: 1.5 });
const webp = image.encode(sharp, { format: "webp", quality: 85 });
await Bun.write("photo.webp", webp);

bun:audio

A from-scratch audio toolkit: WAV / MP3 decode, Opus encode and decode (libopus 1.6.1), rnnoise-based denoiser, FFT, RBJ Audio EQ Cookbook biquads (lowpass / highpass / bandpass / notch), resample, STFT spectrogram, voice-activity detection, AGC, peak / RMS / windowed envelope, mix, normalize, interleave / deinterleave, and PCM type conversion. Heavy codecs (libopus, minimp3, rnnoise) ship statically; paired with bun:rtp, the surface is enough for a full voice-call capture pipeline short of OS audio I/O.

typescript
import audio from "bun:audio";
import rtp from "bun:rtp";

const enc = new audio.OpusEncoder({ sampleRate: 48000, channels: 1, application: "voip" });
const den = new audio.Denoiser();
const agc = new audio.Gain({ targetLevel: 0.1 });

for (const i16Frame of micFrames) {
  const f32 = audio.i16ToF32(i16Frame);     // OS audio → DSP space
  den.process(f32);                          // suppress noise (in place)
  agc.process(f32);                          // normalize loudness
  const opus = enc.encode(f32);
  send(rtp.pack({ payloadType: 111, sequence, timestamp, ssrc, payload: opus }));
}

bun:csv

Streaming RFC 4180 parser — async generator, full quote and escape handling, configurable delimiter, header mode that yields records keyed by column name, per-cell type inference (number / boolean / null). An opt-in parallel: true mode chunks the input across bun:parallel's worker pool when the input has no quoted cells and is large enough.

typescript
import csv from "bun:csv";

for await (const row of csv.parseCsv(Bun.file("rows.csv"), { header: true })) {
  process(row.id, row.name, row.score);
}
fixture serial (med) parallel (med) speedup
5 MB · 128k rows 152 ms 129 ms 1.18×
50 MB · 1.25M rows 1446 ms 1528 ms 0.95×
200 MB · 4.92M rows 5892 ms 6363 ms 0.93×

parallel: true is not a per-file speedup. The serial state machine is already memory-bandwidth-bound, and the parallel path's materialize-and-fork overhead grows with input size — so it helps a little at small files, breaks even around 50 MB, and gets worse from there. Use it to keep the event loop responsive while parsing (parsing N files concurrently does scale across cores), not because you expect bigger files to go faster. bench/parabun-csv-parallel/ reproduces these numbers.

bun:camera

V4L2 capture on Linux. camera.devices() reads /sys/class/video4linux/ and runs VIDIOC_QUERYCAP on each to filter to actual capture devices. camera.formats(path) enumerates the supported (format, width, height, fps) tuples. camera.open(...) mmaps the kernel ring buffer and starts streaming, and cam.frames() is an async iterator of frames. AVFoundation (macOS) and Media Foundation (Windows) backends are planned on the same JS surface.

bun:video

Scaffold only — the JS surface is in place (video.probe, video.decode, video.encode, video.decodeAll, with codec / container / acceleration options) but the native side hasn't been wired yet. The plan is libavcodec on desktop, V4L2 M2M on Pi 5, NVDEC/NVENC on Jetson, all behind the same JS API.

bun:llm

An in-tree GGUF runtime: file loader, byte-level BPE tokenizer, Llama and Qwen2 forward passes, greedy and nucleus sampling. Weights are mmap'd off disk; the residual stream and KV cache live on-device. Per-token traffic across PCIe is a 4-byte argmax. Q4_K and Q6_K matVec kernels use a 1-warp-per-row, 4-warps-per-block layout; QKV and Gate+Up projections are byte-concatenated at load time and dispatched as one matVec per layer.

typescript
import llm from "bun:llm";

using m = await llm.LLM.load("./Llama-3.2-1B-Instruct-Q4_K_M.gguf");

for await (const piece of m.chat([
  { role: "system", content: "You are helpful and concise." },
  { role: "user", content: "What is the capital of France?" },
])) {
  process.stdout.write(piece);
}
Llama-3.2-1B Q4_K_M · RTX 4070 Ti parabun ollama
greedy decode (device-only) 340 tok/s ~350 tok/s
greedy decode (logits DtoH) 275 tok/s
prompt prefill 295 tok/s

Numbers are within run-to-run noise of ollama on this model and hardware. Chat templates for Llama-3, ChatML, and Mistral-Instruct are detected from the GGUF's tokenizer.chat_template. Only the CUDA backend is wired in this module today; Metal kernels are pending.

llm.serve({ engine, modelId, port }) exposes any model (or anything else implementing .chat() / .generate() / .embed()) over an OpenAI-compatible HTTP API. Routes: GET /v1/models, POST /v1/chat/completions (sync and SSE streaming), POST /v1/completions, POST /v1/embeddings. Optional bearer auth and a FIFO concurrency gate (default 1). Default port is 11434, matching ollama's, so OpenAI clients that auto-discover a local ollama work unchanged.

bun:vision

vision.frames(stream, { decodeMjpg? }) takes a frame iterator from bun:camera (or any source yielding the same shape) and yields packed-RGBA8 frames. yuyv, nv12, and rgb24 are converted inline; mjpeg requires the caller to pass image.decode from bun:image (cross-builtin imports between bun: modules aren't supported, so dependencies are passed in at the call site). vision.detectMotion adds a downsampled-luma frame-diff estimator with temporal smoothing.

vision.detect (YOLO / SSD / RT-DETR) and vision.recognize (Tesseract / EasyOCR) are typed but throw — both need an ONNX runtime vendored before they can do anything. The interfaces are there so callers can write against them now and have them work later.

bun:speech

speech.listen(stream, { sampleRate }) takes an audio chunk iterator (bun:audio's capture stream, a file reader, anything yielding { samples }) and yields one utterance per detected speech burst. The classifier is RMS-against-an-adaptive-noise-floor, with pre-roll to catch word onsets, hangover to seal on silence, and a minimum-length filter to drop clicks and breath sounds.

speech.transcribe (Whisper) and speech.speak (Piper) are typed but throw. Whisper needs encoder-decoder transformer support in bun:llm (the current path is decoder-only); Piper needs libpiper or an ONNX runtime vendored.

bun:arrow

In-memory columnar tables. arrow.recordBatch({ ... }) takes a map of typed arrays (or plain arrays — types are inferred) and returns a RecordBatch with a Schema. Columns are typed-array views with optional validity bitmaps; arrow.table([...]) concatenates batches across one schema. Computes: sum, mean, min, max, count, variance, stddev, quantile, median, distinct, filter, groupBy. fromRows / toRows bridge between row-shaped JS data (e.g. bun:csv output) and the columnar form; concat materializes a table-wide column into one typed array.

arrow.toIPC(table) and arrow.fromIPC(bytes) serialize via the Arrow IPC streaming format — continuation-prefixed Schema + RecordBatch messages, FlatBuffers metadata, 8-byte-aligned body buffers, plus DictionaryBatch decode for dictionary-encoded columns (apache-arrow's default for utf8). The FlatBuffers builder/reader is hand-rolled in the module; no npm dep. Six types round-trip: int32, int64, float32, float64, bool (bit-packed on the wire), utf8 (offsets + bytes), each with optional validity bitmaps. Wire compatibility is verified against apache-arrow 21.1.0 in bench/parabun-arrow-ipc-interop/ — both directions (Parabun → apache-arrow and apache-arrow → Parabun) round-trip a 3-column batch including dictionary-encoded strings.

That means the bytes Parabun produces are the same wire format pyarrow, arrow-rs, nanoarrow, polars, and duckdb consume on the streaming path. Parquet remains pending — it's a separate format with its own thrift metadata and page-level encodings.

Example: LangChain VectorStore

ParabunVectorStore extends VectorStore from @langchain/core and implements the addVectors and similaritySearchVectorWithScore methods, so call sites that accept any VectorStore work against it without changes.

before
import { MemoryVectorStore }
  from "langchain/vectorstores/memory";

const store = new MemoryVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
after
import { ParabunVectorStore }
  from "./parabun-store.pjs";

const store = new ParabunVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
100k × 384 f32, top-10 add_ms score_ms vs LangChain
LangChain MemoryVectorStore 4.0 48.2 1.00×
ParabunVectorStore 82.7 15.9 2.83×

add_ms is higher because rows are packed into a single SAB Float32Array and normalized in place — one-time O(N·D) work amortized across subsequent queries. Top-K indices and scores match LangChain's to four decimal places.

Language extensions — .pts / .pjs

Files ending in .pts, .ptsx, .pjs, or .pjsx are parsed with additional desugarings. All output is standard JS; no runtime support is required, and the runtime modules above do not depend on any of this syntax. GitHub's TextMate grammars do not cover .pts; the VS Code / Cursor / Kiro extension provides the grammar and an LSP.

pure and memo

A pure function is rejected at parse time if it mutates an outer variable, reads this, or calls a known-impure global. Prefix pure with memo — or drop pure entirely and write memo as the declarator — and the result is cached by argument identity: 0-arg singleton, 1-arg Map, multi-arg nested Map chain. Recursive self-references route through the outer wrapper, so fib below runs the body 21 times for fib(20), not 21,891.

parabun
// declarator form — `memo` implies pure + function
memo fib(n: number): number {
  return n < 2 ? n : fib(n - 1) + fib(n - 2);
}

// arrow form — same thing as an expression prefix
const normalize = memo (s: string) => s.trim().toLowerCase();

// async dedupes concurrent in-flight calls, evicts on reject
memo async fetchProfile(id: string) { return await db.users.get(id); }

signal, effect, ~>

signal NAME = <rhs> desugars to a Signal binding; bare reads rewrite to .get(), assignments to .set(). If the RHS references another in-scope signal, the binding auto-promotes to a read-only derived(). effect { ... } tracks every signal it reads as a dependency and re-runs on change. A ~> B is reactive binding — it desugars to effect(() => { B = A; }), so B stays in step with A and whatever signals A reads from.

parabun
signal count = 0;
signal doubled = count * 2;   // auto-derived

effect { console.log(count, doubled); }

count++;                           // effect re-runs: 1, 2

// bind signal value into a DOM-ish sink — updates track dep changes
count ~> el.innerHTML;

|>, ..!, ..&, ..=

x |> f is f(x). pure functions passed through |> get inlined at parse time — no call overhead. ..! / ..& are .catch / .finally in suffix position. ..= is = await in a declaration and disambiguates to an inclusive-range literal otherwise (0..5 excludes 5, 0..=5 includes it).

parabun
pure function sq(x: number) { return x * x; }

const result = 5 |> sq |> sq;   // 625 — both calls inlined

const json ..= fetch("/api").then(r => r.json())
  ..! err => console.error(err)      // .catch
  ..& () => console.log("done"); // .finally

for (const i of 0..=9) emit(i);                    // [0..9]

defer and arena

defer EXPR schedules EXPR to run when the enclosing block exits (return, throw, fall-through). Multiple defers dispose in LIFO order. defer await EXPR inside an async function awaits the cleanup. arena { ... } runs the block with the GC paused, then frees everything allocated inside on exit — useful for tight numeric loops with short-lived intermediate allocations.

parabun
function readConfig(path: string) {
  const fd = fs.openSync(path);
  defer fs.closeSync(fd);              // runs on every exit path
  return JSON.parse(fs.readFileSync(fd));
}

arena {
  const buf = new Float32Array(1_000_000);
  // ...numeric work...
}                                        // buf freed here, no GC pressure

Full grammar in LLMs.md, and the LSP carries arity-based "could be memo" / "memo probably not worth it" hints plus full purity diagnostics.

Roadmap

Parabun's positioning is to open typical JS performance bottlenecks via multithreading and GPU. The shipped modules — bun:parallel, bun:simd, bun:gpu, bun:pipeline, bun:arena, bun:signals, bun:llm, bun:image, bun:audio, bun:csv, bun:rtp — cover the typed-array, codec, and CPU/GPU-parallel surface; the remaining items below attack the next layer of "I have to shell out / use Python / write native code" pain points.

Each module ships behind a compile-time feature flag. The configurator generates a bun build --compile invocation with only the modules you check — production builds slim to whatever your app actually imports.

Status Module What it does
shipped bun:image JPEG / PNG / WebP decode + encode, resize (bilinear / Lanczos), blur / sharpen / edge-detect, rotate / flip / crop, adjust / threshold / invert / grayscale, histogram, alpha composite.
shipped bun:audio WAV / MP3 / Opus codecs, RBJ biquads, FFT, resample, spectrogram, VAD, denoiser (rnnoise), AGC, mix / normalize / envelope, planar ⇄ frame-major + i16 ⇄ f32 PCM helpers.
shipped bun:csv Streaming RFC 4180 parser. parallel: true is "off-the-main-thread", not a per-file speedup — see the table above.
shipped bun:rtp RFC 3550 packet pack/parse + jitter buffer. Transport for the codec stack.
shipped bun:gpu primitives conv2D, scan, reduce, argMin / argMax, histogram, median / quantile. CPU correctness paths today; CUDA / Metal hooks slot in via the existing dispatch.
shipped bun:camera V4L2 capture on Linux — devices(), formats(path), open(...) with an async-iterator frames() over kernel-mmapped buffers. AVFoundation + Media Foundation follow on the same surface.
shipped OS audio I/O Live ALSA capture + playback for bun:audio. devices() / capture(...) / play(...) with Float32 PCM streams, S16_LE on the wire. CoreAudio + WASAPI follow.
partial bun:gpu device kernels CUDA reduce (sum / min / max) + atomic-privatized histogram shipped. Scan, Metal mirror, and the rest of the secondary primitives still on CPU until wired.
partial bun:vision Tier 2 — frame stream + frame-diff motion detection ship today (vision.frames, vision.detectMotion). Detector (detect) and OCR (recognize) engines stub until ONNX runtime is vendored.
partial bun:speech Tier 2 — VAD-gated utterance segmentation ships today (speech.listen over any audio chunk iterator). Whisper STT (transcribe) pending encoder-decoder support in bun:llm; Piper TTS (speak) pending libpiper / ONNX vendor add.
partial bun:arrow Tier 2 — in-memory columnar tables (RecordBatch, Table, Column), type inference, validity bitmaps, computes (sum, mean, min, max, count, variance, stddev, quantile, median, distinct, filter, groupBy), fromRows / toRows for the row ↔ columnar bridge, and Arrow IPC streaming format with dictionary-batch decode (fromIPC / toIPC, hand-rolled FlatBuffers builder + reader; round-trips all six in-memory types with validity bitmaps; reads apache-arrow / pyarrow / arrow-rs / polars / duckdb output including Dictionary<Utf8>). Wire-compat verified against apache-arrow 21.1.0 in bench/parabun-arrow-ipc-interop/. Parquet pending.
in progress bun:video Tier 1 — JS surface scaffolded; libavcodec / V4L2 M2M / NVDEC native binding lands with hardware bring-up. Decode + encode + container muxing.
next bun:parallel v2 Closure-aware persistent worker pool + SharedArrayBuffer channels. Lifts today's pmap ceiling.
planned bun:image AVIF AVIF decode/encode (libavif + AOM / dav1d vendor add). Rounds out the codec coverage matrix.

bun:llm serves as proof-of-concept for the stack — built on bun:gpu + bun:simd + bun:parallel. Parabun is positioned as a perf runtime, not an AI runtime.

Scope. The added modules target typed-array numeric work, embarrassingly-parallel loops, and GPU-friendly matrix shapes. HTTP handlers, JSON parsing, and ordinary application code go through the same paths as upstream Bun — no changes in performance or behavior are expected there.