Parabun
A fork of Bun with four additional runtime modules — a worker pool, SIMD
primitives for typed arrays, a compute-only GPU surface, and a GGUF/Llama runtime — and parse-time extensions
for purity, memoization, and reactive bindings in .pts / .pjs files. Regular
.ts and .js files behave the same as in upstream Bun.
Linux and macOS. Windows build is in progress. parabun self-update refreshes an existing install
along with the VS Code extension.
Installs the VS Code extension into any of code, cursor, or kiro found on
$PATH. The extension provides the .pts / .pjs TextMate grammar and an LSP
with hover, go-to-definition, purity diagnostics, memo hints, and operator documentation.
Runtime modules
bun:parallel
pmap and preduce chunk arrays across a persistent worker pool. Functions are
serialized via fn.toString(), so they must be pure — no closures, no outer references.
TypedArrays are passed through a SharedArrayBuffer, so postMessage transfers a
handle rather than a copy.
import { pmap } from "bun:parallel";
pure function score(row) { return row.reduce((a, b) => a + b * b, 0); }
const rows = new Float32Array(new SharedArrayBuffer(1_000_000 * 4));
// ...fill rows...
const scores = await pmap(score, rows, { concurrency: 8 });
bun:simd
WebAssembly v128 kernels for Float32Array (f32x4) and Float64Array (f64x2). Inputs
above 4 MiB are processed in place rather than copied into WASM memory. alloc() returns a
typed array backed by the WASM linear memory for zero-copy use.
import { mulScalar, add, dot, sum } from "bun:simd";
const y = mulScalar(new Float32Array([1, 2, 3, 4]), 3); // [3, 6, 9, 12]
const z = add(a, b);
const d = dot(u, v);
const s = sum(a);
| op (N=100k, f32) | .map / .reduce | tight loop | bun:simd |
|---|---|---|---|
| mulScalar(a, 3) | 808 µs | 60 µs | 30 µs |
| add(a, b) | 884 µs | 73 µs | 40 µs |
| sum(a) | 574 µs | 43 µs | 17 µs |
| dot(a, b) | 716 µs | 51 µs | 24 µs |
bun:gpu
Metal on macOS, CUDA on Linux and Windows, CPU fallback on hosts without a GPU. A matrix passed to
gpu.hold() stays resident across matVec calls, so only the input vector crosses the
host↔device boundary per call. Pure Float32Array → Float32Array functions are
runtime-compiled to PTX (via NVRTC) or MSL (via newLibraryWithSource:) when the body fits a
supported shape: arithmetic, ternary, Math.*.
import gpu from "bun:gpu";
const mat = gpu.alloc(M * K, "f32");
// ...fill mat...
const held = gpu.hold(mat); // uploaded once
for (const q of queries) {
const scores = gpu.matVec(held, q, M, K); // no copy
}
gpu.release(held);
bun:llm
An in-tree GGUF runtime: file loader, byte-level BPE tokenizer, Llama and Qwen2 forward passes, greedy and
nucleus sampling. Weights are mmap'd off disk; the residual stream and KV cache live on-device.
Per-token traffic across PCIe is a 4-byte argmax. Q4_K and Q6_K matVec kernels use a 1-warp-per-row,
4-warps-per-block layout; QKV and Gate+Up projections are byte-concatenated at load time and dispatched as one
matVec per layer.
import llm from "bun:llm";
using m = await llm.LLM.load("./Llama-3.2-1B-Instruct-Q4_K_M.gguf");
for await (const piece of m.chat([
{ role: "system", content: "You are helpful and concise." },
{ role: "user", content: "What is the capital of France?" },
])) {
process.stdout.write(piece);
}
| Llama-3.2-1B Q4_K_M · RTX 4070 Ti | parabun | ollama |
|---|---|---|
| greedy decode (device-only) | 340 tok/s | ~350 tok/s |
| greedy decode (logits DtoH) | 275 tok/s | — |
| prompt prefill | 295 tok/s | — |
Numbers are within run-to-run noise of ollama on this model and hardware. Chat templates for Llama-3, ChatML,
and Mistral-Instruct are detected from the GGUF's tokenizer.chat_template. Only the CUDA backend
is wired in this module today; Metal kernels are pending.
Example: LangChain VectorStore
ParabunVectorStore extends VectorStore from @langchain/core and
implements the addVectors and similaritySearchVectorWithScore methods, so call sites
that accept any VectorStore work against it without changes.
import { MemoryVectorStore }
from "langchain/vectorstores/memory";
const store = new MemoryVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
.similaritySearchVectorWithScore(q, 10);
import { ParabunVectorStore }
from "./parabun-store.pjs";
const store = new ParabunVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
.similaritySearchVectorWithScore(q, 10);
| 100k × 384 f32, top-10 | add_ms | score_ms | vs LangChain |
|---|---|---|---|
| LangChain MemoryVectorStore | 4.0 | 48.2 | 1.00× |
| ParabunVectorStore | 82.7 | 15.9 | 2.83× |
add_ms is higher because rows are packed into a single SAB Float32Array and normalized
in place — one-time O(N·D) work amortized across subsequent queries. Top-K indices and scores match
LangChain's to four decimal places.
Language extensions — .pts / .pjs
Files ending in .pts, .ptsx, .pjs, or .pjsx are parsed with
additional desugarings. All output is standard JS; no runtime support is required, and the runtime modules above
do not depend on any of this syntax. GitHub's TextMate grammars do not cover .pts; the
VS Code / Cursor / Kiro extension
provides the grammar and an LSP.
pure and memo
A pure function is rejected at parse time if it mutates an outer variable, reads this,
or calls a known-impure global. Prefix pure with memo — or drop
pure entirely and write memo as the declarator — and the result is cached by argument
identity: 0-arg singleton, 1-arg Map, multi-arg nested Map chain. Recursive
self-references route through the outer wrapper, so fib below runs the body 21 times for
fib(20), not 21,891.
// declarator form — `memo` implies pure + function
memo fib(n: number): number {
return n < 2 ? n : fib(n - 1) + fib(n - 2);
}
// arrow form — same thing as an expression prefix
const normalize = memo (s: string) => s.trim().toLowerCase();
// async dedupes concurrent in-flight calls, evicts on reject
memo async fetchProfile(id: string) { return await db.users.get(id); }
signal, effect, ~>
signal NAME = <rhs> desugars to a Signal binding; bare reads rewrite to
.get(), assignments to .set(). If the RHS references another in-scope signal, the
binding auto-promotes to a read-only derived(). effect { ... } tracks every signal it
reads as a dependency and re-runs on change. A ~> B is reactive binding — it desugars to
effect(() => { B = A; }), so B stays in step with A and whatever
signals A reads from.
signal count = 0;
signal doubled = count * 2; // auto-derived
effect { console.log(count, doubled); }
count++; // effect re-runs: 1, 2
// bind signal value into a DOM-ish sink — updates track dep changes
count ~> el.innerHTML;
|>, ..!, ..&, ..=
x |> f is f(x). pure functions passed through |> get
inlined at parse time — no call overhead. ..! / ..& are .catch /
.finally in suffix position. ..= is = await in a declaration and
disambiguates to an inclusive-range literal otherwise (0..5 excludes 5, 0..=5 includes
it).
pure function sq(x: number) { return x * x; }
const result = 5 |> sq |> sq; // 625 — both calls inlined
const json ..= fetch("/api").then(r => r.json())
..! err => console.error(err) // .catch
..& () => console.log("done"); // .finally
for (const i of 0..=9) emit(i); // [0..9]
defer and arena
defer EXPR schedules EXPR to run when the enclosing block exits (return, throw,
fall-through). Multiple defers dispose in LIFO order. defer await EXPR inside an async function
awaits the cleanup. arena { ... } runs the block with the GC paused, then frees everything
allocated inside on exit — useful for tight numeric loops with short-lived intermediate allocations.
function readConfig(path: string) {
const fd = fs.openSync(path);
defer fs.closeSync(fd); // runs on every exit path
return JSON.parse(fs.readFileSync(fd));
}
arena {
const buf = new Float32Array(1_000_000);
// ...numeric work...
} // buf freed here, no GC pressure
Full grammar in LLMs.md, and the LSP carries arity-based "could be memo" / "memo probably not worth it" hints plus full purity diagnostics.
Roadmap
Parabun's positioning is to open typical JS performance bottlenecks via multithreading and GPU. The modules above are the foundation; the next set attack the most common "I have to shell out / use Python / write native code" pain points in JS-land.
Each module ships behind a compile-time feature flag. Production builds slim to only what your app imports — heavy codecs (ffmpeg, Apache Arrow) are opt-in, server / Lambda / Workers builds stay minimal.
| Status | Module | What it does |
|---|---|---|
| in flight | bun:image | JPEG / PNG / WebP / AVIF decode + GPU-accelerated resize. Sharp-class but bundleable. |
| next | bun:gpu kernels | 2D convolution, FFT, sort, scan, histogram. Composes into image / audio / video / data work. |
| next | bun:csv + bun:arrow | Parallel CSV parse + columnar (Parquet / Arrow IPC) with SIMD column ops. The "5 GB CSV" story. |
| next | bun:parallel v2 |
Closure-aware persistent worker pool + SharedArrayBuffer channels. Lifts today's
pmap ceiling.
|
| planned | bun:audio | FFT, spectrograms, IIR / FIR filters, resampling. Currently no good options in JS. |
| planned | bun:video | ffmpeg-class transcode / thumbnail / concat as a runtime module. No more which ffmpeg. |
| planned | bun:camera | Live video capture (V4L2 / AVFoundation / Media Foundation). Makes Parabun a real embedded runtime. |
bun:llm serves as proof-of-concept for the stack — built on bun:gpu +
bun:simd + bun:parallel. Parabun is positioned as a perf runtime, not an AI runtime.