/ parabun

Parabun

A fork of Bun with four additional runtime modules — a worker pool, SIMD primitives for typed arrays, a compute-only GPU surface, and a GGUF/Llama runtime — and parse-time extensions for purity, memoization, and reactive bindings in .pts / .pjs files. Regular .ts and .js files behave the same as in upstream Bun.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install.sh | bash

Linux and macOS. Windows build is in progress. parabun self-update refreshes an existing install along with the VS Code extension.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install-extension.sh | bash

Installs the VS Code extension into any of code, cursor, or kiro found on $PATH. The extension provides the .pts / .pjs TextMate grammar and an LSP with hover, go-to-definition, purity diagnostics, memo hints, and operator documentation.

Runtime modules

bun:parallel

pmap and preduce chunk arrays across a persistent worker pool. Functions are serialized via fn.toString(), so they must be pure — no closures, no outer references. TypedArrays are passed through a SharedArrayBuffer, so postMessage transfers a handle rather than a copy.

parabun
import { pmap } from "bun:parallel";

pure function score(row) { return row.reduce((a, b) => a + b * b, 0); }

const rows = new Float32Array(new SharedArrayBuffer(1_000_000 * 4));
// ...fill rows...
const scores = await pmap(score, rows, { concurrency: 8 });

bun:simd

WebAssembly v128 kernels for Float32Array (f32x4) and Float64Array (f64x2). Inputs above 4 MiB are processed in place rather than copied into WASM memory. alloc() returns a typed array backed by the WASM linear memory for zero-copy use.

typescript
import { mulScalar, add, dot, sum } from "bun:simd";

const y = mulScalar(new Float32Array([1, 2, 3, 4]), 3); // [3, 6, 9, 12]
const z = add(a, b);
const d = dot(u, v);
const s = sum(a);
op (N=100k, f32) .map / .reduce tight loop bun:simd
mulScalar(a, 3) 808 µs 60 µs 30 µs
add(a, b) 884 µs 73 µs 40 µs
sum(a) 574 µs 43 µs 17 µs
dot(a, b) 716 µs 51 µs 24 µs

bun:gpu

Metal on macOS, CUDA on Linux and Windows, CPU fallback on hosts without a GPU. A matrix passed to gpu.hold() stays resident across matVec calls, so only the input vector crosses the host↔device boundary per call. Pure Float32ArrayFloat32Array functions are runtime-compiled to PTX (via NVRTC) or MSL (via newLibraryWithSource:) when the body fits a supported shape: arithmetic, ternary, Math.*.

typescript
import gpu from "bun:gpu";

const mat = gpu.alloc(M * K, "f32");
// ...fill mat...
const held = gpu.hold(mat);                   // uploaded once
for (const q of queries) {
  const scores = gpu.matVec(held, q, M, K); // no copy
}
gpu.release(held);

bun:llm

An in-tree GGUF runtime: file loader, byte-level BPE tokenizer, Llama and Qwen2 forward passes, greedy and nucleus sampling. Weights are mmap'd off disk; the residual stream and KV cache live on-device. Per-token traffic across PCIe is a 4-byte argmax. Q4_K and Q6_K matVec kernels use a 1-warp-per-row, 4-warps-per-block layout; QKV and Gate+Up projections are byte-concatenated at load time and dispatched as one matVec per layer.

typescript
import llm from "bun:llm";

using m = await llm.LLM.load("./Llama-3.2-1B-Instruct-Q4_K_M.gguf");

for await (const piece of m.chat([
  { role: "system", content: "You are helpful and concise." },
  { role: "user", content: "What is the capital of France?" },
])) {
  process.stdout.write(piece);
}
Llama-3.2-1B Q4_K_M · RTX 4070 Ti parabun ollama
greedy decode (device-only) 340 tok/s ~350 tok/s
greedy decode (logits DtoH) 275 tok/s
prompt prefill 295 tok/s

Numbers are within run-to-run noise of ollama on this model and hardware. Chat templates for Llama-3, ChatML, and Mistral-Instruct are detected from the GGUF's tokenizer.chat_template. Only the CUDA backend is wired in this module today; Metal kernels are pending.

Example: LangChain VectorStore

ParabunVectorStore extends VectorStore from @langchain/core and implements the addVectors and similaritySearchVectorWithScore methods, so call sites that accept any VectorStore work against it without changes.

before
import { MemoryVectorStore }
  from "langchain/vectorstores/memory";

const store = new MemoryVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
after
import { ParabunVectorStore }
  from "./parabun-store.pjs";

const store = new ParabunVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
100k × 384 f32, top-10 add_ms score_ms vs LangChain
LangChain MemoryVectorStore 4.0 48.2 1.00×
ParabunVectorStore 82.7 15.9 2.83×

add_ms is higher because rows are packed into a single SAB Float32Array and normalized in place — one-time O(N·D) work amortized across subsequent queries. Top-K indices and scores match LangChain's to four decimal places.

Language extensions — .pts / .pjs

Files ending in .pts, .ptsx, .pjs, or .pjsx are parsed with additional desugarings. All output is standard JS; no runtime support is required, and the runtime modules above do not depend on any of this syntax. GitHub's TextMate grammars do not cover .pts; the VS Code / Cursor / Kiro extension provides the grammar and an LSP.

pure and memo

A pure function is rejected at parse time if it mutates an outer variable, reads this, or calls a known-impure global. Prefix pure with memo — or drop pure entirely and write memo as the declarator — and the result is cached by argument identity: 0-arg singleton, 1-arg Map, multi-arg nested Map chain. Recursive self-references route through the outer wrapper, so fib below runs the body 21 times for fib(20), not 21,891.

parabun
// declarator form — `memo` implies pure + function
memo fib(n: number): number {
  return n < 2 ? n : fib(n - 1) + fib(n - 2);
}

// arrow form — same thing as an expression prefix
const normalize = memo (s: string) => s.trim().toLowerCase();

// async dedupes concurrent in-flight calls, evicts on reject
memo async fetchProfile(id: string) { return await db.users.get(id); }

signal, effect, ~>

signal NAME = <rhs> desugars to a Signal binding; bare reads rewrite to .get(), assignments to .set(). If the RHS references another in-scope signal, the binding auto-promotes to a read-only derived(). effect { ... } tracks every signal it reads as a dependency and re-runs on change. A ~> B is reactive binding — it desugars to effect(() => { B = A; }), so B stays in step with A and whatever signals A reads from.

parabun
signal count = 0;
signal doubled = count * 2;   // auto-derived

effect { console.log(count, doubled); }

count++;                           // effect re-runs: 1, 2

// bind signal value into a DOM-ish sink — updates track dep changes
count ~> el.innerHTML;

|>, ..!, ..&, ..=

x |> f is f(x). pure functions passed through |> get inlined at parse time — no call overhead. ..! / ..& are .catch / .finally in suffix position. ..= is = await in a declaration and disambiguates to an inclusive-range literal otherwise (0..5 excludes 5, 0..=5 includes it).

parabun
pure function sq(x: number) { return x * x; }

const result = 5 |> sq |> sq;   // 625 — both calls inlined

const json ..= fetch("/api").then(r => r.json())
  ..! err => console.error(err)      // .catch
  ..& () => console.log("done"); // .finally

for (const i of 0..=9) emit(i);                    // [0..9]

defer and arena

defer EXPR schedules EXPR to run when the enclosing block exits (return, throw, fall-through). Multiple defers dispose in LIFO order. defer await EXPR inside an async function awaits the cleanup. arena { ... } runs the block with the GC paused, then frees everything allocated inside on exit — useful for tight numeric loops with short-lived intermediate allocations.

parabun
function readConfig(path: string) {
  const fd = fs.openSync(path);
  defer fs.closeSync(fd);              // runs on every exit path
  return JSON.parse(fs.readFileSync(fd));
}

arena {
  const buf = new Float32Array(1_000_000);
  // ...numeric work...
}                                        // buf freed here, no GC pressure

Full grammar in LLMs.md, and the LSP carries arity-based "could be memo" / "memo probably not worth it" hints plus full purity diagnostics.

Roadmap

Parabun's positioning is to open typical JS performance bottlenecks via multithreading and GPU. The modules above are the foundation; the next set attack the most common "I have to shell out / use Python / write native code" pain points in JS-land.

Each module ships behind a compile-time feature flag. Production builds slim to only what your app imports — heavy codecs (ffmpeg, Apache Arrow) are opt-in, server / Lambda / Workers builds stay minimal.

Status Module What it does
in flight bun:image JPEG / PNG / WebP / AVIF decode + GPU-accelerated resize. Sharp-class but bundleable.
next bun:gpu kernels 2D convolution, FFT, sort, scan, histogram. Composes into image / audio / video / data work.
next bun:csv + bun:arrow Parallel CSV parse + columnar (Parquet / Arrow IPC) with SIMD column ops. The "5 GB CSV" story.
next bun:parallel v2 Closure-aware persistent worker pool + SharedArrayBuffer channels. Lifts today's pmap ceiling.
planned bun:audio FFT, spectrograms, IIR / FIR filters, resampling. Currently no good options in JS.
planned bun:video ffmpeg-class transcode / thumbnail / concat as a runtime module. No more which ffmpeg.
planned bun:camera Live video capture (V4L2 / AVFoundation / Media Foundation). Makes Parabun a real embedded runtime.

bun:llm serves as proof-of-concept for the stack — built on bun:gpu + bun:simd + bun:parallel. Parabun is positioned as a perf runtime, not an AI runtime.

Scope. The added modules target typed-array numeric work, embarrassingly-parallel loops, and GPU-friendly matrix shapes. HTTP handlers, JSON parsing, and ordinary application code go through the same paths as upstream Bun — no changes in performance or behavior are expected there.