/ parabun

bun:vision

Frame stream conversion + motion detection. Detector / OCR engines stub until ONNX runtime is vendored.

ts
import vision from "bun:vision";

Tier 2 wrapper that turns any camera frame iterator into packed RGBA8 frames, plus a frame-diff motion estimator. Detector + OCR engines are typed but stubbed — they need an ONNX runtime vendored before they can do anything.

frames(stream, opts?)

Takes a RawFrame iterator (e.g. from bun:camera) and yields { width, height, data: Uint8Array } packed-RGBA8 frames.

ts
import camera from "bun:camera";
import image from "bun:image";
import vision from "bun:vision";

const cam = await camera.open({ device: "/dev/video0", width: 1280, height: 720, fps: 30 });
for await (const frame of vision.frames(cam.frames(), { decodeMjpg: image.decode })) {
  // frame.data is RGBA8 — feed to image, detector, recorder, anything
}

Supported pixel formats:

FormatConversion
yuyvYUV 4:2:2 → RGBA
nv12YUV 4:2:0 → RGBA
rgb24RGB → RGBA (alpha=255)
rgbapassthrough
mjpgpasses through decodeMjpg(frame.data). Required: caller passes image.decode from bun:image (cross-builtin imports between bun: modules aren't supported, so the dep is injected here).

detectMotion(stream, opts?)

Frame-diff motion estimator. Downsamples to a luma image (configurable scale), diffs against the previous frame, applies temporal smoothing, and yields { frame, motion: number } where motion is the fraction of pixels that changed beyond a threshold.

ts
for await (const { frame, motion } of vision.detectMotion(vision.frames(cam.frames()), {
  threshold: 30,
  smoothing: 0.6,
  scale: 4,
})) {
  if (motion > 0.05) saveFrame(frame);
}
OptionDefaultDescription
threshold30Per-pixel luma delta below which a pixel is considered unchanged.
smoothing0.5EMA factor on the motion signal. 0 = raw, 1 = frozen.
scale4Downsample factor for the luma image. Higher = cheaper + less sensitive to fine motion.

detect(frame, opts) — stub

Object detection — YOLO / SSD / RT-DETR. Throws:

bun:vision.detect: object-detection engines (YOLO / SSD / RT-DETR) require ONNX runtime as a vendored dep — not yet wired. Tracked in the roadmap as bun:vision (Tier 2).

Once ONNX is vendored, callers pass an ONNX model path and a label set; the function returns { boxes: [{x, y, w, h}], scores: number[], labels: string[] }.

recognize(frame, opts) — stub

OCR — Tesseract / EasyOCR. Same shape: throws with a documented message until the engine is wired. Returns { text, words: [{ text, bbox, confidence }] } when implemented.

Composing

The shape of vision.frames is also the shape detectMotion, detect, and recognize consume. Anything that yields packed-RGBA8 fits — file readers, RTSP unwrappers, GStreamer bridges. The cross-module dependency injection (decodeMjpg) extends to detectors / OCR engines too: when those land, callers pass an engine handle in rather than the module reaching for it.

Limits