/ parabun

bun:csv

Streaming RFC 4180 CSV parser. Async generator, full quote / escape handling, optional parallel mode.

ts
import csv from "bun:csv";

A single export — parseCsv(input, opts?) — that returns an async iterable of rows. The parser is a state machine over UTF-8 bytes; it never materializes the full file in memory regardless of size.

parseCsv(input, opts?)

input can be:

ts
import csv from "bun:csv";

for await (const row of csv.parseCsv(Bun.file("data.csv"), { header: true })) {
  process(row.id, row.name, row.score);
}
OptionDefaultDescription
headerfalseWhen true, the first row is the column names; subsequent rows are emitted as objects keyed by column. When false, rows are string[].
delimiter","Single-character cell separator.
quote"\""Single-character quote that wraps cells with embedded delimiters / newlines.
escapesame as quoteRFC 4180 doubles the quote ("") to escape; some dialects use \\".
commentnoneIf set, lines starting with this character are skipped.
inferTypestrue (with header)Per-cell type inference: numeric → number, true / falseboolean, empty / nullnull. Plain strings pass through.
parallelfalseSee below.

Without header, every row is an array of strings (no inference — keeps fast-path simple).

Parallel mode

parallel: true chunks the input across bun:parallel's worker pool when the input has no quoted cells (the byte-boundary heuristic doesn't work otherwise). It runs the parse off the main thread.

ts
for await (const row of csv.parseCsv(Bun.file("data.csv"), { header: true, parallel: true })) {
  // row processed off main thread
}

This is not a per-file speedup. The serial state machine is already memory-bandwidth-bound, and the parallel path's materialize-and-fork overhead grows with input size. Sweep on a 16-core x86 release build:

FixtureSerial (med)Parallel (med)Speedup
5 MB · 128k rows152 ms129 ms1.18×
50 MB · 1.25M rows1446 ms1528 ms0.95×
200 MB · 4.92M rows5892 ms6363 ms0.93×

Use parallel: true to keep the event loop responsive while parsing (parsing N files concurrently does scale across cores), not because you expect bigger files to go faster. bench/parabun-csv-parallel/ reproduces the numbers.

Bridging to columnar

bun:csv rows pair naturally with bun:arrow's fromRows:

ts
import csv from "bun:csv";
import arrow from "bun:arrow";

const rows: any[] = [];
for await (const row of csv.parseCsv(Bun.file("data.csv"), { header: true })) rows.push(row);
const tbl = arrow.fromRows(rows);

arrow.mean(tbl.column("score"));

For very large CSVs, batch the bridge — call arrow.fromRows per N rows instead of materializing them all first.

Limits