Skip to content

Voice assistant

A turn-taking voice agent over parabun:assistant whose tool actually does something physical. One factory call wires the whole pipeline (mic → VAD → Whisper STT → Llama with tool-grammar → Piper TTS → speaker); one reactive call-binding pipes the live state (“idle” / “listening” / “thinking” / “speaking”) into a status line; one setLight tool drives a real GPIO line.

This is the Parabun-only end of the spectrum — everything substantial is native (Whisper, Llama on GPU, ALSA capture/playback, gpiochip uAPI v2). Para’s contribution is the reactive state signal exposed by the bot and the tools dispatch table.

Looking for a non-hardware integration? See Voice assistant + MCP — bridge any MCP server’s tool catalog into the bot without wrapping each tool by hand.

import assistant from "parabun:assistant";
import gpio from "parabun:gpio";
await using chip = await gpio.openDefaultChip();
await using led = chip.line(17, { mode: "out", initial: 0 });
await using bot = await assistant.create({
llm: process.env.ASSISTANT_LLM, // path to Llama-3.2-1B-Instruct-Q4_K_M.gguf
stt: process.env.ASSISTANT_STT, // path to ggml-tiny.en.bin
tts: process.env.ASSISTANT_TTS, // path to en_US-lessac-medium.onnx
system: "You are a concise voice assistant. You can turn the light on and off.",
tools: [{
name: "setLight",
description: "Turn the light on or off.",
schema: { type: "object", properties: { on: { type: "boolean" } }, required: ["on"] },
run: ({ on }) => { led.write(on ? 1 : 0); return `light ${on ? "on" : "off"}`; },
}],
});
// Live status line — `->` re-calls process.stdout.write whenever bot.state changes.
`\r[${bot.state.get().padEnd(10)}]` -> process.stdout.write;
await bot.run();
  • bot.state is a Signal<"idle" | "listening" | "thinking" | "speaking"> — the bot updates it as it moves through the loop. You can read it directly (.get()), bind it to a UI sink, or build derived state on top of it.
  • A -> fn is the reactive call-binding operator. Every time the LHS expression changes (because bot.state updated), fn is re-called with the new value. Here it writes the current state to stdout, padded for alignment so the line redraws cleanly. The same pattern would feed an OLED screen, an MQTT topic, or a server-sent-events channel.
  • The setLight tool is a plain function. The bot calls into it when the LLM emits a constrained tool-call. The string it returns is fed back to the model so the spoken response can confirm what happened. led.write is the side-effecting line — no reactivity, no event bus.

You need three model files locally — the assistant doesn’t bundle them:

Terminal window
ASSISTANT_LLM=$HOME/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf \
ASSISTANT_STT=$HOME/models/ggml-tiny.en.bin \
ASSISTANT_TTS=$HOME/models/en_US-lessac-medium.onnx \
parabun src/agent.pts

Speak after the status line shows [listening ]. Voice activity detection segments your utterance, Whisper transcribes it, the LLM responds (with a grammar that restricts output to one of your declared tools or plain text), Piper synthesizes the response, ALSA plays it. Say “turn the light on” and the LED on BCM 17 lights up.

  • Linux + ALSA for capture/playback (default device works; pass audio: { input: "hw:1,0" } to override)
  • NVIDIA GPU with CUDA for the LLM. Whisper + Piper run on CPU.
  • A microphone and speakers. USB conference mics work well.
  • Raspberry Pi 5 or similar — openDefaultChip() finds RP1 first, then falls back to /dev/gpiochip0. BCM 17 → 220 Ω → LED → ground.