Skip to content

Voice assistant + MCP

The headline voice assistant declares one inline tool wired to a GPIO pin. That works when you own the action. When the action lives behind a tool surface someone else maintains — anything that speaks the Model Context Protocol — bridging it into the bot is one line.

assistant.create({ tools }) accepts a structural type with .tools: ToolDescriptor[] and .call(name, args). @para/mcp connection objects fit that shape natively, so dropping a connection into tools: [...] flattens every tool the server exposes into the bot’s catalog. The grammar-constrained generation, JSON dispatch, and round-tripped result are identical to inline tools.

Context7 is an MCP server from Upstash that fetches up-to-date library and framework docs by ID. Pair it with the voice loop and you have an assistant that can answer “how do I set a Cloudflare Workers KV key with a TTL?” using the current docs, not whatever its base model trained on a year ago.

import assistant from "parabun:assistant";
import mcp from "@para/mcp";
// Spawn Context7 as a subprocess; @para/mcp speaks JSON-RPC over its stdio.
// CONTEXT7_API_KEY is optional (raises rate limits); works without one.
await using docs = await mcp.connect("stdio", "npx", {
args: ["-y", "@upstash/context7-mcp"],
env: { ...process.env, CONTEXT7_API_KEY: process.env.CONTEXT7_API_KEY ?? "" },
});
await using bot = await assistant.create({
llm: process.env.ASSISTANT_LLM,
stt: process.env.ASSISTANT_STT,
tts: process.env.ASSISTANT_TTS,
system: `You answer programming questions out loud. When the user mentions a library,
framework, or API, call resolve-library-id to find its Context7 ID and then
query-docs to fetch current documentation. Quote sparingly — the user is listening,
not reading. One or two sentences per answer.`,
tools: [docs], // ← every Context7 tool is now reachable
});
// Live status line, plus a count of how many tools the bot can see.
derived header = `\r[${bot.state.get().padEnd(10)}] (${bot.tools.length} tools)`;
header -> process.stdout.write;
await bot.run();

Speak after [listening ] shows up. Try “how do I cache a fetch response in Hono for five minutes?” — the LLM emits a constrained resolve-library-id({ libraryName: "hono" }) call, then query-docs({ context7CompatibleLibraryID, query: "cache fetch 5 minutes" }), gets fresh docs back, and synthesizes a one-sentence spoken answer.

  • docs.tools is populated during the initialize handshake — the bot snapshots it and includes every entry in the JSON-schema-constrained generation grammar. Tool-call requests route through docs.call(name, args); the JSON result is fed back to the model so the spoken response reflects what actually happened.
  • docs.alive is a Signal<boolean> — false when the subprocess exits or the WebSocket drops. Bind it to a status indicator or wrap bot.run() in effect { if (docs.alive.get()) … } for auto-reconnect.
  • bot.tools is the flattened catalog (inline + MCP). Each entry carries source: "inline" | "mcp" so a UI can render where each came from.

Or use the memory server for persistent recall

Section titled “Or use the memory server for persistent recall”

Context7 is one option. The official @modelcontextprotocol/server-memory gives the bot a knowledge graph it writes to and reads from across sessions — say “remember that my dog Biscuit is allergic to chicken” and the assistant will surface that fact months later when you ask “what can I feed Biscuit?”.

await using mem = await mcp.connect("stdio", "npx", {
args: ["-y", "@modelcontextprotocol/server-memory"],
env: { ...process.env, MEMORY_FILE_PATH: `${process.env.HOME}/.assistant-memory.jsonl` },
});
await using bot = await assistant.create({
llm, stt, tts,
system: `You are a personal assistant with persistent memory. Use create_entities and
add_observations to remember facts the user shares. Use search_nodes before
answering personal questions. Speak in one or two sentences.`,
tools: [mem],
});

Zero configuration, zero API keys — MEMORY_FILE_PATH is the only optional setting and defaults to a file next to the server.

tools: [...] accepts the union — inline descriptors and MCP connections side by side. The headline GPIO example becomes:

await using chip = await gpio.openDefaultChip();
await using led = chip.line(17, { mode: "out", initial: 0 });
await using bot = await assistant.create({
llm, stt, tts,
tools: [
docs, // every Context7 tool
mem, // every memory tool
{
name: "setLight",
description: "Toggle the local LED wired to BCM 17.",
schema: { type: "object", properties: { on: { type: "boolean" } }, required: ["on"] },
run: ({ on }) => { led.write(on ? 1 : 0); return `local LED ${on ? "on" : "off"}`; },
},
],
});

The bot sees one merged catalog; the LLM picks among them at each turn — answer a docs question, recall a fact, flip a pin.

mcp.connect("ws", url) is the same surface for servers that speak MCP over WebSocket text frames:

await using hub = await mcp.connect("ws", "wss://mcp.context7.com/mcp", {
headers: { authorization: `Bearer ${process.env.CONTEXT7_API_KEY}` },
});

Either transport works in tools: [...].

Terminal window
ASSISTANT_LLM=$HOME/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf \
ASSISTANT_STT=$HOME/models/ggml-tiny.en.bin \
ASSISTANT_TTS=$HOME/models/en_US-lessac-medium.onnx \
CONTEXT7_API_KEY=… # optional — raises Context7's rate limit. \
parabun src/agent.pts

The status line will show (N tools) once the MCP handshake completes; speak after [listening ]. The LLM is constrained to call one of those N tools or reply in plain text — no malformed JSON, no hallucinated tool names.

Same as the headline voice assistant — Linux + ALSA, NVIDIA GPU for the LLM, mic + speakers. The MCP server runs wherever it normally runs (subprocess, daemon, websocket endpoint); the bot doesn’t care.