Know when they're done talking.

Client-side semantic endpointing for voice apps. Detect turn completion, thinking pauses, and interrupts — entirely in the browser.

//Benefits

Why Utterance?

No cloud dependency

Everything runs in the browser. No servers, no API keys, no network requests for audio processing.

Zero latency

On-device inference means instant results. No round-trip to a server — decisions happen in milliseconds.

Privacy first

Audio never leaves the user’s device. No recording, no uploading, no third-party processing.

Lightweight model

Small ONNX model that loads fast and runs efficiently. Designed for real-time performance on any device.

Framework agnostic

Works with any JavaScript framework. Use it with React, Vue, vanilla JS, or any voice SDK.

Simple event API

Just listen for turnEnd, pause, and interrupt events. Get building in minutes, not hours.

//Quick start

Install and start detecting in seconds.

npm install @utterance/core
index.ts
import { Utterance } from "@utterance/core";

const detector = new Utterance();

detector.on("turnEnd", (result) => {
  console.log("User is done speaking", result.confidence);
});

detector.on("pause", (result) => {
  console.log("User is thinking...", result.duration);
});

detector.on("interrupt", () => {
  console.log("User wants to speak — stop AI response");
});

await detector.start();

Open source. Community driven.

MIT licensed. Free forever. Star us on GitHub, join the Discord, or open a PR.