Utterance

Utterance

Every voice app faces the same problem: it can't tell when you're done talking.

You pause to think, and it cuts you off. You take a breath, and it responds too soon. You want to interrupt, and it keeps going.

Current solutions either detect silence (Silero VAD, ricky0123/vad) without understanding intent, or use server-side AI (OpenAI Realtime, AssemblyAI) that adds latency and costs.

Utterance is different. It uses a lightweight ML model entirely on the client side. It recognizes the difference between a thinking pause and a completed turn. No cloud. No delay. No per-minute fees.

Key Features

  • Semantic endpointing — understands thinking pauses vs. turn completion
  • Interrupt detection — knows when a user wants to interject
  • Confidence scoring — returns probability (0–1) for each detection
  • Client-side only — no cloud, no latency, no API costs
  • Lightweight — model under 5MB, inference under 50ms
  • Framework agnostic — works with any voice stack
  • Privacy first — audio never leaves the device

Comparison

FeatureSilero VADricky0123/vadPicovoice CobraOpenAI RealtimeUtterance
Detects speech vs. silence
Semantic pause detection
Interrupt detection
Runs client-side
No API costs
Privacy (audio stays local)

On this page