Livescriber — Shaun Zhang

Problem

VoxTel handles meetings after the fact. There’s a separate, harder problem: showing a transcript on the screen during a live conversation, with speaker turns, low enough latency to be useful, and stable enough not to flicker as the model revises its own output.

Approach

A Rust front end that owns the audio buffer, streams chunks into a whisper.cpp pipeline, and reconciles partial transcripts using a sliding window. Speaker turns come from VoxTel’s voice-print library. The hard work is not the model — it is the harness around long sessions.

Stack

Rust for the buffer and reconciliation, WebSockets for the wire format, WebAudio for browser capture, whisper.cpp on the server. Diarization plugs into VoxTel’s existing identity store.

What shipped

Nothing yet. Pre-release. Currently used by two people (me and one beta tester) for live captioning during demos.

What’s next

Public alpha once the reconciliation pass stops dropping mid-sentence revisions. Then a hosted demo so people can try it without compiling Rust.