Storytelling engine — Shaun Zhang

Problem

A community storytelling nonprofit wanted to scale long-form member-success stories without losing the founder’s voice, fabricating details, or burying editors in unstructured drafts. The bottleneck wasn’t the writing — it was grading and gating quality at volume.

Approach

A pipeline where every draft is scored before a human sees it: an 8-point pass/fail check (all story blocks present and ordered, self-reported figures labeled, no fabricated claims, verified public data points) plus a 5-point rubric (voice fidelity, narrative clarity, grounding, dignity, movement significance). Drafts below threshold get kicked back with the failing criteria. It never publishes on its own — an editor approves, then the member explicitly approves, before anything goes live.

Stack

A custom Next.js app with Supabase (Postgres + Auth) and Google login, Anthropic Claude for both generation and as the eval judge, deployed on Vercel. The pass/fail and rubric criteria live in version control next to the prompts; a CLI regression runner grades the whole submission bank end-to-end.

What shipped

A beta batch of eleven member stories ran end-to-end: voice fidelity averaged 4.7/5, opening scenes landed in ten of eleven, and there were zero fabricated metrics across the batch even where intake data was missing. That last result surfaced the real risk — the model reconstructs plausibly from gaps — so a hard intake gate and explicit member approval became mandatory. The client green-lit a pilot conditional on four gates: intake validation, automated quantitative checks, member approval, and press-tier source traceability.

What’s next

Build the four gates, then move from beta to the live pilot.

Source is private (client work). The panel can answer questions about this project's stack and outcomes without leaking client identity.