Supafone Labs Docs
Why Supafone Labs existsTL;DR
A voice agent is one mind on a stopwatch. To sound human it answers in under a second — so the model that talks can never afford to think. Yet everything that decides whether a call succeeds is thinking: hearing distress in a voice, catching a switch to Spanish, stopping the agent from confirming a booking whose API call silently failed. That's an architecture problem, not a prompt problem.
Supafone Labs is the supervisor with a headset: a slower second mind that rides beside the call instead of inside its latency budget, reads every turn, and slides a silent note back through your platform's own channel. The caller never hears it. The agent reads it mid-call.
This isn't a hunch — it's an assembly of peer-reviewed results: dual-process talker/reasoner agents (DeepMind), the finding that models can't reliably self-correct — so the supervisor must be external — generator/verifier splits (Cobbe 2021, Lightman 2023, Baker 2025), inference-time multi-model oversight (Sakana AB-MCTS), and feedback-driven prompt optimization (OPRO, DSPy, TextGrad). All 24 papers, annotated: the research page · the full synthesis is the whitepaper (PDF).
The whole integration is three steps: ① get a key, ② feed Supafone Labs your platform's events, ③ deliver the whisper it hands back. First-class Python and TypeScript. That's it — everything below is detail.
The Supafone Labs class
Everything is one object. You describe the shape of events coming in (the ears) and the platform to whisper out to (the mouth); the reasoning core between them is identical on every stack.
import supafone_labs
brain = supafone_labs.Supafone Labs(
provider="deepgram", # EARS — shape of the raw events coming IN
inject_via="ultravox", # MOUTH — native format to compile whispers OUT to
scenario="legal_intake", # a named prompt profile layered under the reasoning core
oracle_instructions="Never quote fees. Acknowledge injury before logistics.",
mode="return", # observe() returns the action; it does not auto-deliver it
)
result = await brain.observe(raw_event) # feed one platform event
whisper = result.actions[0] if result.actions else None
# deliver `whisper` through your platform's channel — or set mode="apply" to auto-inject
The algorithm, in five steps — the same loop for all 13 platforms:
raw event ─▶ ingest adapter parses that provider's format (EARS · provider=)
─▶ reasoning core belief_prompt → perception state
directive_prompt → coaching decision (identical everywhere)
─▶ egress adapter compiles to the native channel (MOUTH · inject_via=)
─▶ result.actions[0] the silent whisper — never spoken
Full architecture diagram (SVG) →
| Parameter | What it controls |
|---|---|
provider | Ears. Which adapter parses the raw events you feed observe(). Auto-detected from the agent object when omitted. |
inject_via | Mouth. The platform whose native channel the whisper is compiled to. Defaults to provider — set it when you tap one stack (e.g. a Deepgram audio fork) but inject into another (e.g. Ultravox). |
scenario | A named prompt profile (legal_intake, medical_frontdesk, sales_outbound, support, generic) layered under the reasoning core as guardrails. |
oracle_instructions | Free-text coaching rules folded into the directive prompt — your firm's non-negotiables. |
mode | "apply" (default) auto-injects through the resolved channel; "return" hands you result.actions to deliver yourself. |
oracle_model | Any claude-* / gpt-* / grok-* id, or a hosted alias. The serving vendor is inferred from the prefix. |
Same code, three modes: with SUPAFONE_LABS_API_KEY the oracle, TTS, and multilingual STT run hosted; without it they run on your own vendor keys; with neither, on deterministic offline fakes. Per-platform examples below.
Quickstart
# 1. pip install supafone-labs[all] # 2. three lines: import supafone_labs brain = supafone_labs.supercharge(my_agent) # platform auto-detected result = await brain.observe(raw_event) # feed your platform's events # 3. result.actions[0] is the ready-to-send native whisper. Deliver it. Done.
// No SDK needed — it's one fetch.
const API = "https://api.labs.supafone.ai";
const { text } = await fetch(`${API}/v1/oracle/complete`, {
method: "POST",
headers: { Authorization: `Bearer ${process.env.SUPAFONE_LABS_API_KEY}`,
"Content-Type": "application/json" },
body: JSON.stringify({
model: "supafone-labs-oracle",
messages: [
{ role: "system", content: "You are a silent supervisor for a live voice agent. Return ONE short correction." },
{ role: "user", content: transcriptSoFar },
],
}),
}).then(r => r.json());
// `text` is the whisper — inject it via your platform's silent channel.Auth & free minutes
Every request authenticates with Authorization: Bearer sl_live_…. Self-serve signup grants 5 free minutes, no card. The key is the account — check balance, usage, and logs in the console.
Billing by the minute
| Action | Billed |
|---|---|
| Oracle completion | 1 second per call |
| TTS | ≈ seconds of speech produced (chars ÷ 15) |
| Prerecorded STT | audio duration |
| Live STT session | wall-clock session time |
Top up with a $10 / 400-minute pack or subscribe for $49 / 2,000 minutes monthly. BYO vendor keys always work in the open-source package — the cloud is convenience, not lock-in.
POST /v1/signup
curl -X POST $API/v1/signup -H "Content-Type: application/json" \
-d '{"email": "you@company.com"}'
# -> { "key": "sl_live_…", "free_minutes": 5.0 }
POST /v1/oracle/complete
Hosted LLM completion, routed by model. supafone-labs-oracle always resolves to the current best default; or name any Claude / GPT / Grok model — routing is by prefix, so new models work the day they ship. GET /v1/models lists what's live (fetched hourly from each vendor, never stale).
# Python
import httpx
r = httpx.post(f"{API}/v1/oracle/complete",
headers={"Authorization": f"Bearer {KEY}"},
json={"model": "claude-sonnet-4-6", "messages": [...]})
directive = r.json()["text"]
POST /v1/tts
Four engines, one namespace: supafone-labs-calm-en (and friends), any aura-* voice, or engine:voice_id for deepgram, cartesia, elevenlabs, inworld. GET /v1/voices lists the catalog.
# TypeScript
const audio = await fetch(`${API}/v1/tts`, {
method: "POST",
headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({ voice: "elevenlabs:", text: "Right away." }),
}).then(r => r.arrayBuffer());
POST /v1/stt
# by URL
curl -X POST $API/v1/stt -H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" -d '{"url": "https://…/call.wav"}'
# or raw bytes
curl -X POST $API/v1/stt -H "Authorization: Bearer $KEY" \
-H "Content-Type: audio/wav" --data-binary @call.wav
WS /v1/stt/live — live multilingual transcription
The Deepgram nova-3 language=multi tap with zero Deepgram account: stream audio in, get language-tagged Results out. The Python package uses this automatically when SUPAFONE_LABS_API_KEY is set and no DEEPGRAM_API_KEY is present.
# TypeScript
import WebSocket from "ws";
const ws = new WebSocket(
`wss://api.labs.supafone.ai/v1/stt/live?api_key=${KEY}` +
`&language=multi&encoding=linear16&sample_rate=16000`);
ws.on("message", (m) => {
const d = JSON.parse(m.toString());
const alt = d.channel?.alternatives?.[0];
if (alt?.transcript) console.log(alt.languages, alt.transcript, d.is_final);
});
// then: ws.send(pcmChunk) per audio frame
# Python — the package does the wiring from supafone_labs.stt import MultilingualCallTap tap = MultilingualCallTap(brain, session_id=call_id) await tap.feed(track="inbound", payload_b64=frame) # Twilio media frame
Usage, balance, and the whisper audit log
GET /v1/usage # today's request counts GET /v1/billing/balance # minutes remaining + top-up links GET /v1/logs?limit=100 # every whispered instruction / spoken line / transcript, timestamped + billed
The logs are the audit trail: when your second mind whispered "Caller is distressed — acknowledge the injury before logistics", that exact text is in /v1/logs with a timestamp and its cost. The console renders it live.
Framework permutations
Every integration is the same three lines — construct, observe, act on the returned actions. What changes per platform is only which raw events you feed and where you deliver the compiled action. Full runnable files live in examples/.
Ultravox (speech-to-speech)
brain = supafone_labs.SupafoneLabs(provider="ultravox") result = await brain.observe(ws_event) # their WS data messages # deliver: result.actions[0] -> inject_message on the Ultravox call WS
Vapi (server webhook)
@app.post("/vapi/webhook")
async def vapi(payload: dict):
result = await brain.observe(payload) # {"message": {...}} envelope
if result.actions: # reply with assistant_override
return {"assistant": {"firstMessageMode": "assistant-speaks-first",
"model": {"messages": [{"role": "system",
"content": result.actions[0].payload["instruction"]}]}}}
return {}
Retell (custom-LLM websocket)
# inside your /llm-websocket/{call_id} handler
result = await brain.observe(msg) # interaction_type messages
if result.actions: # prepend before your LLM turn
llm_messages.insert(0, result.actions[0].payload) # {"role":"system","content":...}
ElevenLabs Agents
result = await brain.observe(frame) # user_transcript / agent_response
if result.actions: # their native silent channel
await ws.send(json.dumps(result.actions[0].payload)) # {"type":"contextual_update",...}
Deepgram Voice Agent
result = await brain.observe(msg) # ConversationText / FunctionCallRequest
if result.actions:
await agent_ws.send(json.dumps(result.actions[0].payload)) # {"type":"UpdatePrompt",...}
OpenAI GPT-Realtime / xAI Grok (S2S)
result = await brain.observe(event) # GA + beta event names both parse
if result.actions: # patch live instructions
await ws.send(json.dumps({"type": "session.update",
"session": {"instructions": base_prompt + "\n" +
result.actions[0].payload["instructions_append"]}}))
Pipecat (frame pipeline)
result = await brain.observe({"frame": "TranscriptionFrame", "text": f.text, ...})
if result.actions: # push a context frame
await task.queue_frames([LLMMessagesAppendFrame(
messages=result.actions[0].payload["messages"], run_llm=False)])
LiveKit Agents
@session.on("user_input_transcribed")
def on_transcribed(ev):
asyncio.create_task(handle(ev))
async def handle(ev):
result = await brain.observe({"type": "user_input_transcribed",
"transcript": ev.transcript, "is_final": ev.is_final})
if result.actions:
chat_ctx.add_message(**result.actions[0].payload) # role=system append
Raw audio / any SIP trunk (the tap)
tap = MultilingualCallTap(brain, session_id=call_sid) # nova-3 multi, both tracks await tap.feed(track=frame["track"], payload_b64=frame["payload"])
Anything else
brain = supafone_labs.SupafoneLabs(provider="my_platform") # unknown -> generic adapter
await brain.observe({"type": "message", "role": "user", "text": "..."} )
SIP providers
Supafone Labs is transport-agnostic — it needs either a transcript stream or forkable call audio, and every serious SIP/telephony layer exposes one:
| Provider | How Supafone Labs taps it |
|---|---|
| Twilio | <Stream track="both_tracks"> media streams → MultilingualCallTap (the production-proven path) |
| Telnyx | Media streaming API (WS fork) → tap |
| SignalWire | SWML/LaML media streams → tap |
| Vonage | Voice API websocket audio → tap |
| Plivo | AudioStream → tap |
| LiveKit SIP | SIP → LiveKit room → the LiveKit adapter directly (no extra STT needed) |
| Jambonz | listen/transcribe verbs → tap or generic webhook adapter |
| FreeSWITCH / Asterisk | mod_audio_fork / ARI externalMedia → tap |
| SIPREC-capable PBXes | SIPREC recording fork → tap (both legs arrive pre-separated) |
Rule of thumb: if your voice-AI platform sits on the SIP leg (Vapi, Retell, Bland all do), use its adapter and skip the tap. If you own the SIP leg yourself, fork the audio to the tap.
Pick your second-brain model
brain = supafone_labs.SupafoneLabs(
provider="ultravox",
oracle_model="claude-sonnet-4-6", # provider inferred automatically
)
# or "gpt-4.1-mini" (OpenAI), "grok-4-fast" (xAI), "supafone-labs-oracle" (hosted)
# live discovery — never hardcode a model list:
models = await supafone_labs.discover_oracle_models()
# queries Anthropic/OpenAI/xAI/hosted /v1/models with YOUR keys, cached hourly
Routing is by model-id prefix, so a model released tomorrow works without updating the package. The static table in config.py is an offline fallback only.
Prompts that always improve
Every finished call is scored against ground truth — did the booking the agent confirmed actually verify? Did the intake form actually send? Were the end-of-call claims backed by tool results? Reports flow in automatically; one call to the optimizer turns them into a better standing directive — a persistent coaching preamble injected into every future call. Your base prompt is never touched.
# reports happen automatically on session end (agent_label groups them)
brain = supafone_labs.SupafoneLabs(provider="vapi", agent_label="intake")
# run one optimization step whenever you like (billed as one oracle call):
curl -X POST $API/v1/optimizer/improve -H "Authorization: Bearer $KEY" \
-d '{"agent": "intake"}'
# -> {"version": 4, "text": "Verify every booking before confirming. ...",
# "rationale": "Recent calls confirmed unverified bookings."}
# the package picks the new version up automatically on the next call
GET /v1/optimizer/standing?agent=intake # current version
GET /v1/optimizer/reports?agent=intake # the scored call history
Why this design: your system prompt lives wherever it lives — a Vapi dashboard field, a code template, a DB row. Supafone Labs never needs to own it, because the whisper channel is a prompt surface it already controls on every platform. The optimizer improves that layer; your prompt stays yours.
Custom prompts
brain = supafone_labs.SupafoneLabs(
provider="ultravox",
oracle_instructions="Coach for a bilingual personal-injury intake desk. "
"Prioritize empathy before logistics; never quote fees.",
# or replace the cores wholesale:
directive_prompt="You are the coaching core... Return ONLY JSON with keys ...",
belief_prompt="You are the perception core... Return ONLY JSON with keys ...",
)
oracle_instructions appends operator guidance to both cores; the full prompt overrides are for when you want a different oracle personality entirely. Custom prompts, custom model, custom guardrails — same degrade-safe loop.