Supafone Labs Docs

Why Supafone Labs existsTL;DR

A voice agent is one mind on a stopwatch. To sound human it answers in under a second — so the model that talks can never afford to think. Yet everything that decides whether a call succeeds is thinking: hearing distress in a voice, catching a switch to Spanish, stopping the agent from confirming a booking whose API call silently failed. That's an architecture problem, not a prompt problem.

Supafone Labs is the supervisor with a headset: a slower second mind that rides beside the call instead of inside its latency budget, reads every turn, and slides a silent note back through your platform's own channel. The caller never hears it. The agent reads it mid-call.

No prompt surgeryPrompts freeze at call-start; the moment that matters is the one they didn't anticipate. Supafone Labs corrects the agent live — no redeploys.

Every platform, one contractOne canonical event in, one whisper out, compiled to whatever you run — 13 adapters across S2S, pipeline, and framework stacks.

It improves itselfCalls are scored against ground truth; the optimizer rewrites a standing directive it owns on every platform. Your base prompt is never touched.

Degrade-safe by constructionThe oracle runs behind a timeout, off the hot path. A stalled model yields no note and the call proceeds exactly as before. Tested, not promised.

This isn't a hunch — it's an assembly of peer-reviewed results: dual-process talker/reasoner agents (DeepMind), the finding that models can't reliably self-correct — so the supervisor must be external — generator/verifier splits (Cobbe 2021, Lightman 2023, Baker 2025), inference-time multi-model oversight (Sakana AB-MCTS), and feedback-driven prompt optimization (OPRO, DSPy, TextGrad). All 24 papers, annotated: the research page · the full synthesis is the whitepaper (PDF).

The whole integration is three steps: ① get a key, ② feed Supafone Labs your platform's events, ③ deliver the whisper it hands back. First-class Python and TypeScript. That's it — everything below is detail.

The Supafone Labs class

Everything is one object. You describe the shape of events coming in (the ears) and the platform to whisper out to (the mouth); the reasoning core between them is identical on every stack.

import supafone_labs

brain = supafone_labs.Supafone Labs(
    provider="deepgram",       # EARS — shape of the raw events coming IN
    inject_via="ultravox",     # MOUTH — native format to compile whispers OUT to
    scenario="legal_intake",   # a named prompt profile layered under the reasoning core
    oracle_instructions="Never quote fees. Acknowledge injury before logistics.",
    mode="return",             # observe() returns the action; it does not auto-deliver it
)

result = await brain.observe(raw_event)   # feed one platform event
whisper = result.actions[0] if result.actions else None
# deliver `whisper` through your platform's channel — or set mode="apply" to auto-inject

The algorithm, in five steps — the same loop for all 13 platforms:

raw event  ─▶  ingest adapter   parses that provider's format        (EARS · provider=)
           ─▶  reasoning core   belief_prompt  → perception state
                                directive_prompt → coaching decision   (identical everywhere)
           ─▶  egress adapter   compiles to the native channel        (MOUTH · inject_via=)
           ─▶  result.actions[0]  the silent whisper — never spoken

Full architecture diagram (SVG) →

Parameter	What it controls
`provider`	Ears. Which adapter parses the raw events you feed `observe()`. Auto-detected from the agent object when omitted.
`inject_via`	Mouth. The platform whose native channel the whisper is compiled to. Defaults to `provider` — set it when you tap one stack (e.g. a Deepgram audio fork) but inject into another (e.g. Ultravox).
`scenario`	A named prompt profile (`legal_intake`, `medical_frontdesk`, `sales_outbound`, `support`, `generic`) layered under the reasoning core as guardrails.
`oracle_instructions`	Free-text coaching rules folded into the directive prompt — your firm's non-negotiables.
`mode`	`"apply"` (default) auto-injects through the resolved channel; `"return"` hands you `result.actions` to deliver yourself.
`oracle_model`	Any `claude-` / `gpt-` / `grok-*` id, or a hosted alias. The serving vendor is inferred from the prefix.

Same code, three modes: with SUPAFONE_LABS_API_KEY the oracle, TTS, and multilingual STT run hosted; without it they run on your own vendor keys; with neither, on deterministic offline fakes. Per-platform examples below.

Quickstart

# 1. pip install supafone-labs[all]
# 2. three lines:
import supafone_labs

brain = supafone_labs.supercharge(my_agent)      # platform auto-detected
result = await brain.observe(raw_event)       # feed your platform's events
# 3. result.actions[0] is the ready-to-send native whisper. Deliver it. Done.

// No SDK needed — it's one fetch.
const API = "https://api.labs.supafone.ai";

const { text } = await fetch(`${API}/v1/oracle/complete`, {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.SUPAFONE_LABS_API_KEY}`,
             "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "supafone-labs-oracle",
    messages: [
      { role: "system", content: "You are a silent supervisor for a live voice agent. Return ONE short correction." },
      { role: "user", content: transcriptSoFar },
    ],
  }),
}).then(r => r.json());

// `text` is the whisper — inject it via your platform's silent channel.

Auth & free minutes

Every request authenticates with Authorization: Bearer sl_live_…. Self-serve signup grants 5 free minutes, no card. The key is the account — check balance, usage, and logs in the console.

Billing by the minute

Action	Billed
Oracle completion	1 second per call
TTS	≈ seconds of speech produced (chars ÷ 15)
Prerecorded STT	audio duration
Live STT session	wall-clock session time

Top up with a $10 / 400-minute pack or subscribe for $49 / 2,000 minutes monthly. BYO vendor keys always work in the open-source package — the cloud is convenience, not lock-in.

curl -X POST $API/v1/signup -H "Content-Type: application/json" \
  -d '{"email": "you@company.com"}'
# -> { "key": "sl_live_…", "free_minutes": 5.0 }

POST /v1/oracle/complete

Hosted LLM completion, routed by model. supafone-labs-oracle always resolves to the current best default; or name any Claude / GPT / Grok model — routing is by prefix, so new models work the day they ship. GET /v1/models lists what's live (fetched hourly from each vendor, never stale).

# Python
import httpx
r = httpx.post(f"{API}/v1/oracle/complete",
    headers={"Authorization": f"Bearer {KEY}"},
    json={"model": "claude-sonnet-4-6", "messages": [...]})
directive = r.json()["text"]

POST /v1/tts

Four engines, one namespace: supafone-labs-calm-en (and friends), any aura-* voice, or engine:voice_id for deepgram, cartesia, elevenlabs, inworld. GET /v1/voices lists the catalog.

# TypeScript
const audio = await fetch(`${API}/v1/tts`, {
  method: "POST",
  headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
  body: JSON.stringify({ voice: "elevenlabs:", text: "Right away." }),
}).then(r => r.arrayBuffer());

POST /v1/stt

# by URL
curl -X POST $API/v1/stt -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" -d '{"url": "https://…/call.wav"}'
# or raw bytes
curl -X POST $API/v1/stt -H "Authorization: Bearer $KEY" \
  -H "Content-Type: audio/wav" --data-binary @call.wav

WS /v1/stt/live — live multilingual transcription

The Deepgram nova-3 language=multi tap with zero Deepgram account: stream audio in, get language-tagged Results out. The Python package uses this automatically when SUPAFONE_LABS_API_KEY is set and no DEEPGRAM_API_KEY is present.

# TypeScript
import WebSocket from "ws";
const ws = new WebSocket(
  `wss://api.labs.supafone.ai/v1/stt/live?api_key=${KEY}` +
  `&language=multi&encoding=linear16&sample_rate=16000`);
ws.on("message", (m) => {
  const d = JSON.parse(m.toString());
  const alt = d.channel?.alternatives?.[0];
  if (alt?.transcript) console.log(alt.languages, alt.transcript, d.is_final);
});
// then: ws.send(pcmChunk)  per audio frame

# Python — the package does the wiring
from supafone_labs.stt import MultilingualCallTap
tap = MultilingualCallTap(brain, session_id=call_id)
await tap.feed(track="inbound", payload_b64=frame)   # Twilio media frame

Usage, balance, and the whisper audit log

GET /v1/usage            # today's request counts
GET /v1/billing/balance  # minutes remaining + top-up links
GET /v1/logs?limit=100   # every whispered instruction / spoken line / transcript, timestamped + billed

The logs are the audit trail: when your second mind whispered "Caller is distressed — acknowledge the injury before logistics", that exact text is in /v1/logs with a timestamp and its cost. The console renders it live.

Framework permutations

Every integration is the same three lines — construct, observe, act on the returned actions. What changes per platform is only which raw events you feed and where you deliver the compiled action. Full runnable files live in examples/.

Ultravox (speech-to-speech)

brain = supafone_labs.SupafoneLabs(provider="ultravox")
result = await brain.observe(ws_event)          # their WS data messages
# deliver: result.actions[0] -> inject_message on the Ultravox call WS

Vapi (server webhook)

@app.post("/vapi/webhook")
async def vapi(payload: dict):
    result = await brain.observe(payload)        # {"message": {...}} envelope
    if result.actions:                           # reply with assistant_override
        return {"assistant": {"firstMessageMode": "assistant-speaks-first",
                              "model": {"messages": [{"role": "system",
                                        "content": result.actions[0].payload["instruction"]}]}}}
    return {}

Retell (custom-LLM websocket)

# inside your /llm-websocket/{call_id} handler
result = await brain.observe(msg)                # interaction_type messages
if result.actions:                               # prepend before your LLM turn
    llm_messages.insert(0, result.actions[0].payload)   # {"role":"system","content":...}

ElevenLabs Agents

result = await brain.observe(frame)              # user_transcript / agent_response
if result.actions:                               # their native silent channel
    await ws.send(json.dumps(result.actions[0].payload))  # {"type":"contextual_update",...}

Deepgram Voice Agent

result = await brain.observe(msg)                # ConversationText / FunctionCallRequest
if result.actions:
    await agent_ws.send(json.dumps(result.actions[0].payload))  # {"type":"UpdatePrompt",...}

OpenAI GPT-Realtime / xAI Grok (S2S)

result = await brain.observe(event)              # GA + beta event names both parse
if result.actions:                               # patch live instructions
    await ws.send(json.dumps({"type": "session.update",
        "session": {"instructions": base_prompt + "\n" +
                    result.actions[0].payload["instructions_append"]}}))

Pipecat (frame pipeline)

result = await brain.observe({"frame": "TranscriptionFrame", "text": f.text, ...})
if result.actions:                               # push a context frame
    await task.queue_frames([LLMMessagesAppendFrame(
        messages=result.actions[0].payload["messages"], run_llm=False)])

LiveKit Agents

@session.on("user_input_transcribed")
def on_transcribed(ev):
    asyncio.create_task(handle(ev))
async def handle(ev):
    result = await brain.observe({"type": "user_input_transcribed",
                                  "transcript": ev.transcript, "is_final": ev.is_final})
    if result.actions:
        chat_ctx.add_message(**result.actions[0].payload)   # role=system append

Raw audio / any SIP trunk (the tap)

tap = MultilingualCallTap(brain, session_id=call_sid)   # nova-3 multi, both tracks
await tap.feed(track=frame["track"], payload_b64=frame["payload"])

Anything else

brain = supafone_labs.SupafoneLabs(provider="my_platform")   # unknown -> generic adapter
await brain.observe({"type": "message", "role": "user", "text": "..."} )

SIP providers

Supafone Labs is transport-agnostic — it needs either a transcript stream or forkable call audio, and every serious SIP/telephony layer exposes one:

Provider	How Supafone Labs taps it
Twilio	`<Stream track="both_tracks">` media streams → `MultilingualCallTap` (the production-proven path)
Telnyx	Media streaming API (WS fork) → tap
SignalWire	SWML/LaML media streams → tap
Vonage	Voice API websocket audio → tap
Plivo	AudioStream → tap
LiveKit SIP	SIP → LiveKit room → the LiveKit adapter directly (no extra STT needed)
Jambonz	listen/transcribe verbs → tap or generic webhook adapter
FreeSWITCH / Asterisk	mod_audio_fork / ARI externalMedia → tap
SIPREC-capable PBXes	SIPREC recording fork → tap (both legs arrive pre-separated)

Rule of thumb: if your voice-AI platform sits on the SIP leg (Vapi, Retell, Bland all do), use its adapter and skip the tap. If you own the SIP leg yourself, fork the audio to the tap.

Pick your second-brain model

brain = supafone_labs.SupafoneLabs(
    provider="ultravox",
    oracle_model="claude-sonnet-4-6",     # provider inferred automatically
)
# or "gpt-4.1-mini" (OpenAI), "grok-4-fast" (xAI), "supafone-labs-oracle" (hosted)

# live discovery — never hardcode a model list:
models = await supafone_labs.discover_oracle_models()
# queries Anthropic/OpenAI/xAI/hosted /v1/models with YOUR keys, cached hourly

Routing is by model-id prefix, so a model released tomorrow works without updating the package. The static table in config.py is an offline fallback only.

Prompts that always improve

Every finished call is scored against ground truth — did the booking the agent confirmed actually verify? Did the intake form actually send? Were the end-of-call claims backed by tool results? Reports flow in automatically; one call to the optimizer turns them into a better standing directive — a persistent coaching preamble injected into every future call. Your base prompt is never touched.

# reports happen automatically on session end (agent_label groups them)
brain = supafone_labs.SupafoneLabs(provider="vapi", agent_label="intake")

# run one optimization step whenever you like (billed as one oracle call):
curl -X POST $API/v1/optimizer/improve -H "Authorization: Bearer $KEY" \
  -d '{"agent": "intake"}'
# -> {"version": 4, "text": "Verify every booking before confirming. ...",
#     "rationale": "Recent calls confirmed unverified bookings."}

# the package picks the new version up automatically on the next call
GET /v1/optimizer/standing?agent=intake   # current version
GET /v1/optimizer/reports?agent=intake    # the scored call history

Why this design: your system prompt lives wherever it lives — a Vapi dashboard field, a code template, a DB row. Supafone Labs never needs to own it, because the whisper channel is a prompt surface it already controls on every platform. The optimizer improves that layer; your prompt stays yours.

Custom prompts

brain = supafone_labs.SupafoneLabs(
    provider="ultravox",
    oracle_instructions="Coach for a bilingual personal-injury intake desk. "
                        "Prioritize empathy before logistics; never quote fees.",
    # or replace the cores wholesale:
    directive_prompt="You are the coaching core... Return ONLY JSON with keys ...",
    belief_prompt="You are the perception core... Return ONLY JSON with keys ...",
)

oracle_instructions appends operator guidance to both cores; the full prompt overrides are for when you want a different oracle personality entirely. Custom prompts, custom model, custom guardrails — same degrade-safe loop.