1 · A fast talker, a slow reasoner
The model on the latency clock cannot also be the model that deliberates. Kahneman's dual-process theory, built as an agent.
2024Google DeepMind
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture
A fast conversational "Talker" paired with a slower "Reasoner" that plans and maintains the beliefs the Talker acts on. The closest published precedent for this product.
In Supafone Labs: your voice agent is the Talker; the oracle is the Reasoner; the belief state is the shared memory.
2020IBM Research
Thinking Fast and Slow in AI
The standard position paper mapping System 1 / System 2 onto AI design: fast reactive components, slow deliberative supervision.
2 · Supervision must be external
A model grading its own live output — with no outside signal — often makes things worse. That's the argument for a second mind, not a longer prompt.
2023ICLR 2024the key negative result
Large Language Models Cannot Self-Correct Reasoning Yet
Intrinsic self-correction without external feedback frequently degrades performance. External feedback is what makes correction work.
In Supafone Labs: the supervisor is a separate model with separate context and ground-truth tool results — exactly the external signal this paper says is required.
2023CMU / AI2
Self-Refine: Iterative Refinement with Self-Feedback
Iterated feedback loops improve outputs across tasks — the refine-from-feedback mechanism the supervisor applies from outside.
2023Northeastern / MIT
Reflexion: Language Agents with Verbal Reinforcement Learning
Agents that store verbal reflections on failures and improve later attempts — the memory-of-mistakes pattern behind post-call reports.
3 · A second model checking the first works
Across math, reasoning, and safety: a dedicated checker consistently catches what the generator misses — even when the checker is smaller.
2021OpenAI
Training Verifiers to Solve Math Word Problems
A separate verifier ranking a generator's outputs beats just making the generator bigger — the original generator/verifier split.
2023OpenAI
Let's Verify Step by Step
Supervising each step of a process beats judging only the final outcome.
In Supafone Labs: the oracle judges every turn of the call, not just the end-of-call summary.
2025OpenAI
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
A weaker LLM monitoring a stronger model's chain-of-thought catches reward hacking at production scale.
In Supafone Labs: the oracle can be a small, cheap model and still catch a bigger agent's failures.
2023Meta
Shepherd: A Critic for Language Model Generation
A purpose-trained 7B critic that critiques other models' outputs at quality competitive with far larger judges.
4 · Models overseeing models, at inference time
Including Sakana AI's adjacent work: models correcting and building on each other with no retraining beats any single model alone.
2025Sakana AI
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
AB-MCTS (open-sourced as TreeQuest): multiple frontier models cooperate at inference time, correcting each other's attempts — the combination beat every individual model on ARC-AGI-2.
Announcement.
In Supafone Labs: the same principle at call time — two models on one problem beat one model alone.
2025Sakana AI
Reinforcement Learning Teachers of Test Time Scaling
A "teacher" model trained not to solve tasks but to guide a student model — Sakana's closest work to a dedicated helper-model role.
2025Sakana AI · Institute of Science Tokyo
Transformer²: Self-adaptive LLMs
A two-pass inference framework: a dispatcher identifies the task, then RL-trained "expert" vectors reweight the model's singular values in real time — the model adapts itself per request, beating LoRA with fewer parameters.
In Supafone Labs: adaptation at inference time, not training time — the same philosophy as whisper injection.
2024Sakana AI
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Includes an automated LLM reviewer critiquing another model's generated papers with near-human accuracy.
v2 (2025) adds agentic tree search.
2023MIT / Google
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Model instances critiquing each other's answers measurably improves factuality — disagreement between models is signal.
2018OpenAI
AI safety via debate
Agents exposing each other's flaws for a judge — the foundational adversarial-oversight framing.
2022Anthropic
Constitutional AI: Harmlessness from AI Feedback
Models critiquing and revising outputs against explicit written principles — the pattern behind operator guardrails enforced by a critic.
5 · Prompts that improve from measured feedback
Score real outcomes, hand them to a critic, get a better prompt, version it. The standing-directive loop is this lineage, run on live calls.
2025Sakana AI · UBC
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
Agents that rewrite their own code, keep the variants that empirically perform better, and improve open-endedly — SWE-bench 20.0% → 50.0%, Polyglot 14.2% → 30.7%. The strongest evidence yet that agents improve when graded against real outcomes.
In Supafone Labs: the same evolutionary shape at prompt scale — versioned standing directives, kept only when measured call scores improve.
2023Google DeepMind
Large Language Models as Optimizers
OPRO: an LLM iteratively proposes better prompts from a trajectory of scored attempts.
In Supafone Labs: /v1/optimizer/improve is OPRO over your post-call reports — scores in, better standing directive out.
2023Stanford
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Prompts as parameters optimized against metrics, not hand-tuned strings.
2024Stanford
TextGrad: Automatic "Differentiation" via Text
Natural-language feedback backpropagated as "textual gradients" through compound AI systems.
2023Microsoft
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
ProTeGi: critiques of concrete errors act as gradients that edit the prompt.
6 · The industry already puts a model beside the model
Production guardrail systems converged on the same shape: a separate runtime component watching the live model's traffic.
2023NVIDIA
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
A programmable runtime layer between the user and the LLM enforcing dialogue rails.
2023Meta
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
A dedicated safeguard model classifying a conversational model's traffic in real time.
In Supafone Labs: the same runtime position — but instead of blocking, it coaches.