Inside Voice


Teaching LLMs to think before they speak by simulating an internal voice




Role: Conception, Programming and Prototyping
Duration: 3 weeks


The Space Between Prompt and Response


Months before the emergence of O1 and its sophisticated reasoning capabilities, I began exploring how to systematically improve LLM reasoning through structured internal dialogue. While O1's abilities would later demonstrate the potential for advanced reasoning in language models, this early work focused on understanding and systematically improving how models think.

From "Let's think step by step" to chain-of-thought prompting and tree-of-thought reasoning, researchers have demonstrated that how we prompt LLMs dramatically affects their reasoning capabilities. But these techniques often focus on structuring the output rather than the thinking process itself. What if we could go deeper, understanding and enhancing the actual cognitive processes within these models?


The journey from intuition to understanding: like sunlight breaking through layers of thought


Trust Falls and Book Clubs: How Humans Really Think


Ask someone "Who do you trust most in your life?" and you'll get an immediate response - a name, a face, a feeling. But ask them "Why?" and something fascinating happens. There's a pause. You can almost see their mind shifting gears as they begin to analyze years of interactions, patterns of behavior, shared experiences, moments of reliability and vulnerability. This gap - between the immediate, emotional response and the careful, reasoned analysis - reveals something profound about how our minds work.

Take another example: "What's your favorite book?" The title comes instantly, carried on a wave of emotion and memory. But explaining why that particular book matters often leads to a deeper exploration - of literary techniques, personal growth, historical context, and how it changed your worldview. The initial response is automatic; the explanation requires a different kind of thinking entirely.


Fast and Slow: The Two-Steps of Human Thought


Our minds employ two distinct thinking systems (beautifully detailed in Daniel Kahneman's "Thinking, Fast and Slow"). System I is our fast, intuitive processor - recognizing faces, understanding social situations, making split-second decisions. When you instantly know who you trust most, that's System I. System II is our analytical engine - slower, deliberative, energy-intensive. When you break down why you trust someone, examining specific instances and counterexamples - that's System II taking over. These systems work in concert: System I provides quick judgments and pattern recognition, while System II examines and refines these initial responses. Neither is superior; they serve different but equally crucial purposes in cognition.


The Power of Talking to Yourself


What's particularly fascinating is the role of internal dialogue in bridging these two modes of thought. When we "think things through," we're not just being methodical - we're literally having a conversation with ourselves. This internal dialogue serves multiple purposes:
  • It helps recruit relevant information from different parts of our memory
  • It creates structured paths for examining our own thinking
  • It forces us to articulate and therefore clarify our reasoning
  • It allows us to catch and correct our own biases and logical flaws


Breaking the Prompt Barrier


While techniques like chain-of-thought and tree-of-thought have shown the value of structured thinking, they often operate at the surface level - providing a framework for organizing outputs. What we needed was a way to engage the model's deeper cognitive processes, to activate and utilize the vast knowledge embedded in its latent space.

This led me to develop a system that mirrors this cognitive architecture. Instead of just prompting for answers, my approach creates an internal dialogue that activates both systems. The fascinating part is how this dialogue seems to work - it's not about creating consciousness or qualia, but rather about methodically activating and organizing knowledge that's embedded in the model's latent space. Here's how it looks in practice:




From Curiosity to Testing


My journey with this approach started with simple curiosity. I found myself wondering: does this internal dialogue actually improve the quality of thinking, or am I just making the process more complex? The only way to find out was to test it.

I started with informal experiments, working with a small group of colleagues (n=5) who helped evaluate the system's responses across different types of questions. We weren't aiming for rigorous academic validation - this was more like a group of curious researchers poking at an interesting idea to see what happened.
The results surprised us. While not scientifically conclusive, we noticed consistent patterns that suggested we were onto something interesting:




The most fascinating improvements weren't in the accuracy of answers but in the quality of reasoning. The system started showing something that looked remarkably like careful deliberation - catching its own biases, considering counterarguments, and often arriving at more nuanced conclusions.


Technical Implentation


The core of the implementation lies in structuring the dialogue to progressively build context and activate different aspects of the model's knowledge. Each stage of the dialogue is carefully designed to engage different processing modes:


Looking Forward


This work opens up fascinating questions about artificial intelligence and cognition. Can we develop even more sophisticated ways to engage both systems of thinking? How can we better balance intuitive and analytical processing? As models like O1 demonstrate increasingly sophisticated reasoning capabilities, understanding and enhancing their cognitive processes becomes ever more crucial.