The Polyglot Mind of AI: Why Models Show 'Thoughts' in Multiple Languages

A curious phenomenon has been noted by users of advanced AI models like OpenAI's o3-pro and Google's Gemini: the appearance of non-English words or phrases within their 'thought processes' or intermediate reasoning steps, even when the interaction is primarily in English. For instance, a model might insert Bengali or Hindi words. This has led to questions about why this occurs and whether it's an error. The consensus from the discussion suggests it's more of an inherent characteristic.

Why AI 'Thinks' in Multiple Languages

Several key factors contribute to this behavior:

It's Not Necessarily an 'Error': Many experts and users propose that this is not a bug, but rather an artifact of how these complex systems are built and operate. LLMs don't 'think' in a single, fixed language like humans often perceive themselves to do.
Multilingual by Design: These models are trained on vast datasets containing text from numerous languages. This multilingual training naturally equips them to process and generate text in various tongues.
Internal Efficiency - Tokens and Latent Space: The way LLMs represent and process information internally (their 'latent space') might mean that certain concepts or sequences are more efficiently encoded or reached using tokens from a language different from the user's input or expected output language. As one commenter put it, the model might take a different path through other character sets if the probabilities justify it during output decoding.

Insights from Research: The DeepSeek R1 Case

The DeepSeek R1 paper (arXiv:2501.12948) was frequently referenced in the discussion. Key findings from this paper include:

Observation of 'thought language mixing' as a natural occurrence in their model's reasoning process.
Experiments to suppress this language mixing actually led to a decline in the model's performance on its primary tasks.
This suggests that allowing the model to use its full multilingual capabilities internally can be beneficial for reasoning, even if the intermediate steps appear less tidy to a human observer.

Parallels with Human Multilingualism

Many contributors drew analogies to how multilingual humans often behave:

Code-Switching: Bilingual or multilingual individuals frequently switch between languages within a single conversation, sometimes even mid-sentence, especially if they know their interlocutor is also multilingual. This is often done for efficiency, to express a concept that has a more precise or concise term in another language, or because a particular topic was learned or is usually discussed in that language (e.g., technical terms often remaining in English for non-native English speakers).
Intentional Mixed-Language Interaction: Some users reported intentionally using mixed languages when interacting with models like ChatGPT and observed the models reciprocating this behavior, sometimes even adopting the secondary language for conversation titles if initiated with a non-English greeting.

The Nature of 'Thought Process' Displays

It's also important to consider what is being shown when a model displays its 'thoughts.'

One commenter, NoahZuniga, suggested that for models like o3-pro, the visible 'thought process' is actually a summary generated by another model. If this summarization model makes an error or is less constrained, it might pass through or even introduce multilingual elements.
OpenAI has previously mentioned using summarizer or sanitizer models for Chain-of-Thought traces before displaying them to users. Relaxing these sanitization steps could also lead to more raw, multilingual outputs becoming visible.

Why Specific Languages like Hindi/Bengali?

The prominence of languages like Hindi and Bengali in these observations was attributed to:

Large Volume of Training Data: The internet contains substantial amounts of text in these languages.
Bilingual User Base and Data: India, for example, has a large population of bilingual (or multilingual) English speakers who also use Hindi, Bengali, and other regional languages. Their online interactions, which form part of the training data, naturally include code-switching.

Broader Implications and Perspectives

Linguistic Relativity: Some pondered if LLMs are, in a sense, discovering a form of linguistic relativity, where certain concepts are 'easier' or more natural to express or process in one language over another. One user noted getting more 'colorful' answers in Finnish about specific niche topics.
A Contrasting View: At least one commenter expressed concern, viewing this as a problem of control. They argued that if an LLM incorporates unwanted linguistic behaviors, it's difficult to make it 'unlearn' them, unlike deleting data from a traditional database.
Research Opportunities: Studying these multilingual reasoning patterns could offer insights into how LLMs represent concepts across languages and potentially even into language-specific or culture-specific reasoning patterns.

In conclusion, the appearance of non-English 'thoughts' in LLMs is a complex behavior stemming from their training, architecture, and optimization goals. While it can be surprising, it's often a sign of the model leveraging its full capabilities. Further research and transparency from model developers will continue to shed light on these fascinating internal workings.