Research8h ago

AI Researchers Uncover How Chatbots Perceive Their Own Thoughts vs. Yours

LessWrongJune 22, 20261 min brief

In brief

AI researchers have made a significant discovery about how large language models (LLMs) distinguish between their own thoughts and the words of others in a conversation.
By examining the structure of inputs that these models receive, they found that everything an LLM processes-whether it's a user's message, its own previous responses, or even tool outputs-is just a single continuous string of text.
- This means the model doesn't have a separate memory like humans do; instead, it relies on this stream to generate its responses.
The researchers highlighted how modifying this input string can drastically change an LLM's behavior.
For instance, deleting a turn in the conversation or rewriting previous messages alters the model's "memories." This understanding has important implications for both security and the development of more reliable AI systems.
- It also opens new avenues for exploring how these models process roles and interactions within conversations.
Looking ahead, this research could lead to better ways to control and secure AI systems against manipulation.
By understanding how LLMs perceive their own thoughts versus external input, developers can create safeguards against potential vulnerabilities and build more transparent AI tools.

Read full story at LessWrong →

More briefs