Concept

Toolformer

A model trained to teach itself to use external tools - an early demonstration that language models could learn when and how to call APIs without explicit human labelling of when tool use helps.

Added May 18, 2026

Language models have a fundamental limitation: their knowledge is frozen at training time. They cannot look up current information, perform precise calculations, or access external systems. The standard solution is to give them tools - search engines, calculators, code interpreters - and train them to use those tools. But training this behaviour traditionally required large amounts of human-labelled data specifying exactly when and how to use each tool.

Toolformer, published by Meta Research in 2023, demonstrated a more elegant approach: the model learns to annotate its own training data with tool calls. The process starts by generating potential tool invocations - places in the training text where calling a tool might help. For each potential call, the model actually executes it (calling a calculator, searching Wikipedia, etc.) and checks whether the result improves the next-word prediction probability for the surrounding text. If inserting the tool call and its result makes the model better at predicting the continuation, the augmented version is kept as training data.

Through this self-supervised process, Toolformer learns which situations benefit from each tool without any human labelling of when to use tools. It discovers that questions about current events benefit from search, that numerical calculations benefit from the calculator, that date arithmetic benefits from a calendar tool, and so on.

The model that results is better at answering factual questions, more accurate at arithmetic, and able to retrieve current information - all while remaining a general language model. The tool calls are interspersed naturally in the text generation process, with the model learning to invoke tools seamlessly as part of generating a response.

Toolformer was influential less as a deployed system and more as a proof of concept: language models can learn tool use through their own exploration rather than requiring human supervision. This insight informed much of the subsequent work on function calling and tool-augmented language models that led to the agentic AI systems being built today.

Analogy

A student who discovers for themselves when to use reference books. Instead of being told "use the encyclopaedia for historical facts and the calculator for maths," they try both approaches on practice problems and notice which ones give better results. Over time they develop intuition for which tool fits which situation - and they learned this without a teacher explicitly labelling every example.

Real-world example

When Toolformer was tested on arithmetic questions, question answering, and commonsense reasoning, it outperformed much larger models that did not have tool access. A 6.7-billion parameter Toolformer beat GPT-3 (175 billion parameters) on several benchmarks by being able to call a calculator or look things up, rather than relying on knowledge stored in parameters. This demonstrated that tool access can partially substitute for scale.

Why it matters

Toolformer's approach to self-supervised tool learning directly influenced how modern AI systems learn to use tools. The principle that a model can generate its own training signal for tool use - by observing whether tool calls improve its predictions - is now embedded in various alignment and fine-tuning techniques used to teach models to call functions reliably.

In the news

Related concepts

Agentic AI Foundation Model MCP (Model Context Protocol)

← Back to concepts