Launch3h ago

Amazon's Breakthrough in Teaching AI Better Decision-Making

Amazon ScienceMay 26, 20261 min brief

In brief

Amazon researchers have developed a new method to improve how large language models (LLMs) make decisions.
They introduced two techniques-set-supervised fine-tuning (SSFT) and global forking policy optimization (GFPO)-that help LLMs generate diverse reasoning paths, leading to more accurate answers.
- These methods avoid a problem called "mode collapse," where models tend to repeat the same reasoning approach.
The researchers used something called "global forking tokens" to guide the model into different thinking modes.
- This allows the model to explore various strategies for solving problems, which is crucial for tasks like coding or math.
Their tests showed that using SSFT and GFPO together boosted accuracy by 5% to 7% on standard benchmarks.
Looking ahead, this advancement could make AI systems more reliable in real-world applications where accurate decision-making is essential.
Developers should watch for how these techniques are adopted in different industries to see their impact on improving AI performance.

Terms in this brief

set-supervised fine-tuning (SSFT): A method where large language models are further trained on specific datasets to enhance their decision-making abilities, allowing them to explore diverse reasoning paths and avoid repeating the same approaches.
global forking policy optimization (GFPO): An optimization technique that uses 'global forking tokens' to guide AI models into different thinking modes, enabling them to explore various strategies for problem-solving tasks like coding or math.

Read full story at Amazon Science →

More briefs