Research2w ago

AI Training Reveals Surprising Patterns in Learning and Forgetting

arXiv CS.LGApril 21, 2026

In brief

AI researchers have discovered that a popular method for fine-tuning large language models, called LoRA, causes the models to "unlearn" certain examples.
- This happens specifically with items where human annotators disagreed on the correct answer.
During training, these contested examples showed increased loss, meaning the model became less accurate at predicting them over time.
- This finding is significant because it highlights a potential issue in how AI models are trained and evaluated.
The study looked at six different models-four that use encoders and two that are decoder-only-and found consistent patterns across all of them.
The strongest effects were seen in decoder-only models, which showed the most correlation between annotation disagreement and loss during training.
The researchers suggest this might be due to noise introduced during the fine-tuning process, but more investigation is needed to fully understand why this happens.
For now, developers and researchers should pay close attention to how their models handle contested data to avoid unintended forgetting of important information.

Terms in this brief

LoRA: Low-Rank Adaptation — a method for efficiently fine-tuning large language models by updating only a small subset of their parameters, making the process faster and more resource-efficient. This technique helps in adapting models to specific tasks without retraining the entire model from scratch.

Read full story at arXiv CS.LG →

More briefs