General1mo ago

AI Alignment Crisis: Most Safety Experts Not Focusing on Ensuring Superintelligent AIs Follow Human Instructions

LessWrongJune 12, 20261 min brief

In brief

A recent analysis reveals that the majority of AI safety experts are not working on ensuring superintelligent AIs align with human values-a critical task known as "alignment." While some groups, like the Alignment Research Center and Sequent, focus on this issue, they represent a small fraction of the broader AI safety community.
Most others engage in indirect work such as capability evaluations, risk assessments, and policy development.
- This lack of direct alignment efforts raises concerns about how prepared we are for advanced AI systems.
Currently, only a few projects like COT-monitoring aim to make current models behave well, which might help with future alignment challenges.
While this work is valuable, it’s not enough to ensure that superintelligent AIs will follow human instructions.
The AI community needs to prioritize more direct alignment research to avoid potential risks as AI capabilities grow.
Watch for upcoming discussions and initiatives addressing this critical gap in AI safety efforts.

Terms in this brief

COT-monitoring: Chain-of-thought monitoring — a method to ensure AI models follow logical reasoning paths that align with human intentions. It helps make current AI systems behave more predictably and safely by checking their reasoning steps.

Read full story at LessWrong →

More briefs