Constitutional AI

An alignment technique developed by Anthropic where an AI model critiques and revises its own outputs using a set of principles.

Constitutional AI, introduced by Anthropic in 2022, is an approach to AI alignment that uses a written "constitution" — a set of principles — to guide model behavior. The model critiques its own outputs against these principles and revises them to be more helpful and harmless.

This reduces the need for human labeling in safety training. The AI self-supervises by applying the constitution, dramatically scaling what was previously a manual process.

Claude's origin: Constitutional AI is the core technique behind Anthropic's Claude, combining RL from AI feedback with constitutional principles.

Constitutional AI demonstrates that carefully designed training processes can teach models to be more aligned without relying entirely on expensive human feedback. It's one of the most promising directions in practical alignment.

Related Terms

← Back to Glossary