Reinforcement Learning

A machine learning paradigm where an agent learns to take actions in an environment to maximize cumulative reward.

Reinforcement Learning (RL) trains agents to make sequential decisions by rewarding good actions and penalizing bad ones. Unlike supervised learning, there are no labeled correct answers — the agent learns through trial and error.

RL powers game-playing systems (AlphaGo, OpenAI Five), robotic control, and modern LLM fine-tuning through RLHF. The core loop: agent observes state, selects action, receives reward, updates policy.

Key components: agent, environment, state, action, reward, and policy. The agent's goal is to find a policy that maximizes long-term cumulative reward.

Modern RL algorithms include Q-learning, policy gradients, PPO (Proximal Policy Optimization), and actor-critic methods. PPO is especially widely used because of its stability and sample efficiency.

Related Terms

← Back to Glossary