FeedbackShare feedback

← All sections·Learning path

VII

Reinforcement Learning & Robotics17

Advanced

AI that learns by doing - and the systems that let it operate in the physical world.

Actor-Critic

A reinforcement learning architecture that combines a policy network (the actor, which decides which actions to take) with a value network (the critic, which evaluates how good the current state is) - reducing gradient variance and enabling more stable learning than pure policy gradient.

Deep Q-Network

The breakthrough algorithm from DeepMind (2013) that combined Q-learning with deep neural networks, enabling RL agents to learn directly from raw pixel inputs to play Atari games at superhuman level - and igniting the field of deep reinforcement learning.

Exploration vs Exploitation

The central dilemma of reinforcement learning: whether to exploit currently known good strategies to collect reward, or explore unknown actions that might reveal even better strategies - a tradeoff with no universally correct answer.

All concepts

I

M

O

Offline Reinforcement Learning
A variant of reinforcement learning that learns a policy entirely from a static dataset of pre-collected experience, without any environment interaction during training - enabling RL from historical logs when real-world exploration is impossible or dangerous.

P

Q

Q-Learning
A foundational reinforcement learning algorithm that learns the value of state-action pairs directly from experience, without needing a model of the environment - allowing an agent to discover optimal policies through trial and error.

R

Reward Shaping
The practice of adding supplementary reward signals to a reinforcement learning environment to make learning faster and more reliable, guiding the agent toward useful behaviours before sparse natural rewards can be observed.

S

W

World Models
Learned neural network representations of an environment's dynamics - enabling an RL agent to simulate future outcomes in its "mind" and plan ahead without additional real-world experience, dramatically improving sample efficiency.