In Lecture 14 we move from supervised learning to reinforcement learning (RL), in which an agent must learn to interact with an environment in order to maximize its reward. We formalize reinforcement learning using the language of Markov Decision Processes (MDPs), policies, value functions, and Q-Value functions. We discuss different algorithms for reinforcement learning including Q-Learning, policy gradients, and Actor-Critic. We show how deep reinforcement learning has been used to play Atari games and to achieve super-human Go performance in AlphaGo.
11 views
27
8
21 minute ago 00:01:06 0
HOLY CRAP! Kash Patel just absolutely DESTROYED Cory Booker
6 hours ago 00:00:25 0
Vídeo de Путь открытий
10 hours ago 04:55:24 11
TLDCON Day 2
10 hours ago 07:31:25 0
Share A Mic Kirtan The Movement & Acharya Das Video Lecture