Research talk: Breaking the deadly triad with a target network
Speaker: Shangtong Zhang, PhD Student, Oxford University
The deadly triad refers to the instability of an off-policy reinforcement learning (RL) algorithm when it employs function approximation and bootstrapping simultaneously, and this is a major challenge in off-policy RL. Join PhD student Shangtong Zhang, from the WhiRL group at the University of Oxford, to learn how the target network can be used as a tool for theoretically breaking the deadly triad. Together, you’ll explore how to theoretically understand the conventional wisdom that a target network stabilizes training, a novel target network update rule that augments the commonly used Polyak-averaging style update with two projections, and how a target network can be used in linear off-policy RL algorithms, in both prediction and control settings, as well as both discounted and average-reward Markov decision processes.
Learn more about the 2021 Microsoft Research Summit:
1 view
24
5
1 week ago 00:00:00 1
GROSSE POINTE GARDEN SOCIETY Series Review NBC (2025) Ep. 1 - 4
1 month ago 01:25:57 1
Praveen Mohan On Dark Truth Of Indian Temples, Secrets Of Pyramid & More | The Ranveer Show 270
2 months ago 03:46:53 1
Gypsy Rose’s TikTok Hacked Is Fake IMO. The Signs Are All There - Blake Lively Astroturfing Lawsuit
2 months ago 00:03:43 1
Utilization of Prefabricated Vertical Drains (PVDs) in Railway Embankment Construction on Soft Soil
2 months ago 00:16:04 1
How Immigrants Shape(d) the United States | Nalini Krishnankutty | TEDxPSU
3 months ago 00:08:38 1
Can Curiosity Heal Division? | Scott Shigeoka | TED
3 months ago 00:00:32 1
…but the people are retarded
4 months ago 01:04:12 1
Depravity of Power: USA & Co Trying To KILL International Law | Dr. Binoy Kampmark
4 months ago 00:11:44 1
Apple CEO’s High Stake Visit To China For Apology & Request To Market Share
4 months ago 00:39:26 1
Bob Laramee - Visualizing the Signal From the Noise: Keynote Talk for the ICINC 2024 Conference