Deep Deterministic Policy Gradients are Easy in Pytorch

In this tutorial we will code a deep deterministic policy gradient (DDPG) agent in Pytorch, to beat the continuous lunar lander environment. DDPG combines the best of Deep Q Learning and Actor Critic Methods into an algorithm that can solve environments with continuous action spaces. We will have an actor network that learns the (deterministic) policy, coupled with a critic network to learn the action-value functions. We will make use of a replay buffer to maximize sample efficiency, as well as target netw
