Aleksei Petrenko: SampleFactory and high-throughput reinforcement learning

Data Fest Online 2020 Reinforcement Learning track Sample Factory: Egocentric 3D Control from Pixels at 100 000 FPS with Asynchronous Reinforcement Learning The quest for sample efficiency in general-purpose RL algorithms has proven to be rather challenging. The level results in RL has been growing largely due to the increased amount of compute research labs are willing to use in their projects. As a result, SOTA-level results have become increasingly unreachable for regular researchers. Our goal is to bring the deep RL back to the community by improving the efficiency of training and reducing the cost of data collection. We present the SampleFactory - an on-policy RL training system optimized for speed. By maximizing the hardware utilization of our algorithm we approach 150000 FPS of training on a single machine, 10x faster than many popular frameworks. Our agents trained with SampleFactory APPO approach human level of performance in challenging and immersive 3D games. Register and get access to the tracks: Join the community:

3 views

605

141