Alexander Panin: Variational Information Maximizing Exploration,

When it comes to solving practical problems, performance of reinforcement learning algorithms usually depends highly on efficient environment exploration. However, classical exploration strategies (e-greedy, boltzmann) have several common drawbacks that jeopardize training speed. Informally, if you want to learn to program in java, having already learned python, randomly mistyping 10% of characters (e-greedy) and keeping those that compiled will likely yield poor results. We’d like to describe a method devi
В начало