AI/ML Seminar Series: Roy Fox (1/10/2022)

UCI AI/ML Seminar Series Roy Fox Assistant Professor Department of Computer Science University of California, Irvine Curiously effective ensemble and double-oracle reinforcement-learning methods Ensemble methods for reinforcement learning have gained attention in recent years, due to their ability to represent model uncertainty and use it to guide exploration and to reduce value estimation bias. We present MeanQ, a very simple ensemble method with improved performance, and show how it reduces estimation variance enough to operate without a stabilizing target network. Curiously, MeanQ is theoretically *almost* equivalent to a non-ensemble state-of-the-art method that it significantly outperforms, raising questions about the interaction between uncertainty estimation, representation, and resampling. In adversarial environments, where a second agent attempts to minimize the first’s rewards, double-oracle (DO) methods grow a population of policies

5 views