Sample Efficient Reinforcement Learning Through Learning From Demonstrations In Minecraft

From Morphomics
Jump to: navigation, search

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. Runescape We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning. Cite this Paper



BibTeX @InProceedingspmlr-v123-scheller20a, title = Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft, author = Scheller, Christian and Schraner, Yanick and Vogel, Manfred, booktitle = Proceedings of the NeurIPS 2019 Competition and Demonstration Track, pages = 67--76, year = 2020, editor = Escalante, Hugo Jair and Hadsell, Raia, volume = 123, series = Proceedings of Machine Learning Research, month = 08--14 Dec, publisher = PMLR, pdf = http://proceedings.mlr.press/v123/scheller20a/scheller20a.pdf, url = https://proceedings.mlr.press/v123/scheller20a.html, abstract = Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning. Copy to ClipboardDownload Endnote %0 Conference Paper %T Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft %A Christian Scheller %A Yanick Schraner %A Manfred Vogel %B Proceedings of the NeurIPS 2019 Competition and Demonstration Track %C Proceedings of Machine Learning Research %D 2020 %E Hugo Jair Escalante %E Raia Hadsell %F pmlr-v123-scheller20a %I PMLR %P 67--76 %U https://proceedings.mlr.press/v123/scheller20a.html %V 123 %X Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning. Copy to ClipboardDownload APA Scheller, C., Schraner, Y. & Vogel, M.. (2020). Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft. Proceedings of the NeurIPS 2019 Competition and Demonstration Track, in Proceedings of Machine Learning Research 123:67-76 Available from https://proceedings.mlr.press/v123/scheller20a.html.