Building Taskable Reinforcement Learning Agents via Formal Languages and Automata

17 November 2022 | Online | 16:00 | Sheila McIlraith (University of Toronto)

Abstract

Reinforcement Learning (RL) is proving to be a powerful technique for building sequential decision making systems in cases where the complexity of the underlying environment is difficult to model. Two challenges that face RL are reward specification and sample complexity. Specification of a reward function a mapping from state to numeric value -- can be challenging, particularly when reward-worthy behaviour is complex and temporally extended. Further, when reward is sparse, it can require millions of exploratory episodes for an RL agent to converge to a reasonable quality policy. In this talk I'll show how

formal languages and automata can be used to represent complex non-Markovian reward functions. I'll present the notion of a Reward Machine, an automata-based structure that provides a normal form representation for reward functions, exposing function structure in a manner that greatly expedites

learning. Finally, I'll also show how these machines can be generated via symbolic planning or learned from data, solving (deep) RL problems that otherwise could not be solved.

Bio
Sheila McIlraith is a Professor in the Department of Computer Science at the University of Toronto, a Canada CIFAR AI Chair (Vector Institute), and an Associate Director and Research Lead at the Schwartz Reisman Institute for Technology and Society. Prior to joining U of T, McIlraith spent six years as a Research Scientist at Stanford University, and one year at Xerox PARC. McIlraith's research is in the area of AI sequential decision making broadly construed, with a focus on human-compatible AI. She is a long-time member of the knowledge representation community, serving in various roles from program co-chair in 2012 to now serving on the board of directors. McIlraith is a Fellow of the ACM and the Association for the Advancement of Artificial Intelligence (AAAI).

Page updated

Report abuse