Specification-Guided Reinforcement Learning

Faculty: Rajeev Alur, Osbert Bastani, and Dinesh Jayaraman

Problem: To synthesize control policies for robotic tasks using RL, user must specify rewards as numerical values associated with states. Such reward engineering requires expertise and is error prone.

 

Solution: Allow user to express intent using high-level logical specifications. 

Always (not spill) and Eventually (cooked pasta)