Robotics and Biology Laboratory

On Decomposability in Robot Reinforcement Learning

Sebastian Höfer

Title: On Decomposability in Robot Reinforcement Learning


Reinforcement learning is a computational framework that enables machines to learn from trial-and-error interaction with the environment. In recent years, reinforcement learning has been successfully applied to a wide variety of problem domains, including robotics. However, the success of the reinforcement learning applications in robotics relies on a variety of assumptions, such as the availability of large amounts of training data, highly accurate models of the robot and the environment as well as prior knowledge about the task.

In this thesis, we study several of these assumptions and investigate how to generalize them. To that end, we look at these assumptions from different angles. On the one hand, we study them in two concrete applications of reinforcement learning in robotics: ball catching and learning to manipulate articulated objects. On the other hand, we develop an abstract explanatory framework that relates the assumptions to the decomposability of problems and solutions. Taken together, the concrete case studies and the abstract explanatory framework enable us to make suggestions on how to relax the previously stated assumptions and how to design more effective solutions to robot reinforcement learning problems.

The first case study deals with the problem of ball catching: how to run most effectively to catch a projectile, such as a baseball, that is flying in the air for a long period of time. The question about the best solution to the ball catching problem has been subject to intense scientific debate for almost 50 years. It turns out that this scientific debate is not focused on the ball catching problem alone but revolves around the research question whether heuristic or optimization-based approaches are better suited for solving such problems in general. In this thesis, we study the ball catching problem as an instance of the heuristics-vs.-optimality debate. We study two types of approaches to the ball catching problem, one commonly considered as heuristic and one based on optimization, and investigate their properties using both a theoretical analysis and a set of simulation experiments. This investigation shows that neither of the two types of approaches can be regarded as superior with respect to the ball catching problem, as each of them makes different assumptions and thus is better suited for different variations of the problem. This result raises the question about the key difference between these two types of approaches to ball catching. We show that optimality is not a relevant criterion for distinguishing between them: we demonstrate that the approach to ball catching that is commonly considered heuristic can be phrased as optimal under task-general assumptions. This motivates our search for a more adequate explanatory framework for distinguishing between these solutions, and we discuss whether decomposability offers such a framework at the end of the thesis.

The second study deals with the problem of learning to manipulate articulated objects. Articulated objects are composed of rigid bodies that are connected by joints, such as doors, laptops and drawers. In this thesis, we address the questions of how to discover the kinematic structure of unknown articulated objects, how to learn simple push and pull actions for actuating the detected joints, and how to identify the functional dependencies between joints, for example locking mechanisms. The solutions to these questions require reasoning about object parts and their relationships. We therefore resort to a learning paradigm that is well-suited for performing such reasoning, relational reinforcement learning. In order to tightly integrate relational learning with perceptual and motor skills required to operate manipulated objects, we propose two novel learning approaches: task-sensitive learning of relational forward models, and an approach for tight coupling of relational forward model and action parameter learning. We demonstrate the effectiveness of these approaches in simulated and real-world robotic manipulation experiments.

In the last part of this thesis, we generalize the lessons learned from the two case studies to robot reinforcement learning and decision making problems. To that end, we introduce the spectrum of decomposability as an explanatory framework for characterizing problems and solutions in decision making. This framework regards decomposability as a varying property on a spectrum and suggests that the decomposition of a problem has a significant impact on the ability to find an adequate solution. From that, we conclude that the inability to find an effective solution can either result from a premature, inadequate decomposition of the problem, or from approaching a non-decomposable problem by fully decomposing it. To support our view, we revisit the two case studies in the light of decomposability and provide additional evidence from the literature in artificial intelligence, cognitive science and neuroscience. We conclude this thesis by making suggestions on how to address the assumptions required to successfully apply reinforcement learning in robotics.

June 2017