Robotics and Biology Laboratory

Estimating Robust Affordances from Vision by Combining Multiple Models



Patrick Lowin

Vito Mengers

Oliver Brock


Robots navigating human environments necessitate a profound comprehension of how objects around them can be manipulated. Specifically, this means how these objects can be grasped and then moved. We can call these combined factors of graspability and movability of objects, their affordances after Gibson [1]. Efficient and safe interaction hinges on the robot's ability to anticipate these affordances before establishing contact.

While we can estimate affordance during interaction [5], estimating these factors beforehand from RGBD data presents inherent challenges. Pre-interaction affordance estimation can be addressed by utilizing object geometry [2,3] or visual appearance[4] from passive RGBD data. However, these approaches often suffer from degraded performance in real-world scenarios due to the ambiguity of their visual input.

This thesis proposes a novel approach that capitalizes on the temporal structure of the problem, namely that affordances are consistent over time, and the complementary nature of geometry and appearance-based methods. We will recursively estimate beliefs over affordances using two existing predictors separately and then fusing these beliefs. Thereby, the thesis aims to enhance affordance estimation accuracy and robustness.

To accommodate the diverse distribution of affordances in the environment, we will employ particle belief representations for the recursive estimators. This involves transforming the interconnection problem into the task of weighting individual particle sets based on each other. The effectiveness of this approach will be evaluated through an analysis of performance, particularly in failure cases, using the real-world RBO dataset of articulated objects.


[1]J. J. Gibson (1979). 'The Theory of Affordances'. The Ecological Approach to Visual Perception. Houghton Mifflin Harcourt (HMH), Boston. p. 127

[2] EISNER, Ben; ZHANG, Harry; HELD, David. Flowbot3d: Learning 3d articulation flow to manipulate articulated objects. In: Conference on Robot Learning (pp. 1038-1049). PMLR.

[3] MO, Kaichun, et al. Where2act: From pixels to actions for articulated 3d objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. pp. 6813-6823.

[4] GOYAL, Mohit, et al. Human hands as probes for interactive object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pp. 3293-3303.

[5] Martín-Martín, Roberto, and Oliver Brock. "Coupled recursive estimation for online interactive perception of articulated objects." The International Journal of Robotics Research 41.8 (2022): 741-777.