In this thesis, we propose a method to estimate objectness based on crossmodal fusion of information from motion and appearance. Objectness is a property that describes how we can group parts of a visual scene into objects.
It is encoded in both motion and appearance, but both modalities often provide complementary information. Therefore, we fuse the information using two interconnected recursive estimators: One estimates objectness from motion as a probabilistic clustering of feature points and the other estimates objectness from appearance as a probabilistic segmentation.
Our objectness estimation is specifically tailored to the problem of kinematic structure estimation (KSE). In KSE we estimate the kinematic joints between the objects in the environment. We can estimate these kinematic joints by analyzing the trajectories of objects, but this requires objectness estimation that is consistent over time, real-time capable, considers all moving objects, and provides probabilistic estimates. We fulfill all these requirements with our two interconnected recursive estimators.
We use our objectness estimation to extend the online multimodal interactive perception system for KSE and evaluate our estimation of objectness as well as its impact on the estimated kinematic joints on the RBO dataset of articulated objects. The results show that we improve the estimation of objectness, and we therefore hypothesize an improvement in estimated kinematic joints, but only observe it under certain conditions. We analyze these conditions further and provide insights in the connection of objectness and kinematic joints, as well as into interconnected recursive estimation loops in general.