Humans interact with objects in the 3D world robustly without complicated 3D sensors like lidars. Instead they only have 2D sensors in the eyes. If compared (rather naively) to widely available camera sensors, the human retina has vastly diminished capabilities, such as resolution, refresh rate etc. How then can humans interact with the 3D world so robustly?
One way is to exploit regularities in the 3D space unlocked by actively moving in specific ways. Gaze fixation is a specific movement of the body and eyes, and it can be shown to be a very useful behavior for extracting relevant 3D properties of the world. Gaze fixation is the act of looking at one object at a time under movement, just the way you would when moving about in daily life: you actively focus your attention, while tracking objects you are interested to interact with.
Gaze fixation entails simultaneously sensing and acting, which is an active vision behavior. This behavior robustly finds the distance to fixated object and any immediate obstacles better than any commonly available 3D sensor. Moreover, if the camera is mounted near the robot end-effector, it allows controlling the robot in 4 DOF, instead of the general 6 DOF. This reduced DOF and extraction of robust 3D properties makes a fixating robot an ideal candidate to interact with the real world without building any sophisticated world model.
In our lab, we have already endowed a robot with fixation capabilities and employed certain heuristics to enable the robot to pick up objects of unknown size and shape. However, the simple heuristics do not take advantage of 3D geometry of objects (also perceived through gaze fixation) to facilitate a successful grasp. For example, a grasp strategy that works for cubes and spheres will not always work for cylinders (like a bottle), unless the gripper is made parallel to axis of the cylinder.
In this thesis you will,
Contact: Aravind Battaje, Oliver Brock