Distance estimation using fixation and event camera

Humans navigate and interact with the 3D world robustly without complicated 3D sensors like lidars, but 2D sensors in the eyes. If compared (rather naively) to widely available camera sensors the human retina has vastly diminished capabilities, such as resolution, refresh rate etc. How then can humans interact with the 3D world so robustly?

