Paper page - Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?
…off-the-shelf models struggle with multi-turn visual history , and performance drops sharply when viewpoint reproduction requires body translation rather than in-place rotation, exposing a gap in mapping spatial discrepancies…