Section 01
[Introduction] Three-Step Nav: A Three-Step Navigation Method to Solve Large Model Visual Navigation Challenges
Multimodal large model-driven Vision-and-Language Navigation (VLN) agents often encounter problems such as route deviation and early stopping. Three-Step Nav proposes a three-step protocol of "Look Ahead - Look Now - Look Back", achieving optimal zero-shot performance without fine-tuning, effectively addressing the core pain points of existing VLN agents.