VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
VOGUE, a method that shifts exploration to the visual input space by quantifying policy sensitivity to visual perturbations, enhances reasoning in multimodal large language models.