Hello, I'm Rui Liu

I am a Ph.D. student in Computer Science at the University of Maryland, College Park, working with Prof. Pratap Tokekar and Prof. Ming Lin. Previously, I earned my bachelor’s degree from Shanghai Jiao Tong University. I have also interned at Apple, Tencent AI Lab and Tencent Robotics X.

I work on Multimodal Learning, Reinforcement Learning, and Imitation Learning etc. My work spans across Multimodal Large Language Models (MLLMs) Reasoning, Robotics, and Autonomous Driving. I aim to build autonomous AI agents.


Selected Publications

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Preprint

VOGUE, a method that shifts exploration to the visual input space by quantifying policy sensitivity to visual perturbations, enhances reasoning in multimodal large language models.

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Preprint

Vision-SR1 uses RL to enhance reasoning in vision-language models by decomposing the process into visual perception and language reasoning stages, improving accuracy and reducing hallucinations.

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Preprint

A RL framework that enhances LLMs reasoning capabilities by enabling parallel thinking through a progressive curriculum, leading to significant performance improvements on math benchmarks.

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Preprint

CDE enhances Reinforcement Learning with Verifiable Rewards (RLVR) by using intrinsic curiosity signals from the actor and critic to improve exploration and reduce premature convergence in LLMs.

Adaptive Conformal Guidance for Learning under Uncertainty

Adaptive Conformal Guidance for Learning under Uncertainty

Preprint

A broadly applicable framework that dynamically modulates guidance signals based on associated uncertainty, providing a simple yet effective solution for incorporating uncertainty-aware guidance across diverse machine learning systems.

CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems

CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems

The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS, 2025)

A multi-modal multi-agent framework that enables agents to collaborate and share multimodal data during training while allowing inference with reduced modalities during testing, which is especially beneficial for deployment in resource-constrained environments.

MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation

MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation

2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS, 2025)

A multi-modal collaborative decision-making approach for connected autonomy.

Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis

Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis

Preprint

A thorough iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm and employing the exponential utility function.

IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition

IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition

2025 IEEE International Conference on Robotics and Automation (ICRA, 2025)

A multi-dimensional Representation Learning approach that integrates visual, physical, temporal, and geometric representations to enhance the robustness and generalizability of Imitation Learning for food acquisition.

LAVA: Long-horizon Visual Action based Food Acquisition

LAVA: Long-horizon Visual Action based Food Acquisition

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS, 2024) Best Paper Award at ICRA 2024 Workshop on Cooking Robotics: Perception and Motion Planning

Long-horizon Visual Action-based (LAVA) food acquisition of liquid, semisolid, and deformable foods.

Data-Driven Distributionally Robust Optimal Control with State-Dependent Noise

Data-Driven Distributionally Robust Optimal Control with State-Dependent Noise

2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS, 2023)

A data-driven technique for estimating the uncertainty distribution and bound for the KL divergence for distributionally robust optimal control (DROC).

Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types

Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types

2024 IEEE International Conference on Robotics and Automation (Workshop)

An adaptive visual imitation learning approach that exhibits adaptability and robustness across different bowl configurations and diverse food types for robotic scooping.