Education
University of Maryland, College Park
Ph.D. in Computer Science, working with Prof. Pratap Tokekar and Prof. Ming Lin
Ph.D. in Computer Science, working with Prof. Pratap Tokekar and Prof. Ming Lin
August 2022 - Now
University of Maryland, College Park
Master in Computer Science
Master in Computer Science
May 2024
Shanghai Jiao Tong University
B.S. in Mechanical Engineering
B.S. in Mechanical Engineering
June 2020
Employment
Tencent AI Lab
Research Intern. Working on Multimodal LLM reasoning.
Research Intern. Working on Multimodal LLM reasoning.
May 2025 - Nov 2025
Bellevue, WA
Bellevue, WA
Apple
PhD Intern. Worked on ML algorithms development and data analysis.
PhD Intern. Worked on ML algorithms development and data analysis.
May 2023 - Aug 2023
Cupertino, CA
Cupertino, CA
University of Maryland, College Park
Graduate Research Assistant. Working on AI, ML and Robotics.
Graduate Research Assistant. Working on AI, ML and Robotics.
Aug 2022 - Now
College Park, MD
College Park, MD
Tencent Robotics X
Research Intern. Worked on quadruped robotics algorithms development and gait planning.
Research Intern. Worked on quadruped robotics algorithms development and gait planning.
Jun 2020 - Nov 2020
Shenzhen, China
Shenzhen, China
The Chinese Unversity of Hong Kong
Research Intern. Worked on surgical robotics motion planning.
Research Intern. Worked on surgical robotics motion planning.
Jul 2019 - Sep 2019
Hong Kong, China
Hong Kong, China
Shanghai Jiao Tong University
Research Assistant. Worked on electric vehicle heat pump systems.
Research Assistant. Worked on electric vehicle heat pump systems.
Mar 2019 - May 2020
Shanghai, China
Shanghai, China
Publications
VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
VOGUE, a method that shifts exploration to the visual input space by quantifying policy sensitivity to visual perturbations, enhances reasoning in multimodal large language models.
Preprint
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Vision-SR1 uses RL to enhance reasoning in vision-language models by decomposing the process into visual perception and language reasoning stages, improving accuracy and reducing hallucinations.
Preprint
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
A RL framework that enhances LLMs reasoning capabilities by enabling parallel thinking through a progressive curriculum, leading to significant performance improvements on math benchmarks.
Preprint
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
CDE enhances Reinforcement Learning with Verifiable Rewards (RLVR) by using intrinsic curiosity signals from the actor and critic to improve exploration and reduce premature convergence in LLMs.
Preprint
Adaptive Conformal Guidance for Learning under Uncertainty
A broadly applicable framework that dynamically modulates guidance signals based on associated uncertainty, providing a simple yet effective solution for incorporating uncertainty-aware guidance across diverse machine learning systems.
Preprint
CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems
A multi-modal multi-agent framework that enables agents to collaborate and share multimodal data during training while allowing inference with reduced modalities during testing, which is especially beneficial for deployment in resource-constrained environments.
NeurIPS, 2025
MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation
A multi-modal collaborative decision-making approach for connected autonomy.
IROS, 2025
Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis
A thorough iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm and employing the exponential utility function.
Preprint
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition
A multi-dimensional Representation Learning approach that integrates visual, physical, temporal, and geometric representations to enhance the robustness and generalizability of Imitation Learning for food acquisition.
ICRA, 2025
LAVA: Long-horizon Visual Action based Food Acquisition
Long-horizon Visual Action-based (LAVA) food acquisition of liquid, semisolid, and deformable foods.
IROS, 2024
Data-Driven Distributionally Robust Optimal Control with State-Dependent Noise
A data-driven technique for estimating the uncertainty distribution and bound for the KL divergence for distributionally robust optimal control (DROC).
IROS, 2023
Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types
An adaptive visual imitation learning approach that exhibits adaptability and robustness across different bowl configurations and diverse food types for robotic scooping.
ICRA, 2024