research | Gaurav Chaudhary

My research explores how learning algorithms can enable robots to autonomously acquire complex skills in the real world. I focus on reinforcement learning systems that are data-efficient, robust to uncertainty, and deployable on physical robots. The long-term goal is to build embodied AI systems that can learn continuously from interaction, bridging the gap between machine learning theory and real robotic deployment.

Research Areas

Reinforcement Learning

Designing algorithms that improve sample efficiency and exploration in environments with sparse rewards and partial observability. Key topics: - Offline RL - Hybrid offline–online learning - Goal-conditioned RL - Curriculum learning

Embodied AI

Developing learning systems where perception, control, and reasoning are tightly integrated for real-world interaction. Focus areas: - policy learning for manipulation - perception-aware control - adaptive policies for dynamic environments

Robotic Manipulation

Building learning-based control policies that transfer from simulation to physical robot platforms. Platforms and simulators include: - Flexiv Rizon-4 - UR10 - MuJoCo - PyBullet

Representative Research

Hybrid Reinforcement Learning

MOORLintroduces a meta-policy framework that integrates offline datasets with online exploration. The method improves stability and learning efficiency while addressing distributional shift in offline data.

Reward-Free Imitation Learning

ReLOAD proposes intrinsic reward generation using Random Network Distillation, enabling policy learning from unlabeled trajectories without manually designed rewards.

Curriculum Learning for RL

TEACH introduces a teacher–student learning framework where temporal variance in value estimates drives goal selection, accelerating learning in sparse-reward tasks.

Active Perception

An RL-based system that learns to reposition cameras to maximize visual signal quality, improving perception performance under occlusion and noise.

Research Pipeline

Research pipeline: algorithm design → simulation training → perception integration → real-world robotic deployment.

My research spans the full pipeline from algorithm design and simulation experiments to deployment on real robotic platforms.

Current Research Direction

At Nanyang Technological University (NTU) I work on test-time adaptation for diffusion policy models, aiming to make robotic manipulation policies more robust to real-world uncertainty and distribution shifts.

This work combines:

generative policy models
reinforcement learning
perception-aware control
sim-to-real transfer

Research Vision

Future autonomous systems must operate in complex physical environments with limited supervision.
My research aims to develop learning algorithms that enable robots to:

learn from small amounts of interaction data
adapt to new environments and tasks
integrate perception and control
operate reliably in real-world settings