We present a new embodied question answering (EQA) dataset with open vocabulary questions.
We conduct a study on using pre-trained visual representations (PVRs) to train robots for real-world tasks.
We present the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).
We propose a combined simulation and real-world benchmark on the problem of Open-Vocabulary Mobile Manipulation (OVMM).
We present a modular system that can perform well on the Instance ImageNav task in both simulation and the real world.
We present Habitat-Matterport 3D Semantics (HM3DSEM), the largest dataset of 3D real-world spaces with densely annotated semantics.
We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav and ObjectNav without task-specific modules.
A last-mile navigation module that connects to prior policies, leading to improved image-goal navigation results in simulation and real-robot experiments.
In this work we propose OVRL, a two-stage representation learning strategy for visual navigation tasks in Embodied AI.
In this work we develop a gradient-based meta-learning algorithm for efficient, online continual learning, that is robust and scalable to real-world visual benchmarks.
In this work, we develop a novel formulation based on Reinforcement Learning that generates fail safe trajectories while using Monocular SLAM for localization.