Project 2

August 11th, 2023

Foundation Models for Robotics

Project Location:

The project can be completed from the following sites: Black Mountain (ACT), Pullenvale (QLD).

Desirable skills:

  • Proficiency in Python and PyTorch
  • Knowledge of deep learning
  • Strong oral and written communication skills

Supervisory project team:

Brendan Tidd, Nick Lawrance, Can Peng, Lars Petersson, Jiajun Liu, Josh Pinskier, Tirtha Bandy, David Howard, David Hall, Stephen Hausler, Piotr Koniusz, Stano Funiak, Moshiur Farazi and Peyman Moghadam

Contact person:

Research Scientist, Data61 / Project Supevisor

Project description:

Create a step-change in the capability, performance and utility of robotic systems through the creation and deployment of foundation models grounded in the physical world with coupled world representation and actions. Foundational models for Vision and Language are already well advanced. However, there is comparatively little work looking at foundational models for robotics more generally. Many areas of robotics still rely largely on more traditional approaches. In this project we aim to develop a Large Robotics Models (LRMs) grounded in the physical world with coupled world representations and actions. LRMs fundamentally reshape the traditional robotics stack pipeline, to a foundational stack that can generalise to new embodiments, sensing, environments, and tasks without being explicitly trained on them. 

This project supports a cohort of up to 6 PhD scholarships across the 4 themes listed below. Student are expected to work independently as well as collaboratively within the PhD cohort and greater teams of scientists and engineers. Student are expected to develop novel algorithms, implement prototype, publish and present their work in high impact journals and conferences.  

Theme 1: Developing a novel self-supervised pre-training method for joint representations to bind multimodal data streams such as video, audio, text, lidar, and inertial measurement units (IMUs) combined with Actions Embeddings (i.e., motions, interactions) without the need for manual supervision. Complementary information from multimodal data can be used to develop novel self-supervised learning methods needed for real-world settings.  (Supervisory team: Lars Petersson, Piotr Koniusz, Peyman Moghadam)  

Theme 2:  Given a collection of generic pre-trained Large Foundation Models, distil this knowledge into smaller, platform-specific navigation models that are deployed on robots in-situ. The models take language, visual and goal-oriented task instructions to perform navigation tasks. The resulting policies should be robust to variations in the sensors and environments, as well as continually and actively incorporating new experience data without forgetting. (Supervisory team: Nick Lawrance, Jiajun Liu)

Theme 3: Develop an AI-in-the-loop reinforcement learning pipeline for robot control. Current methods for learning behaviours for robots: require painstaking manual construction and reward tuning, are highly inefficient with collected data, and suffer heavy degradation in transfer to the real world from simulation. In this project, a Foundation Model decides what behaviour is needed for a task and trains a robust policy with minimal human input, by: selecting an appropriate reference motion (motion capture, YouTube video, scripted), the milestones for a curriculum, and the reward signal. (Supervisory team: Brendan Tidd, Tirtha Bandy)

Theme 4: Develop high-fidelity and compact implicit representations using Neural Radiance Fields (NeRFs) to generate a data-driven and physics informed simulation to scale the coupled world representation and action spaces. Bridge the simulation to real-world gap by developing a Real2Real data-driven simulation from multimodal data streams. (Supervisory team: Josh Pinskier, David Howard, David Hall, Can Peng, Stephen Hausler)

The students will interact with an experienced team of engineers and researchers towards common capability output, this includes participating in regular sprints for Science Digital. The students will be encouraged to grow their personal brands as researchers through engagement with the research community and industry partners via presentations at international conferences. The project also provides the opportunity for a shared experience across the cohort of students working on this project; allowing them to interact and develop their research skills together in a collaborative environment with experienced CSIRO researchers.