Robot Mission Summary with Video Foundation Models

May 28th, 2025

When robots return from an autonomous mission or when an operator has stepped away for some time to attend to another task, the operator needs to catch up to what the robots have been doing in their absence. Currently, this requires the operator to manually review raw data, such as hours of video footages, to extract useful insights. This is time-consuming, incurs high cognitive load, and requires expertise and training.

In this work, we investigate how Video Foundation Models can be used to allow robots to generate multimodal mission summaries and answer operator’s queries. Our user study showed that interactive robot mission summaries can help novice users perform supervisory tasks, such as identifying occurrences of object and event of interest from the robot’s autonomous exploration.

This interactive robot summary enhances intermittent supervision of robots, allowing an operator to stay informed without having to remain attentive throughout the mission. Thus, it enables operators to manage multiple robots over extended deployment periods.

This work has been published at the IEEE Robotics and Automation Letters: K. Katuwandeniya, L. Tian and D. Kulić, “‘What Did the Robot Do in My Absence?’ Video Foundation Models to Enhance Intermittent Supervision,” in IEEE Robotics and Automation Letters, vol. 10, no. 4, pp. 3222-3229, April 2025, doi: 10.1109/LRA.2025.3539118. https://ieeexplore.ieee.org/abstract/document/10873818

For more information, please check out the project website: https://kavindie.github.io/what-did-the-robot-do-in-my-absence/

To request a transcript please contact us.

Video Abstract