Foundation Model

A large-scale neural network pre-trained on broad data (internet text, images, videos) that can be adapted to downstream tasks with minimal fine-tuning. In robotics, foundation models provide visual representations (DINOv2), language grounding (CLIP, LLMs), world models (video prediction), and directly output robot actions (VLAs like RT-2, OpenVLA, π0). They promise to amortize the data cost of learning across tasks and embodiments.

Robot LearningVision-Language

Explore More Terms

Browse the full robotics glossary with 1,000+ terms.

Back to Glossary