VLA

Vision-Language-Action model — a foundation model that takes visual observations and language instructions as input and directly outputs robot actions. VLAs unify perception, language understanding, and control in a single neural network. Examples include RT-2, OpenVLA, Octo, and π0. VLAs represent the current frontier of generalist robot learning, aiming to be the 'GPT moment' for robotics.

Robot LearningVLAVision-Language

Explore More Terms

Browse the full robotics glossary with 1,000+ terms.

Back to Glossary