Method and Apparatus for Vision-based Natural Language Instruction Generation

Listed on

2026-06-22

Robotics Technology› Robot Arm/Manipulator› Control / AI / Software

1.41

CI (SI)

★★★★★★★★★★

0.54

TR (N)

★★★★★★★★★★

2.62

★★★★★★★★★★

AI Model for Robot Manipulation Trajectory Control

This technology describes a mechanism for generating an AI model that leverages Visual Grounding technology to extract object category, position, and attribute information from images. This information is then converted into natural language instructions to plan and control a robot's manipulation trajectory.

Existing robot control methods required operators to manually input object coordinates and task details. This resulted in limitations such as the need for fixed object positions and low operational efficiency when generating commands for multiple objects.

This technology proposes a method for generating a training dataset and subsequently training an AI model. This is achieved using a first framework (GVCCI) which comprises: a visual feature extraction module that recognizes objects and extracts features from images; a module that generates context-appropriate natural language instructions; a model that infers targets and positions via a visual grounding model; and a manipulation module that plans the trajectory of a robot arm.

‍

Key Features:

Provides a visual recognition-based natural language instruction generation method
Recognizes at least one object within an image
Extracting object features and generating natural language instructions for objects based on these features, in accordance with pre-defined criteria and requirements
Enables quick adaptation even with limited image data, and improves instruction generation capability as training data accumulates

‍

This technology was developed with support from the Institute of Information & Communications Technology Planning & Evaluation (IITP) through a self-directed AI research project focused on solving novel problems.

‍

Seoul National University

Jang Byung-tak | Kim Jung-hyun

Document

Date of application:

2024-02-21

Patent registration number:

10-2903342

Industry

software

robot•automation

Technology