New Robot Control Model Uses Sketches to Enhance Instructions
In the world of robotics, researchers are constantly seeking innovative ways to improve the ability of robots to understand and follow instructions. Recent advancements in language and vision models have made significant strides in this area, but there are still limitations to what these models can achieve. However, a new study by researchers at Stanford University and Google DeepMind offers a promising solution: using sketches as instructions for robots.
The researchers developed a model called RT-Sketch, which utilizes sketches to control robots. This model performs on par with language- and image-conditioned agents in normal conditions but surpasses them in situations where language and image goals fall short. The use of sketches provides rich spatial information that helps the robot carry out tasks without getting confused by the clutter of realistic images or the ambiguity of natural language instructions.
One of the main advantages of using sketches is their ability to convey precise spatial arrangements without the need for pixel-level details like in an image. Language instructions can often be ambiguous for tasks that require spatial precision, while images may not always be available or may contain too much information. Sketches strike a balance between providing spatial information and being easy to express and understand.
The idea of conditioning robots on sketches originated from the researchers’ exploration of how to enable robots to interpret assembly manuals, such as those for IKEA furniture. Language instructions can be highly ambiguous for spatially precise tasks, and pre-recorded images of desired scenes may not be available. Sketches offer a convenient and expressive way for humans to specify goals to robots.
The RT-Sketch model is based on Robotics Transformer 1 (RT-1), a deep learning architecture developed by DeepMind that takes language instructions as input and generates commands for robots. In this modified version, RT-Sketch replaces language input with visual goals, including sketches and images. The researchers trained the model using the RT-1 dataset, which consists of VR-teleoperated demonstrations of various tasks.
To create the sketches for training the model, the researchers used a generative adversarial network (GAN) to transform images into hand-drawn sketches. They then augmented these generated sketches with variations to simulate hand-drawn sketches’ natural variations. The trained RT-Sketch model takes an image of the scene and a rough sketch of the desired arrangement of objects and generates a sequence of robot commands to reach the goal.
The researchers evaluated RT-Sketch in different scenarios, including tabletop and countertop manipulation tasks. The model performed on par with image- and language-conditioned models in most situations but outperformed language-conditioned models when goals couldn’t be expressed clearly with language instructions. It also proved useful in cluttered environments where image-based instructions can confuse image-conditioned models.
The success of RT-Sketch opens up exciting possibilities for its application in various spatial tasks, such as arranging objects or furniture in a new space with a mobile robot. It can also be beneficial for long-horizon tasks that require step-by-step subgoals. In the future, the researchers plan to explore the use of sketches in conjunction with other modalities like language, images, and human gestures. They believe that sketches have the potential to convey even more information beyond visual scenes, such as motion, subgoals, constraints, and semantic labels.
Overall, the development of RT-Sketch represents a significant advancement in robot control models. By utilizing sketches as instructions, robots can better understand and execute tasks that require spatial precision. The possibilities for incorporating sketches into robotic systems are vast, and future research will undoubtedly uncover even more ways to enhance the capabilities of robots through this innovative approach.
VentureBeat’s mission is to provide technical decision-makers with knowledge about transformative enterprise technology and facilitate transactions. For more information and insights, check out their Briefings.