venturebeat.com
Open in
urlscan Pro
192.0.66.2
Public Scan
Submitted URL: https://link.mail.beehiiv.com/ss/c/u001.1KHRTbC8HPuHqA0YnUG6trNGXZYnTYnGRVy7daaCosNGCgx9h0tMCIm2B6byKsfHC07mVH74tI2qv7EW7Az5NA...
Effective URL: https://venturebeat.com/automation/deepmind-and-stanfords-new-robot-control-model-follow-instructions-from-sketches/
Submission: On March 12 via api from US — Scanned from DE
Effective URL: https://venturebeat.com/automation/deepmind-and-stanfords-new-robot-control-model-follow-instructions-from-sketches/
Submission: On March 12 via api from US — Scanned from DE
Form analysis
1 forms found in the DOMGET https://venturebeat.com/
<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
<input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
<button type="submit" class="">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<g>
<path fill-rule="evenodd" clip-rule="evenodd"
d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
</path>
</g>
</svg>
</button>
</form>
Text Content
WE VALUE YOUR PRIVACY We and our partners store and/or access information on a device, such as cookies and process personal data, such as unique identifiers and standard information sent by a device for personalised ads and content, ad and content measurement, and audience insights, as well as to develop and improve products. With your permission we and our partners may use precise geolocation data and identification through device scanning. You may click to consent to our and our 760 partners’ processing as described above. Alternatively you may access more detailed information and change your preferences before consenting or to refuse consenting. Please note that some processing of your personal data may not require your consent, but you have a right to object to such processing. Your preferences will apply to this website only. You can change your preferences at any time by returning to this site or visit our privacy policy. MORE OPTIONSAGREE Skip to main content Events Video Special Issues Jobs VentureBeat Homepage Subscribe * Artificial Intelligence * View All * AI, ML and Deep Learning * Auto ML * Data Labelling * Synthetic Data * Conversational AI * NLP * Text-to-Speech * Security * View All * Data Security and Privacy * Network Security and Privacy * Software Security * Computer Hardware Security * Cloud and Data Storage Security * Data Infrastructure * View All * Data Science * Data Management * Data Storage and Cloud * Big Data and Analytics * Data Networks * Automation * View All * Industrial Automation * Business Process Automation * Development Automation * Robotic Process Automation * Test Automation * Enterprise Analytics * View All * Business Intelligence * Disaster Recovery Business Continuity * Statistical Analysis * Predictive Analysis * More * Data Decision Makers * Virtual Communication * Team Collaboration * UCaaS * Virtual Reality Collaboration * Virtual Employee Experience * Programming & Development * Product Development * Application Development * Test Management * Development Languages Subscribe Events Video Special Issues Jobs DEEPMIND AND STANFORD’S NEW ROBOT CONTROL MODEL FOLLOW INSTRUCTIONS FROM SKETCHES Ben Dickson@BenDee983 March 11, 2024 1:41 PM * Share on Facebook * Share on X * Share on LinkedIn Credit: RT-Sketch Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here. -------------------------------------------------------------------------------- Recent advances in language and vision models have helped make great progress in creating robotic systems that can follow instructions from text descriptions or images. However, there are limits to what language- and image-based instructions can accomplish. A new study by researchers at Stanford University and Google DeepMind suggests using sketches as instructions for robots. Sketches have rich spatial information to help the robot carry out its tasks without getting confused by the clutter of realistic images or the ambiguity of natural language instructions. 1 / 4 Moving responsible AI forward as fast as AI Read More 1.4M 7.2K 2 Video Player is loading. Play Video Unmute Duration 0:00 / Current Time 0:00 Playback Speed Settings 1x Loaded: 0% 0:00 Remaining Time -0:00 FullscreenPlayRewind 10 SecondsUp Next This is a modal window. Beginning of dialog window. Escape will cancel and close the window. TextColorWhiteBlackRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentBackgroundColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentTransparentWindowColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyTransparentSemi-TransparentOpaque Font Size50%75%100%125%150%175%200%300%400%Text Edge StyleNoneRaisedDepressedUniformDropshadowFont FamilyProportional Sans-SerifMonospace Sans-SerifProportional SerifMonospace SerifCasualScriptSmall Caps Reset restore all settings to the default valuesDone Close Modal Dialog End of dialog window. Share Playback Speed 0.25x 0.5x 1x Normal 1.5x 2x Replay the list TOP ARTICLES * Powered by AnyClip * Privacy Policy Moving responsible AI forward as fast as AI The researchers created RT-Sketch, a model that uses sketches to control robots. It performs on par with language- and image-conditioned agents in normal conditions and outperforms them in situations where language and image goals fall short. WHY SKETCHES? While language is an intuitive way to specify goals, it can become inconvenient when the task requires precise manipulations, such as placing objects in specific arrangements. VB EVENT The AI Impact Tour – Boston We’re excited for the next stop on the AI Impact Tour in Boston on March 27th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on best practices for data integrity in 2024 and beyond. Space is limited, so request an invite today. Request an invite On the other hand, images are efficient at depicting the desired goal of the robot in full detail. However, access to a goal image is often impossible, and a pre-recorded goal image can have too many details. Therefore, a model trained on goal images might overfit to its training data and not be able to generalize its capabilities to other environments. “The original idea of conditioning on sketches actually stemmed from early-on brainstorming about how we could enable a robot to interpret assembly manuals, such as IKEA furniture schematics, and perform the necessary manipulation,” Priya Sundaresan, Ph.D. student at Stanford University and lead author of the paper, told VentureBeat. “Language is often extremely ambiguous for these kinds of spatially precise tasks, and an image of the desired scene is not available beforehand.” The team decided to use sketches as they are minimal, easy to collect, and rich with information. On the one hand, sketches provide spatial information that would be hard to express in natural language instructions. On the other, sketches can provide specific details of desired spatial arrangements without needing to preserve pixel-level details as in an image. At the same time, they can help models learn to tell which objects are relevant to the task, which results in more generalizable capabilities. “We view sketches as a stepping stone towards more convenient but expressive ways for humans to specify goals to robots,” Sundaresan said. advertisement RT-SKETCH RT-Sketch is one of many new robotics systems that use transformers, the deep learning architecture used in large language models (LLMs). RT-Sketch is based on Robotics Transformer 1 (RT-1), a model developed by DeepMind that takes language instructions as input and generates commands for robots. RT-Sketch has modified the architecture to replace natural language input with visual goals, including sketches and images. advertisement To train the model, the researchers used the RT-1 dataset, which includes 80,000 recordings of VR-teleoperated demonstrations of tasks such as moving and manipulating objects, opening and closing cabinets, and more. However, first, they had to create sketches from the demonstrations. For this, they selected 500 training examples and created hand-drawn sketches from the final video frame. They then used these sketches and the corresponding video frame along with other image-to-sketch examples to train a generative adversarial network (GAN) that can create sketches from images. GAN network generates sketches from images They used the GAN network to create goal sketches to train the RT-Sketch model. They also augmented these generated sketches with various colorspace and affine transforms, to simulate variations in hand-drawn sketches. The RT-Sketch model was then trained on the original recordings and the sketch of the goal state. The trained model takes an image of the scene and a rough sketch of the desired arrangement of objects. In response, it generates a sequence of robot commands to reach the desired goal. advertisement “RT-Sketch could be useful in spatial tasks where describing the intended goal would take longer to say in words than a sketch, or in cases where an image may not be available,” Sundaresan said. RT-Sketch takes in visual instructions and generates action commands for robots For example, if you want to set a dinner table, language instructions like “put the utensils next to the plate” could be ambiguous with multiple sets of forks and knives and many possible placements. Using a language-conditioned model would require multiple interactions and corrections to the model. At the same time, having an image of the desired scene would require solving the task in advance. With RT-Sketch, you can instead provide a quickly drawn sketch of how you expect the objects to be arranged. “RT-Sketch could also be applied to scenarios such as arranging or unpacking objects and furniture in a new space with a mobile robot, or any long-horizon tasks such as multi-step folding of laundry where a sketch can help visually convey step-by-step subgoals,” Sundaresan said. RT-SKETCH IN ACTION advertisement The researchers evaluated RT-Sketch in different scenes across six manipulation skills, including moving objects near to one another, knocking cans sideways or placing them upright, and closing and opening drawers. RT-Sketch performs on par with image- and language-conditioned models for tabletop and countertop manipulation. Meanwhile, it outperforms language-conditioned models in scenarios where goals can’t be expressed clearly with language instructions. It is also suitable for scenarios where the environment is cluttered with visual distractors and image-based instructions can confuse image-conditioned models. “This suggests that sketches are a happy medium; they are minimal enough to avoid being affected by visual distractors, but are expressive enough to preserve semantic and spatial awareness,” Sundaresan said. In the future, the researchers will explore the broader applications of sketches, such as complementing them with other modalities like language, images, and human gestures. DeepMind already has several other robotics models that use multi-modal models. It will be interesting to see how they can be improved with the findings of RT-Sketch. The researchers will also explore the versatility of sketches beyond just capturing visual scenes. “Sketches can convey motion via drawn arrows, subgoals via partial sketches, constraints via scribbles, or even semantic labels via scribbled text,” Sundaresan said. “All of these can encode useful information for downstream manipulation that we have yet to explore.” VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings. NEXT STOP: AI IMPACT TOUR BOSTON Join us in Boston an exclusive invitation-only evening of networking and insights to discuss how to ensure data integrity for enterprise AI. Request an Invite * VentureBeat Homepage * Follow us on Facebook * Follow us on X * Follow us on LinkedIn * Follow us on RSS * Press Releases * Contact Us * Advertise * Share a News Tip * Contribute to DataDecisionMakers * Privacy Policy * Terms of Service * Do Not Sell My Personal Information © 2024 VentureBeat. All rights reserved.