venturebeat.com Open in urlscan Pro
192.0.66.2 Public Scan

Back to summary

Submitted URL:
https://link.mail.beehiiv.com/ss/c/u001.1KHRTbC8HPuHqA0YnUG6trNGXZYnTYnGRVy7daaCosNGCgx9h0tMCIm2B6byKsfHC07mVH74tI2qv7EW7Az5NA...
Effective URL:
https://venturebeat.com/automation/deepmind-and-stanfords-new-robot-control-model-follow-instructions-from-sketches/
Submission: On March 12 via api (March 12th 2024, 10:53:14 pm UTC) from US — Scanned from DE

Form analysis
1 forms found in the DOM

GET https://venturebeat.com/

<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
  <input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
  <button type="submit" class="">
    <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
      <g>
        <path fill-rule="evenodd" clip-rule="evenodd"
          d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
        </path>
      </g>
    </svg>
  </button>
</form>

Text Content

WE VALUE YOUR PRIVACY

We and our partners store and/or access information on a device, such as cookies
and process personal data, such as unique identifiers and standard information
sent by a device for personalised ads and content, ad and content measurement,
and audience insights, as well as to develop and improve products. With your
permission we and our partners may use precise geolocation data and
identification through device scanning. You may click to consent to our and our
760 partners’ processing as described above. Alternatively you may access more
detailed information and change your preferences before consenting or to refuse
consenting. Please note that some processing of your personal data may not
require your consent, but you have a right to object to such processing. Your
preferences will apply to this website only. You can change your preferences at
any time by returning to this site or visit our privacy policy.
MORE OPTIONSAGREE

Skip to main content
Events Video Special Issues Jobs
VentureBeat Homepage

* Artificial Intelligence
* View All
* AI, ML and Deep Learning
* Auto ML
* Data Labelling
* Synthetic Data
* Conversational AI
* NLP
* Text-to-Speech
* Security
* View All
* Data Security and Privacy
* Network Security and Privacy
* Software Security
* Computer Hardware Security
* Cloud and Data Storage Security
* Data Infrastructure
* View All
* Data Science
* Data Management
* Data Storage and Cloud
* Big Data and Analytics
* Data Networks
* Automation
* View All
* Industrial Automation
* Business Process Automation
* Development Automation
* Robotic Process Automation
* Test Automation
* Enterprise Analytics
* View All
* Business Intelligence
* Disaster Recovery Business Continuity
* Statistical Analysis
* Predictive Analysis
* More
* Data Decision Makers
* Virtual Communication
* Team Collaboration
* UCaaS
* Virtual Reality Collaboration
* Virtual Employee Experience
* Programming & Development
* Product Development
* Application Development
* Test Management
* Development Languages

Subscribe Events Video Special Issues Jobs

DEEPMIND AND STANFORD’S NEW ROBOT CONTROL MODEL FOLLOW INSTRUCTIONS FROM
SKETCHES

Ben Dickson@BenDee983
March 11, 2024 1:41 PM
* Share on Facebook
* Share on X
* Share on LinkedIn

Credit: RT-Sketch

Join leaders in Boston on March 27 for an exclusive night of networking,
insights, and conversation. Request an invite here.

--------------------------------------------------------------------------------

Recent advances in language and vision models have helped make great progress in
creating robotic systems that can follow instructions from text descriptions or
images. However, there are limits to what language- and image-based instructions
can accomplish.

A new study by researchers at Stanford University and Google DeepMind suggests
using sketches as instructions for robots. Sketches have rich spatial
information to help the robot carry out its tasks without getting confused by
the clutter of realistic images or the ambiguity of natural language
instructions.

1
/
4
Moving responsible AI forward as fast as AI
Read More

1.4M
7.2K
2

Video Player is loading.
Play Video
Unmute

Duration 0:00
/
Current Time 0:00
Playback Speed Settings
1x
Loaded: 0%

0:00

Remaining Time -0:00
FullscreenPlayRewind 10 SecondsUp Next

This is a modal window.

Beginning of dialog window. Escape will cancel and close the window.

TextColorWhiteBlackRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentBackgroundColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentTransparentWindowColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyTransparentSemi-TransparentOpaque
Font Size50%75%100%125%150%175%200%300%400%Text Edge
StyleNoneRaisedDepressedUniformDropshadowFont FamilyProportional
Sans-SerifMonospace Sans-SerifProportional SerifMonospace SerifCasualScriptSmall
Caps
Reset restore all settings to the default valuesDone
Close Modal Dialog

End of dialog window.

Share
Playback Speed

0.25x
0.5x
1x Normal
1.5x
2x
Replay the list

TOP ARTICLES

* Powered by AnyClip
* Privacy Policy

Moving responsible AI forward as fast as AI

The researchers created RT-Sketch, a model that uses sketches to control robots.
It performs on par with language- and image-conditioned agents in normal
conditions and outperforms them in situations where language and image goals
fall short.

WHY SKETCHES?

While language is an intuitive way to specify goals, it can become inconvenient
when the task requires precise manipulations, such as placing objects in
specific arrangements.

VB EVENT

The AI Impact Tour – Boston

We’re excited for the next stop on the AI Impact Tour in Boston on March 27th.
This exclusive, invite-only event, in partnership with Microsoft, will feature
discussions on best practices for data integrity in 2024 and beyond. Space is
limited, so request an invite today.

Request an invite

On the other hand, images are efficient at depicting the desired goal of the
robot in full detail. However, access to a goal image is often impossible, and a
pre-recorded goal image can have too many details. Therefore, a model trained on
goal images might overfit to its training data and not be able to generalize its
capabilities to other environments.

“The original idea of conditioning on sketches actually stemmed from early-on
brainstorming about how we could enable a robot to interpret assembly manuals,
such as IKEA furniture schematics, and perform the necessary manipulation,”
Priya Sundaresan, Ph.D. student at Stanford University and lead author of the
paper, told VentureBeat. “Language is often extremely ambiguous for these kinds
of spatially precise tasks, and an image of the desired scene is not available
beforehand.”

The team decided to use sketches as they are minimal, easy to collect, and rich
with information. On the one hand, sketches provide spatial information that
would be hard to express in natural language instructions. On the other,
sketches can provide specific details of desired spatial arrangements without
needing to preserve pixel-level details as in an image. At the same time, they
can help models learn to tell which objects are relevant to the task, which
results in more generalizable capabilities.

“We view sketches as a stepping stone towards more convenient but expressive
ways for humans to specify goals to robots,” Sundaresan said.

RT-SKETCH

RT-Sketch is one of many new robotics systems that use transformers, the deep
learning architecture used in large language models (LLMs). RT-Sketch is based
on Robotics Transformer 1 (RT-1), a model developed by DeepMind that takes
language instructions as input and generates commands for robots. RT-Sketch has
modified the architecture to replace natural language input with visual goals,
including sketches and images.

To train the model, the researchers used the RT-1 dataset, which includes 80,000
recordings of VR-teleoperated demonstrations of tasks such as moving and
manipulating objects, opening and closing cabinets, and more. However, first,
they had to create sketches from the demonstrations. For this, they selected 500
training examples and created hand-drawn sketches from the final video frame.
They then used these sketches and the corresponding video frame along with other
image-to-sketch examples to train a generative adversarial network (GAN) that
can create sketches from images.

GAN network generates sketches from images

They used the GAN network to create goal sketches to train the RT-Sketch model.
They also augmented these generated sketches with various colorspace and affine
transforms, to simulate variations in hand-drawn sketches. The RT-Sketch model
was then trained on the original recordings and the sketch of the goal state.

The trained model takes an image of the scene and a rough sketch of the desired
arrangement of objects. In response, it generates a sequence of robot commands
to reach the desired goal.

“RT-Sketch could be useful in spatial tasks where describing the intended goal
would take longer to say in words than a sketch, or in cases where an image may
not be available,” Sundaresan said.

RT-Sketch takes in visual instructions and generates action commands for robots

For example, if you want to set a dinner table, language instructions like “put
the utensils next to the plate” could be ambiguous with multiple sets of forks
and knives and many possible placements. Using a language-conditioned model
would require multiple interactions and corrections to the model. At the same
time, having an image of the desired scene would require solving the task in
advance. With RT-Sketch, you can instead provide a quickly drawn sketch of how
you expect the objects to be arranged.

“RT-Sketch could also be applied to scenarios such as arranging or unpacking
objects and furniture in a new space with a mobile robot, or any long-horizon
tasks such as multi-step folding of laundry where a sketch can help visually
convey step-by-step subgoals,” Sundaresan said.

RT-SKETCH IN ACTION

The researchers evaluated RT-Sketch in different scenes across six manipulation
skills, including moving objects near to one another, knocking cans sideways or
placing them upright, and closing and opening drawers.

RT-Sketch performs on par with image- and language-conditioned models for
tabletop and countertop manipulation. Meanwhile, it outperforms
language-conditioned models in scenarios where goals can’t be expressed clearly
with language instructions. It is also suitable for scenarios where the
environment is cluttered with visual distractors and image-based instructions
can confuse image-conditioned models.

“This suggests that sketches are a happy medium; they are minimal enough to
avoid being affected by visual distractors, but are expressive enough to
preserve semantic and spatial awareness,” Sundaresan said.

In the future, the researchers will explore the broader applications of
sketches, such as complementing them with other modalities like language,
images, and human gestures. DeepMind already has several other robotics models
that use multi-modal models. It will be interesting to see how they can be
improved with the findings of RT-Sketch. The researchers will also explore the
versatility of sketches beyond just capturing visual scenes.

“Sketches can convey motion via drawn arrows, subgoals via partial sketches,
constraints via scribbles, or even semantic labels via scribbled text,”
Sundaresan said. “All of these can encode useful information for downstream
manipulation that we have yet to explore.”

VentureBeat's mission is to be a digital town square for technical
decision-makers to gain knowledge about transformative enterprise technology and
transact. Discover our Briefings.

NEXT STOP: AI IMPACT TOUR BOSTON

Join us in Boston an exclusive invitation-only evening of networking and
insights to discuss how to ensure data integrity for enterprise AI.

Request an Invite

* VentureBeat Homepage
* Follow us on Facebook
* Follow us on X
* Follow us on LinkedIn
* Follow us on RSS

* Press Releases
* Contact Us
* Advertise
* Share a News Tip
* Contribute to DataDecisionMakers

* Privacy Policy
* Terms of Service
* Do Not Sell My Personal Information

venturebeat.com Open in urlscan Pro 192.0.66.2 Public Scan

Form analysis 1 forms found in the DOM

GET https://venturebeat.com/

Text Content

venturebeat.com Open in urlscan Pro
192.0.66.2 Public Scan

Form analysis
1 forms found in the DOM