arstechnica.com Open in urlscan Pro
3.13.161.146  Public Scan

Submitted URL: https://link.mail.beehiiv.com/ss/c/qsUdASMblJlxRZ06v4dZZTgWydaEYlSsHbQFyY6gC4zoiPaEs8POUQA_UgjXB2XO-kckx3XXD01QH4JxV598iaNKyxg...
Effective URL: https://arstechnica.com/information-technology/2023/09/telling-ai-model-to-take-a-deep-breath-causes-math-scores-to-soar...
Submission: On September 25 via api from US — Scanned from DE

Form analysis 1 forms found in the DOM

GET /search/

<form action="/search/" method="GET" id="search_form">
  <input type="hidden" name="ie" value="UTF-8">
  <input type="text" name="q" id="hdr_search_input" value="" aria-label="Search..." placeholder="Search...">
</form>

Text Content

Skip to main content
 * Biz & IT
 * Tech
 * Science
 * Policy
 * Cars
 * Gaming & Culture
 * Store
 * Forums

Subscribe

Close


NAVIGATE

 * Store
 * Subscribe
 * Videos
 * Features
 * Reviews

 * RSS Feeds
 * Mobile Site

 * About Ars
 * Staff Directory
 * Contact Us

 * Advertise with Ars
 * Reprints


FILTER BY TOPIC

 * Biz & IT
 * Tech
 * Science
 * Policy
 * Cars
 * Gaming & Culture
 * Store
 * Forums


SETTINGS

Front page layout


Grid


List


Site theme

light

dark

Sign in

BABY STEPS, BABY STEPS —


TELLING AI MODEL TO “TAKE A DEEP BREATH” CAUSES MATH SCORES TO SOAR IN STUDY


DEEPMIND USED AI MODELS TO OPTIMIZE THEIR OWN PROMPTS, WITH SURPRISING RESULTS.

Benj Edwards - 9/19/2023, 11:38 PM

Enlarge
Getty Images

READER COMMENTS

92 with

Google DeepMind researchers recently developed a technique to improve math
ability in AI language models like ChatGPT by using other AI models to improve
prompting—the written instructions that tell the AI model what to do. It found
that using human-style encouragement improved math skills dramatically, in line
with earlier results.

In a paper called "Large Language Models as Optimizers" listed this month on
arXiv, DeepMind scientists introduced Optimization by PROmpting (OPRO), a method
to improve the performance of large language models (LLMs) such as OpenAI’s
ChatGPT and Google’s PaLM 2. This new approach sidesteps the limitations of
traditional math-based optimizers by using natural language to guide LLMs in
problem-solving. "Natural language" is a fancy way of saying everyday human
speech.


FURTHER READING

A jargon-free explanation of how AI large language models work

"Instead of formally defining the optimization problem and deriving the update
step with a programmed solver," the researchers write, "we describe the
optimization problem in natural language, then instruct the LLM to iteratively
generate new solutions based on the problem description and the previously found
solutions."

Typically, in machine learning, techniques using algorithms such as
derivative-based optimizers act as a guide for improving an AI model's
performance. Imagine a model's performance as a curve on a graph: The goal is to
find the lowest point on this curve because that's where the model makes the
fewest mistakes. By using the slope of the curve to make adjustments, the
optimizer helps the model get closer and closer to that ideal low point, making
it more accurate and efficient at whatever task it's designed to do.

Advertisement


Rather than relying on formal mathematical definitions to perform this task,
OPRO uses "meta-prompts" described in natural language to set the stage for the
optimization process. The LLM then generates candidate solutions based on the
problem’s description and previous solutions, and it tests them by assigning
each a quality score.

In OPRO, two large language models play different roles: a scorer LLM evaluates
the objective function such as accuracy, while an optimizer LLM generates new
solutions based on past results and a natural language description. Different
pairings of scorer and optimizer LLMs are evaluated, including models like PaLM
2 and GPT variants. OPRO can optimize prompts for the scorer LLM by having the
optimizer iteratively generate higher-scoring prompts. These scores help the
system identify the best solutions, which are then added back into the
'meta-prompt' for the next round of optimization.


“TAKE A DEEP BREATH AND WORK ON THIS STEP BY STEP”

Perhaps the most intriguing part of the DeepMind study is the impact of specific
phrases on the output. Phrases like "let's think step by step" prompted each AI
model to produce more accurate results when tested against math problem data
sets. (This technique became widely known in May 2022 thanks to a now-famous
paper titled "Large Language Models are Zero-Shot Reasoners.")


FURTHER READING

The AI race heats up: Google announces PaLM 2, its answer to GPT-4

Consider a simple word problem, such as, "Beth bakes four two-dozen batches of
cookies in a week. If these cookies are shared among 16 people equally, how many
cookies does each person consume?" The 2022 paper discovered that instead of
just feeding a chatbot a word problem like this by itself, you'd instead prefix
it with "Let's think step by step" and then paste in the problem. The accuracy
of the AI model's results almost always improves, and it works well with
ChatGPT.

Advertisement


Interestingly, in this latest study, DeepMind researchers found "Take a deep
breath and work on this problem step by step" to be the most effective prompt
when used with Google's PaLM 2 language model. The phrase achieved the top
accuracy score of 80.2 percent in tests against GSM8K, which is a data set of
grade-school math word problems. By comparison, PaLM 2, without any special
prompting, scored only 34 percent accuracy on GSM8K, and the classic "Let’s
think step by step" prompt scored 71.8 percent accuracy.

So why does this work? Obviously, large language models can't take a deep breath
because they don't have lungs or bodies. They don't think and reason like
humans, either. What "reasoning" they do (and "reasoning" is a contentious term
among some, though it is readily used as a term of art in AI) is borrowed from a
massive data set of language phrases scraped from books and the web. That
includes things like Q&A forums, which include many examples of "let's take a
deep breath" or "think step by step" before showing more carefully reasoned
solutions. Those phrases may help the LLM tap into better answers or produce
better examples of reasoning or problem-solving from the data set it absorbed
into its neural network during training.

Even though working out the best ways to give LLMs human-like encouragement is
slightly puzzling to us, that's not a problem for OPRO because the technique
utilizes large language models to discover these more effective prompting
phrases. DeepMind researchers think that the biggest win for OPRO is its ability
to sift through many possible prompts to find the one that gives the best
results for a specific problem. This could allow people to produce far more
useful or accurate results from LLMs in the future.



READER COMMENTS

92 with
Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars
Technica. In his free time, he writes and records music, collects vintage
computers, and enjoys nature. He lives in Raleigh, NC.

Advertisement




CHANNEL ARS TECHNICA

UNSOLVED MYSTERIES OF QUANTUM LEAP WITH DONALD P. BELLISARIO

Today "Quantum Leap" series creator Donald P. Bellisario joins Ars Technica to
answer once and for all the lingering questions we have about his enduringly
popular show. Was Dr. Sam Beckett really leaping between all those time periods
and people or did he simply imagine it all? What do people in the waiting room
do while Sam is in their bodies? What happens to Sam's loyal ally Al? 30 years
following the series finale, answers to these mysteries and more await.

 * UNSOLVED MYSTERIES OF QUANTUM LEAP WITH DONALD P. BELLISARIO

 * UNSOLVED MYSTERIES OF WARHAMMER 40K WITH AUTHOR DAN ABNETT

 * SITREP: F-16 REPLACEMENT SEARCH A SIGNAL OF F-35 FAIL?

 * SITREP: BOEING 707

 * STEVE BURKE OF GAMERSNEXUS REACTS TO THEIR TOP 1000 COMMENTS ON YOUTUBE

 * MODERN VINTAGE GAMER REACTS TO HIS TOP 1000 COMMENTS ON YOUTUBE

 * HOW THE NES CONQUERED A SKEPTICAL AMERICA IN 1985

 * SCOTT MANLEY REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * HOW HORROR WORKS IN AMNESIA: REBIRTH, SOMA AND AMNESIA: THE DARK DESCENT

 * LGR'S CLINT BASINGER REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * THE F-35'S NEXT TECH UPGRADE

 * HOW ONE GAMEPLAY DECISION CHANGED DIABLO FOREVER

 * UNSOLVED MORTAL KOMBAT MYSTERIES WITH DOMINIC CIANCIOLO FROM NETHERREALM
   STUDIOS

 * US NAVY GETS AN ITALIAN ACCENT

 * HOW AMAZON’S “UNDONE” ANIMATES DREAMS WITH ROTOSCOPING AND OIL PAINTS

 * FIGHTER PILOT BREAKS DOWN EVERY BUTTON IN AN F-15 COCKPIT

 * HOW NBA JAM BECAME A BILLION-DOLLAR SLAM DUNK

 * LINUS "TECH TIPS" SEBASTIAN REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * HOW ALAN WAKE WAS REBUILT 3 YEARS INTO DEVELOPMENT

 * HOW PRINCE OF PERSIA DEFEATED APPLE II'S MEMORY LIMITATIONS

 * HOW CRASH BANDICOOT HACKED THE ORIGINAL PLAYSTATION

 * MYST: THE CHALLENGES OF CD-ROM | WAR STORIES

 * MARKIPLIER REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * HOW MIND CONTROL SAVED ODDWORLD: ABE'S ODDYSEE

 * BIOWARE ANSWERS UNSOLVED MYSTERIES OF THE MASS EFFECT UNIVERSE

 * CIVILIZATION: IT'S GOOD TO TAKE TURNS | WAR STORIES

 * SITREP: DOD RESETS BALLISTIC MISSILE INTERCEPTOR PROGRAM

 * WARFRAME'S REBECCA FORD REVIEWS YOUR CHARACTERS

 * SUBNAUTICA: A WORLD WITHOUT GUNS | WAR STORIES

 * HOW SLAY THE SPIRE’S ORIGINAL INTERFACE ALMOST KILLED THE GAME | WAR STORIES

 * AMNESIA: THE DARK DESCENT - THE HORROR FACADE | WAR STORIES

 * COMMAND & CONQUER: TIBERIAN SUN | WAR STORIES

 * BLADE RUNNER: SKINJOBS, VOXELS, AND FUTURE NOIR | WAR STORIES

 * DEAD SPACE: THE DRAG TENTACLE | WAR STORIES

 * TEACH THE CONTROVERSY: FLAT EARTHERS

 * DELTA V: THE BURGEONING WORLD OF SMALL ROCKETS, PAUL ALLEN'S HUGE PLANE, AND
   SPACEX GETS A CRUCIAL GREEN-LIGHT

 * CHRIS HADFIELD EXPLAINS HIS 'SPACE ODDITY' VIDEO

 * THE GREATEST LEAP, EPISODE 1: RISK

 * ULTIMA ONLINE: THE VIRTUAL ECOLOGY | WAR STORIES

More videos
← Previous story Next story →


RELATED STORIES

by Taboolaby Taboola
Sponsored LinksSponsored Links
Promoted LinksPromoted Links
Hundeapotheke Bayern

3 giftige Lebensmittel für Hunde: Das eine Fleisch, das du deinem Hund niemals
geben solltest.Hundeapotheke Bayern


Undo
Enpal

Solar lohnt sich nur, wenn Ihr Dach...Enpal


Undo
Chooslee

Die 10 schnellsten Autos der WeltChoosleeJetzt Anschauen


Undo
Pflege-Ratgeber24

Achtung: Wer einen Pflegegrad hat, der lässt oft bares Geld
liegen!Pflege-Ratgeber24Mehr erfahren


Undo
Apotheken Magazin

Privatversichert? Genialer Trick reduziert bis zu 70% der BeiträgeApotheken
Magazin


Undo
House Coast

Jeder Autotürgriff hat diese versteckte FunktionHouse Coast


Undo



TODAY ON ARS

 * Store
 * Subscribe
 * About Us
 * RSS Feeds
 * View Mobile Site

 * Contact Us
 * Staff
 * Advertise with us
 * Reprints


NEWSLETTER SIGNUP

Join the Ars Orbital Transmission mailing list to get weekly updates delivered
to your inbox. Sign me up →



CNMN Collection
WIRED Media Group
© 2023 Condé Nast. All rights reserved. Use of and/or registration on any
portion of this site constitutes acceptance of our User Agreement (updated
1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars
Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from
links on this site. Read our affiliate link policy.
Your California Privacy Rights | Manage Preferences
The material on this site may not be reproduced, distributed, transmitted,
cached or otherwise used, except with the prior written permission of Condé
Nast.
Ad Choices



We and our partners store and/or access information on a device, such as unique
IDs in cookies to process personal data. You may accept or manage your choices
by clicking below or at any time in the privacy policy page. These choices will
be signaled to our partners and will not affect browsing data.More information
about your privacy


WE AND OUR PARTNERS PROCESS DATA TO PROVIDE:

Use precise geolocation data. Actively scan device characteristics for
identification. Store and/or access information on a device. Personalised ads
and content, ad and content measurement, audience insights and product
development. List of Partners (vendors)

I Accept
Show Purposes