arstechnica.com
Open in
urlscan Pro
3.13.161.146
Public Scan
Submitted URL: https://link.mail.beehiiv.com/ss/c/qsUdASMblJlxRZ06v4dZZTgWydaEYlSsHbQFyY6gC4zoiPaEs8POUQA_UgjXB2XO-kckx3XXD01QH4JxV598iaNKyxg...
Effective URL: https://arstechnica.com/information-technology/2023/09/telling-ai-model-to-take-a-deep-breath-causes-math-scores-to-soar...
Submission: On September 25 via api from US — Scanned from DE
Effective URL: https://arstechnica.com/information-technology/2023/09/telling-ai-model-to-take-a-deep-breath-causes-math-scores-to-soar...
Submission: On September 25 via api from US — Scanned from DE
Form analysis
1 forms found in the DOMGET /search/
<form action="/search/" method="GET" id="search_form">
<input type="hidden" name="ie" value="UTF-8">
<input type="text" name="q" id="hdr_search_input" value="" aria-label="Search..." placeholder="Search...">
</form>
Text Content
Skip to main content * Biz & IT * Tech * Science * Policy * Cars * Gaming & Culture * Store * Forums Subscribe Close NAVIGATE * Store * Subscribe * Videos * Features * Reviews * RSS Feeds * Mobile Site * About Ars * Staff Directory * Contact Us * Advertise with Ars * Reprints FILTER BY TOPIC * Biz & IT * Tech * Science * Policy * Cars * Gaming & Culture * Store * Forums SETTINGS Front page layout Grid List Site theme light dark Sign in BABY STEPS, BABY STEPS — TELLING AI MODEL TO “TAKE A DEEP BREATH” CAUSES MATH SCORES TO SOAR IN STUDY DEEPMIND USED AI MODELS TO OPTIMIZE THEIR OWN PROMPTS, WITH SURPRISING RESULTS. Benj Edwards - 9/19/2023, 11:38 PM Enlarge Getty Images READER COMMENTS 92 with Google DeepMind researchers recently developed a technique to improve math ability in AI language models like ChatGPT by using other AI models to improve prompting—the written instructions that tell the AI model what to do. It found that using human-style encouragement improved math skills dramatically, in line with earlier results. In a paper called "Large Language Models as Optimizers" listed this month on arXiv, DeepMind scientists introduced Optimization by PROmpting (OPRO), a method to improve the performance of large language models (LLMs) such as OpenAI’s ChatGPT and Google’s PaLM 2. This new approach sidesteps the limitations of traditional math-based optimizers by using natural language to guide LLMs in problem-solving. "Natural language" is a fancy way of saying everyday human speech. FURTHER READING A jargon-free explanation of how AI large language models work "Instead of formally defining the optimization problem and deriving the update step with a programmed solver," the researchers write, "we describe the optimization problem in natural language, then instruct the LLM to iteratively generate new solutions based on the problem description and the previously found solutions." Typically, in machine learning, techniques using algorithms such as derivative-based optimizers act as a guide for improving an AI model's performance. Imagine a model's performance as a curve on a graph: The goal is to find the lowest point on this curve because that's where the model makes the fewest mistakes. By using the slope of the curve to make adjustments, the optimizer helps the model get closer and closer to that ideal low point, making it more accurate and efficient at whatever task it's designed to do. Advertisement Rather than relying on formal mathematical definitions to perform this task, OPRO uses "meta-prompts" described in natural language to set the stage for the optimization process. The LLM then generates candidate solutions based on the problem’s description and previous solutions, and it tests them by assigning each a quality score. In OPRO, two large language models play different roles: a scorer LLM evaluates the objective function such as accuracy, while an optimizer LLM generates new solutions based on past results and a natural language description. Different pairings of scorer and optimizer LLMs are evaluated, including models like PaLM 2 and GPT variants. OPRO can optimize prompts for the scorer LLM by having the optimizer iteratively generate higher-scoring prompts. These scores help the system identify the best solutions, which are then added back into the 'meta-prompt' for the next round of optimization. “TAKE A DEEP BREATH AND WORK ON THIS STEP BY STEP” Perhaps the most intriguing part of the DeepMind study is the impact of specific phrases on the output. Phrases like "let's think step by step" prompted each AI model to produce more accurate results when tested against math problem data sets. (This technique became widely known in May 2022 thanks to a now-famous paper titled "Large Language Models are Zero-Shot Reasoners.") FURTHER READING The AI race heats up: Google announces PaLM 2, its answer to GPT-4 Consider a simple word problem, such as, "Beth bakes four two-dozen batches of cookies in a week. If these cookies are shared among 16 people equally, how many cookies does each person consume?" The 2022 paper discovered that instead of just feeding a chatbot a word problem like this by itself, you'd instead prefix it with "Let's think step by step" and then paste in the problem. The accuracy of the AI model's results almost always improves, and it works well with ChatGPT. Advertisement Interestingly, in this latest study, DeepMind researchers found "Take a deep breath and work on this problem step by step" to be the most effective prompt when used with Google's PaLM 2 language model. The phrase achieved the top accuracy score of 80.2 percent in tests against GSM8K, which is a data set of grade-school math word problems. By comparison, PaLM 2, without any special prompting, scored only 34 percent accuracy on GSM8K, and the classic "Let’s think step by step" prompt scored 71.8 percent accuracy. So why does this work? Obviously, large language models can't take a deep breath because they don't have lungs or bodies. They don't think and reason like humans, either. What "reasoning" they do (and "reasoning" is a contentious term among some, though it is readily used as a term of art in AI) is borrowed from a massive data set of language phrases scraped from books and the web. That includes things like Q&A forums, which include many examples of "let's take a deep breath" or "think step by step" before showing more carefully reasoned solutions. Those phrases may help the LLM tap into better answers or produce better examples of reasoning or problem-solving from the data set it absorbed into its neural network during training. Even though working out the best ways to give LLMs human-like encouragement is slightly puzzling to us, that's not a problem for OPRO because the technique utilizes large language models to discover these more effective prompting phrases. DeepMind researchers think that the biggest win for OPRO is its ability to sift through many possible prompts to find the one that gives the best results for a specific problem. This could allow people to produce far more useful or accurate results from LLMs in the future. READER COMMENTS 92 with Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. Advertisement CHANNEL ARS TECHNICA UNSOLVED MYSTERIES OF QUANTUM LEAP WITH DONALD P. BELLISARIO Today "Quantum Leap" series creator Donald P. Bellisario joins Ars Technica to answer once and for all the lingering questions we have about his enduringly popular show. Was Dr. Sam Beckett really leaping between all those time periods and people or did he simply imagine it all? What do people in the waiting room do while Sam is in their bodies? What happens to Sam's loyal ally Al? 30 years following the series finale, answers to these mysteries and more await. * UNSOLVED MYSTERIES OF QUANTUM LEAP WITH DONALD P. BELLISARIO * UNSOLVED MYSTERIES OF WARHAMMER 40K WITH AUTHOR DAN ABNETT * SITREP: F-16 REPLACEMENT SEARCH A SIGNAL OF F-35 FAIL? * SITREP: BOEING 707 * STEVE BURKE OF GAMERSNEXUS REACTS TO THEIR TOP 1000 COMMENTS ON YOUTUBE * MODERN VINTAGE GAMER REACTS TO HIS TOP 1000 COMMENTS ON YOUTUBE * HOW THE NES CONQUERED A SKEPTICAL AMERICA IN 1985 * SCOTT MANLEY REACTS TO HIS TOP 1000 YOUTUBE COMMENTS * HOW HORROR WORKS IN AMNESIA: REBIRTH, SOMA AND AMNESIA: THE DARK DESCENT * LGR'S CLINT BASINGER REACTS TO HIS TOP 1000 YOUTUBE COMMENTS * THE F-35'S NEXT TECH UPGRADE * HOW ONE GAMEPLAY DECISION CHANGED DIABLO FOREVER * UNSOLVED MORTAL KOMBAT MYSTERIES WITH DOMINIC CIANCIOLO FROM NETHERREALM STUDIOS * US NAVY GETS AN ITALIAN ACCENT * HOW AMAZON’S “UNDONE” ANIMATES DREAMS WITH ROTOSCOPING AND OIL PAINTS * FIGHTER PILOT BREAKS DOWN EVERY BUTTON IN AN F-15 COCKPIT * HOW NBA JAM BECAME A BILLION-DOLLAR SLAM DUNK * LINUS "TECH TIPS" SEBASTIAN REACTS TO HIS TOP 1000 YOUTUBE COMMENTS * HOW ALAN WAKE WAS REBUILT 3 YEARS INTO DEVELOPMENT * HOW PRINCE OF PERSIA DEFEATED APPLE II'S MEMORY LIMITATIONS * HOW CRASH BANDICOOT HACKED THE ORIGINAL PLAYSTATION * MYST: THE CHALLENGES OF CD-ROM | WAR STORIES * MARKIPLIER REACTS TO HIS TOP 1000 YOUTUBE COMMENTS * HOW MIND CONTROL SAVED ODDWORLD: ABE'S ODDYSEE * BIOWARE ANSWERS UNSOLVED MYSTERIES OF THE MASS EFFECT UNIVERSE * CIVILIZATION: IT'S GOOD TO TAKE TURNS | WAR STORIES * SITREP: DOD RESETS BALLISTIC MISSILE INTERCEPTOR PROGRAM * WARFRAME'S REBECCA FORD REVIEWS YOUR CHARACTERS * SUBNAUTICA: A WORLD WITHOUT GUNS | WAR STORIES * HOW SLAY THE SPIRE’S ORIGINAL INTERFACE ALMOST KILLED THE GAME | WAR STORIES * AMNESIA: THE DARK DESCENT - THE HORROR FACADE | WAR STORIES * COMMAND & CONQUER: TIBERIAN SUN | WAR STORIES * BLADE RUNNER: SKINJOBS, VOXELS, AND FUTURE NOIR | WAR STORIES * DEAD SPACE: THE DRAG TENTACLE | WAR STORIES * TEACH THE CONTROVERSY: FLAT EARTHERS * DELTA V: THE BURGEONING WORLD OF SMALL ROCKETS, PAUL ALLEN'S HUGE PLANE, AND SPACEX GETS A CRUCIAL GREEN-LIGHT * CHRIS HADFIELD EXPLAINS HIS 'SPACE ODDITY' VIDEO * THE GREATEST LEAP, EPISODE 1: RISK * ULTIMA ONLINE: THE VIRTUAL ECOLOGY | WAR STORIES More videos ← Previous story Next story → RELATED STORIES by Taboolaby Taboola Sponsored LinksSponsored Links Promoted LinksPromoted Links Hundeapotheke Bayern 3 giftige Lebensmittel für Hunde: Das eine Fleisch, das du deinem Hund niemals geben solltest.Hundeapotheke Bayern Undo Enpal Solar lohnt sich nur, wenn Ihr Dach...Enpal Undo Chooslee Die 10 schnellsten Autos der WeltChoosleeJetzt Anschauen Undo Pflege-Ratgeber24 Achtung: Wer einen Pflegegrad hat, der lässt oft bares Geld liegen!Pflege-Ratgeber24Mehr erfahren Undo Apotheken Magazin Privatversichert? Genialer Trick reduziert bis zu 70% der BeiträgeApotheken Magazin Undo House Coast Jeder Autotürgriff hat diese versteckte FunktionHouse Coast Undo TODAY ON ARS * Store * Subscribe * About Us * RSS Feeds * View Mobile Site * Contact Us * Staff * Advertise with us * Reprints NEWSLETTER SIGNUP Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox. Sign me up → CNMN Collection WIRED Media Group © 2023 Condé Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from links on this site. Read our affiliate link policy. Your California Privacy Rights | Manage Preferences The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast. Ad Choices We and our partners store and/or access information on a device, such as unique IDs in cookies to process personal data. You may accept or manage your choices by clicking below or at any time in the privacy policy page. These choices will be signaled to our partners and will not affect browsing data.More information about your privacy WE AND OUR PARTNERS PROCESS DATA TO PROVIDE: Use precise geolocation data. Actively scan device characteristics for identification. Store and/or access information on a device. Personalised ads and content, ad and content measurement, audience insights and product development. List of Partners (vendors) I Accept Show Purposes