www.wired.com
Open in
urlscan Pro
2600:9000:2670:be00:14:11ee:9340:93a1
Public Scan
Submitted URL: https://click.convertkit-mail2.com/qdumewg538u7h7ldmmwflh8kx6wkkb4/e0hph7h0g82k0vs8/aHR0cHM6Ly93d3cud2lyZWQuY29tL3N0b3J5L29wZW5haS1...
Effective URL: https://www.wired.com/story/openai-o1-strawberry-problem-reasoning/
Submission: On September 19 via api from US — Scanned from GB
Effective URL: https://www.wired.com/story/openai-o1-strawberry-problem-reasoning/
Submission: On September 19 via api from US — Scanned from GB
Form analysis
0 forms found in the DOMText Content
Skip to main content Open Navigation Menu Menu Story Saved To revisit this article, visit My Profile, then View saved stories. Close Alert OpenAI Announces a New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step by Step * Security * Politics * Gear * The Big Story * Business * Science * Culture * Ideas * Merch Story Saved To revisit this article, visit My Profile, then View saved stories. Close Alert Sign In SUBSCRIBE Search Search * Security * Politics * Gear * The Big Story * Business * Science * Culture * Ideas * Merch * Podcasts * Video * Newsletters * Magazine * Travel * Steven Levy's Plaintext Column * WIRED Classics from the Archive * Events * WIRED Insider * WIRED Consulting * Jobs * Coupons Chevron ON SALE NOWGet a year of WIRED - now only £2.50 £1 per monthThis is your last free article. See the future here first with 1 year of unlimited access.SUBSCRIBE NOW Already a member? Sign in Get a year of WIRED - now only £2.50 £1 per month.SUBSCRIBE NOW Will Knight Business Sep 12, 2024 1:05 PM OPENAI ANNOUNCES A NEW AI MODEL, CODE-NAMED STRAWBERRY, THAT SOLVES DIFFICULT PROBLEMS STEP BY STEP The ChatGPT maker reveals details of what’s officially known as OpenAI o1, which shows that AI needs more than scale to advance. * Facebook * X * Email * Save Story To revisit this article, visit My Profile, then View saved stories. Photo-Illustration: WIRED Staff/Getty Images Save this storySave Save this storySave OpenAI made the last big breakthrough in artificial intelligence by increasing the size of its models to dizzying proportions, when it introduced GPT-4 last year. The company today announced a new advance that signals a shift in approach—a model that can “reason” logically through many difficult problems and is significantly smarter than existing AI without a major scale-up. The new model, dubbed OpenAI o1, can solve problems that stump existing AI models, including OpenAI’s most powerful existing model, GPT-4o. Rather than summon up an answer in one step, as a large language model normally does, it reasons through the problem, effectively thinking out loud as a person might, before arriving at the right result. “This is what we consider the new paradigm in these models,” Mira Murati, OpenAI’s chief technology officer, tells WIRED. “It is much better at tackling very complex reasoning tasks.” Featured Video AI Expert Answers Prompt Engineering Questions From Twitter The new model was code-named Strawberry within OpenAI, and it is not a successor to GPT-4o but rather a complement to it, the company says. Murati says that OpenAI is currently building its next master model, GPT-5, which will be considerably larger than its predecessor. But while the company still believes that scale will help wring new abilities out of AI, GPT-5 is likely to also include the reasoning technology introduced today. “There are two paradigms,” Murati says. “The scaling paradigm and this new paradigm. We expect that we will bring them together.” LLMs typically conjure their answers from huge neural networks fed vast quantities of training data. They can exhibit remarkable linguistic and logical abilities, but traditionally struggle with surprisingly simple problems such as rudimentary math questions that involve reasoning. Murati says OpenAI o1 uses reinforcement learning, which involves giving a model positive feedback when it gets answers right and negative feedback when it does not, in order to improve its reasoning process. “The model sharpens its thinking and fine tunes the strategies that it uses to get to the answer,” she says. Reinforcement learning has enabled computers to play games with superhuman skill and do useful tasks like designing computer chips. The technique is also a key ingredient for turning an LLM into a useful and well-behaved chatbot. Mark Chen, vice president of research at OpenAI, demonstrated the new model to WIRED, using it to solve several problems that its prior model, GPT-4o, cannot. These included an advanced chemistry question and the following mind-bending mathematical puzzle: “A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of the prince and princess?” (The correct answer is that the prince is 30, and the princess is 40). “The [new] model is learning to think for itself, rather than kind of trying to imitate the way humans would think,” as a conventional LLM does, Chen says. OpenAI says its new model performs markedly better on a number of problem sets, including ones focused on coding, math, physics, biology, and chemistry. On the American Invitational Mathematics Examination (AIME), a test for math students, GPT-4o solved on average 12 percent of the problems while o1 got 83 percent right, according to the company. Most Popular * Gear The Top New Features Coming to Apple’s iOS 18 and iPadOS 18 By Julian Chokkattu * The Big Story After Shark Tank, Mark Cuban Just Wants to Break Shit—Especially the Prescription Drug Industry By Lauren Goode * Science How to View the ‘Comet of the Century’ C/2023 A3 By Jorge Garay * Gear How to Turn Your Smartphone Into a Dumb Phone By David Nield * The new model is slower than GPT-4o, and OpenAI says it does not always perform better—in part because, unlike GPT-4o, it cannot search the web and it is not multimodal, meaning it cannot parse images or audio. Improving the reasoning capabilities of LLMs has been a hot topic in research circles for some time. Indeed, rivals are pursuing similar research lines. In July, Google announced AlphaProof, a project that combines language models with reinforcement learning for solving difficult math problems. AlphaProof was able to learn how to reason over math problems by looking at correct answers. A key challenge with broadening this kind of learning is that there are not correct answers for everything a model might encounter. Chen says OpenAI has succeeded in building a reasoning system that is much more general. “I do think we have made some breakthroughs there; I think it is part of our edge,” Chen says. “It’s actually fairly good at reasoning across all domains.” Noah Goodman, a professor at Stanford who has published work on improving the reasoning abilities of LLMs, says the key to more generalized training may involve using a “carefully prompted language model and handcrafted data” for training. He adds that being able to consistently trade the speed of results for greater accuracy would be a “nice advance.” Yoon Kim, an assistant professor at MIT, says how LLMs solve problems currently remains somewhat mysterious, and even if they perform step-by-step reasoning there may be key differences from human intelligence. This could be crucial as the technology becomes more widely used. “These are systems that would be potentially making decisions that affect many, many people,” he says. “The larger question is, do we need to be confident about how a computational model is arriving at the decisions?” The technique introduced by OpenAI today also may help ensure that AI models behave well. Murati says the new model has shown itself to be better at avoiding producing unpleasant or potentially harmful output by reasoning about the outcome of its actions. “If you think about teaching children, they learn much better to align to certain norms, behaviors, and values once they can reason about why they’re doing a certain thing,” she says. Oren Etzioni, a professor emeritus at the University of Washington and a prominent AI expert, says it’s “essential to enable LLMs to engage in multi-step problem solving, use tools, and solve complex problems.” He adds, “Pure scale up will not deliver this.” Etzioni says, however, that there are further challenges ahead. “Even if reasoning were solved, we would still have the challenge of hallucination and factuality.” OpenAI’s Chen says that the new reasoning approach developed by the company shows that advancing AI need not cost ungodly amounts of compute power. “One of the exciting things about the paradigm is we believe that it’ll allow us to ship intelligence cheaper,” he says, “and I think that really is the core mission of our company.” YOU MIGHT ALSO LIKE … * In your inbox: The best and weirdest stories from WIRED’s archive * Elon Musk is a national security risk * Interview: Meredith Whittaker is out to prove capitalism wrong * How do you solve a problem like Polestar? * Event: Join us for The Big Interview on December 3 in San Francisco Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the AI Lab newsletter, a weekly dispatch from beyond the cutting edge of AI—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental advances in AI and China’s AI... Read more Senior Writer * X TopicsOpenAIartificial intelligence Read More After Shark Tank, Mark Cuban Just Wants to Break Shit—Especially the Prescription Drug Industry The billionaire is done with his hit TV show, but he’s still invested in fighting pharma’s middlemen, being a dad, and needling Elon Musk. Lauren Goode The World’s Biggest Bitcoin Mine Is Rattling This Texas Oil Town A cash-strapped city in rural Texas will soon be home to the world’s largest bitcoin mine. Local protesters are “raising hell.” Joel Khalili How Do You Solve a Problem Like Polestar? The all-electric sibling of Volvo has a new CEO, new models landing, and a new plant in South Carolina—but will this be enough to stop the EV brand's decline? Carlton Reid AI Has Helped Shein Become Fast Fashion’s Biggest Polluter The company nearly doubled its emissions in 2023, making it the worst actor in a notoriously unsustainable industry. Sachi Mulkey The Godmother of AI Wants Everyone to Be a World Builder Stanford computer scientist Fei-Fei Li is unveiling a startup that aims to teach AI systems deep knowledge of physical reality. Investors are throwing money at it. Steven Levy WIRED is where tomorrow is realized. It is the essential source of information and ideas that make sense of a world in constant transformation. The WIRED conversation illuminates how technology is changing every aspect of our lives—from culture to business, science to design. The breakthroughs and innovations that we uncover lead to new ways of thinking, new connections, and new industries. More From WIRED * Subscribe * Newsletters * FAQ * WIRED Staff * Editorial Standards * Archive * RSS * Accessibility Help Reviews and Guides * Reviews * Buying Guides * Mattresses * Electric Bikes * Soundbars * Streaming Guides * Coupons * Submit an Offer * Become a Partner * Coupons Contact * Code Guarantee * Advertise * Contact Us * Customer Care * Jobs * Press Center * Condé Nast Store * User Agreement * Privacy Policy * Your California Privacy Rights © 2024 Condé Nast. All rights reserved. WIRED may earn a portion of sales from products that are purchased through our site as part of our Affiliate Partnerships with retailers. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast. Ad Choices SELECT INTERNATIONAL SITE United StatesLargeChevron * Italia * Japón * Czech Republic & Slovakia * Facebook * X * Pinterest * YouTube * Instagram * Tiktok Manage Preferences WE CARE ABOUT YOUR PRIVACY We and our 184 partners store and/or access information on a device, such as unique IDs in cookies to process personal data. You may accept or manage your choices by clicking below, including your right to object where legitimate interest is used, or at any time in the privacy policy page. These choices will be signaled to our partners and will not affect browsing data.More Information WE AND OUR PARTNERS PROCESS DATA TO PROVIDE: Use precise geolocation data. Actively scan device characteristics for identification. Store and/or access information on a device. Personalised advertising and content, advertising and content measurement, audience research and services development. List of Partners (vendors) I Accept Your Privacy Choices