www.wired.com Open in urlscan Pro
2600:9000:2670:be00:14:11ee:9340:93a1  Public Scan

Submitted URL: https://click.convertkit-mail2.com/qdumewg538u7h7ldmmwflh8kx6wkkb4/e0hph7h0g82k0vs8/aHR0cHM6Ly93d3cud2lyZWQuY29tL3N0b3J5L29wZW5haS1...
Effective URL: https://www.wired.com/story/openai-o1-strawberry-problem-reasoning/
Submission: On September 19 via api from US — Scanned from GB

Form analysis 0 forms found in the DOM

Text Content

Skip to main content

Open Navigation Menu
Menu
Story Saved

To revisit this article, visit My Profile, then View saved stories.

Close Alert


OpenAI Announces a New AI Model, Code-Named Strawberry, That Solves Difficult
Problems Step by Step
 * Security
 * Politics
 * Gear
 * The Big Story
 * Business
 * Science
 * Culture
 * Ideas
 * Merch

Story Saved

To revisit this article, visit My Profile, then View saved stories.

Close Alert

Sign In

SUBSCRIBE



Search
Search
 * Security
 * Politics
 * Gear
 * The Big Story
 * Business
 * Science
 * Culture
 * Ideas
 * Merch

 * Podcasts
 * Video
 * Newsletters
 * Magazine
 * Travel
 * Steven Levy's Plaintext Column
 * WIRED Classics from the Archive
 * Events
 * WIRED Insider
 * WIRED Consulting
 * Jobs
 * Coupons

Chevron
ON SALE NOWGet a year of WIRED - now only £2.50 £1 per monthThis is your last
free article. See the future here first with 1 year of unlimited
access.SUBSCRIBE NOW
Already a member? Sign in

Get a year of WIRED - now only £2.50 £1 per month.SUBSCRIBE NOW



Will Knight
Business
Sep 12, 2024 1:05 PM


OPENAI ANNOUNCES A NEW AI MODEL, CODE-NAMED STRAWBERRY, THAT SOLVES DIFFICULT
PROBLEMS STEP BY STEP

The ChatGPT maker reveals details of what’s officially known as OpenAI o1, which
shows that AI needs more than scale to advance.
 * Facebook
 * X
 * Email
 * Save Story

   To revisit this article, visit My Profile, then View saved stories.


Photo-Illustration: WIRED Staff/Getty Images
Save this storySave
Save this storySave

OpenAI made the last big breakthrough in artificial intelligence by increasing
the size of its models to dizzying proportions, when it introduced GPT-4 last
year. The company today announced a new advance that signals a shift in
approach—a model that can “reason” logically through many difficult problems and
is significantly smarter than existing AI without a major scale-up.

The new model, dubbed OpenAI o1, can solve problems that stump existing AI
models, including OpenAI’s most powerful existing model, GPT-4o. Rather than
summon up an answer in one step, as a large language model normally does, it
reasons through the problem, effectively thinking out loud as a person might,
before arriving at the right result.



“This is what we consider the new paradigm in these models,” Mira Murati,
OpenAI’s chief technology officer, tells WIRED. “It is much better at tackling
very complex reasoning tasks.”

Featured Video



AI Expert Answers Prompt Engineering Questions From Twitter

The new model was code-named Strawberry within OpenAI, and it is not a successor
to GPT-4o but rather a complement to it, the company says.

Murati says that OpenAI is currently building its next master model, GPT-5,
which will be considerably larger than its predecessor. But while the company
still believes that scale will help wring new abilities out of AI, GPT-5 is
likely to also include the reasoning technology introduced today. “There are two
paradigms,” Murati says. “The scaling paradigm and this new paradigm. We expect
that we will bring them together.”



LLMs typically conjure their answers from huge neural networks fed vast
quantities of training data. They can exhibit remarkable linguistic and logical
abilities, but traditionally struggle with surprisingly simple problems such as
rudimentary math questions that involve reasoning.



Murati says OpenAI o1 uses reinforcement learning, which involves giving a model
positive feedback when it gets answers right and negative feedback when it does
not, in order to improve its reasoning process. “The model sharpens its thinking
and fine tunes the strategies that it uses to get to the answer,” she says.
Reinforcement learning has enabled computers to play games with superhuman skill
and do useful tasks like designing computer chips. The technique is also a key
ingredient for turning an LLM into a useful and well-behaved chatbot.

Mark Chen, vice president of research at OpenAI, demonstrated the new model to
WIRED, using it to solve several problems that its prior model, GPT-4o, cannot.
These included an advanced chemistry question and the following mind-bending
mathematical puzzle: “A princess is as old as the prince will be when the
princess is twice as old as the prince was when the princess’s age was half the
sum of their present age. What is the age of the prince and princess?” (The
correct answer is that the prince is 30, and the princess is 40).

“The [new] model is learning to think for itself, rather than kind of trying to
imitate the way humans would think,” as a conventional LLM does, Chen says.

OpenAI says its new model performs markedly better on a number of problem sets,
including ones focused on coding, math, physics, biology, and chemistry. On the
American Invitational Mathematics Examination (AIME), a test for math students,
GPT-4o solved on average 12 percent of the problems while o1 got 83 percent
right, according to the company.

Most Popular
 * Gear
   The Top New Features Coming to Apple’s iOS 18 and iPadOS 18
   By Julian Chokkattu
 * The Big Story
   After Shark Tank, Mark Cuban Just Wants to Break Shit—Especially the
   Prescription Drug Industry
   By Lauren Goode
 * Science
   How to View the ‘Comet of the Century’ C/2023 A3
   By Jorge Garay
 * Gear
   How to Turn Your Smartphone Into a Dumb Phone
   By David Nield
 * 





The new model is slower than GPT-4o, and OpenAI says it does not always perform
better—in part because, unlike GPT-4o, it cannot search the web and it is not
multimodal, meaning it cannot parse images or audio.

Improving the reasoning capabilities of LLMs has been a hot topic in research
circles for some time. Indeed, rivals are pursuing similar research lines. In
July, Google announced AlphaProof, a project that combines language models with
reinforcement learning for solving difficult math problems.

AlphaProof was able to learn how to reason over math problems by looking at
correct answers. A key challenge with broadening this kind of learning is that
there are not correct answers for everything a model might encounter. Chen says
OpenAI has succeeded in building a reasoning system that is much more general.
“I do think we have made some breakthroughs there; I think it is part of our
edge,” Chen says. “It’s actually fairly good at reasoning across all domains.”



Noah Goodman, a professor at Stanford who has published work on improving the
reasoning abilities of LLMs, says the key to more generalized training may
involve using a “carefully prompted language model and handcrafted data” for
training. He adds that being able to consistently trade the speed of results for
greater accuracy would be a “nice advance.”



Yoon Kim, an assistant professor at MIT, says how LLMs solve problems currently
remains somewhat mysterious, and even if they perform step-by-step reasoning
there may be key differences from human intelligence. This could be crucial as
the technology becomes more widely used. “These are systems that would be
potentially making decisions that affect many, many people,” he says. “The
larger question is, do we need to be confident about how a computational model
is arriving at the decisions?”

The technique introduced by OpenAI today also may help ensure that AI models
behave well. Murati says the new model has shown itself to be better at avoiding
producing unpleasant or potentially harmful output by reasoning about the
outcome of its actions. “If you think about teaching children, they learn much
better to align to certain norms, behaviors, and values once they can reason
about why they’re doing a certain thing,” she says.

Oren Etzioni, a professor emeritus at the University of Washington and a
prominent AI expert, says it’s “essential to enable LLMs to engage in multi-step
problem solving, use tools, and solve complex problems.” He adds, “Pure scale up
will not deliver this.” Etzioni says, however, that there are further challenges
ahead. “Even if reasoning were solved, we would still have the challenge of
hallucination and factuality.”

OpenAI’s Chen says that the new reasoning approach developed by the company
shows that advancing AI need not cost ungodly amounts of compute power. “One of
the exciting things about the paradigm is we believe that it’ll allow us to ship
intelligence cheaper,” he says, “and I think that really is the core mission of
our company.”







YOU MIGHT ALSO LIKE …

 * In your inbox: The best and weirdest stories from WIRED’s archive

 * Elon Musk is a national security risk

 * Interview: Meredith Whittaker is out to prove capitalism wrong

 * How do you solve a problem like Polestar?

 * Event: Join us for The Big Interview on December 3 in San Francisco

Will Knight is a senior writer for WIRED, covering artificial intelligence. He
writes the AI Lab newsletter, a weekly dispatch from beyond the cutting edge of
AI—sign up here. He was previously a senior editor at MIT Technology Review,
where he wrote about fundamental advances in AI and China’s AI... Read more
Senior Writer
 * X

TopicsOpenAIartificial intelligence
Read More
After Shark Tank, Mark Cuban Just Wants to Break Shit—Especially the
Prescription Drug Industry
The billionaire is done with his hit TV show, but he’s still invested in
fighting pharma’s middlemen, being a dad, and needling Elon Musk.
Lauren Goode
The World’s Biggest Bitcoin Mine Is Rattling This Texas Oil Town
A cash-strapped city in rural Texas will soon be home to the world’s largest
bitcoin mine. Local protesters are “raising hell.”
Joel Khalili
How Do You Solve a Problem Like Polestar?
The all-electric sibling of Volvo has a new CEO, new models landing, and a new
plant in South Carolina—but will this be enough to stop the EV brand's decline?
Carlton Reid

AI Has Helped Shein Become Fast Fashion’s Biggest Polluter
The company nearly doubled its emissions in 2023, making it the worst actor in a
notoriously unsustainable industry.
Sachi Mulkey
The Godmother of AI Wants Everyone to Be a World Builder
Stanford computer scientist Fei-Fei Li is unveiling a startup that aims to teach
AI systems deep knowledge of physical reality. Investors are throwing money at
it.
Steven Levy





WIRED is where tomorrow is realized. It is the essential source of information
and ideas that make sense of a world in constant transformation. The WIRED
conversation illuminates how technology is changing every aspect of our
lives—from culture to business, science to design. The breakthroughs and
innovations that we uncover lead to new ways of thinking, new connections, and
new industries.

More From WIRED

 * Subscribe
 * Newsletters
 * FAQ
 * WIRED Staff
 * Editorial Standards
 * Archive
 * RSS
 * Accessibility Help

Reviews and Guides

 * Reviews
 * Buying Guides
 * Mattresses
 * Electric Bikes
 * Soundbars
 * Streaming Guides
 * Coupons
 * Submit an Offer
 * Become a Partner
 * Coupons Contact
 * Code Guarantee

 * Advertise
 * Contact Us
 * Customer Care
 * Jobs
 * Press Center
 * Condé Nast Store
 * User Agreement
 * Privacy Policy
 * Your California Privacy Rights

© 2024 Condé Nast. All rights reserved. WIRED may earn a portion of sales from
products that are purchased through our site as part of our Affiliate
Partnerships with retailers. The material on this site may not be reproduced,
distributed, transmitted, cached or otherwise used, except with the prior
written permission of Condé Nast. Ad Choices

SELECT INTERNATIONAL SITE

United StatesLargeChevron
 * Italia
 * Japón
 * Czech Republic & Slovakia

 * Facebook
 * X
 * Pinterest
 * YouTube
 * Instagram
 * Tiktok


Manage Preferences





WE CARE ABOUT YOUR PRIVACY

We and our 184 partners store and/or access information on a device, such as
unique IDs in cookies to process personal data. You may accept or manage your
choices by clicking below, including your right to object where legitimate
interest is used, or at any time in the privacy policy page. These choices will
be signaled to our partners and will not affect browsing data.More Information


WE AND OUR PARTNERS PROCESS DATA TO PROVIDE:

Use precise geolocation data. Actively scan device characteristics for
identification. Store and/or access information on a device. Personalised
advertising and content, advertising and content measurement, audience research
and services development. List of Partners (vendors)

I Accept
Your Privacy Choices