blog.allenai.org Open in urlscan Pro
162.159.152.4  Public Scan

Submitted URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514
Effective URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514?gi=2f01178db713
Submission Tags: 0xscam
Submission: On September 04 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Open in app

Sign up

Sign in

Write


Sign up

Sign in




OLMOE: AN OPEN, SMALL, AND STATE-OF-THE-ART MIXTURE-OF-EXPERTS MODEL

Ai2

·

Follow

Published in

Ai2 Blog

·
3 min read
·
3 hours ago

1



Listen

Share



We’re introducing OLMoE, jointly developed with Contextual AI, which is the
first mixture-of-experts model to join the OLMo family. OLMoE brings two
important aspects to the space of truly open models — it is the first model to
be on the Pareto frontier of performance and size, while also being released
with open data, code, evaluations, logs, and intermediate training checkpoints.
Over the last few years, mixture-of-experts architectures have become a core
technology used by closed AI labs to more efficiently serve and train leading
language models (LMs). We’ve seen similar gains in our training stack, where
this MoE model trained 2x faster than equivalent dense models.



OLMoE is a sparse MoE model with 1 billion active and 7 billion total
parameters. It was trained on 5 trillion tokens built on a new data mix
incorporating lessons from Ai2’s Dolma and building heavily on
DataComp-Baseline. We performed extensive experimentation on many crucial MoE
details, including the routing algorithm, auxiliary loss functions, and sparse
upcycling. A comparison of the model to other Ai2 OLMo models and models of
similar size categories is below.



The release of this model is also accompanied by a preview version of our new
Tulu 3 post-training pipeline. This version includes additional instruction data
from HuggingFace’s No Robots human data, math, and code data, and a subset of
Nvidia’s Daring Anteater synthetic data. This mix gives noticeable improvements
across math, code, and instruction following evaluations, which is then passed
into the standard UltraFeedback preference tuning with Direct Preference
Optimization (DPO). A comparison of the gains from supervised fine-tuning (SFT)
and DPO is shown below.



We are releasing many variants and checkpoints of this model to enable multiple
directions of LM research.

 * 244 checkpoints for the pretrained model, one every 5000 steps.
 * The annealed and unannealed checkpoints.
 * Fine-tuned versions on both the annealed and unannealed base models.
 * Fine-tuned versions with and without load balancing through the experts.

For more details, check out:

 * Paper
 * Code
 * Twitter thread



Follow @allen_ai on Twitter/X, and subscribe to the Ai2 Newsletter to stay
current on news and research coming out of Ai2.




SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD.


FREE



Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.


Sign up for free


MEMBERSHIP



Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app


Try for 5 $/month
Llm
Open Source
Mixture Of Experts
Olmo


1

1



Follow



WRITTEN BY AI2

1.8K Followers
·Editor for

Ai2 Blog

Our mission is to build breakthrough AI to solve the world's biggest problems.
We are a Seattle-based non-profit founded in 2014 by Paul G. Allen.

Follow




MORE FROM AI2 AND AI2 BLOG

Ai2

in

Ai2 Blog


OLMO: OPEN LANGUAGE MODEL


A STATE-OF-THE-ART, TRULY OPEN LLM AND FRAMEWORK

Feb 1
541
1



Ai2

in

Ai2 Blog


DIGITAL SOCRATES: EVALUATING LLMS THROUGH EXPLANATION CRITIQUES


BLOG WRITTEN BY YULING GU

Aug 12
1



Ai2

in

Ai2 Blog


OLMO 1.7–7B: A 24 POINT IMPROVEMENT ON MMLU


TODAY, WE’VE RELEASED AN UPDATED VERSION OF OUR 7 BILLION PARAMETER OPEN
LANGUAGE MODEL, OLMO 1.7–7B. THIS MODEL SCORES 52 ON MMLU, SITTING…

Apr 17
26
1



Ai2

in

Ai2 Blog


OPEN RESEARCH IS THE KEY TO UNLOCKING SAFER AI


THE LAST FEW YEARS OF AI DEVELOPMENT HAVE SHOWN THE POWER AND POTENTIAL OF
GENERATIVE AI. NATURALLY, THESE LEAPS IN MACHINE INTELLIGENCE…

Aug 8



See all from Ai2
See all from Ai2 Blog



RECOMMENDED FROM MEDIUM

Michael Wood

in

Cubed


100% ACCURATE AI CLAIMED BY ACURAI — OPENAI AND ANTHROPIC CONFIRM ACURAI’S
DISCOVERIES


ACURAI’S AUDACIOUS CLAIMS TO HAVE DISCOVERED HOW LLMS OPERATE ARE NOW CONFIRMED
BY STUDIES CONDUCTED BY OPENAI AND ANTHROPIC.


Aug 26
426
8



Ignacio de Gregorio


THOROUGH ANALYSIS OF OPENAI’S NEW LEAKED STRATEGY


FROM FRUITS TO CONSTELLATIONS


6d ago
436
6




LISTS


NATURAL LANGUAGE PROCESSING

1675 stories·1252 saves


DATA SCIENCE AND AI

40 stories·234 saves


ICON DESIGN

36 stories·403 saves


CHATGPT PROMPTS

48 stories·1954 saves


Jim the AI Whisperer

in

The Generator


MY ONE-WORD AI PROMPT TO INDUCE DEEPER REASONING AND MORE ACCURATE OUTPUT FROM
CHATGPT: “RUMINATE”


SLOW DOWN, GENIUS: A SIMPLE HACK FOR SMARTER AI RESPONSES


6d ago
2.3K
26



The Tenyks Blogger


ZERO-SHOT AI: THE END OF FINE-TUNING AS WE KNOW IT?


HOW YOLO-WORLD’S ZERO-SHOT APPROACH MEASURES UP TO YOLO’S FINE-TUNING.

5d ago
12



Louis-François Bouchard



in

Towards AI


WHEN (NOT) TO USE GRAPHRAG


WHAT IS GRAPHRAG, AND WHEN SHOULD YOU USE IT?


Aug 24
72
1



Aparna Dhinakaran

in

Towards Data Science


NAVIGATING THE NEW TYPES OF LLM AGENTS AND ARCHITECTURES


THE FAILURE OF REACT AGENTS GIVES WAY TO A NEW GENERATION OF AGENTS —  AND
POSSIBILITIES

5d ago
832
7


See more recommendations

Help

Status

About

Careers

Press

Blog

Privacy

Terms

Text to speech

Teams

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.