blog.allenai.org Open in urlscan Pro
162.159.153.4  Public Scan

Submitted URL: https://cjy4q04.na1.hubspotlinks.com/Ctc/LX+113/cJy4q04/VVsb873jxf6WW760snd5CL8N7W6B8lDm5kzbW4N7Wbrxv5nXHsW50kH_H6lZ3kLW3-JcMP3JL8kfN...
Effective URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514?_hsenc=p2ANqtz--d...
Submission: On September 05 via manual from BR — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Open in app

Sign up

Sign in

Write


Sign up

Sign in




OLMOE: AN OPEN, SMALL, AND STATE-OF-THE-ART MIXTURE-OF-EXPERTS MODEL

Ai2

·

Follow

Published in

Ai2 Blog

·
3 min read
·
1 day ago

13



Listen

Share



We’re introducing OLMoE, jointly developed with Contextual AI, which is the
first mixture-of-experts model to join the OLMo family. OLMoE brings two
important aspects to the space of truly open models — it is the first model to
be on the Pareto frontier of performance and size, while also being released
with open data, code, evaluations, logs, and intermediate training checkpoints.
Over the last few years, mixture-of-experts architectures have become a core
technology used by closed AI labs to more efficiently serve and train leading
language models (LMs). We’ve seen similar gains in our training stack, where
this MoE model trained 2x faster than equivalent dense models.



OLMoE is a sparse MoE model with 1 billion active and 7 billion total
parameters. It was trained on 5 trillion tokens built on a new data mix
incorporating lessons from Ai2’s Dolma and building heavily on
DataComp-Baseline. We performed extensive experimentation on many crucial MoE
details, including the routing algorithm, auxiliary loss functions, and sparse
upcycling. A comparison of the model to other Ai2 OLMo models and models of
similar size categories is below.



The release of this model is also accompanied by a preview version of our new
Tulu 3 post-training pipeline. This version includes additional instruction data
from HuggingFace’s No Robots human data, math, and code data, and a subset of
Nvidia’s Daring Anteater synthetic data. This mix gives noticeable improvements
across math, code, and instruction following evaluations, which is then passed
into the standard UltraFeedback preference tuning with Direct Preference
Optimization (DPO). A comparison of the gains from supervised fine-tuning (SFT)
and DPO is shown below.



We are releasing many variants and checkpoints of this model to enable multiple
directions of LM research.

 * 244 checkpoints for the pretrained model, one every 5000 steps.
 * The annealed and unannealed checkpoints.
 * Fine-tuned versions on both the annealed and unannealed base models.
 * Fine-tuned versions with and without load balancing through the experts.

For more details, check out:

 * Paper
 * Code
 * Twitter thread



Follow @allen_ai on Twitter/X, and subscribe to the Ai2 Newsletter to stay
current on news and research coming out of Ai2.




SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD.


FREE



Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.


Sign up for free


MEMBERSHIP



Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app


Try for 5 $/month
Llm
Open Source
Mixture Of Experts
Olmo


13

13



Follow



WRITTEN BY AI2

1.8K Followers
·Editor for

Ai2 Blog

Our mission is to build breakthrough AI to solve the world's biggest problems.
We are a Seattle-based non-profit founded in 2014 by Paul G. Allen.

Follow




MORE FROM AI2 AND AI2 BLOG

Ai2

in

Ai2 Blog


OLMO: OPEN LANGUAGE MODEL


A STATE-OF-THE-ART, TRULY OPEN LLM AND FRAMEWORK

Feb 1
541
1



Sergey Feldman

in

Ai2 Blog


BUILDING A BETTER SEARCH ENGINE FOR SEMANTIC SCHOLAR


A “TELL-ALL” ACCOUNT OF IMPROVING OUR ACADEMIC SEARCH ENGINE.

Jul 20, 2020
337
3



Ai2

in

Ai2 Blog


OLMO 1.7–7B: A 24 POINT IMPROVEMENT ON MMLU


TODAY, WE’VE RELEASED AN UPDATED VERSION OF OUR 7 BILLION PARAMETER OPEN
LANGUAGE MODEL, OLMO 1.7–7B. THIS MODEL SCORES 52 ON MMLU, SITTING…

Apr 17
26
1



Ai2

in

Ai2 Blog


DIGITAL SOCRATES: EVALUATING LLMS THROUGH EXPLANATION CRITIQUES


BLOG WRITTEN BY YULING GU

Aug 12
2


See all from Ai2
See all from Ai2 Blog



RECOMMENDED FROM MEDIUM

Michael Wood

in

Cubed


100% ACCURATE AI CLAIMED BY ACURAI — OPENAI AND ANTHROPIC CONFIRM ACURAI’S
DISCOVERIES


ACURAI’S AUDACIOUS CLAIMS TO HAVE DISCOVERED HOW LLMS OPERATE ARE NOW CONFIRMED
BY STUDIES CONDUCTED BY OPENAI AND ANTHROPIC.


Aug 26
578
10



Christopher Tao

in

Towards AI


DO NOT USE LLM OR GENERATIVE AI FOR THESE USE CASES


CHOOSE CORRECT AI TECHNIQUES FOR THE RIGHT USE CASE FAMILIES


Aug 10
2.7K
35




LISTS


NATURAL LANGUAGE PROCESSING

1679 stories·1252 saves


DATA SCIENCE AND AI

40 stories·234 saves


ICON DESIGN

36 stories·407 saves


CHATGPT PROMPTS

48 stories·1957 saves


Thuwarakesh Murallie

in

Towards Data Science


HOW TO ACHIEVE NEAR HUMAN-LEVEL PERFORMANCE IN CHUNKING FOR RAGS


THE COSTLY YET POWERFUL SPLITTING TECHNIQUE FOR SUPERIOR RAG RETRIEVAL


Aug 26
374
2



Tim Cvetko

in

Generative AI


RAG EVALUATION WITH LLM-AS-A-JUDGE + SYNTHETIC DATASET CREATION


FORGET CUSTOM HUMAN QA PAIRS


Mar 11
282
2



Dr. Cynthia Alease Smith

in

Momentum


SLAVE AND SLAVERY VS. THE ENSLAVEMENT AND ENSLAVED PERSON — EMANCIPATING
SEMANTICS


…IT IS ABOUT TIME THE UNITED STATES EMANCIPATED THE SEMANTICS OF THESE TERMS AND
STOPPED PLAYING LINGUISTIC GAMES DESIGNED TO DUMB DOWN…


Sep 13, 2023
436
7



Papers in 100 Lines of Code


INSTANT 3D RECONSTRUCTION WITH PYTHON AND INSTANT-NGP


INSTANT NEURAL GRAPHICS PRIMITIVES IN 100 LINES OF PURE PYTORCH CODE


Aug 5



See more recommendations

Help

Status

About

Careers

Press

Blog

Privacy

Terms

Text to speech

Teams


To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.