blog.allenai.org
Open in
urlscan Pro
162.159.152.4
Public Scan
Submitted URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514
Effective URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514?gi=2f01178db713
Submission Tags: 0xscam
Submission: On September 04 via api from US — Scanned from DE
Effective URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514?gi=2f01178db713
Submission Tags: 0xscam
Submission: On September 04 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Open in app Sign up Sign in Write Sign up Sign in OLMOE: AN OPEN, SMALL, AND STATE-OF-THE-ART MIXTURE-OF-EXPERTS MODEL Ai2 · Follow Published in Ai2 Blog · 3 min read · 3 hours ago 1 Listen Share We’re introducing OLMoE, jointly developed with Contextual AI, which is the first mixture-of-experts model to join the OLMo family. OLMoE brings two important aspects to the space of truly open models — it is the first model to be on the Pareto frontier of performance and size, while also being released with open data, code, evaluations, logs, and intermediate training checkpoints. Over the last few years, mixture-of-experts architectures have become a core technology used by closed AI labs to more efficiently serve and train leading language models (LMs). We’ve seen similar gains in our training stack, where this MoE model trained 2x faster than equivalent dense models. OLMoE is a sparse MoE model with 1 billion active and 7 billion total parameters. It was trained on 5 trillion tokens built on a new data mix incorporating lessons from Ai2’s Dolma and building heavily on DataComp-Baseline. We performed extensive experimentation on many crucial MoE details, including the routing algorithm, auxiliary loss functions, and sparse upcycling. A comparison of the model to other Ai2 OLMo models and models of similar size categories is below. The release of this model is also accompanied by a preview version of our new Tulu 3 post-training pipeline. This version includes additional instruction data from HuggingFace’s No Robots human data, math, and code data, and a subset of Nvidia’s Daring Anteater synthetic data. This mix gives noticeable improvements across math, code, and instruction following evaluations, which is then passed into the standard UltraFeedback preference tuning with Direct Preference Optimization (DPO). A comparison of the gains from supervised fine-tuning (SFT) and DPO is shown below. We are releasing many variants and checkpoints of this model to enable multiple directions of LM research. * 244 checkpoints for the pretrained model, one every 5000 steps. * The annealed and unannealed checkpoints. * Fine-tuned versions on both the annealed and unannealed base models. * Fine-tuned versions with and without load balancing through the experts. For more details, check out: * Paper * Code * Twitter thread Follow @allen_ai on Twitter/X, and subscribe to the Ai2 Newsletter to stay current on news and research coming out of Ai2. SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD. FREE Distraction-free reading. No ads. Organize your knowledge with lists and highlights. Tell your story. Find your audience. Sign up for free MEMBERSHIP Read member-only stories Support writers you read most Earn money for your writing Listen to audio narrations Read offline with the Medium app Try for 5 $/month Llm Open Source Mixture Of Experts Olmo 1 1 Follow WRITTEN BY AI2 1.8K Followers ·Editor for Ai2 Blog Our mission is to build breakthrough AI to solve the world's biggest problems. We are a Seattle-based non-profit founded in 2014 by Paul G. Allen. Follow MORE FROM AI2 AND AI2 BLOG Ai2 in Ai2 Blog OLMO: OPEN LANGUAGE MODEL A STATE-OF-THE-ART, TRULY OPEN LLM AND FRAMEWORK Feb 1 541 1 Ai2 in Ai2 Blog DIGITAL SOCRATES: EVALUATING LLMS THROUGH EXPLANATION CRITIQUES BLOG WRITTEN BY YULING GU Aug 12 1 Ai2 in Ai2 Blog OLMO 1.7–7B: A 24 POINT IMPROVEMENT ON MMLU TODAY, WE’VE RELEASED AN UPDATED VERSION OF OUR 7 BILLION PARAMETER OPEN LANGUAGE MODEL, OLMO 1.7–7B. THIS MODEL SCORES 52 ON MMLU, SITTING… Apr 17 26 1 Ai2 in Ai2 Blog OPEN RESEARCH IS THE KEY TO UNLOCKING SAFER AI THE LAST FEW YEARS OF AI DEVELOPMENT HAVE SHOWN THE POWER AND POTENTIAL OF GENERATIVE AI. NATURALLY, THESE LEAPS IN MACHINE INTELLIGENCE… Aug 8 See all from Ai2 See all from Ai2 Blog RECOMMENDED FROM MEDIUM Michael Wood in Cubed 100% ACCURATE AI CLAIMED BY ACURAI — OPENAI AND ANTHROPIC CONFIRM ACURAI’S DISCOVERIES ACURAI’S AUDACIOUS CLAIMS TO HAVE DISCOVERED HOW LLMS OPERATE ARE NOW CONFIRMED BY STUDIES CONDUCTED BY OPENAI AND ANTHROPIC. Aug 26 426 8 Ignacio de Gregorio THOROUGH ANALYSIS OF OPENAI’S NEW LEAKED STRATEGY FROM FRUITS TO CONSTELLATIONS 6d ago 436 6 LISTS NATURAL LANGUAGE PROCESSING 1675 stories·1252 saves DATA SCIENCE AND AI 40 stories·234 saves ICON DESIGN 36 stories·403 saves CHATGPT PROMPTS 48 stories·1954 saves Jim the AI Whisperer in The Generator MY ONE-WORD AI PROMPT TO INDUCE DEEPER REASONING AND MORE ACCURATE OUTPUT FROM CHATGPT: “RUMINATE” SLOW DOWN, GENIUS: A SIMPLE HACK FOR SMARTER AI RESPONSES 6d ago 2.3K 26 The Tenyks Blogger ZERO-SHOT AI: THE END OF FINE-TUNING AS WE KNOW IT? HOW YOLO-WORLD’S ZERO-SHOT APPROACH MEASURES UP TO YOLO’S FINE-TUNING. 5d ago 12 Louis-François Bouchard in Towards AI WHEN (NOT) TO USE GRAPHRAG WHAT IS GRAPHRAG, AND WHEN SHOULD YOU USE IT? Aug 24 72 1 Aparna Dhinakaran in Towards Data Science NAVIGATING THE NEW TYPES OF LLM AGENTS AND ARCHITECTURES THE FAILURE OF REACT AGENTS GIVES WAY TO A NEW GENERATION OF AGENTS — AND POSSIBILITIES 5d ago 832 7 See more recommendations Help Status About Careers Press Blog Privacy Terms Text to speech Teams To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.