blog.allenai.org
Open in
urlscan Pro
162.159.153.4
Public Scan
Submitted URL: https://cjy4q04.na1.hubspotlinks.com/Ctc/LX+113/cJy4q04/VVsb873jxf6WW760snd5CL8N7W6B8lDm5kzbW4N7Wbrxv5nXHsW50kH_H6lZ3kLW3-JcMP3JL8kfN...
Effective URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514?_hsenc=p2ANqtz--d...
Submission: On September 05 via manual from BR — Scanned from DE
Effective URL: https://blog.allenai.org/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514?_hsenc=p2ANqtz--d...
Submission: On September 05 via manual from BR — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Open in app Sign up Sign in Write Sign up Sign in OLMOE: AN OPEN, SMALL, AND STATE-OF-THE-ART MIXTURE-OF-EXPERTS MODEL Ai2 · Follow Published in Ai2 Blog · 3 min read · 1 day ago 13 Listen Share We’re introducing OLMoE, jointly developed with Contextual AI, which is the first mixture-of-experts model to join the OLMo family. OLMoE brings two important aspects to the space of truly open models — it is the first model to be on the Pareto frontier of performance and size, while also being released with open data, code, evaluations, logs, and intermediate training checkpoints. Over the last few years, mixture-of-experts architectures have become a core technology used by closed AI labs to more efficiently serve and train leading language models (LMs). We’ve seen similar gains in our training stack, where this MoE model trained 2x faster than equivalent dense models. OLMoE is a sparse MoE model with 1 billion active and 7 billion total parameters. It was trained on 5 trillion tokens built on a new data mix incorporating lessons from Ai2’s Dolma and building heavily on DataComp-Baseline. We performed extensive experimentation on many crucial MoE details, including the routing algorithm, auxiliary loss functions, and sparse upcycling. A comparison of the model to other Ai2 OLMo models and models of similar size categories is below. The release of this model is also accompanied by a preview version of our new Tulu 3 post-training pipeline. This version includes additional instruction data from HuggingFace’s No Robots human data, math, and code data, and a subset of Nvidia’s Daring Anteater synthetic data. This mix gives noticeable improvements across math, code, and instruction following evaluations, which is then passed into the standard UltraFeedback preference tuning with Direct Preference Optimization (DPO). A comparison of the gains from supervised fine-tuning (SFT) and DPO is shown below. We are releasing many variants and checkpoints of this model to enable multiple directions of LM research. * 244 checkpoints for the pretrained model, one every 5000 steps. * The annealed and unannealed checkpoints. * Fine-tuned versions on both the annealed and unannealed base models. * Fine-tuned versions with and without load balancing through the experts. For more details, check out: * Paper * Code * Twitter thread Follow @allen_ai on Twitter/X, and subscribe to the Ai2 Newsletter to stay current on news and research coming out of Ai2. SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD. FREE Distraction-free reading. No ads. Organize your knowledge with lists and highlights. Tell your story. Find your audience. Sign up for free MEMBERSHIP Read member-only stories Support writers you read most Earn money for your writing Listen to audio narrations Read offline with the Medium app Try for 5 $/month Llm Open Source Mixture Of Experts Olmo 13 13 Follow WRITTEN BY AI2 1.8K Followers ·Editor for Ai2 Blog Our mission is to build breakthrough AI to solve the world's biggest problems. We are a Seattle-based non-profit founded in 2014 by Paul G. Allen. Follow MORE FROM AI2 AND AI2 BLOG Ai2 in Ai2 Blog OLMO: OPEN LANGUAGE MODEL A STATE-OF-THE-ART, TRULY OPEN LLM AND FRAMEWORK Feb 1 541 1 Sergey Feldman in Ai2 Blog BUILDING A BETTER SEARCH ENGINE FOR SEMANTIC SCHOLAR A “TELL-ALL” ACCOUNT OF IMPROVING OUR ACADEMIC SEARCH ENGINE. Jul 20, 2020 337 3 Ai2 in Ai2 Blog OLMO 1.7–7B: A 24 POINT IMPROVEMENT ON MMLU TODAY, WE’VE RELEASED AN UPDATED VERSION OF OUR 7 BILLION PARAMETER OPEN LANGUAGE MODEL, OLMO 1.7–7B. THIS MODEL SCORES 52 ON MMLU, SITTING… Apr 17 26 1 Ai2 in Ai2 Blog DIGITAL SOCRATES: EVALUATING LLMS THROUGH EXPLANATION CRITIQUES BLOG WRITTEN BY YULING GU Aug 12 2 See all from Ai2 See all from Ai2 Blog RECOMMENDED FROM MEDIUM Michael Wood in Cubed 100% ACCURATE AI CLAIMED BY ACURAI — OPENAI AND ANTHROPIC CONFIRM ACURAI’S DISCOVERIES ACURAI’S AUDACIOUS CLAIMS TO HAVE DISCOVERED HOW LLMS OPERATE ARE NOW CONFIRMED BY STUDIES CONDUCTED BY OPENAI AND ANTHROPIC. Aug 26 578 10 Christopher Tao in Towards AI DO NOT USE LLM OR GENERATIVE AI FOR THESE USE CASES CHOOSE CORRECT AI TECHNIQUES FOR THE RIGHT USE CASE FAMILIES Aug 10 2.7K 35 LISTS NATURAL LANGUAGE PROCESSING 1679 stories·1252 saves DATA SCIENCE AND AI 40 stories·234 saves ICON DESIGN 36 stories·407 saves CHATGPT PROMPTS 48 stories·1957 saves Thuwarakesh Murallie in Towards Data Science HOW TO ACHIEVE NEAR HUMAN-LEVEL PERFORMANCE IN CHUNKING FOR RAGS THE COSTLY YET POWERFUL SPLITTING TECHNIQUE FOR SUPERIOR RAG RETRIEVAL Aug 26 374 2 Tim Cvetko in Generative AI RAG EVALUATION WITH LLM-AS-A-JUDGE + SYNTHETIC DATASET CREATION FORGET CUSTOM HUMAN QA PAIRS Mar 11 282 2 Dr. Cynthia Alease Smith in Momentum SLAVE AND SLAVERY VS. THE ENSLAVEMENT AND ENSLAVED PERSON — EMANCIPATING SEMANTICS …IT IS ABOUT TIME THE UNITED STATES EMANCIPATED THE SEMANTICS OF THESE TERMS AND STOPPED PLAYING LINGUISTIC GAMES DESIGNED TO DUMB DOWN… Sep 13, 2023 436 7 Papers in 100 Lines of Code INSTANT 3D RECONSTRUCTION WITH PYTHON AND INSTANT-NGP INSTANT NEURAL GRAPHICS PRIMITIVES IN 100 LINES OF PURE PYTORCH CODE Aug 5 See more recommendations Help Status About Careers Press Blog Privacy Terms Text to speech Teams To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.