pub.aimind.so
Open in
urlscan Pro
162.159.152.4
Public Scan
Submitted URL: https://pub.aimind.so/this-is-why-you-cant-use-llama-2-d33701ce0766?gi=24110ddbd1af
Effective URL: https://pub.aimind.so/this-is-why-you-cant-use-llama-2-d33701ce0766?gi=1cedfc3e2a40
Submission: On September 07 via api from US — Scanned from DE
Effective URL: https://pub.aimind.so/this-is-why-you-cant-use-llama-2-d33701ce0766?gi=1cedfc3e2a40
Submission: On September 07 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Open in app Sign up Sign In Write Sign up Sign In Top highlight Member-only story THIS IS WHY YOU CAN’T USE LLAMA-2 HOW ONE OPEN-SOURCE PROJECT IS DEMOCRATISING ACCESS TO LLMS John Adeojo · Follow Published in AI Mind · 5 min read · Aug 15 556 12 Listen Share Image by author: Generated with Midjourney OPEN-SOURCE FOUNDATION MODELS We have seen an explosion of open-source foundation models with the likes of Llama-2, Falcon, and Bloom, to name a few. However, the largest of these models are pretty much impossible to use for a person of modest means. Large language models have a large number of parameters. Take Llama-2 for instance, the largest version of it has 70 billion parameters. The scale of these models ensures that for most researchers, hobbyists or engineers, the hardware requirements are a significant barrier. If you’re reading this I gather you have probably tried but you have been unable to use these models. Let’s look at the hardware requirements for Meta’s Llama-2 to understand why that is. WHY YOU CAN’T USE LLAMA-2 Photo by Ilias Gainutdinov on Unsplash To load a model in full precision, i.e. 32-bit (or float-32) on a GPU for downstream training or inference, it costs about 4GB in memory per 1 billion parameters¹. So, just to load Llama-2 at 70 billion parameters, it costs around 280GB in memory at full precision. Edit: Llama-2 is actually published in 16bit not 32bit (although many LLMs are published in 32bit). The math remains the same regardless, it would cost 180GB to load Llama-2 70B. Now, there is the option to load models at different precision levels (at the sacrifice of performance). If you load in 8-bit, you will incur 1GB of memory per billion parameters, which would still require 70GB of GPU memory for loading in 8-bit. And we haven’t even got on to the fine-tuning. For fine-tuning using the AdamW optimiser, each parameter requires 8 bytes of GPU memory. For Llama-2, this would mean an additional 560GB of GPU memory. In total, we would require between 630GB and 840GB to fine-tune the Llama-2 model. For many, access to GPUs is done via Google Colab. At the time of writing, the highest-spec GPU available in Colab is the A100… READ THE FULL STORY WITH A FREE ACCOUNT. The author made this story available to Medium members only. Sign up to read this one for free. Continue in app Or, continue in mobile web Sign up with Google Sign up with Facebook Sign up with email Already have an account? Sign in 556 556 12 Follow WRITTEN BY JOHN ADEOJO 1K Followers ·Writer for AI Mind Founder & Chief Data Scientist at Data-Centric Solutions | Empowering Business Success Through Data Science Follow MORE FROM JOHN ADEOJO AND AI MIND John Adeojo in Towards Data Science BUILD MORE CAPABLE LLMS WITH RETRIEVAL AUGMENTED GENERATION HOW RETRIEVAL AUGMENTED GENERATION CAN ENHANCE YOUR LLMS BY INTEGRATING A KNOWLEDGE BASE ·12 min read·Aug 9 158 3 Paul Pallaghy, PhD in AI Mind A HOLY GRAIL OF TEXT AI: CHATGPT / LLM GENERATIVE QUERY ON YOUR OWN UNLIMITED CUSTOM DATA THIS IS ALMOST BIGGER THAN CHATGPT. SO-CALLED ‘VECTOR DATABASES’ NOW ENABLE RELIABLE ‘SEMANTIC’ SEARCH AND GENERATIVE AI ON UNLIMITED… ·6 min read·Jul 4 878 14 Anthony Alcaraz in AI Mind INTEGRATING KNOWLEDGE GRAPHS WITH LARGE LANGUAGE MODELS FOR MORE HUMAN-LIKE AI REASONING REASONING — THE ABILITY TO THINK LOGICALLY AND MAKE INFERENCES FROM KNOWLEDGE — IS INTEGRAL TO HUMAN INTELLIGENCE. AS WE PROGRESS TOWARDS… ·6 min read·Aug 9 210 3 John Adeojo in Towards Data Science FINE-TUNE YOUR LLM WITHOUT MAXING OUT YOUR GPU HOW YOU CAN FINE-TUNE YOUR LLMS WITH LIMITED HARDWARE AND A TIGHT BUDGET ·8 min read·Aug 1 99 2 See all from John Adeojo See all from AI Mind RECOMMENDED FROM MEDIUM Heiko Hotz in Towards Data Science RAG VS FINETUNING — WHICH IS THE BEST TOOL TO BOOST YOUR LLM APPLICATION? THE DEFINITIVE GUIDE FOR CHOOSING THE RIGHT METHOD FOR YOUR USE CASE ·19 min read·Aug 24 1.4K 15 AL Anany THE CHATGPT HYPE IS OVER — NOW WATCH HOW GOOGLE WILL KILL CHATGPT. IT NEVER HAPPENS INSTANTLY. THE BUSINESS GAME IS LONGER THAN YOU KNOW. ·6 min read·6 days ago 2K 92 LISTS NATURAL LANGUAGE PROCESSING 580 stories·195 saves PREDICTIVE MODELING W/ PYTHON 20 stories·350 saves CHATGPT PROMPTS 24 stories·338 saves AI REGULATION 6 stories·107 saves Salvatore Raieli in Level Up Coding PLATYPUS: QUICK, CHEAP, AND POWERFUL LLM WINNING OVER THE OTHERS WITH ONLY ONE GPU AND 5 HOURS OF FINE-TUNING ·8 min read·4 days ago 513 2 Sachin Kulkarni GENERATIVE AI WITH ENTERPRISE DATA CREATE BUSINESS VALUE ADD ENTERPRISE KNOWLEDGE TO LARGE LANGUAGE MODELS 6 min read·Jul 25 146 3 Jonathan Shriftman THE BUILDING BLOCKS OF GENERATIVE AI A BEGINNERS GUIDE TO THE GENERATIVE AI INFRASTRUCTURE STACK 22 min read·Jul 10 153 3 Maximilian Vogel in MLearning.ai THE CHATGPT LIST OF LISTS: A COLLECTION OF 3000+ PROMPTS, EXAMPLES, USE-CASES, TOOLS, APIS… UPDATED AUG 20, 2023. ADDED PROMPT DESIGN COURSES, MASTERCLASSES AND TUTORIALS. 10 min read·Feb 7 7.8K 96 See more recommendations Help Status Writers Blog Careers Privacy Terms About Text to speech Teams To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.