pub.aimind.so Open in urlscan Pro
162.159.152.4  Public Scan

Submitted URL: https://pub.aimind.so/this-is-why-you-cant-use-llama-2-d33701ce0766?gi=24110ddbd1af
Effective URL: https://pub.aimind.so/this-is-why-you-cant-use-llama-2-d33701ce0766?gi=1cedfc3e2a40
Submission: On September 07 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Open in app

Sign up

Sign In

Write


Sign up

Sign In



Top highlight

Member-only story


THIS IS WHY YOU CAN’T USE LLAMA-2


HOW ONE OPEN-SOURCE PROJECT IS DEMOCRATISING ACCESS TO LLMS

John Adeojo

·

Follow

Published in

AI Mind

·
5 min read
·
Aug 15

556

12

Listen

Share


Image by author: Generated with Midjourney


OPEN-SOURCE FOUNDATION MODELS

We have seen an explosion of open-source foundation models with the likes of
Llama-2, Falcon, and Bloom, to name a few. However, the largest of these models
are pretty much impossible to use for a person of modest means.

Large language models have a large number of parameters. Take Llama-2 for
instance, the largest version of it has 70 billion parameters.

The scale of these models ensures that for most researchers, hobbyists or
engineers, the hardware requirements are a significant barrier.

If you’re reading this I gather you have probably tried but you have been unable
to use these models. Let’s look at the hardware requirements for Meta’s Llama-2
to understand why that is.


WHY YOU CAN’T USE LLAMA-2


Photo by Ilias Gainutdinov on Unsplash

To load a model in full precision, i.e. 32-bit (or float-32) on a GPU for
downstream training or inference, it costs about 4GB in memory per 1 billion
parameters¹. So, just to load Llama-2 at 70 billion parameters, it costs around
280GB in memory at full precision.

Edit: Llama-2 is actually published in 16bit not 32bit (although many LLMs are
published in 32bit). The math remains the same regardless, it would cost 180GB
to load Llama-2 70B.

Now, there is the option to load models at different precision levels (at the
sacrifice of performance). If you load in 8-bit, you will incur 1GB of memory
per billion parameters, which would still require 70GB of GPU memory for loading
in 8-bit.

And we haven’t even got on to the fine-tuning. For fine-tuning using the AdamW
optimiser, each parameter requires 8 bytes of GPU memory. For Llama-2, this
would mean an additional 560GB of GPU memory. In total, we would require between
630GB and 840GB to fine-tune the Llama-2 model.

For many, access to GPUs is done via Google Colab. At the time of writing, the
highest-spec GPU available in Colab is the A100…

READ THE FULL STORY WITH A FREE ACCOUNT.

The author made this story available to Medium members only.
Sign up to read this one for free.



Continue in app
Or, continue in mobile web



Sign up with Google

Sign up with Facebook

Sign up with email

Already have an account? Sign in





556

556

12


Follow




WRITTEN BY JOHN ADEOJO

1K Followers
·Writer for

AI Mind

Founder & Chief Data Scientist at Data-Centric Solutions | Empowering Business
Success Through Data Science

Follow





MORE FROM JOHN ADEOJO AND AI MIND

John Adeojo

in

Towards Data Science


BUILD MORE CAPABLE LLMS WITH RETRIEVAL AUGMENTED GENERATION


HOW RETRIEVAL AUGMENTED GENERATION CAN ENHANCE YOUR LLMS BY INTEGRATING A
KNOWLEDGE BASE


·12 min read·Aug 9

158

3




Paul Pallaghy, PhD

in

AI Mind


A HOLY GRAIL OF TEXT AI: CHATGPT / LLM GENERATIVE QUERY ON YOUR OWN UNLIMITED
CUSTOM DATA


THIS IS ALMOST BIGGER THAN CHATGPT. SO-CALLED ‘VECTOR DATABASES’ NOW ENABLE
RELIABLE ‘SEMANTIC’ SEARCH AND GENERATIVE AI ON UNLIMITED…


·6 min read·Jul 4

878

14




Anthony Alcaraz

in

AI Mind


INTEGRATING KNOWLEDGE GRAPHS WITH LARGE LANGUAGE MODELS FOR MORE HUMAN-LIKE AI
REASONING


REASONING — THE ABILITY TO THINK LOGICALLY AND MAKE INFERENCES FROM
KNOWLEDGE — IS INTEGRAL TO HUMAN INTELLIGENCE. AS WE PROGRESS TOWARDS…


·6 min read·Aug 9

210

3




John Adeojo

in

Towards Data Science


FINE-TUNE YOUR LLM WITHOUT MAXING OUT YOUR GPU


HOW YOU CAN FINE-TUNE YOUR LLMS WITH LIMITED HARDWARE AND A TIGHT BUDGET


·8 min read·Aug 1

99

2



See all from John Adeojo
See all from AI Mind



RECOMMENDED FROM MEDIUM

Heiko Hotz

in

Towards Data Science


RAG VS FINETUNING — WHICH IS THE BEST TOOL TO BOOST YOUR LLM APPLICATION?


THE DEFINITIVE GUIDE FOR CHOOSING THE RIGHT METHOD FOR YOUR USE CASE


·19 min read·Aug 24

1.4K

15




AL Anany




THE CHATGPT HYPE IS OVER — NOW WATCH HOW GOOGLE WILL KILL CHATGPT.


IT NEVER HAPPENS INSTANTLY. THE BUSINESS GAME IS LONGER THAN YOU KNOW.


·6 min read·6 days ago

2K

92





LISTS


NATURAL LANGUAGE PROCESSING

580 stories·195 saves


PREDICTIVE MODELING W/ PYTHON

20 stories·350 saves


CHATGPT PROMPTS

24 stories·338 saves


AI REGULATION

6 stories·107 saves


Salvatore Raieli

in

Level Up Coding


PLATYPUS: QUICK, CHEAP, AND POWERFUL LLM


WINNING OVER THE OTHERS WITH ONLY ONE GPU AND 5 HOURS OF FINE-TUNING


·8 min read·4 days ago

513

2




Sachin Kulkarni


GENERATIVE AI WITH ENTERPRISE DATA


CREATE BUSINESS VALUE ADD ENTERPRISE KNOWLEDGE TO LARGE LANGUAGE MODELS

6 min read·Jul 25

146

3




Jonathan Shriftman


THE BUILDING BLOCKS OF GENERATIVE AI


A BEGINNERS GUIDE TO THE GENERATIVE AI INFRASTRUCTURE STACK

22 min read·Jul 10

153

3




Maximilian Vogel

in

MLearning.ai


THE CHATGPT LIST OF LISTS: A COLLECTION OF 3000+ PROMPTS, EXAMPLES, USE-CASES,
TOOLS, APIS…


UPDATED AUG 20, 2023. ADDED PROMPT DESIGN COURSES, MASTERCLASSES AND TUTORIALS.

10 min read·Feb 7

7.8K

96



See more recommendations

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.