venturebeat.com Open in urlscan Pro
192.0.66.2  Public Scan

Submitted URL: https://venturebeat.com/ai/you-can-now-run-the-most-powerful-open-source-ai-models-locally-on-mac-m4-computers-thanks-to...
Effective URL: https://venturebeat.com/ai/you-can-now-run-the-most-powerful-open-source-ai-models-locally-on-mac-m4-computers-thanks-to...
Submission: On November 21 via api from BE — Scanned from DE

Form analysis 2 forms found in the DOM

GET https://venturebeat.com/

<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
  <input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
  <button type="submit" class="">
    <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
      <g>
        <path fill-rule="evenodd" clip-rule="evenodd"
          d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
        </path>
      </g>
    </svg>
  </button>
</form>

<form action="" data-action="nonce_mailchimp_boilerplate_subscribe" id="boilerplateNewsletterForm" class="Form js-vb-newsletter-cta">
  <input type="email" name="email" placeholder="Email" class="Form__input" id="boilerplateNewsletterEmail" required="">
  <input type="hidden" name="newsletter" value="vb_dailyroundup">
  <input type="hidden" name="b_f67554569818c29c4c844d121_89d8059242" value="">
  <input type="hidden" id="nonce_mailchimp_boilerplate_subscribe" name="nonce_mailchimp_boilerplate_subscribe" value="ef4daf6dd7"><input type="hidden" name="_wp_http_referer"
    value="/ai/you-can-now-run-the-most-powerful-open-source-ai-models-locally-on-mac-m4-computers-thanks-to-exo-labs/"> <button type="submit" class="Form__button Newsletter__sub-btn">Subscribe</button>
</form>

Text Content

WE VALUE YOUR PRIVACY

We and our partners store and/or access information on a device, such as cookies
and process personal data, such as unique identifiers and standard information
sent by a device for personalised advertising and content, advertising and
content measurement, audience research and services development. With your
permission we and our partners may use precise geolocation data and
identification through device scanning. You may click to consent to our and our
1445 partners’ processing as described above. Alternatively you may access more
detailed information and change your preferences before consenting or to refuse
consenting. Please note that some processing of your personal data may not
require your consent, but you have a right to object to such processing. Your
preferences will apply to this website only. You can change your preferences or
withdraw your consent at any time by returning to this site and clicking the
"Privacy" button at the bottom of the webpage.
MORE OPTIONSAGREE

Skip to main content
Events Video Special Issues Jobs
VentureBeat Homepage

Subscribe

 * Artificial Intelligence
   * View All
   * AI, ML and Deep Learning
   * Auto ML
   * Data Labelling
   * Synthetic Data
   * Conversational AI
   * NLP
   * Text-to-Speech
 * Security
   * View All
   * Data Security and Privacy
   * Network Security and Privacy
   * Software Security
   * Computer Hardware Security
   * Cloud and Data Storage Security
 * Data Infrastructure
   * View All
   * Data Science
   * Data Management
   * Data Storage and Cloud
   * Big Data and Analytics
   * Data Networks
 * Automation
   * View All
   * Industrial Automation
   * Business Process Automation
   * Development Automation
   * Robotic Process Automation
   * Test Automation
 * Enterprise Analytics
   * View All
   * Business Intelligence
   * Disaster Recovery Business Continuity
   * Statistical Analysis
   * Predictive Analysis
 * More
   * Data Decision Makers
   * Virtual Communication
     * Team Collaboration
     * UCaaS
     * Virtual Reality Collaboration
     * Virtual Employee Experience
   * Programming & Development
     * Product Development
     * Application Development
     * Test Management
     * Development Languages


Subscribe Events Video Special Issues Jobs



YOU CAN NOW RUN THE MOST POWERFUL OPEN SOURCE AI MODELS LOCALLY ON MAC M4
COMPUTERS, THANKS TO EXO LABS

Carl Franzen@carlfranzen
November 13, 2024 12:02 PM
 * Share on Facebook
 * Share on X
 * Share on LinkedIn

Credit: VentureBeat made with Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive
content on industry-leading AI coverage. Learn More

--------------------------------------------------------------------------------

When it comes to generative AI, Apple’s efforts have seemed largely concentrated
on mobile — namely Apple Intelligence running on iOS 18, the latest operating
system for the iPhone.

But as it turns out, the new Apple M4 computer chip — available in the new Mac
Mini and Macbook Pro models announced at the end of October 2024 — is excellent
hardware for running the most powerful open source foundation large language
models (LLMs) yet released, including Meta’s Llama-3.1 405B, Nvidia’s Nemotron
70B, and Qwen 2.5 Coder-32B.

0:00
/
18:54


Building Out the Infrastructure for Agentic AI


In fact, Alex Cheema, co-founder of Exo Labs, a startup founded in March 2024 to
(in his words) “democratize access to AI” through open source multi-device
computing clusters, has already done it.

As he shared on the social network X recently, the UK-based Cheema connected
four Mac Mini M4 devices (retail value of $599.00) plus a single Macbook Pro M4
Max (retail value of $1,599.00) with Exo’s open source software to run Alibaba’s
software developer-optimized LLM Qwen 2.5 Coder-32B.

After all, with the total cost of Cheema’s cluster around $5,000 retail, it is
still significantly cheaper than even a single coveted NVidia H100 GPU (retail
of $25,000-$30,000).


THE VALUE OF RUNNING AI ON LOCAL COMPUTE CLUSTERS RATHER THAN THE WEB

While many AI consumers are used to visiting websites such as OpenAI’s ChatGPT
or mobile apps that connect to the web, there are incredible cost, privacy,
security, and behavioral benefits to running AI models locally on devices the
user or enterprise controls and owns — without a web connection.

Cheema said Exo Labs is still working on building out its enterprise grade
software offerings, but he’s aware of several companies already using Exo
software to run local compute clusters for AI inferences — and believes it will
spread from individuals to enterprises in the coming years. For now, anyone with
coding experience can get started by visiting Exo’s Github repository (repo) and
downloading the software themselves.

“The way AI is done today involves training these very large models that require
immense compute power,” Cheema explained to VentureBeat in a video call
interview earlier today. “You have GPU clusters costing tens of billions of
dollars, all connected in a single data center with high interconnects, running
six-month-long training sessions. Training large AI models is highly
centralized, limited to a few companies that can afford the scale of compute
required. And even after the training, running these models effectively is
another centralized process.”

By contrast, Exo hopes to allow “people to own their models and control what
they’re doing. If models are only running on servers in massive data centers,
you lose transparency and control over what’s happening.”

Indeed, as an example, he noted that he fed his own direct and private messages
into a local LLM to be able to ask it questions about those conversations,
without fear of them leaking onto the open web.

“Personally, I wanted to use AI on my own messages to do things like ask, ‘Do I
have any urgent messages today?’ That’s not something I want to send to a
service like GPT,” he noted.


USING M4’S SPEED AND LOW POWER CONSUMPTION TO AI’S ADVANTAGE

Exo’s recent success has been thanks to Apple’s M4 chip — available in regular,
Pro and Max models offer what Apple calls “the world’s fastest GPU core” and
best performance on single-threaded tasks (those operating on a single CPU core,
whereas the M4 series has 10 or more).

Based on the fact that the M4 specs had been teased and leaked earlier, and a
version already offered in the iPad, Cheema was confident that the M4 would work
well for his purposes.

“I already knew, ‘we’re going to be able to run these models,'” Cheema told
VentureBeat.

Indeed, according to figures shared on X, Exo Labs’s Mac Mini M4 cluster
operates Qwen 2.5 Coder 32B at 18 tokens per second and Nemotron-70B at 8 tokens
per second. (Tokens are the numerical representations of letter, word and
numeral strings — the AI’s native language.)

Exo also saw success using earlier Mac hardware, connecting two Macbook Pro M3
computers to run the Llama 3.1-405B model at more than 5 tok/second.

This demonstration shows how AI training and inference workloads can be handled
efficiently without relying on cloud infrastructure, making AI more accessible
for privacy and cost-conscious consumers and enterprises alike. For enterprises
working in highly regulated industries, or even those simply conscious of cost,
who still want to leverage the most powerful AI models — Exo Labs’ demoes show a
viable path forward.

For enterprises with high tolerance for experimentation, Exo is offering bespoke
services including installing and shipping its software on Mac equipment. A full
enterprise offering is expected in the next year.


THE ORIGINS OF EXO LABS: TRYING TO SPEED UP AI WORKLOADS WITHOUT NVIDIA GPUS

Cheema, a University of Oxford physics graduate who previously worked in
distributed systems engineering for web3 and crypto companies, was motivated to
launch Exo Labs in March 2024 after finding himself stymied by the slow progress
of machine learning research on his own computer.

“Initially, it just started off as just a curiosity,” Cheema told VentureBeat.
“I was doing some machine learning research and I wanted to speed up my
research. It was taking a long time to run stuff on my old MacBook, so I was
like, ‘okay, I have a few other devices laying around. Maybe old devices from a
few friends here…is there any way I can use their devices?’ And instead of it
taking a day to run this thing, ideally, it takes a few hours. So then, that
kind of turned into this more general system that allows you to distribute any
AI workload over multiple machines. Usually you would run basically something on
just one device, but if you want to get the speed up, and deliver more tokens
per second from your model, or you want to speed up your training run, then the
only option you really have to do that is to go out to more devices.”

However, even once he gathered the requisite devices he had lying around and
from friends, Cheema discovered another issue: bandwidth.

“The problem with that is now you have this communication between the devices
which is really slow,” he explained to VentureBeat. “So there’s a lot of hard
technical problems there that are very similar to the kind of distributed
systems problems that I was working on in my past.”

As a result, he and his co-founder Mohamed “Mo” Baioumy, developed a new
software tool, Exo, that distributes AI workloads across multiple devices for
those lacking Nvidia GPUs, and ultimately open sourced it on Github in July
through a GNU General Public License, which includes commercial or paid usage,
as long as the user retains and makes available a copy of the source code.

Since then, Exo has seen its popularity climb steadily on Github, and the
company has raised an undisclosed amount in funding from private investors.


BENCHMARKS TO GUIDE THE NEW WAVE OF LOCAL AI INNOVATORS

To further support adoption, Exo Labs is preparing to launch a free benchmarking
website next week.

The site will provide detailed comparisons of hardware setups, including
single-device and multi-device configurations, allowing users to identify the
best solutions for running LLMs based on their needs and budget.

Cheema emphasized the importance of real-world benchmarks, pointing out that
theoretical estimates often misrepresent actual capabilities.

“Our goal is to provide clarity and encourage innovation by showcasing tested
setups that anyone can replicate,” he added.

Correction: this article originally mistakenly stated Cheema was based in Dubai,
when in fact he was only visiting. We have since updated the piece and regret
the error.

VB Daily

Stay in the know! Get the latest news in your inbox daily

Subscribe

By subscribing, you agree to VentureBeat's Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.




THE AI IMPACT TOUR DATES

Join leaders in enterprise AI for networking, insights, and engaging
conversations at the upcoming stops of our AI Impact Tour. See if we're coming
to your area!

Learn More

 * VentureBeat Homepage
 * Follow us on Facebook
 * Follow us on X
 * Follow us on LinkedIn
 * Follow us on RSS

 * Press Releases
 * Contact Us
 * Advertise
 * Share a News Tip
 * Contribute to DataDecisionMakers

 * Privacy Policy
 * Terms of Service
 * Do Not Sell My Personal Information

© 2024 VentureBeat. All rights reserved.