venturebeat.com
Open in
urlscan Pro
192.0.66.2
Public Scan
Submitted URL: https://venturebeat.com/ai/you-can-now-run-the-most-powerful-open-source-ai-models-locally-on-mac-m4-computers-thanks-to...
Effective URL: https://venturebeat.com/ai/you-can-now-run-the-most-powerful-open-source-ai-models-locally-on-mac-m4-computers-thanks-to...
Submission: On November 21 via api from BE — Scanned from DE
Effective URL: https://venturebeat.com/ai/you-can-now-run-the-most-powerful-open-source-ai-models-locally-on-mac-m4-computers-thanks-to...
Submission: On November 21 via api from BE — Scanned from DE
Form analysis
2 forms found in the DOMGET https://venturebeat.com/
<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
<input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
<button type="submit" class="">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<g>
<path fill-rule="evenodd" clip-rule="evenodd"
d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
</path>
</g>
</svg>
</button>
</form>
<form action="" data-action="nonce_mailchimp_boilerplate_subscribe" id="boilerplateNewsletterForm" class="Form js-vb-newsletter-cta">
<input type="email" name="email" placeholder="Email" class="Form__input" id="boilerplateNewsletterEmail" required="">
<input type="hidden" name="newsletter" value="vb_dailyroundup">
<input type="hidden" name="b_f67554569818c29c4c844d121_89d8059242" value="">
<input type="hidden" id="nonce_mailchimp_boilerplate_subscribe" name="nonce_mailchimp_boilerplate_subscribe" value="ef4daf6dd7"><input type="hidden" name="_wp_http_referer"
value="/ai/you-can-now-run-the-most-powerful-open-source-ai-models-locally-on-mac-m4-computers-thanks-to-exo-labs/"> <button type="submit" class="Form__button Newsletter__sub-btn">Subscribe</button>
</form>
Text Content
WE VALUE YOUR PRIVACY We and our partners store and/or access information on a device, such as cookies and process personal data, such as unique identifiers and standard information sent by a device for personalised advertising and content, advertising and content measurement, audience research and services development. With your permission we and our partners may use precise geolocation data and identification through device scanning. You may click to consent to our and our 1445 partners’ processing as described above. Alternatively you may access more detailed information and change your preferences before consenting or to refuse consenting. Please note that some processing of your personal data may not require your consent, but you have a right to object to such processing. Your preferences will apply to this website only. You can change your preferences or withdraw your consent at any time by returning to this site and clicking the "Privacy" button at the bottom of the webpage. MORE OPTIONSAGREE Skip to main content Events Video Special Issues Jobs VentureBeat Homepage Subscribe * Artificial Intelligence * View All * AI, ML and Deep Learning * Auto ML * Data Labelling * Synthetic Data * Conversational AI * NLP * Text-to-Speech * Security * View All * Data Security and Privacy * Network Security and Privacy * Software Security * Computer Hardware Security * Cloud and Data Storage Security * Data Infrastructure * View All * Data Science * Data Management * Data Storage and Cloud * Big Data and Analytics * Data Networks * Automation * View All * Industrial Automation * Business Process Automation * Development Automation * Robotic Process Automation * Test Automation * Enterprise Analytics * View All * Business Intelligence * Disaster Recovery Business Continuity * Statistical Analysis * Predictive Analysis * More * Data Decision Makers * Virtual Communication * Team Collaboration * UCaaS * Virtual Reality Collaboration * Virtual Employee Experience * Programming & Development * Product Development * Application Development * Test Management * Development Languages Subscribe Events Video Special Issues Jobs YOU CAN NOW RUN THE MOST POWERFUL OPEN SOURCE AI MODELS LOCALLY ON MAC M4 COMPUTERS, THANKS TO EXO LABS Carl Franzen@carlfranzen November 13, 2024 12:02 PM * Share on Facebook * Share on X * Share on LinkedIn Credit: VentureBeat made with Midjourney Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More -------------------------------------------------------------------------------- When it comes to generative AI, Apple’s efforts have seemed largely concentrated on mobile — namely Apple Intelligence running on iOS 18, the latest operating system for the iPhone. But as it turns out, the new Apple M4 computer chip — available in the new Mac Mini and Macbook Pro models announced at the end of October 2024 — is excellent hardware for running the most powerful open source foundation large language models (LLMs) yet released, including Meta’s Llama-3.1 405B, Nvidia’s Nemotron 70B, and Qwen 2.5 Coder-32B. 0:00 / 18:54 Building Out the Infrastructure for Agentic AI In fact, Alex Cheema, co-founder of Exo Labs, a startup founded in March 2024 to (in his words) “democratize access to AI” through open source multi-device computing clusters, has already done it. As he shared on the social network X recently, the UK-based Cheema connected four Mac Mini M4 devices (retail value of $599.00) plus a single Macbook Pro M4 Max (retail value of $1,599.00) with Exo’s open source software to run Alibaba’s software developer-optimized LLM Qwen 2.5 Coder-32B. After all, with the total cost of Cheema’s cluster around $5,000 retail, it is still significantly cheaper than even a single coveted NVidia H100 GPU (retail of $25,000-$30,000). THE VALUE OF RUNNING AI ON LOCAL COMPUTE CLUSTERS RATHER THAN THE WEB While many AI consumers are used to visiting websites such as OpenAI’s ChatGPT or mobile apps that connect to the web, there are incredible cost, privacy, security, and behavioral benefits to running AI models locally on devices the user or enterprise controls and owns — without a web connection. Cheema said Exo Labs is still working on building out its enterprise grade software offerings, but he’s aware of several companies already using Exo software to run local compute clusters for AI inferences — and believes it will spread from individuals to enterprises in the coming years. For now, anyone with coding experience can get started by visiting Exo’s Github repository (repo) and downloading the software themselves. “The way AI is done today involves training these very large models that require immense compute power,” Cheema explained to VentureBeat in a video call interview earlier today. “You have GPU clusters costing tens of billions of dollars, all connected in a single data center with high interconnects, running six-month-long training sessions. Training large AI models is highly centralized, limited to a few companies that can afford the scale of compute required. And even after the training, running these models effectively is another centralized process.” By contrast, Exo hopes to allow “people to own their models and control what they’re doing. If models are only running on servers in massive data centers, you lose transparency and control over what’s happening.” Indeed, as an example, he noted that he fed his own direct and private messages into a local LLM to be able to ask it questions about those conversations, without fear of them leaking onto the open web. “Personally, I wanted to use AI on my own messages to do things like ask, ‘Do I have any urgent messages today?’ That’s not something I want to send to a service like GPT,” he noted. USING M4’S SPEED AND LOW POWER CONSUMPTION TO AI’S ADVANTAGE Exo’s recent success has been thanks to Apple’s M4 chip — available in regular, Pro and Max models offer what Apple calls “the world’s fastest GPU core” and best performance on single-threaded tasks (those operating on a single CPU core, whereas the M4 series has 10 or more). Based on the fact that the M4 specs had been teased and leaked earlier, and a version already offered in the iPad, Cheema was confident that the M4 would work well for his purposes. “I already knew, ‘we’re going to be able to run these models,'” Cheema told VentureBeat. Indeed, according to figures shared on X, Exo Labs’s Mac Mini M4 cluster operates Qwen 2.5 Coder 32B at 18 tokens per second and Nemotron-70B at 8 tokens per second. (Tokens are the numerical representations of letter, word and numeral strings — the AI’s native language.) Exo also saw success using earlier Mac hardware, connecting two Macbook Pro M3 computers to run the Llama 3.1-405B model at more than 5 tok/second. This demonstration shows how AI training and inference workloads can be handled efficiently without relying on cloud infrastructure, making AI more accessible for privacy and cost-conscious consumers and enterprises alike. For enterprises working in highly regulated industries, or even those simply conscious of cost, who still want to leverage the most powerful AI models — Exo Labs’ demoes show a viable path forward. For enterprises with high tolerance for experimentation, Exo is offering bespoke services including installing and shipping its software on Mac equipment. A full enterprise offering is expected in the next year. THE ORIGINS OF EXO LABS: TRYING TO SPEED UP AI WORKLOADS WITHOUT NVIDIA GPUS Cheema, a University of Oxford physics graduate who previously worked in distributed systems engineering for web3 and crypto companies, was motivated to launch Exo Labs in March 2024 after finding himself stymied by the slow progress of machine learning research on his own computer. “Initially, it just started off as just a curiosity,” Cheema told VentureBeat. “I was doing some machine learning research and I wanted to speed up my research. It was taking a long time to run stuff on my old MacBook, so I was like, ‘okay, I have a few other devices laying around. Maybe old devices from a few friends here…is there any way I can use their devices?’ And instead of it taking a day to run this thing, ideally, it takes a few hours. So then, that kind of turned into this more general system that allows you to distribute any AI workload over multiple machines. Usually you would run basically something on just one device, but if you want to get the speed up, and deliver more tokens per second from your model, or you want to speed up your training run, then the only option you really have to do that is to go out to more devices.” However, even once he gathered the requisite devices he had lying around and from friends, Cheema discovered another issue: bandwidth. “The problem with that is now you have this communication between the devices which is really slow,” he explained to VentureBeat. “So there’s a lot of hard technical problems there that are very similar to the kind of distributed systems problems that I was working on in my past.” As a result, he and his co-founder Mohamed “Mo” Baioumy, developed a new software tool, Exo, that distributes AI workloads across multiple devices for those lacking Nvidia GPUs, and ultimately open sourced it on Github in July through a GNU General Public License, which includes commercial or paid usage, as long as the user retains and makes available a copy of the source code. Since then, Exo has seen its popularity climb steadily on Github, and the company has raised an undisclosed amount in funding from private investors. BENCHMARKS TO GUIDE THE NEW WAVE OF LOCAL AI INNOVATORS To further support adoption, Exo Labs is preparing to launch a free benchmarking website next week. The site will provide detailed comparisons of hardware setups, including single-device and multi-device configurations, allowing users to identify the best solutions for running LLMs based on their needs and budget. Cheema emphasized the importance of real-world benchmarks, pointing out that theoretical estimates often misrepresent actual capabilities. “Our goal is to provide clarity and encourage innovation by showcasing tested setups that anyone can replicate,” he added. Correction: this article originally mistakenly stated Cheema was based in Dubai, when in fact he was only visiting. We have since updated the piece and regret the error. VB Daily Stay in the know! Get the latest news in your inbox daily Subscribe By subscribing, you agree to VentureBeat's Terms of Service. Thanks for subscribing. Check out more VB newsletters here. An error occured. THE AI IMPACT TOUR DATES Join leaders in enterprise AI for networking, insights, and engaging conversations at the upcoming stops of our AI Impact Tour. See if we're coming to your area! Learn More * VentureBeat Homepage * Follow us on Facebook * Follow us on X * Follow us on LinkedIn * Follow us on RSS * Press Releases * Contact Us * Advertise * Share a News Tip * Contribute to DataDecisionMakers * Privacy Policy * Terms of Service * Do Not Sell My Personal Information © 2024 VentureBeat. All rights reserved.