venturebeat.com
Open in
urlscan Pro
192.0.66.2
Public Scan
URL:
https://venturebeat.com/ai/apple-researchers-achieve-breakthroughs-in-multimodal-ai-as-company-ramps-up-investments/
Submission: On March 24 via manual from SG — Scanned from SG
Submission: On March 24 via manual from SG — Scanned from SG
Form analysis
2 forms found in the DOMGET https://venturebeat.com/
<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
<input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
<button type="submit" class="">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<g>
<path fill-rule="evenodd" clip-rule="evenodd"
d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
</path>
</g>
</svg>
</button>
</form>
<form action="" data-action="nonce_mailchimp_boilerplate_subscribe" id="boilerplateNewsletterForm" class="Form js-vb-newsletter-cta">
<input type="email" name="email" placeholder="Email" class="Form__input" id="boilerplateNewsletterEmail" required="">
<input type="hidden" name="newsletter" value="vb_dailyroundup">
<input type="hidden" name="b_f67554569818c29c4c844d121_89d8059242" value="">
<input type="hidden" id="nonce_mailchimp_boilerplate_subscribe" name="nonce_mailchimp_boilerplate_subscribe" value="68df95cb3f"><input type="hidden" name="_wp_http_referer"
value="/ai/apple-researchers-achieve-breakthroughs-in-multimodal-ai-as-company-ramps-up-investments/"> <button type="submit" class="Form__button Newsletter__sub-btn">Subscribe</button>
</form>
Text Content
Skip to main content Events Video Special Issues Jobs VentureBeat Homepage Subscribe * Artificial Intelligence * View All * AI, ML and Deep Learning * Auto ML * Data Labelling * Synthetic Data * Conversational AI * NLP * Text-to-Speech * Security * View All * Data Security and Privacy * Network Security and Privacy * Software Security * Computer Hardware Security * Cloud and Data Storage Security * Data Infrastructure * View All * Data Science * Data Management * Data Storage and Cloud * Big Data and Analytics * Data Networks * Automation * View All * Industrial Automation * Business Process Automation * Development Automation * Robotic Process Automation * Test Automation * Enterprise Analytics * View All * Business Intelligence * Disaster Recovery Business Continuity * Statistical Analysis * Predictive Analysis * More * Data Decision Makers * Virtual Communication * Team Collaboration * UCaaS * Virtual Reality Collaboration * Virtual Employee Experience * Programming & Development * Product Development * Application Development * Test Management * Development Languages Subscribe Events Video Special Issues Jobs APPLE RESEARCHERS ACHIEVE BREAKTHROUGHS IN MULTIMODAL AI AS COMPANY RAMPS UP INVESTMENTS Michael Nuñez@MichaelFNunez March 15, 2024 1:31 PM * Share on Facebook * Share on X * Share on LinkedIn Credit: VentureBeat made with Midjourney Join Gen AI enterprise leaders in Boston on March 27 for an exclusive night of networking, insights, and conversations surrounding data integrity. Request an invite here. -------------------------------------------------------------------------------- Apple researchers have developed new methods for training large language models on both text and images, enabling more powerful and flexible AI systems, in what could be a significant advance for artificial intelligence and for future Apple products. The work, described in a research paper titled “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training” that was quietly posted to arxiv.org this week, demonstrates how carefully combining different types of training data and model architectures can lead to state-of-the-art performance on a range of AI benchmarks. 1 / 21 Live from GTC 2024 - Interview with Supermicro Read More 33.3K 1 Video Player is loading. Play Video Unmute Duration 0:00 / Current Time 0:00 Playback Speed Settings 1x Loaded: 0% 0:00 Remaining Time -0:00 FullscreenPlayRewind 10 SecondsUp Next This is a modal window. Beginning of dialog window. Escape will cancel and close the window. TextColorWhiteBlackRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentBackgroundColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentTransparentWindowColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyTransparentSemi-TransparentOpaque Font Size50%75%100%125%150%175%200%300%400%Text Edge StyleNoneRaisedDepressedUniformDropshadowFont FamilyProportional Sans-SerifMonospace Sans-SerifProportional SerifMonospace SerifCasualScriptSmall Caps Reset restore all settings to the default valuesDone Close Modal Dialog End of dialog window. Share Playback Speed 0.25x 0.5x 1x Normal 1.5x 2x Replay the list TOP ARTICLES * Powered by AnyClip * Privacy Policy Live from GTC 2024 - Interview with Supermicro “We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art few-shot results across multiple benchmarks,” the researchers explain. By training models on a diverse dataset spanning visual and linguistic information, the MM1 models were able to excel at tasks like image captioning, visual question answering, and natural language inference. SCALING VISUAL COMPONENTS IS KEY The researchers also found that the choice of image encoder and the resolution of input images had a major impact on model performance. “We show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance,” they said. This suggests that continued scaling and refinement of the visual components of these multimodal models will be key to unlocking further gains. VB EVENT The AI Impact Tour – Atlanta Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today. Request an invite Surprisingly, the largest 30 billion parameter MM1 model exhibited strong in-context learning abilities, allowing it to perform multi-step reasoning over multiple input images using few-shot “chain-of-thought” prompting. This points to the potential for large multimodal models to tackle complex, open-ended problems that require grounded language understanding and generation. APPLE’S BILLION-DOLLAR AI BET advertisement The MM1 research comes as Apple has been ramping up its investments in artificial intelligence in an effort to catch up with rivals like Google, Microsoft, and Amazon who have raced ahead in integrating generative AI capabilities into their products. The company is on track to spend $1 billion per year on AI development, according to a recent Bloomberg report. Sources say Apple is working on a large language model framework called “Ajax” as well as a chatbot known internally as “Apple GPT.” The goal is to integrate these technologies into Siri, Messages, Apple Music and other apps and services. For example, AI could be used to auto-generate personalized playlists, assist developers in writing code, or engage in open-ended conversation and task completion. “We view AI and machine learning as fundamental technologies, and they’re integral to virtually every product that we ship,” Apple CEO Tim Cook said during a recent earnings call. “I’m not going to get into details about what it is, because — as you know, we don’t — we really don’t do that. But you can bet that we’re investing, we’re investing quite a bit, we’re going to do it responsibly and it will — you will see product advancements over time that where the — those technologies are at the heart of them.” THE HIGH STAKES OF THE AI ARMS RACE advertisement Apple has a history of being a fast follower rather than a first mover when it comes to major technology shifts. But with AI poised to transform every aspect of the digital landscape, the stakes are high for the iPhone maker to stay competitive. The MM1 research shows that Apple has the talent and resources to make cutting-edge advances. But it remains to be seen if the notoriously secretive company can move quickly enough to keep pace in the escalating AI arms race. Many eyes will be on Apple’s Worldwide Developers Conference in June, where the company is expected to unveil new AI-powered features and developer tools. In the meantime, smaller AI advances like the Keyframer animation tool and performance enhancements coming out of Apple’s research labs show steady progress is being made behind the scenes. As Cook hinted during a recent earnings call: “We’re excited to share details of our ongoing work in AI later this year.” That work, it is now clear, includes ambitious efforts to master multimodal intelligence at the largest scales. The age of pervasively helpful and human-like AI may arrive sooner than we think — and Apple intends to play a major part in shaping it. VB Daily Stay in the know! Get the latest news in your inbox daily Subscribe By subscribing, you agree to VentureBeat's Terms of Service. Thanks for subscribing. Check out more VB newsletters here. An error occured. NEXT STOP: AI IMPACT TOUR BOSTON Join us in Boston an exclusive invitation-only evening of networking and insights to discuss how to ensure data integrity for enterprise AI. Request an Invite * VentureBeat Homepage * Follow us on Facebook * Follow us on X * Follow us on LinkedIn * Follow us on RSS * Press Releases * Contact Us * Advertise * Share a News Tip * Contribute to DataDecisionMakers * Privacy Policy * Terms of Service * Do Not Sell My Personal Information © 2024 VentureBeat. All rights reserved.