venturebeat.com
Open in
urlscan Pro
192.0.66.2
Public Scan
Submitted URL: https://cmszf04.na1.hubspotlinks.com/Ctc/W1+113/cMsZf04/VWx9bv1VpRfnW1xhnVF4VSGdhW3S9mTt50V_QLMkNlkr5nKvpV3Zsc37CgHcHW3QhFtH8b9c2VN3g...
Effective URL: https://venturebeat.com/ai/mit-researchers-develop-self-learning-language-models-that-outperform-larger-counterparts/?ut...
Submission: On July 14 via api from GB — Scanned from GB
Effective URL: https://venturebeat.com/ai/mit-researchers-develop-self-learning-language-models-that-outperform-larger-counterparts/?ut...
Submission: On July 14 via api from GB — Scanned from GB
Form analysis
1 forms found in the DOMGET https://venturebeat.com/
<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
<input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
<button type="submit" class="">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<g>
<path fill-rule="evenodd" clip-rule="evenodd"
d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
</path>
</g>
</svg>
</button>
</form>
Text Content
WE VALUE YOUR PRIVACY We and our partners store and/or access information on a device, such as cookies and process personal data, such as unique identifiers and standard information sent by a device for personalised ads and content, ad and content measurement, and audience insights, as well as to develop and improve products. With your permission we and our partners may use precise geolocation data and identification through device scanning. You may click to consent to our and our partners’ processing as described above. Alternatively you may click to refuse to consent or access more detailed information and change your preferences before consenting. Please note that some processing of your personal data may not require your consent, but you have a right to object to such processing. Your preferences will apply to this website only. You can change your preferences at any time by returning to this site or visit our privacy policy. MORE OPTIONSDISAGREEAGREE Skip to main content Events Video Special Issues Jobs VentureBeat Homepage Subscribe * Artificial Intelligence * View All * AI, ML and Deep Learning * Auto ML * Data Labelling * Synthetic Data * Conversational AI * NLP * Text-to-Speech * Security * View All * Data Security and Privacy * Network Security and Privacy * Software Security * Computer Hardware Security * Cloud and Data Storage Security * Data Infrastructure * View All * Data Science * Data Management * Data Storage and Cloud * Big Data and Analytics * Data Networks * Automation * View All * Industrial Automation * Business Process Automation * Development Automation * Robotic Process Automation * Test Automation * Enterprise Analytics * View All * Business Intelligence * Disaster Recovery Business Continuity * Statistical Analysis * Predictive Analysis * More * Data Decision Makers * Virtual Communication * Team Collaboration * UCaaS * Virtual Reality Collaboration * Virtual Employee Experience * Programming & Development * Product Development * Application Development * Test Management * Development Languages Subscribe Events Video Special Issues Jobs MIT RESEARCHERS DEVELOP SELF-LEARNING LANGUAGE MODELS THAT OUTPERFORM LARGER COUNTERPARTS Victor Dey June 1, 2023 7:00 AM * Share on Facebook * Share on Twitter * Share on LinkedIn Image credit: VentureBeat generated with Midjourney Head over to our on-demand library to view sessions from VB Transform 2023. Register Here -------------------------------------------------------------------------------- Researchers at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) have achieved a groundbreaking advancement in language modeling in the realm of dominant large language models (LLMs). The CSAIL team has pioneered an innovative approach to language modeling that challenges the conventional belief that smaller models possess limited capabilities. The research introduces a scalable, self-learning model that surpasses larger counterparts by up to 500 times in specific language understanding tasks, all without reliance on human-generated annotations. 1 / 1 Transform Events: Sizzle Reel Read More 241.6K 2 Video Player is loading. Play Video Unmute Duration 0:00 / Current Time 0:00 Playback Speed Settings 1x Loaded: 0% 0:00 Remaining Time -0:00 FullscreenPlayUp Next This is a modal window. Beginning of dialog window. Escape will cancel and close the window. TextColorWhiteBlackRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentBackgroundColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentTransparentWindowColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyTransparentSemi-TransparentOpaque Font Size50%75%100%125%150%175%200%300%400%Text Edge StyleNoneRaisedDepressedUniformDropshadowFont FamilyProportional Sans-SerifMonospace Sans-SerifProportional SerifMonospace SerifCasualScriptSmall Caps Reset restore all settings to the default valuesDone Close Modal Dialog End of dialog window. Share Playback Speed 0.25x 0.5x 1x Normal 1.5x 2x Replay the list TOP ARTICLES * Powered by AnyClip * Privacy Policy Transform Events: Sizzle Reel Transform Events: Sizzle Reel NOW PLAYING UP NEXT The algorithm developed by the MIT team, named “SimPLE” (Simple Pseudo-Label Editing), utilizes self-training, a technique that allows the model to learn from its own predictions, thereby eliminating the need for additional annotated training data. This model was devised to tackle the challenge of generating inaccurate labels during self-training. Notably, the research team claims that this inventive approach significantly enhances the model’s performance across various tasks, surpassing notable models such as Google’s LaMDA, FLAN and other GPT models. EVENT VB Transform 2023 On-Demand Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions. Register Now A REVOLUTION (BUT LIMITED IN SCOPE) In their paper Entailment as Robust Self-Learners, the MIT research team presents the argument that while recent advancements in language generation with LLMs have brought about a revolution, these models possess a distinct limitation when it comes to understanding tasks. “Digital calculators are better than GPT-4 in arithmetic because they are designed based on arithmetic principles,” Hongyin Luo, MIT CSAIL postdoctoral associate and research lead author, told VentureBeat. “Our small model is trained to grasp the core principle of language understanding — contextual entailment, while LLMs do not explicitly learn about it. With a clear goal of learning contextual entailment, the parameter efficiency of our model is much higher than LLMs, thus achieving good performance on NLU tasks.” The research also states that, simply put, a competent contextual entailment model must also excel as an natural language understanding (NLU) model. Moreover, the CSAIL team believes that the implications of this research go beyond mere enhancements in performance. It challenges the prevailing notion that larger models are inherently superior, highlighting the potential of smaller models as equally powerful and environmentally sustainable alternatives. advertisement ENHANCING LANGUAGE MODEL UNDERSTANDING THROUGH TEXTUAL ENTAILMENT The MIT CSAIL team focused on textual entailment to enhance the model’s comprehension of diverse language tasks. Textual entailment denotes the connection between two sentences, whereby if one sentence (the premise) is true, it is probable that the other sentence (the hypothesis) is also true. By training the model using a model that recognizes these relationships, the researchers were able to generate prompts to assess whether specific information is entailed by a given sentence or phrase within various tasks. This zero-shot adaptation significantly enhanced the model’s versatility and adaptability. advertisement MIT’s Luo told VentureBeat that although LLMs have showcased impressive abilities in generating language, art and code, they carry considerable computational costs and privacy risks when handling sensitive data. Conversely, smaller models have historically fallen behind their larger counterparts in multi-tasking and weakly supervised tasks. To address these challenges, the MIT CSAIL researchers employed a natural language-based logical inference dataset to develop smaller models that outperformed much larger models. In addition, by incorporating the concept of textual entailment, researchers endowed the models with the ability to comprehend a broad spectrum of tasks. ADAPTING WITHOUT ADDITIONAL TRAINING These models underwent training to ascertain whether specific information was entailed by a given sentence or phrase, thereby enabling them to adapt to various tasks without requiring additional training. advertisement “The benefit of self-training is that the model can automatically label a large amount of data (create pseudo-labels), but the risk is that the pseudo-labels contain wrong predictions, which might mislead the model or cause overfitting,” said Luo. “Our SimPLE method outperforms all self-training baselines. The method combines two classic AI strategies for robustness: Uncertainty estimation and voting, and provides a more accurate set of predictions.” Lou explained that language model training traditionally necessitates manual data annotation by humans or utilizing LLM APIs. However, human annotators often label sensitive data, thereby compromising privacy. Additionally, transmitting data to third-party annotators or OpenAI’s API may result in the inadvertent exposure of highly sensitive information. “Our method allows data annotation without seeing the data,” he explained. “An annotator only needs to write a template that describes the task. With this template, our system predicts the relationship between the response and the question, generating high-quality labels. By doing this, the dataset is annotated without sharing any data with the annotator.” REDEFINING AI MODEL DEVELOPMENT THROUGH SELF-TRAINING advertisement MIT’s research team asserts that the collection of smaller models exhibits versatility across a wide array of AI tasks — ranging from sentiment classification to news categorization — and demonstrate exceptional proficiency in discerning the relationship between two textual components. The models can also infer sentiment from statements and ascertain the subject matter of news articles based on their content. The researchers achieved remarkable outcomes by reimagining various NLU tasks as entailment tasks. According to Luo, the self-trained entailment models, which comprise 350 million parameters, outperform supervised language models with 137 to 175 billion parameters. He firmly believes that this pioneering work has the potential to redefine the AI and ML landscape, providing a language modeling solution that is more scalable, dependable and cost-effective. “The core of the model is predicting entailment relations, while LLMs predict “how to make things read similar to the training data.” “This makes our model more suitable and efficient for language understanding,” Luo added. “Our model performs better than LLMs and traditional BERT-based models trained with human-generated labels.” advertisement PAVING THE WAY FOR COST-EFFICIENT LANGUAGE MODEL TRAINING The paper that outlines this research, authored by Luo, James Glass and Yoon Kim, is scheduled to be presented in July at the Meeting of the Association for Computational Linguistics in Toronto, Canada. The project received support from the Hong Kong Innovation AI program. With its pioneering approach, the research strives to establish the groundwork for future AI technologies that prioritize scalability, privacy preservation and sustainability. advertisement Lou said that the model contains only 1/500th of the parameters compared to GPT-3-175B, making its deployment significantly easier and resulting in faster inference. The CSAIL team emphasized that organizations would now be able to deploy efficient, robust multi-task models without compromising data privacy or relying on expensive computational resources through the research. “Our next step involves employing the entailment models in various language-related tasks,” said Lou. “Currently, we are engaged in co-training with LLMs to leverage their advantages and further enhance the capabilities of our efficient self-trained models. Additionally, we are working on applying entailment models to measure the alignment between a claim and fact/moral principles, which benefits detecting machine and human-generated misinformation, hate speech and stereotypes.” VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings. VB TRANSFORM ON-DEMAND LIBRARY Were you unable to attend our live event in SF? Check out all of the summit sessions in our on-demand library now! Watch Now * VentureBeat Homepage * Follow us on Facebook * Follow us on Twitter * Follow us on LinkedIn * Follow us on RSS * Press Releases * Contact Us * Advertise * Share a News Tip * Contribute to DataDecisionMakers * Careers * Privacy Policy * Terms of Service * Do Not Sell My Personal Information © 2023 VentureBeat. All rights reserved. Want must-read news straight to your inbox? Sign up for AI Weekly View Briefings No thanks