venturebeat.com
Open in
urlscan Pro
192.0.66.2
Public Scan
Submitted URL: https://t.sidekickopen01-eu1.com/Ctc/5F+23284/d2-bdL04/JlY2-6qcW95jsWP6lZ3mTW3gPp7t98BMjlN1_FmpQCPKdHW3jjMQs7DScQhW84h2T13MsRT7W2...
Effective URL: https://venturebeat.com/ai/encord-offers-tool-to-automatically-detect-errors-in-training-data/
Submission: On November 24 via api from US — Scanned from DE
Effective URL: https://venturebeat.com/ai/encord-offers-tool-to-automatically-detect-errors-in-training-data/
Submission: On November 24 via api from US — Scanned from DE
Form analysis
1 forms found in the DOMGET https://venturebeat.com/
<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
<input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
<button type="submit" class="">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<g>
<path fill-rule="evenodd" clip-rule="evenodd"
d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
</path>
</g>
</svg>
</button>
</form>
Text Content
WE VALUE YOUR PRIVACY We and our partners store and/or access information on a device, such as cookies and process personal data, such as unique identifiers and standard information sent by a device for personalised ads and content, ad and content measurement, and audience insights, as well as to develop and improve products. With your permission we and our partners may use precise geolocation data and identification through device scanning. You may click to consent to our and our 760 partners’ processing as described above. Alternatively you may access more detailed information and change your preferences before consenting or to refuse consenting. Please note that some processing of your personal data may not require your consent, but you have a right to object to such processing. Your preferences will apply to this website only. You can change your preferences at any time by returning to this site or visit our privacy policy. MORE OPTIONSAGREE Skip to main content Events Video Special Issues Jobs VentureBeat Homepage Subscribe * Artificial Intelligence * View All * AI, ML and Deep Learning * Auto ML * Data Labelling * Synthetic Data * Conversational AI * NLP * Text-to-Speech * Security * View All * Data Security and Privacy * Network Security and Privacy * Software Security * Computer Hardware Security * Cloud and Data Storage Security * Data Infrastructure * View All * Data Science * Data Management * Data Storage and Cloud * Big Data and Analytics * Data Networks * Automation * View All * Industrial Automation * Business Process Automation * Development Automation * Robotic Process Automation * Test Automation * Enterprise Analytics * View All * Business Intelligence * Disaster Recovery Business Continuity * Statistical Analysis * Predictive Analysis * More * Data Decision Makers * Virtual Communication * Team Collaboration * UCaaS * Virtual Reality Collaboration * Virtual Employee Experience * Programming & Development * Product Development * Application Development * Test Management * Development Languages Subscribe Events Video Special Issues Jobs ENCORD TACKLES GROWING PROBLEM OF UNLABELED DATA Taryn Plumb@taryn_plumb June 1, 2022 6:00 AM * Share on Facebook * Share on X * Share on LinkedIn Image Credit: ipopba // Getty Images Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here. -------------------------------------------------------------------------------- There’s an interesting give and take with machine learning (ML) models. Just as humans increasingly rely on them, they rely on us. While they can dramatically speed up and hone human processes, they also need to be fed the correct information – by humans – to be able to do this job. “They don’t have common sense, they only learn from what you tell them,” said Eric Landau, cofounder and CTO of London computer vision company Encord, which claims to have developed the first-ever tool to help address a fundamental, growing, time-consuming problem of unlabeled data. The inherent quandary is that there’s no shortage of data in the world – it only continues to accumulate by the day, hour, minute. However, much data remains unlabeled and is thus unusable by ML models. Humans must often do the labeling and natural human distraction can lead to errors that then have to be corrected, resulting in double work. VB EVENT The AI Impact Tour Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you! Learn More “It’s relying on human judgment to correct other errors in human judgment,” Landau said. “If you aren’t careful about feeding a model the exact right annotations, this will have negative consequences in the real world.” THE URGENT NEED FOR DATA QUALITY Today, the London-based Encord announced the general release of its data quality assessment technology that automatically detects errors within annotated training data. This can help make AI development less expensive, time-consuming and difficult to scale, Landau said. “Model building can be a slow, arduous process,” he said. “Data quality is an urgent need for machine learning teams. We wanted to speed up that process, to make it easier for teams to build models a lot quicker.” The technology uses micro models based on neural networks that can be finely targeted so that big models can train on large amounts of data. It is agnostic to use case, so users feed in whatever that essential data may be. “These are small, targeted models that are good at one thing, not very general,” Landau said. For example, for dashcams that detect road signs, several micro models can be strung together that each individually understand signs for, say, certain U.S. states or European cities. The tool also applies the growing technique of self-supervised learning. Only the “most egregious” cases are passed back to human eyes to help it operate most robustly and optimize human time. advertisement The technology is being used by specialist computer vision companies including Teton.AI and SurgEase, as well as healthcare institutions King’s College London and St Thomas’s Hospital. Landau said he sees ML use cases across a variety of areas, from satellite imaging to radiology. “It doesn’t require humans to review every single data point. It’s a general approach to the problem but it’s extremely important,” he said. “As far as we know, it is the first of its kind automated label quality assessment tool for computer vision.” A DATA-CENTRIC APPROACH Founded in 2020, Encord it is backed by CRV, Y Combinator, WndrCo and Crane Venture Partners and in May 2022 was named one of CBInsights’ AI 100 list of most innovative artificial intelligence startups. advertisement Still, there are several other much larger companies tackling the same data labeling issues, including Scale AI and Snorkel. But Encord is clearly on the upswing: Its tools have been used by Kings College London, Memorial Sloan Kettering Cancer Center and Stanford Medical Centre to help process 3X more images and reduce experiment duration by 80%, Landau said. The company has helped hospitals annotate pre-cancerous polyp videos, resulting in increased efficiency by an average of 6.4 times. It has automated 97% of labels to help clinicians become 16x more efficient at labeling medical images. It has even loftier plans to accelerate medical research by 100x, according to Landau. He emphasized the importance of relying on the data, rather than the model – a growing concept in AI use cases. The longstanding practice has been the “model-centric” approach, but focusing on data is “more acute.” “What you’re feeding the model is the most important thing,” he said. “The quality of the model is the quality of the data.” advertisement That’s because if you just think about models and how to fix them, you lose the perspective of the issue you’re trying to solve, he pointed out. When data is improperly annotated, models learn the wrong thing and individuals can get hurt. If a polyp is overlooked in a gastroenterology video, for example, or models can’t identify when a patient in an elderly care home has fallen, or the numerous issues with autonomous vehicles. Particularly in building medical diagnostic AI systems, scientists require training data from all types of demographics – ages, nationalities, characteristics. “You can only do that with data-centric models,” Landau said. “If you just think about the model, not the data, you will lose that.” VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings. THE AI IMPACT TOUR Join us for an evening full of networking and insights at VentureBeat's AI Impact Tour, coming to San Francisco, New York, and Los Angeles! Learn More * VentureBeat Homepage * Follow us on Facebook * Follow us on X * Follow us on LinkedIn * Follow us on RSS * Press Releases * Contact Us * Advertise * Share a News Tip * Contribute to DataDecisionMakers * Careers * Privacy Policy * Terms of Service * Do Not Sell My Personal Information © 2023 VentureBeat. All rights reserved.