venturebeat.com Open in urlscan Pro
192.0.66.2  Public Scan

Submitted URL: https://t.sidekickopen01-eu1.com/Ctc/5F+23284/d2-bdL04/JlY2-6qcW95jsWP6lZ3mTW3gPp7t98BMjlN1_FmpQCPKdHW3jjMQs7DScQhW84h2T13MsRT7W2...
Effective URL: https://venturebeat.com/ai/encord-offers-tool-to-automatically-detect-errors-in-training-data/
Submission: On November 24 via api from US — Scanned from DE

Form analysis 1 forms found in the DOM

GET https://venturebeat.com/

<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
  <input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
  <button type="submit" class="">
    <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
      <g>
        <path fill-rule="evenodd" clip-rule="evenodd"
          d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
        </path>
      </g>
    </svg>
  </button>
</form>

Text Content

WE VALUE YOUR PRIVACY

We and our partners store and/or access information on a device, such as cookies
and process personal data, such as unique identifiers and standard information
sent by a device for personalised ads and content, ad and content measurement,
and audience insights, as well as to develop and improve products. With your
permission we and our partners may use precise geolocation data and
identification through device scanning. You may click to consent to our and our
760 partners’ processing as described above. Alternatively you may access more
detailed information and change your preferences before consenting or to refuse
consenting. Please note that some processing of your personal data may not
require your consent, but you have a right to object to such processing. Your
preferences will apply to this website only. You can change your preferences at
any time by returning to this site or visit our privacy policy.
MORE OPTIONSAGREE

Skip to main content
Events Video Special Issues Jobs
VentureBeat Homepage

Subscribe

 * Artificial Intelligence
   * View All
   * AI, ML and Deep Learning
   * Auto ML
   * Data Labelling
   * Synthetic Data
   * Conversational AI
   * NLP
   * Text-to-Speech
 * Security
   * View All
   * Data Security and Privacy
   * Network Security and Privacy
   * Software Security
   * Computer Hardware Security
   * Cloud and Data Storage Security
 * Data Infrastructure
   * View All
   * Data Science
   * Data Management
   * Data Storage and Cloud
   * Big Data and Analytics
   * Data Networks
 * Automation
   * View All
   * Industrial Automation
   * Business Process Automation
   * Development Automation
   * Robotic Process Automation
   * Test Automation
 * Enterprise Analytics
   * View All
   * Business Intelligence
   * Disaster Recovery Business Continuity
   * Statistical Analysis
   * Predictive Analysis
 * More
   * Data Decision Makers
   * Virtual Communication
     * Team Collaboration
     * UCaaS
     * Virtual Reality Collaboration
     * Virtual Employee Experience
   * Programming & Development
     * Product Development
     * Application Development
     * Test Management
     * Development Languages


Subscribe Events Video Special Issues Jobs



ENCORD TACKLES GROWING PROBLEM OF UNLABELED DATA

Taryn Plumb@taryn_plumb
June 1, 2022 6:00 AM
 * Share on Facebook
 * Share on X
 * Share on LinkedIn

Image Credit: ipopba // Getty Images

Are you ready to bring more awareness to your brand? Consider becoming a sponsor
for The AI Impact Tour. Learn more about the opportunities here.

--------------------------------------------------------------------------------



There’s an interesting give and take with machine learning (ML) models.

Just as humans increasingly rely on them, they rely on us. While they can
dramatically speed up and hone human processes, they also need to be fed the
correct information – by humans – to be able to do this job.

“They don’t have common sense, they only learn from what you tell them,” said
Eric Landau, cofounder and CTO of London computer vision company Encord, which
claims to have developed the first-ever tool to help address a fundamental,
growing, time-consuming problem of unlabeled data.

The inherent quandary is that there’s no shortage of data in the world – it only
continues to accumulate by the day, hour, minute. However, much data remains
unlabeled and is thus unusable by ML models. Humans must often do the labeling
and natural human distraction can lead to errors that then have to be corrected,
resulting in double work.


VB EVENT

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming
to a city near you!

 


Learn More

“It’s relying on human judgment to correct other errors in human judgment,”
Landau said. “If you aren’t careful about feeding a model the exact right
annotations, this will have negative consequences in the real world.”


THE URGENT NEED FOR DATA QUALITY

Today, the London-based Encord announced the general release of its data quality
assessment technology that automatically detects errors within annotated
training data. This can help make AI development less expensive, time-consuming
and difficult to scale, Landau said.

“Model building can be a slow, arduous process,” he said. “Data quality is an
urgent need for machine learning teams. We wanted to speed up that process, to
make it easier for teams to build models a lot quicker.”

The technology uses micro models based on neural networks that can be finely
targeted so that big models can train on large amounts of data. It is agnostic
to use case, so users feed in whatever that essential data may be. “These are
small, targeted models that are good at one thing, not very general,” Landau
said.

For example, for dashcams that detect road signs, several micro models can be
strung together that each individually understand signs for, say, certain U.S.
states or European cities.

The tool also applies the growing technique of self-supervised learning. Only
the “most egregious” cases are passed back to human eyes to help it operate most
robustly and optimize human time.

advertisement


The technology is being used by specialist computer vision companies including
Teton.AI and SurgEase, as well as healthcare institutions King’s College London
and St Thomas’s Hospital. Landau said he sees ML use cases across a variety of
areas, from satellite imaging to radiology.

“It doesn’t require humans to review every single data point. It’s a general
approach to the problem but it’s extremely important,” he said. “As far as we
know, it is the first of its kind automated label quality assessment tool for
computer vision.”


A DATA-CENTRIC APPROACH

Founded in 2020, Encord it is backed by CRV, Y Combinator, WndrCo and Crane
Venture Partners and in May 2022 was named one of CBInsights’ AI 100 list of
most innovative artificial intelligence startups.

advertisement


Still, there are several other much larger companies tackling the same data
labeling issues, including Scale AI and Snorkel. But Encord is clearly on the
upswing: Its tools have been used by Kings College London, Memorial Sloan
Kettering Cancer Center and Stanford Medical Centre to help process 3X more
images and reduce experiment duration by 80%, Landau said.

The company has helped hospitals annotate pre-cancerous polyp videos, resulting
in increased efficiency by an average of 6.4 times. It has automated 97% of
labels to help clinicians become 16x more efficient at labeling medical images.
It has even loftier plans to accelerate medical research by 100x, according to
Landau. 

He emphasized the importance of relying on the data, rather than the model – a
growing concept in AI use cases. The longstanding practice has been the
“model-centric” approach, but focusing on data is “more acute.”

“What you’re feeding the model is the most important thing,” he said. “The
quality of the model is the quality of the data.”

advertisement


That’s because if you just think about models and how to fix them, you lose the
perspective of the issue you’re trying to solve, he pointed out. When data is
improperly annotated, models learn the wrong thing and individuals can get hurt.
If a polyp is overlooked in a gastroenterology video, for example, or models
can’t identify when a patient in an elderly care home has fallen, or the
numerous issues with autonomous vehicles.

Particularly in building medical diagnostic AI systems, scientists require
training data from all types of demographics – ages, nationalities,
characteristics. “You can only do that with data-centric models,” Landau said.
“If you just think about the model, not the data, you will lose that.”

VentureBeat's mission is to be a digital town square for technical
decision-makers to gain knowledge about transformative enterprise technology and
transact. Discover our Briefings.




THE AI IMPACT TOUR

Join us for an evening full of networking and insights at VentureBeat's AI
Impact Tour, coming to San Francisco, New York, and Los Angeles!

Learn More


 * VentureBeat Homepage
 * Follow us on Facebook
 * Follow us on X
 * Follow us on LinkedIn
 * Follow us on RSS

 * Press Releases
 * Contact Us
 * Advertise
 * Share a News Tip
 * Contribute to DataDecisionMakers

 * Careers
 * Privacy Policy
 * Terms of Service
 * Do Not Sell My Personal Information

© 2023 VentureBeat. All rights reserved.