blogs.nvidia.com Open in urlscan Pro
152.195.53.224 Public Scan

Back to summary

Submitted URL:
https://go.nvidianews.com/MTU2LU9GTi03NDIAAAGEVd-UWOhSyMijMcUbYnafliQip9nK_xoKQvzr5uZ83Luds0Rg-18aYBTNfsCyplFjD99OQCU=
Effective URL:
https://blogs.nvidia.com/blog/2022/03/22/microsoft-translator-triton-inference/?ncid=em-news-744165&mkt_tok=MTU2LU9GTi03N...
Submission: On May 13 via api (May 13th 2022, 7:21:18 am UTC) from CH — Scanned from DE

Form analysis
2 forms found in the DOM

GET https://blogs.nvidia.com/

<form role="search" method="get" class="search-form" action="https://blogs.nvidia.com/" __bizdiag="109519073" __biza="WJ__">
  <div class="form-item for-search">
    <label>
      <span class="screen-reader-text">Search for:</span>
      <input type="search" class="search-field" placeholder="Search The Blog" value="" name="s" title="Search for:">
    </label>
  </div>
  <div class="form-item for-submit">
    <input type="submit" class="search-submit" value="Search">
    <button type="submit" class="search-submit-button"> <span class="icon icon-search"></span>
    </button>
  </div>
  <input type="hidden" name="lang" value="en">
</form>

GET https://blogs.nvidia.com/

<form role="search" method="get" class="search-form" action="https://blogs.nvidia.com/" __bizdiag="109519073" __biza="WJ__">
  <div class="form-item for-search">
    <label>
      <span class="screen-reader-text">Search for:</span>
      <input type="search" class="search-field" placeholder="Search" value="" name="s" title="Search for:">
    </label>
  </div>
  <div class="form-item for-submit">
    <button type="submit" class="search-submit"> <span class="icon icon-search"></span>
    </button>
  </div>
  <input type="hidden" name="lang" value="en">
</form>

Text Content

Skip to content
Artificial Intelligence Computing Leadership from NVIDIA
 * Search for:
   
   Toggle Search
   


Search for:

 * 
 * 
 * 
 * 
   
 * 
 * 
 * 
 * 
   
 * 

 * Privacy Policy
 * Manage My Privacy
 * Legal
 * Accessibility
 * Product Security
 * Contact

Copyright © 2022 NVIDIA Corporation
 * Home
 * AI
 * Data Center
 * Driving
 * Gaming
 * Pro Graphics
 * Autonomous Machines
 * Healthcare
 * Startups
 * AI Podcast


GETTING PEOPLE TALKING: MICROSOFT IMPROVES AI QUALITY AND EFFICIENCY OF
TRANSLATOR USING NVIDIA TRITON

Microsoft aims to be the first to put a class of powerful AI transformer models
into production using Azure with NVIDIA GPUs and Triton inference software.
March 22, 2022 by Shankar Chandrasekaran

Share
 * 
 * 
 * 
 * 
 * Email

When your software can evoke tears of joy, you spread the cheer.

So, Translator, a Microsoft Azure Cognitive Service, is applying some of the
world’s largest AI models to help more people communicate.

“There are so many cool stories,” said Vishal Chowdhary, development manager for
Translator.

Like the five-day sprint to add Haitian Creole to power apps that helped aid
workers after Haiti suffered a 7.0 earthquake in 2010. Or the grandparents who
choked up in their first session using the software to speak live with remote
grandkids who spoke a language they did not understand.


AN AMBITIOUS GOAL

“Our vision is to eliminate barriers in all languages and modalities with this
same API that’s already being used by thousands of developers,” said Chowdhary.

With some 7,000 languages spoken worldwide, it’s an ambitious goal.

So, the team turned to a powerful, and complex, tool — a mixture of experts
(MoE) AI approach.

It’s a state-of-the-art member of the class of transformer models driving rapid
advances in natural language processing. And with 5 billion parameters, it’s 80x
larger than the biggest model the team has in production for natural-language
processing.

MoE models are so compute-intensive, it’s hard to find anyone who’s put them
into production. In an initial test, CPU-based servers couldn’t meet the team’s
requirement to use them to translate a document in one second.


A 27X SPEEDUP

Then the team ran the test on accelerated systems with NVIDIA Triton Inference
Server, part of the NVIDIA AI Enterprise 2.0 platform announced this week at
GTC.

“Using NVIDIA GPUs and Triton we could do it, and do it efficiently,” said
Chowdhary.

In fact, the team was able to achieve up to a 27x speedup over non-optimized GPU
runtimes.

“We were able to build one model to perform different language understanding
tasks — like summarizing, text generation and translation — instead of having to
develop separate models for each task,” said Hanny Hassan Awadalla, a principal
researcher at Microsoft who supervised the tests.


HOW TRITON HELPED

Microsoft’s models break down a big job like translating a stack of documents
into many small tasks of translating hundreds of sentences. Triton’s dynamic
batching feature pools these many requests to make best use of a GPU’s muscle.

The team praised Triton’s ability to run any model in any mode using CPUs, GPUs
or other accelerators.

“It seems very well thought out with all the features I wanted for my scenario,
like something I would have developed for myself,” said Chowdhary, whose team
has been developing large-scale distributed systems for more than a decade.

Under the hood, two software components were key to Triton’s success. NVIDIA
extended FasterTransformer — a software layer that handles inference
computations — to support MoE models. CUTLASS, an NVIDIA math library, helped
implement the models efficiently.


PROVEN PROTOTYPE IN FOUR WEEKS

Though the tests were complex, the team worked with NVIDIA engineers to get an
end-to-end prototype with Triton up and running in less than a month.

“That’s a really impressive timeline to make a shippable product — I really
appreciate that,” said Awadalla.

And though it was the team’s first experience with Triton, “we used it to ship
the MoE models by rearchitecting our runtime environment without a lot of
effort, and now I hope it becomes part of our long-term host system,” Chowdhary
added.


TAKING THE NEXT STEPS

The accelerated service will arrive in judicious steps, initially for document
translation in a few major languages.

“Eventually, we want our customers to get the goodness of these new models
transparently in all our scenarios,” said Chowdhary.

The work is part of a broad initiative at Microsoft. It aims to fuel advances
across a wide sweep of its products such as Office and Teams, as well as those
of its developers and customers from small one-app companies to Fortune 500
enterprises.

Paving the way, Awadalla’s team published research in September on training MoE
models with up to 200 billion parameters on NVIDIA A100 Tensor Core GPUs. Since
then, the team’s accelerated that work another 8x by using 80G versions of the
A100 GPUs on models with more than 300 billion parameters.

“The models will need to get larger and larger to better represent more
languages, especially for ones where we don’t have a lot of data,” Adawalla
said.

Categories: Deep Learning
Tags: Artificial Intelligence | Customer Stories | GTC 2022 | Inference



ALL NVIDIA NEWS

Urban Jungle: AI-Generated Endangered Species Mix With Times Square’s Nightlife



GFN Thursday Gets Groovy As ‘Evil Dead: The Game’ Marks 1,300 Games on GeForce
NOW



Behind the Scenes of Virtual Streams: Explore Three Biggest Trends in XR



Creator Karen X. Cheng Brings Keen AI for Design ‘In the NVIDIA Studio’



More Freedom on the Freeway: AI Lifts Malaysia’s Toll Barriers





POST NAVIGATION




CORPORATE INFORMATION

 * About NVIDIA
 * Corporate Overview
 * Technologies
 * NVIDIA Research
 * Investors
 * Social Responsibility
 * NVIDIA Foundation


GET INVOLVED

 * Forums
 * Careers
 * Developer Home
 * Join the Developer Program
 * NVIDIA Partner Network
 * NVIDIA Inception
 * Resources for Venture Capitalists
 * NVIDIA Inception GPU Ventures
 * Technical Training
 * Training for IT Professionals
 * Professional Services for Data Science


NEWS & EVENTS

 * Newsroom
 * NVIDIA Blog
 * NVIDIA Developer Blog
 * Webinars
 * Stay Informed
 * Events Calendar
 * NVIDIA GTC
 * NVIDIA On-Demand

Explore our regional blogs and other social networks

 * Privacy Policy
 * Manage My Privacy
 * Legal
 * Accessibility
 * Product Security
 * Contact

Copyright © 2022 NVIDIA Corporation
USA - United States

NVIDIA websites use cookies to deliver and improve the website experience. See
our cookie policy for further details on how we use cookies and how to change
your cookie settings.

ACCEPT


 Tweet
 Share
 E-mail
 Tweet
 Share
 E-mail
 Tweet
 Share
 LinkedIn
 E-mail
Share this Article

Friend's Email Address

Your Name

Your Email Address

Comments

Send Email

Email sent!
Sumo

blogs.nvidia.com Open in urlscan Pro 152.195.53.224 Public Scan

Form analysis 2 forms found in the DOM

GET https://blogs.nvidia.com/

GET https://blogs.nvidia.com/

Text Content

blogs.nvidia.com Open in urlscan Pro
152.195.53.224 Public Scan

Form analysis
2 forms found in the DOM