www.modular.com Open in urlscan Pro
52.17.119.105  Public Scan

Submitted URL: http://www.modular.com/
Effective URL: https://www.modular.com/
Submission: On April 02 via api from US — Scanned from DE

Form analysis 1 forms found in the DOM

Name: email-formGET

<form id="email-form" name="email-form" data-name="Email Form" method="get" class="perf_opt-inner" data-wf-page-id="65ca59ca70c7535b94cecd16" data-wf-element-id="67b62da9-adf4-db36-d0ac-d94cd878dd29" aria-label="Email Form">
  <div class="margin-bottom margin-medium">
    <div class="w-layout-vflex perf_opt-models">
      <div class="margin-bottom margin-xsmall">
        <div class="text-size-metadata">POPULAR ModelS</div>
      </div>
      <div class="perf_opt-models-mask w-dyn-list">
        <div role="list" class="perf_opt-models-inner w-dyn-items">
          <div role="listitem" class="w-dyn-item"><label data-type="AMD c6a.16xlarge" data-values-1="" data-values-2="2.2" class="perf_opt-item w-radio is-active"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">Llama2 7b</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="AMD c6a.16xlarge" data-values-1="" data-values-2="3.0" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">Mistral 7b</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="AMD c6a.16xlarge" data-values-1="" data-values-2="3.3" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">StarCoder 7b</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="Intel c6i.4xlarge" data-values-1="" data-values-2="2.0" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">WavLM Large</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="AWS c6g.8xlarge" data-values-1="1.8" data-values-2="4.1" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">Stable Diffusion UNet</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="AWS c7g.4xlarge" data-values-1="2.6" data-values-2="3.5" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">RoBERTa Base Seqlen 128</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="Intel c5.4xlarge" data-values-1="2.0" data-values-2="1.5" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">CLIP-ViT Large Patch14</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="AMD c5a.8xlarge" data-values-1="4.4" data-values-2="1.5" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">DLRM RMC2</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="Intel c6i.4xlarge" data-values-1="3.1" data-values-2="1.4" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">GPT-2 Small Seqlen 128</span></label></div>
          <div role="listitem" class="w-dyn-item"><label data-type="AMD c5a.4xlarge" data-values-1="2.1" data-values-2="1.4" class="perf_opt-item w-radio"><input type="radio" name="models" id="radio-10" data-name="models"
                class="w-form-formradioinput hide w-radio-input" value="Radio"><span class="w-form-label" for="radio-10">BERT Large Uncased Seqlen 256</span></label></div>
        </div>
      </div>
    </div>
  </div>
</form>

Text Content

Max Platform

NEW


MAX Platform


Learn more

Platform

MAX Engine

MAX Serving

Mojo
Docs



Docs


Learn more

Docs Home

MAX Engine Docs

MAX Serving Docs

Mojo Docs
Company



About


Learn more

Vision

Newsletter

Culture

ModCon

Contact

Careers
Blog

Sales

LoginSign Up

Get StartedSign In
Menu
FIRST STEP IN Mojo🔥 OPEN SOURCE 🚀
INTRODUCING MAX & ALL THE MODCON '23 ANNOUNCEMENTS
READ BLOG

MAX
 PLATFORM ACCELERATES THE PACE OF AI.


IT'S  PROGRAMMABLE 


We rebuilt the modern AI software stack, from the ground up, to boost any AI
pipeline, on any hardware.


Get started

Learn More

TRUSTED BY ORGANIZATIONS


01
BENEFITS


PROGRAMMABLE, PERFORMANT &  PORTABLE


FULL PROGRAMMABILITY

MAX is built on top of Mojo from the ground up to empower AI engineers to unlock
the full potential of AI hardware by combining the usability of Python, the
safety of Rust, and the performance of C.


UNPARALLELED PERFORMANCE

MAX unlocks state-of-the-art performance for your AI models. Extend and optimize
your AI pipelines without having to rewrite them, with unparalleled performance
using a next generation compiler.


SEAMLESS PORTABILITY

Seamlessly move your models and AI pipelines to any hardware target, maximizing
your performance to cost ratio and avoiding vendor lock-in.
‍


02
PERFORMANCE


UNPARALLELED LATENCY & COST SAVINGS

MAX unlocks state-of-the-art latency and throughput for your AI pipeline,
including generative models, helping you quickly productionize AI pipelines and
realize massive cost savings on your cloud bill.

POPULAR ModelS
Llama2 7b
Mistral 7b
StarCoder 7b
WavLM Large
Stable Diffusion UNet
RoBERTa Base Seqlen 128
CLIP-ViT Large Patch14
DLRM RMC2
GPT-2 Small Seqlen 128
BERT Large Uncased Seqlen 256
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
1.7
x

vs

Modular is 1.7x faster than TensorFlow when running [Stable Diffusion-UNet] on
[CPU]

2.2
x

vs

Modular is 2.2x faster than PyTorch when running Llama2 7b on AMD c6a.16xlarge



Do these numbers seem too good to be true? View in more detail, then sign up to
compare locally.

Explore our performance

03
PRODUCT


AN INTEGRATED AI 
DEVELOPER EXPERIENCE

The Modular Accelerated Xecution (MAX) platform is a unified set of tools and
libraries that provides everything you need to deploy low-latency,
high-throughput, real-time AI inference pipelines into production.


MAX
COMPONENTS


 * MOJO
   
   A programming language that combines the usability of Python with the
   performance of C, unlocking unparalleled programmability of AI hardware and
   extensibility of AI models for all AI engineers.
   
   Learn about Mojo
   
   * Mojo Docs
     
     
   
   * Mojo Community
     
     
   


 * MAX ENGINE
   
   A model inference runtime and API library that executes all your AI pipelines
   on any hardware with unparalleled performance and cost savings.
   
   Learn about MAX Engine
   
   * MAX Engine Docs
     
     
   
   * MAX Engine Github Repo
     
     
   


 * MAX SERVING
   
   A model serving library for the MAX Engine that provides full
   interoperability with existing serving systems (e.g. Triton) and that
   seamlessly deploys within existing container infrastructure (e.g.,
   Kubernetes).
   
   Learn about MAX Serving
   
   * MAX Serving Docs
     
     
   
   * Get Started
     
     
   


04
USE CASES


INCREDIBLY EASY TO GET STARTED

from max import engine

# Load your model
session = engine.InferenceSession()
model = session.load(MODEL_PATH)

# Prepare the inputs, then run an inference
outputs = model.execute(**inputs)


from max.graph import Dim, Module, MOTensor

@value
struct LLM:
    var params: ModelParams
    fn build(inout self, inout m: Module):
        var g = m.graph("llm",
									TypeTuple(MOTensor(
                    DType.float32,
                    Dim.dynamic(),
										Dim.dynamic())
									)
								)
        ...
        g.output((reshape(
										next_token, self.batch
									)))


from max.engine import InferenceSession

var sess = InferenceSession()
var txt_enc = sess.load_model('txt-encoder')
var img_dec = sess.load_model('img-decoder')
var img_dif = sess.load_model('img-diffuser')
var latent = ...
for step in range(n_steps):
    var prev = latent
    var latent = execute(img_dif, latent)
    var pred = ...
    latent = ...

var decoded = execute(img_dec, latent)
var pixels = decoded.to_numpy()
var img = Image.fromarray(pixels, 'RGB')






QUICK PERFORMANCE WINS

Use our Python or C API to replace your current TensorFlow, PyTorch, or ONNX
inference calls with MAX Engine. With 3 lines of code you can execute your AI
models up to 5x faster across a variety of CPU architectures (Intel, AMD, ARM).
Additionally, use MAX Serving as a drop-in replacement for your NVIDIA Triton
Inference Server.




EXTEND & OPTIMIZE YOUR MODELS

Once you're using MAX Engine, you can optimize your performance further by using
Mojo to write custom ops or build your whole model in Mojo, using the MAX Graph
API (for inference).




FULL STACK ON MAX

Beyond inference performance in MAX Engine, you can further optimize the rest of
your AI pipeline by migrating your data pre/post-processing code and application
code to Mojo. Over time, we will add more tools and libraries to MAX that
accelerate development for other parts of your AI stack.


Get started


from max import engine

# Load your model
session = engine.InferenceSession()
model = session.load(MODEL_PATH)

# Prepare the inputs, then run an inference
outputs = model.execute(**inputs)






QUICK PERFORMANCE WINS

Use our Python or C API to replace your current TensorFlow, PyTorch, or ONNX
inference calls with MAX Engine. With 3 lines of code you can execute your AI
models up to 5x faster across a variety of CPU architectures (Intel, AMD, ARM).
Additionally, use MAX Serving as a drop-in replacement for your NVIDIA Triton
Inference Server.



from max.graph import Dim, Module, MOTensor

@value
struct LLM:
    var params: ModelParams
    fn build(inout self, inout m: Module):
        var g = m.graph("llm",
									TypeTuple(MOTensor(
                    DType.float32,
                    Dim.dynamic(),
										Dim.dynamic())
									)
								)
        ...
        g.output((reshape(
										next_token, self.batch
									)))






EXTEND & OPTIMIZE YOUR MODELS

Once you're using MAX Engine, you can optimize your performance further by using
Mojo to write custom ops or build your whole model in Mojo, using the MAX Graph
API (for inference).



from max.engine import InferenceSession

var sess = InferenceSession()
var txt_enc = sess.load_model('txt-encoder')
var img_dec = sess.load_model('img-decoder')
var img_dif = sess.load_model('img-diffuser')
var latent = ...
for step in range(n_steps):
    var prev = latent
    var latent = execute(img_dif, latent)
    var pred = ...
    latent = ...

var decoded = execute(img_dec, latent)
var pixels = decoded.to_numpy()
var img = Image.fromarray(pixels, 'RGB')






FULL STACK ON MAX

Beyond inference performance in MAX Engine, you can further optimize the rest of
your AI pipeline by migrating your data pre/post-processing code and application
code to Mojo. Over time, we will add more tools and libraries to MAX that
accelerate development for other parts of your AI stack.



Try it


LATEST ABOUT MODULAR

Developer


THE NEXT BIG STEP IN MOJO🔥 OPEN SOURCE

March 28, 2024

Developer


LEVERAGING MAX ENGINE'S DYNAMIC SHAPE CAPABILITIES

March 28, 2024

Product


MAX 24.2 IS HERE! WHAT’S NEW?

March 28, 2024

Developer


DEPLOYING MAX ON AMAZON SAGEMAKER

March 27, 2024

Developer


SEMANTIC SEARCH WITH MAX ENGINE

March 21, 2024

Engineering


HOW TO BE CONFIDENT IN YOUR PERFORMANCE BENCHMARKING

March 19, 2024




WHY MODULAR?


01


BUILT BY THE WORLD’S AI EXPERTS

Our team has built most of the world’s existing AI infrastructure, including
TensorFlow, PyTorch, ONNX, and XLA, and we’ve built and scaled dev tools like
Swift, LLVM, and MLIR. Now we’re focused on rebuilding AI infrastructure for the
world.


02


REINVENTED FROM THE GROUND UP

To unlock the next wave of AI innovation, we started with a “first principles”
approach to building  the lowest layers of the AI stack. We can’t pile on more
and more layers of complexity on top of already over-complicated existing
solutions.


03


INFRASTRUCTURE THAT JUST WORKS

We build technology that meets you where you are. We don’t require you to
rewrite your models, workflows, or application code, grapple with confusing
converters, or be a hardware expert to take advantage of bleeding-edge
technology.






TRY 

MAX
 RIGHT NOW

Up and running, for free, in 5 minutes.

Get started

Book a demo
MAX Platform

MAX Engine 🏎️

MAX Serving ⚡️

Mojo 🔥

Sign Up

Blog

Careers

Report an issue


Copyright ©

2024

Modular Inc

Terms

,
Privacy

&
Acceptable Use

Please accept our cookies
We use cookies to track visitor traffic so we can learn to improve the website
and documentation. Read more
AcceptReject

Cookie preferences

Cookie usage
The Modular Docs website uses browser cookies only to track website traffic with
Google Analytics. For more details about how we handle sensitive data, please
read our privacy policy.
Google Analytics cookiesGoogle Analytics cookies
These cookies track website usage and are unique to this website.

NameDomainExpirationDescription^_gagoogle.com2 yearsGoogle Analytics

Accept allReject allSave settings