wangzhezhe.github.io Open in urlscan Pro
2606:50c0:8001::153  Public Scan

URL: https://wangzhezhe.github.io/2021/06/27/HPC-vs-CloudComputing/
Submission: On September 09 via manual from US — Scanned from US

Form analysis 1 forms found in the DOM

GET //google.com/search

<form action="//google.com/search" method="get" accept-charset="UTF-8" class="search-form"><input type="search" name="q" class="search-form-input" placeholder="Search"><button type="submit" class="search-form-submit"></button><input type="hidden"
    name="sitesearch" value="http://yoursite.com"></form>

Text Content


首页 归档 关于


AVERAGEMIND

2021-06-27
distributed system


HPC VS CLOUDCOMPUTING

This article mainly compares several differences between the HPC(High
performance computing) and CloudComputing. This is the script for online video.

You may heard the CloudComputing and HPC (High performance computing) here and
there

Do you curious about the differences between these two techniques?

After several years elementary works in both of these two areas, the answer
about this question is becoming clear.

(In this vedio) We try to explain the difference between CloudComputing and HPC
from several aspects. We mainly consider these differences from the aspects of
Maintainer and Machines, Users and assocaited programs, How to access it and
Typical Software Stacks.

These two areas are still evolving quickly, we just try to share some critical
ideas and necessary information.

Generally speaking, the common thing between the CloudCompuing and HPC is that,
you try to run your program on multiple machines, such as hundres and thousands
of computers. Obviously, one person or small insitutions do not have enough
funds and capaiblities to construct a cluster that contains a large amount of
machines, so, these machines or clusters are usually maintained by specialized
institution.

Let’s first look at the difference for institutions that maintain the machine or
computing nodes used for CloudComputing and HPC.


MACHINES AND THEIR MAINTAINER

CloudComputing:

The cloud computing service is mainly provided by commercial companies.

The left side figure shows the main cloud computing service providers

you may know names of these companies very well, such as Amazon, Microsoft and
Google, etc /ɪt ˈsetərə/

https://cloud.google.com/gartner-cloud-infrastructure-as-a-service

HPC

In contrast, the right figure shows the main provider of the HPC system in USA.
We mainly list the main national laboratories under the DOE (department of the
energy). These institutions are well known for people with the backgound of the
scientific computing. These government funded institutions are main force for
building the HPC system and play important role in history. For example, the oak
ridge and Los Alamos national lab are two sites involved in Manhattan Project.

The difference of the maintainers influence the properties of the platform and
its serving functions. Lets looks at the details of the machines and services
they provided.

Cloud Computing

They ususlly do not show the details such as the performance metrics of their
cluster, instead, they are more care about the types of services.

The machine with different configurations are created to users as their
requirments based on virtulaization techniques. Their physical machine pool is
also heterogeneous which is different compared with the HPC which is configered
by homogeneous machines in general.

The left figure shows the featured products of the google clould service. In
addition to the virtual machine, they also provide versatil /ˈvɜːrsətl/ data
management and processing capabilities.

HPC:

There is a rank list called top500 list that compares the performance of
differnet HPC system in different aspects according to various Testing program.
This is just like the olympic game in the HPC world, this rank changes a lot
every year.

as shown in the right figure, it lists the current rank.

https://www.top500.org/lists/top500/2020/11/

Currently, the rank one system in the list is Fukago supercomputer, which is
maintained by RIKEN (RIKEN is Japan’s largest comprehensive research institution
renowned for high-quality research in a diverse range of scientific disciplines)

the second one is Summit system, which is maintained by the oak ridge national
lab.

Actually, most of these system is maintained by the national institutions.

The Rmax and Rpeak are two important metric

A system’s Rmax score describes its maximal achieved performance (execute how
many float operation per second); the Rpeak score describes its theoretical peak
performance (https://kb.iu.edu/d/bbzo).

This figure shows national labs in USA

https://www.energy.gov/science/science-innovation/office-science-national-laboratories

Another interesting topic is to discuss the users of the CloudComputing and HPC


SERVE TO WHOM (USERS)

CloudComputing:

The CloudComputing mainly server for the IT companies. The startup company may
not spend money to buy their own machines and they just need to rent the
computing node from the cloud provider with low cost in a more flexible way. The
cloud provider has a professinal team to maintain these service. Anyone can rent
or buy the machine from the cloud compuer provider if you pay their money.

https://aws.amazon.com/what-is-aws/

the left figure lists some key information about the customer they serve to. The
company such as retailer, financial companies may need to rent lots of machine
to maintaine their user information or provide necessary online service, such as
login system, website, or the electronic transaction system that supports their
businuss.

HPC:

The HPC mainly server for the domain scientist, they use HPC to solve numerical
problems such as scientific simulations. They build model and run the model on
HPC in large scale with parallel computing, it is common to use thousands of
machine and cores to run a particular simulation. The scientists or research
teams in colleage have the collabaration with the research institutions that
owns the HPC system, if there research goal overlap with each other, the
maintainer will set a specific core time to a project. Anyway, the goal of HPC
is for research, they are basically the non-profit services.

the scientic projects that uses summit

https://www.olcf.ornl.gov/leadership-science/

The right figure shows some key projects running on summit supercomputer,
you could see that the main areas are biology, physicals, fusion or nuclear
science or earth science etc. The styles of project are quite different compared
with the CloudComputing illustrated by the figure at the left side.

In summary the core value of the CloudComputing is to serve the customer’s
business, differnet softwares are built in order to server this goal. For
exmaple, we need the data base to store and index the user information, the
smart analytics to process the data as needed, and the backend with accositated
security service to support the website.
The main value of the HPC is to server the domain science, the typical projects
includes the scientific simulation, data visulization and analytics and
associated I/O or high speed networking service to transfer the data between
different stages of the workflow.


HOW TO ACCESS IT FROM USER’S PERSPECTIVE

Cloud Computing:

When we disucss the Cloud Computing Service, they are divided into different
layers in particular.

The first layer is called IaaS, namely the infrastructure as a service, the
cloud provider will assign a virtulized computing node to the user in this case.
when you rent some nodes in the IaaS level, these nodes belong to you totally,
and the node might be virtulized node. you have the root permission to your
computing nodes, and you can configure it (debug it) as needed. The cloud
provider only provide you the computing/netowrking/storage resource as you
needed. You need to pay more money if you want to get a more powerful node.

The second layer called the PaaS, namely the platform as a service. In this
layer, the user just need to provide the configuration file that describes how
to run their executable files, and the platform is in charge of the resource
scheduling and high avalability of the program.

The next layer is called the SaaS, namely the software as a service, In this
case, the user just need to call the API to interact with the service provided
by the cloud computing provider such as the storage service or data monitor
service.

HPC:

Common HPC contains login node and compute node, if you can get an account on a
particular HPC, you can login to the login node. Every user have a same view
when they access the HPC, theritically speaking, they can use all noeds on this
machine (there are some constrains on the total number of the core time, and
different partition may have different running time limitations). Anyway, it is
just like a membership, when they give you a account to access this machine, you
could share high quality service they provided. large disk space, high speed
node etc. You need to submit jobs to the scheduler queue, the scheduler is
responsible for assigning the node to every user. They do not use the
virtulization techiques, since the computing power is abundant and luxury for
most of the users. You do not have the root permission, if you need to install
particular softwares that requries the root privilage, you may send a ticket to
maintainer for help or consulting any issues regarding the usage. You will be
charged (core time) only when your job is scheduled to run.

For example, if you check the summit node, it is really luxury and powerful
https://docs.olcf.ornl.gov/systems/summit_user_guide.html#summit-user-guide

more parameters on this page
https://www.olcf.ornl.gov/summit/

The core time is related with your project. When the pi start a project, the
core time is usually fixed. When one project finish, you can not access the HPC
resource anymore. That’s why researchers always need to write proposal and apply
for new project. Only by this way, they can get the avalible computing reosurce
and funding from the governmnet.

Different with the 7 time 24 high avalaibilty for the Cloud Computing service,
most HPC have the periodical /ˌpɪriˈɑːdɪkl/ mainataince, you could not access
the HPC system when it is offline, I think it is definately a good reason to
take some rests some times.


SOFTWARE STACKS

Let’s go through some typical softwares used for these two areas quickly

CloudComputing:

IAAS OpenStack, KVM, virtulization
PAAS Kubernates, container, docker
SAAS More versatile customized cloud service. such as labmda service.

There is more flexible charing and scheudling policy. Such as you may get more
resources on-demand when you need it. This is hard to implment on the HPC. For
the HPC, it is plan driven schduling, you plan everything well when you submit a
job. You may need to follow the limitation of the queue scheduler, such as long
waiting time, since the HPC resource is shared by many people at the same time.

The Cloud Computing may need all kinds of data base softwares and big data
service such as Spark, or backend services that provide the RPC call. Java,
python, golang is also popular in this area.

HPC

Slurm,all user access longin node, and share same computation pool,
module, to configure the environment
spack to insall the packges

The simulation may use MPI and RDMA a lot, C/C++ fortran are the mainstream
language in this area.

In summary, the Cloud Computing Service and HPC share the same genes but have
different souls. They are buit based on GPU and CPU but server for differnet
users and projects.


OTHER REQURMENTS

HPC:
powerful computing, user may access tounsands of cores in short period of time

latest HPC computing node is configured with GPU for every node, which is luxury
for comercial company

They will update the system maybe one or two times a month, this is ok for
scientific project which is mostly offline computation.

CloudComputing:

Basiclly, they need to provide 7 times 24 avaliable, High availability is an
important issue for the cloud provider. Even several miniuts offline can cause
the huge loss for the customer. There is a tradeoff. If you work with HPC, you
may not need to be oncall or update your system at the midnight. But if you work
with the cloud computing, you may need to on-call 7 times 24 to process any
unexpected issues.


NEW TREND

The boundy is becoming less clear.

The scientist may also rent the comertial node for their long running services
to satisfy the high quality requriments. The commercial company may also develop
specilised HPC cluster for comertial using. The main cloud provider also set up
the HPC cluster for users.

In summary, they like the parallel or distributed computing in different world.


OTHER REFERENCES

EC2 type
https://aws.amazon.com/ec2/physicalcores/

Google solutions
https://cloud.google.com/compute#section-9

Google products
https://cloud.google.com/products

分享


推荐文章

 * 2020.04.24:  single core with multiprocess
 * 2021.06.21:  Editing-taolu
 * 2022.10.30:  Proposal taolu
 * 2020.09.19:  tech path

上一篇
Wedding Story
下一篇
Editing-taolu

文章目录
 1. 1. Machines and their Maintainer
 2. 2. Serve to whom (Users)
 3. 3. How to access it from user’s perspective
 4. 4. Software stacks
 5. 5. Other Requrments
 6. 6. New trend
 7. 7. other references

© 2014 - 2023 zhe |  主题 Cafe
联系方式 | godenwangzhe@gmail.com
首页 归档 关于