blog.paramg.com Open in urlscan Pro
162.159.153.4 Public Scan

Back to summary

Submitted URL:
https://blog.paramg.com/building-distributed-high-scale-computing-applications-using-orleans-81cf96be03e9
Effective URL:
https://blog.paramg.com/building-distributed-high-scale-computing-applications-using-orleans-81cf96be03e9?gi=13135ad81051
Submission: On December 23 via manual (December 23rd 2022, 2:02:33 pm UTC) from PL — Scanned from DE

Form analysis
0 forms found in the DOM

Text Content

Open in app

Write

Paramesh Gunasekaran
Follow

May 9, 2021

·
12 min read
·

Listen

Save

BUILDING DISTRIBUTED HIGH-SCALE COMPUTING APPLICATIONS USING ORLEANS

This article not only attempts to explain the complexities in developing
distributed application using traditional approach but also provides Actor Model
as an alternate approach, introducing a Microsoft implementation known as
Orleans and backs with a working prototype using Orleans framework. The full
source code of this working prototype is available at
go.paramg.com/orleans/prototype with the presentation slides at
go.paramg.com/orleans/presentation.

INTRODUCTION

In today’s world, systems and applications are expected to serve with very
high-throughput, respond in real-time, always available 24/7 without any
downtime or reduced capacity, automatically scale up and down depending upon
real-time customer volume. These attributes are no longer reserved for only
high-end enterprise-level or research-oriented systems but also for every simple
customer-facing day-to-day regular application. Indeed each one of these
attributes pose an independent and complex problem landscape that has been under
academic research for several years with multiple possible solutions depending
on the use case.

> High-throughput, real-time response, always-available and auto-scaling are the
> normal

Unfortunately, every application developer is forced to solve and implement
solutions for these problems along with their actual business solutions. This
results in the developer implementing these solutions in varying degrees of
completeness and repeating it over for every application or business domain as
there is no standardized way of solving these problems and reusing for
developers. Orleans is an attempt from Microsoft to solve these problems using a
well-known pattern called Actor-Model and abstract away the complexity of
solving these core problems from developers and making it available for reuse
for any application or business domains.

TERMINOLOGIES

Before we dwell into Orleans, the differences in its approach from traditional
one, how it works, its benefits and limitations, where and when to use and when
not to use it, we should take a look at a few terminologies to set a common
vocabulary and concepts around the same.

1. Clustering is a technique of linking many computing machines together to act
like a single system
2. Fault Tolerance is how a system handles and recovers from failures, fault
occultation tolerance before service degradation
3. Consistency is ensuring transaction can only bring database from one valid
state to another maintaining database invariant (ACID)
4. Eventual Consistency informally guarantees if no new updates are made to a
given data item, eventually all accesses to that item will return the last
updated value (BASE)
5. Availability is extent to which system is operational, functional and usable
fulfilling business requirements
6. Partition Tolerance is extent to which a cluster continues to function even
if there is a “partition” (communication break) between two nodes (both
nodes are up, but can’t communicate)

TRADITIONAL APPROACH

Systems are traditionally designed using n-tier architectures, say for example,
a front-end application running in the customer premise such as a web browser,
desktop application, or firmware of a hardware device communicates with the
server using any well-known protocol such as HTTP, AQMP or native TCP which
processes requests and responds to the caller or kickoff any next set of
actions. The application server may choose to persist its state in a simple
filesystem or a relational database system. Usually, when a computation
bottleneck occurs in receiving requests, it is customary to add a stateless
service in between the client and the application server and scale out the
stateless service in order to achieve high-throughput by increasing the number
of requests the system can handle. Unfortunately, there is a tipping point when
the bottleneck shifts from the number of client requests that the system can
handle to the number of transactions that the database can handle rendering a
contention on the database.

Traditional N-Tier Architecture

There are ways to solve this bottleneck which typically includes splitting up of
the application server into smaller and more manageable chunks. Patterns such as
micro-services or service-oriented architecture might help in this landscape.
But this poses other problems such as achieving the right granularity of
services, designing the right level of abstraction of these service interfaces,
how these services communicate with each other and even then this approach only
simply pushes the bottleneck issue to another day when a database bottleneck can
occur in one of those services. Patterns such as CQRS (Command Query
Responsibility Segregation) and NoSQL databases might help with some of its
features like sharding or replication but comes with a tradeoff of eventual
consistency which may not be tolerable for all business cases.

ACTOR MODEL

In order to achieve concurrency in the system which in turn achieves higher
throughput, the fundamental idea here is move away from modelling the system
using a n-tier architecture and using a threading model in the application
server, than to use the actor model to describe the application logic as actors
in the system. The actor model is a conceptual model describing the approach for
implementing concurrent computations. An actor encapsulates the unit of
computation along with its own isolated state required for performing the
computation. The actor forms the core building block for performing an operation
in a concurrent manner. The actor model observes the following core principles.

1. An actor is the unit of concurrent computation
2. An actor can spawn other actors in the system
3. An actor interacts with other actors in the system by sending messages
4. An actor stores incoming messages in a mailbox for processing preserving its
order
5. An actor has its own isolated state immutable by other actors or any
external system
6. An actor changes its own internal state by processing incoming messages
7. An actor processes incoming messages in a sequential order

Orleans Grain Communication

I can’t recommend this video enough, a chat with Carl Hewitt, the author of
Actor Model himself.

Chat with Carl Hewitt, author of Actor Model

ORLEANS FRAMEWORK

Orleans is a framework and a Microsoft implementation of Actor Model.
Technically, it is termed as Virtual Actor Model since the framework describes
the actors as virtual. The actors are virtual because they are merely volatile
.NET objects resident in memory. The framework abstracts how actors are created,
managed and communicate with each other thus providing a much cleaner interface
with developers for creating and working with them. Orleans refers to actors as
grains which form the basic building block of an Orleans-based application. An
Orleans grain consists of its identity (represented in C# as GUID or long),
behavior (class methods) and its state (class properties). Orleans framework
provides a grain runtime called silo which is responsible for hosting and
managing grains as well as its communication between them.

Orleans Grain Composition

Orleans allows persistence of grain state using one of its built-in storage
providers such as SQL Server or Azure Table Storage, or by allowing to write a
custom storage provider.

OBSERVING CONSTRAINTS

Orleans archives concurrency safely without using a threading model by observing
the following Actor Model constraints. These constraints are well baked into
Orleans framework so that developers have very minimal chance of breaking them.

1. The execution of code in a grain is single-threaded and asynchronous
2. A grain cannot access other grain properties or state directly
3. Communication between grains is by using messages that are passed between
them
4. Shared state between grains is not allowed
5. Messages are queued for a grain if more than one message arrives for a grain

By observing these constraints, Orleans facilitates developers to build
applications without concurrency or threading (thread-locking) issues such as
race conditions or deadlocks as the execution is always single-threaded.

WHEN TO USE ORLEANS?

Orleans like any other software framework has its sweet spot in terms of when
and where it can be used to reap its maximum benefits. The following are some
good-fit use-case for using the Orleans framework.

1. Significant number (hundreds, millions, billions, and even trillions) of
loosely coupled entities. To put the number in perspective, Orleans can
easily create a grain for every person on Earth in a small cluster, so long
as a subset of that total number is active at any point in time. For
example, user profiles, purchase orders, application/game sessions, stocks.
2. Entities are small enough to be single-threaded. For example, determine if
stock should be purchased based on current price.
3. Workload is interactive. For example, request-response, start or monitor or
complete.
4. More than one server is expected or may be required. Orleans runs on a
cluster which is expanded by adding servers to expand the cluster.
5. Global coordination is not needed or on a smaller scale between a few
entities at a time. Scalability and performance of execution is achieved by
parallelizing and distributing a large number of mostly independent tasks
with no single point of synchronization.

WHEN NOT TO USE ORLEANS?

And like any other software framework, its usage is not omnipotent and can be a
nightmare if used in totally wrong situations. The following are some bad-fit
use-case where Orleans framework should not be considered for use.

1. Memory must be shared between entities because in Orleans each grain
maintains its own states and cannot not be shared.
2. A small number of large entities that may be multi-threaded. A micro-service
may be a better option when supporting complex logic in a single service.
3. Global coordination and/or consistency is needed. Such global coordination
would severely limit performance of an Orleans-based application. Orleans
was built to easily scale to a global scale without the need of in-depth
manual coordination.
4. Operations that run for a long time such as batch jobs, Single Instruction
Multiple Data (SIMD) tasks which depends on the need of the application and
may be a fit for Orleans

PROTOTYPE

The prototype attempts to model a fictional math device that generates random
numbers between a specified minimum and maximum integer range every specified
time interval for the server to compute various mathematical functions such as
prime, factorial, mean, variance and standard deviation. The context diagram
below shows the overall architecture and how each project of the prototype fits
into the landscape of the system.

Prototype Architecture

The full source code for this working prototype can be found at GitHub from the
bookmark go.paramg.com/orleans/prototype

The prototype consists of the following components

1. Device, modeled as a single contained .NET Framework v4.8 WinForms
application that communication with the REST API endpoint sending the
generated random numbers
2. REST API, modeled as a .NET Core WebApi that receives random numbers and the
mathematical function to use, forwards it to the Orleans server to be
processed, receives the response and responds back to the device
3. Orleans Server, modeled as a .NET Framework v4.8 WinForms application that
hosts grains, per function and per device, and computes the required
function to output result
4. Database, a SQL Server database that is used by Orleans framework to store
grain state

DEVICE

The device modeled as .NET framework based WinForms application, allows multiple
instances to be started simulating multiple devices. The device expects the REST
API is present at the localhost and port 5000 and always talks to 127.0.0.1. The
application displays its device identifier, a GUID that is auto-generated every
time a new instance is started and allows to specify the lower and upper limits
of random numbers to be generated, the time interval between each round of
random number, and also the type of function to be used for processing the
generated number on the server.

Device Simulator Process

API

REST API Swagger Document

The REST API, modeled as .NET Core WebApi provides a self-describing Swagger
endpoint and also the following HTTP endpoints.

> GET /devices/{id}

This endpoint returns the status of detail including the status (Not Started or
Running) of the device specified

> GET /devices/{id}/start

This endpoint is called by the device when it is started and provides an
indication to the API so that the API marks the device status to Running.

> GET /devices/{id}/stop

This endpoint is called by the device when it is stopped and provides an
indication to the API so that the API marks the device status to Not Started.

> POST /devices

This endpoint is called periodically by the device to provide the random number
generated and the associated function to be used for processing by the server.
This endpoint returns the processed result of the server so that it can be
displayed on the device.

> DELETE /devices/{id}

This endpoint is called by the device when it is shutdown (process exited) so
that the API removes its references held.

SERVER

The Orleans server modeled as .NET framework based WinForms application allows
to start multiple instances listening on different TCP ports with out-of-the-box
clustering enabled. This facilitates to demonstrate how multiple instances of
Orleans server clusters itself and also the fault-tolerance of the cluster, that
is how an unavailability of one of the servers does not after the entire cluster
and the requests can be still processed by others.

Orleans Server Process

DATABASE

The prototype uses SQL Storage Provider to store Grains state, cluster
membership and reminders in SQL Server database. Scripts to provision the
necessary database with its schema is also provided. When a server gets
provisioned and starts, Orleans adds a record in the membership tables with its
availability status. It is this table that an Orleans client uses to discover
instances of server and its availability status.

Orleans Database Schema

PRODUCTION ARCHITECTURE

The following diagram describes how the prototype can be used as a starting
point and a model application to develop a real-world distributed application
that can be deployed at cloud-scale. To illustrate a production architecture of
a real-world application, we assume a requirement of an alarm management system
to receive temperature readings from millions of field IoT devices and to
trigger an alarm if the average temperature of the device reading is greater
than a predefined threshold over a period of time. The architecture assumes that
the system shall be deployed to AWS Cloud and scaled across the globe using
different data centers hosted on various AWS regions.

Typical Production Architecture with Orleans Server

The requests from a field device are load-balanced at the DNS which resolves to
the nearest data center hosting the alarm management system. The system provides
an API endpoint which may be developed using any general purpose language such
as Java, C#, Python or NodeJS for accepting the temperature reading HTTP
requests, packaged as an application image using any containerization
technologies such as Docker which are hosted and scaled using an container
orchestrator such as AWS Fargate. The choice of AWS Fargate may be due to the
fact that the API endpoint is a simple stateless application and does not
require any complex requirements or interactions with the rest of the system,
and can be scaled virtually infinitely with minimal maintenance overhead. The
API endpoint then transforms and forwards the request to the application server
running as Orleans Grains which are hosted by Orleans Silos that are packaged
and deployed as Kubernetes pods or services. The choice of a AWS managed
Kubernetes (Elastic Kubernetes Service — EKS) cluster may be due to the fact
that the application server (modeled as Orleans Grains) may require complex
scaling and upgrade requirements. The application server may trigger an alarm by
communicating to an external system based on the incoming temperature reading
and number of previous readings. Further, the application server grains may
choose to persist its current state whenever its state changes, to an multi-AZ
cloud-scale DynamoDB database in order to survive any restarts of the
application server (Kubernetes pods or service) or deactivation of the grains by
Orleans Silo. Thus, this architecture facilitates scaling at every tier of the
system, out-of-box clustering of the application server nodes and retention of
application grains in memory in order to provide high-throughput and
availability.

WHAT’S NEXT?

This article introduces the problems in developing distributed applications with
tradeoffs required when following a traditional approach, and introduces how
Orleans might be able to help simplify and abstract the complexity and letting
the developer to focus on the business logic. Though this article provides a
starting point with respect to concepts and a working prototype, there are few
more advanced features and concepts available in Orleans as Microsoft continues
to develop the framework actively. I might choose to update the prototype and
this article with an illustration of one or more advanced features as my time
permits and continue to explore more. Some of the advanced features of Orleans
include:

1. Grain Optimization
2. Stateless Worker
3. Re-entrant
4. Immutable Messages
5. Timers and Reminders
6. Observers
7. Interceptors
8. Transactions
9. Grain Services
10. Streaming

REFERENCES

The following are a few noteworthy references, articles and whitepapers on
Orleans, Actor Model or distributed systems in general.

ORLEANS HOMEPAGE

https://dotnet.github.io/orleans/index.html

MICROSOFT VIRTUAL ACTOR MODEL PUBLICATION

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Orleans-MSR-TR-2014-41.pdf

CONCURRENT PROGRAMMING FOR SCALABLE WEB ARCHITECTURE — BENJAMIN ERB

https://berb.github.io/diploma-thesis/original/054_actors.html

Distributed Computing
Microsoft Orleans
Actor Model
C Sharp Programming
Prototype

blog.paramg.com Open in urlscan Pro 162.159.153.4 Public Scan

Form analysis 0 forms found in the DOM

Text Content

blog.paramg.com Open in urlscan Pro
162.159.153.4 Public Scan

Form analysis
0 forms found in the DOM