blog.paramg.com
Open in
urlscan Pro
162.159.153.4
Public Scan
Submitted URL: https://blog.paramg.com/building-distributed-high-scale-computing-applications-using-orleans-81cf96be03e9
Effective URL: https://blog.paramg.com/building-distributed-high-scale-computing-applications-using-orleans-81cf96be03e9?gi=13135ad81051
Submission: On December 23 via manual from PL — Scanned from DE
Effective URL: https://blog.paramg.com/building-distributed-high-scale-computing-applications-using-orleans-81cf96be03e9?gi=13135ad81051
Submission: On December 23 via manual from PL — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Open in app Sign up Sign In Write Sign up Sign In Paramesh Gunasekaran Follow May 9, 2021 · 12 min read · Listen Save BUILDING DISTRIBUTED HIGH-SCALE COMPUTING APPLICATIONS USING ORLEANS This article not only attempts to explain the complexities in developing distributed application using traditional approach but also provides Actor Model as an alternate approach, introducing a Microsoft implementation known as Orleans and backs with a working prototype using Orleans framework. The full source code of this working prototype is available at go.paramg.com/orleans/prototype with the presentation slides at go.paramg.com/orleans/presentation. INTRODUCTION In today’s world, systems and applications are expected to serve with very high-throughput, respond in real-time, always available 24/7 without any downtime or reduced capacity, automatically scale up and down depending upon real-time customer volume. These attributes are no longer reserved for only high-end enterprise-level or research-oriented systems but also for every simple customer-facing day-to-day regular application. Indeed each one of these attributes pose an independent and complex problem landscape that has been under academic research for several years with multiple possible solutions depending on the use case. > High-throughput, real-time response, always-available and auto-scaling are the > normal Unfortunately, every application developer is forced to solve and implement solutions for these problems along with their actual business solutions. This results in the developer implementing these solutions in varying degrees of completeness and repeating it over for every application or business domain as there is no standardized way of solving these problems and reusing for developers. Orleans is an attempt from Microsoft to solve these problems using a well-known pattern called Actor-Model and abstract away the complexity of solving these core problems from developers and making it available for reuse for any application or business domains. TERMINOLOGIES Before we dwell into Orleans, the differences in its approach from traditional one, how it works, its benefits and limitations, where and when to use and when not to use it, we should take a look at a few terminologies to set a common vocabulary and concepts around the same. 1. Clustering is a technique of linking many computing machines together to act like a single system 2. Fault Tolerance is how a system handles and recovers from failures, fault occultation tolerance before service degradation 3. Consistency is ensuring transaction can only bring database from one valid state to another maintaining database invariant (ACID) 4. Eventual Consistency informally guarantees if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value (BASE) 5. Availability is extent to which system is operational, functional and usable fulfilling business requirements 6. Partition Tolerance is extent to which a cluster continues to function even if there is a “partition” (communication break) between two nodes (both nodes are up, but can’t communicate) TRADITIONAL APPROACH Systems are traditionally designed using n-tier architectures, say for example, a front-end application running in the customer premise such as a web browser, desktop application, or firmware of a hardware device communicates with the server using any well-known protocol such as HTTP, AQMP or native TCP which processes requests and responds to the caller or kickoff any next set of actions. The application server may choose to persist its state in a simple filesystem or a relational database system. Usually, when a computation bottleneck occurs in receiving requests, it is customary to add a stateless service in between the client and the application server and scale out the stateless service in order to achieve high-throughput by increasing the number of requests the system can handle. Unfortunately, there is a tipping point when the bottleneck shifts from the number of client requests that the system can handle to the number of transactions that the database can handle rendering a contention on the database. Traditional N-Tier Architecture There are ways to solve this bottleneck which typically includes splitting up of the application server into smaller and more manageable chunks. Patterns such as micro-services or service-oriented architecture might help in this landscape. But this poses other problems such as achieving the right granularity of services, designing the right level of abstraction of these service interfaces, how these services communicate with each other and even then this approach only simply pushes the bottleneck issue to another day when a database bottleneck can occur in one of those services. Patterns such as CQRS (Command Query Responsibility Segregation) and NoSQL databases might help with some of its features like sharding or replication but comes with a tradeoff of eventual consistency which may not be tolerable for all business cases. ACTOR MODEL In order to achieve concurrency in the system which in turn achieves higher throughput, the fundamental idea here is move away from modelling the system using a n-tier architecture and using a threading model in the application server, than to use the actor model to describe the application logic as actors in the system. The actor model is a conceptual model describing the approach for implementing concurrent computations. An actor encapsulates the unit of computation along with its own isolated state required for performing the computation. The actor forms the core building block for performing an operation in a concurrent manner. The actor model observes the following core principles. 1. An actor is the unit of concurrent computation 2. An actor can spawn other actors in the system 3. An actor interacts with other actors in the system by sending messages 4. An actor stores incoming messages in a mailbox for processing preserving its order 5. An actor has its own isolated state immutable by other actors or any external system 6. An actor changes its own internal state by processing incoming messages 7. An actor processes incoming messages in a sequential order Orleans Grain Communication I can’t recommend this video enough, a chat with Carl Hewitt, the author of Actor Model himself. Chat with Carl Hewitt, author of Actor Model ORLEANS FRAMEWORK Orleans is a framework and a Microsoft implementation of Actor Model. Technically, it is termed as Virtual Actor Model since the framework describes the actors as virtual. The actors are virtual because they are merely volatile .NET objects resident in memory. The framework abstracts how actors are created, managed and communicate with each other thus providing a much cleaner interface with developers for creating and working with them. Orleans refers to actors as grains which form the basic building block of an Orleans-based application. An Orleans grain consists of its identity (represented in C# as GUID or long), behavior (class methods) and its state (class properties). Orleans framework provides a grain runtime called silo which is responsible for hosting and managing grains as well as its communication between them. Orleans Grain Composition Orleans allows persistence of grain state using one of its built-in storage providers such as SQL Server or Azure Table Storage, or by allowing to write a custom storage provider. OBSERVING CONSTRAINTS Orleans archives concurrency safely without using a threading model by observing the following Actor Model constraints. These constraints are well baked into Orleans framework so that developers have very minimal chance of breaking them. 1. The execution of code in a grain is single-threaded and asynchronous 2. A grain cannot access other grain properties or state directly 3. Communication between grains is by using messages that are passed between them 4. Shared state between grains is not allowed 5. Messages are queued for a grain if more than one message arrives for a grain By observing these constraints, Orleans facilitates developers to build applications without concurrency or threading (thread-locking) issues such as race conditions or deadlocks as the execution is always single-threaded. WHEN TO USE ORLEANS? Orleans like any other software framework has its sweet spot in terms of when and where it can be used to reap its maximum benefits. The following are some good-fit use-case for using the Orleans framework. 1. Significant number (hundreds, millions, billions, and even trillions) of loosely coupled entities. To put the number in perspective, Orleans can easily create a grain for every person on Earth in a small cluster, so long as a subset of that total number is active at any point in time. For example, user profiles, purchase orders, application/game sessions, stocks. 2. Entities are small enough to be single-threaded. For example, determine if stock should be purchased based on current price. 3. Workload is interactive. For example, request-response, start or monitor or complete. 4. More than one server is expected or may be required. Orleans runs on a cluster which is expanded by adding servers to expand the cluster. 5. Global coordination is not needed or on a smaller scale between a few entities at a time. Scalability and performance of execution is achieved by parallelizing and distributing a large number of mostly independent tasks with no single point of synchronization. WHEN NOT TO USE ORLEANS? And like any other software framework, its usage is not omnipotent and can be a nightmare if used in totally wrong situations. The following are some bad-fit use-case where Orleans framework should not be considered for use. 1. Memory must be shared between entities because in Orleans each grain maintains its own states and cannot not be shared. 2. A small number of large entities that may be multi-threaded. A micro-service may be a better option when supporting complex logic in a single service. 3. Global coordination and/or consistency is needed. Such global coordination would severely limit performance of an Orleans-based application. Orleans was built to easily scale to a global scale without the need of in-depth manual coordination. 4. Operations that run for a long time such as batch jobs, Single Instruction Multiple Data (SIMD) tasks which depends on the need of the application and may be a fit for Orleans PROTOTYPE The prototype attempts to model a fictional math device that generates random numbers between a specified minimum and maximum integer range every specified time interval for the server to compute various mathematical functions such as prime, factorial, mean, variance and standard deviation. The context diagram below shows the overall architecture and how each project of the prototype fits into the landscape of the system. Prototype Architecture The full source code for this working prototype can be found at GitHub from the bookmark go.paramg.com/orleans/prototype The prototype consists of the following components 1. Device, modeled as a single contained .NET Framework v4.8 WinForms application that communication with the REST API endpoint sending the generated random numbers 2. REST API, modeled as a .NET Core WebApi that receives random numbers and the mathematical function to use, forwards it to the Orleans server to be processed, receives the response and responds back to the device 3. Orleans Server, modeled as a .NET Framework v4.8 WinForms application that hosts grains, per function and per device, and computes the required function to output result 4. Database, a SQL Server database that is used by Orleans framework to store grain state DEVICE The device modeled as .NET framework based WinForms application, allows multiple instances to be started simulating multiple devices. The device expects the REST API is present at the localhost and port 5000 and always talks to 127.0.0.1. The application displays its device identifier, a GUID that is auto-generated every time a new instance is started and allows to specify the lower and upper limits of random numbers to be generated, the time interval between each round of random number, and also the type of function to be used for processing the generated number on the server. Device Simulator Process API REST API Swagger Document The REST API, modeled as .NET Core WebApi provides a self-describing Swagger endpoint and also the following HTTP endpoints. > GET /devices/{id} This endpoint returns the status of detail including the status (Not Started or Running) of the device specified > GET /devices/{id}/start This endpoint is called by the device when it is started and provides an indication to the API so that the API marks the device status to Running. > GET /devices/{id}/stop This endpoint is called by the device when it is stopped and provides an indication to the API so that the API marks the device status to Not Started. > POST /devices This endpoint is called periodically by the device to provide the random number generated and the associated function to be used for processing by the server. This endpoint returns the processed result of the server so that it can be displayed on the device. > DELETE /devices/{id} This endpoint is called by the device when it is shutdown (process exited) so that the API removes its references held. SERVER The Orleans server modeled as .NET framework based WinForms application allows to start multiple instances listening on different TCP ports with out-of-the-box clustering enabled. This facilitates to demonstrate how multiple instances of Orleans server clusters itself and also the fault-tolerance of the cluster, that is how an unavailability of one of the servers does not after the entire cluster and the requests can be still processed by others. Orleans Server Process DATABASE The prototype uses SQL Storage Provider to store Grains state, cluster membership and reminders in SQL Server database. Scripts to provision the necessary database with its schema is also provided. When a server gets provisioned and starts, Orleans adds a record in the membership tables with its availability status. It is this table that an Orleans client uses to discover instances of server and its availability status. Orleans Database Schema PRODUCTION ARCHITECTURE The following diagram describes how the prototype can be used as a starting point and a model application to develop a real-world distributed application that can be deployed at cloud-scale. To illustrate a production architecture of a real-world application, we assume a requirement of an alarm management system to receive temperature readings from millions of field IoT devices and to trigger an alarm if the average temperature of the device reading is greater than a predefined threshold over a period of time. The architecture assumes that the system shall be deployed to AWS Cloud and scaled across the globe using different data centers hosted on various AWS regions. Typical Production Architecture with Orleans Server The requests from a field device are load-balanced at the DNS which resolves to the nearest data center hosting the alarm management system. The system provides an API endpoint which may be developed using any general purpose language such as Java, C#, Python or NodeJS for accepting the temperature reading HTTP requests, packaged as an application image using any containerization technologies such as Docker which are hosted and scaled using an container orchestrator such as AWS Fargate. The choice of AWS Fargate may be due to the fact that the API endpoint is a simple stateless application and does not require any complex requirements or interactions with the rest of the system, and can be scaled virtually infinitely with minimal maintenance overhead. The API endpoint then transforms and forwards the request to the application server running as Orleans Grains which are hosted by Orleans Silos that are packaged and deployed as Kubernetes pods or services. The choice of a AWS managed Kubernetes (Elastic Kubernetes Service — EKS) cluster may be due to the fact that the application server (modeled as Orleans Grains) may require complex scaling and upgrade requirements. The application server may trigger an alarm by communicating to an external system based on the incoming temperature reading and number of previous readings. Further, the application server grains may choose to persist its current state whenever its state changes, to an multi-AZ cloud-scale DynamoDB database in order to survive any restarts of the application server (Kubernetes pods or service) or deactivation of the grains by Orleans Silo. Thus, this architecture facilitates scaling at every tier of the system, out-of-box clustering of the application server nodes and retention of application grains in memory in order to provide high-throughput and availability. WHAT’S NEXT? This article introduces the problems in developing distributed applications with tradeoffs required when following a traditional approach, and introduces how Orleans might be able to help simplify and abstract the complexity and letting the developer to focus on the business logic. Though this article provides a starting point with respect to concepts and a working prototype, there are few more advanced features and concepts available in Orleans as Microsoft continues to develop the framework actively. I might choose to update the prototype and this article with an illustration of one or more advanced features as my time permits and continue to explore more. Some of the advanced features of Orleans include: 1. Grain Optimization 2. Stateless Worker 3. Re-entrant 4. Immutable Messages 5. Timers and Reminders 6. Observers 7. Interceptors 8. Transactions 9. Grain Services 10. Streaming REFERENCES The following are a few noteworthy references, articles and whitepapers on Orleans, Actor Model or distributed systems in general. ORLEANS HOMEPAGE https://dotnet.github.io/orleans/index.html MICROSOFT VIRTUAL ACTOR MODEL PUBLICATION https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Orleans-MSR-TR-2014-41.pdf CONCURRENT PROGRAMMING FOR SCALABLE WEB ARCHITECTURE — BENJAMIN ERB https://berb.github.io/diploma-thesis/original/054_actors.html Distributed Computing Microsoft Orleans Actor Model C Sharp Programming Prototype 53 53 53 MORE FROM PARAMESH GUNASEKARAN Follow Software Architect, Security Evangelist — Microsoft .NET | Cloud and DevOps Oct 3, 2018 IMPLEMENTING SERVICE MESH USING ISTIO ON KUBERNETES Introduction About few months ago, while I was working on designing infrastructure platform and developing ASP.NET Core based microservices application, I was introduced to the world of service mesh and Istio by a coworker. Since then, I was intrigued by service mesh as an architectural concept and fascinated how Istio enables… Kubernetes 5 min read Kubernetes 5 min read -------------------------------------------------------------------------------- Love podcasts or audiobooks? Learn on the go with our new app. Try Knowable RECOMMENDED FROM MEDIUM Elecard Company ELECARD WEBINAR “QOS QOE MONITORING STRATEGIES FOR SPORTS/ESPORTS STREAMING” MeasureMatch MEASUREMATCH CLIENT ACCOUNT TERMS OF SERVICE. THEY’RE IMPORTANT. HERE’S WHY. (PART I OF III) Meruja Selvamanikkam UPGRADING YOUR ENTERPRISE INTEGRATOR: AN OVERVIEW Hasan Basri Bayat in ITNEXT ALL FLUTTER LOCAL DATABASES IN ONE ARTICLE Tom Herold in scalableminds NEW ANNOTATION AND USABILITY FEATURES IN WEBKNOSSOS Hashir Hassan in codeburst MASTER THE USER AUTHENTICATION IN DJANGO — ALLAUTH Damien Joyce LOOK AT YOUR DATA, DEVOPS IS NOT JUST MEASUREMENT BUT KNOWING WHAT TO MEASURE. Ukanah Dean UNDEFINED REFERENCE TO `MAIN’. C COMPILE ERROR AboutHelpTermsPrivacy -------------------------------------------------------------------------------- GET THE MEDIUM APP PARAMESH GUNASEKARAN 32 Followers Software Architect, Security Evangelist — Microsoft .NET | Cloud and DevOps Follow MORE FROM MEDIUM Anton Stöckl in ITNEXT ESSENTIAL FEATURES OF AN EVENT STORE FOR EVENT SOURCING Sunil Kumar in JavaScript in Plain English MY SALARY INCREASED 13 TIMES IN 5 YEARS — HERE IS HOW I DID IT Dineshchandgr WHY DO WE NEED A DATABASE CONNECTION POOL? -EVERY PROGRAMMER MUST KNOW Ugur Yagmur in CodeX 5 HARSH TRUTHS FROM LINUS TORVALDS Help Status Writers Blog Careers Privacy Terms About Text to speech To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.