medium.com
Open in
urlscan Pro
2606:4700:7::a29f:9804
Public Scan
Submitted URL: https://multicloud.dev/
Effective URL: https://medium.com/homeaway-tech-blog/supporting-multi-region-deployments-in-the-hybrid-cloud-683559fb42c0
Submission Tags: falconsandbox
Submission: On December 10 via api from US — Scanned from DE
Effective URL: https://medium.com/homeaway-tech-blog/supporting-multi-region-deployments-in-the-hybrid-cloud-683559fb42c0
Submission Tags: falconsandbox
Submission: On December 10 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Open in app Sign up Sign in Write Sign up Sign in SUPPORTING MULTI-REGION DEPLOYMENTS IN THE HYBRID CLOUD RELEASING MICROSERVICES EFFICIENTLY AND RELIABLY AT SCALE Jacob Patterson · Follow Published in HomeAway Tech Blog · 8 min read · Nov 9, 2018 53 Listen Share Here at HomeAway, we strive to provide a highly available hybrid cloud platform to ease the operations burden for product-focused developers. The platform currently supports three distinct runtime environments (test, stage, production), each containing a set of physically isolated data centers defined as regions. While maintaining an ecosystem with such a large amount of physical and logical isolation enables important features such as high availability deployments and geo-aware routing, it is difficult to interact with such a distributed set of services. As a quick end-user example, consider how one might handle deploying an active-active set-up across six availability zones within two regions. Each scheduler is logically restricted to its resource pool within the availability zone, and so six calls to the six unique container schedulers are required to deploy the requested application. Doing this manually for every new versioned release of a deployment would quickly lose viability, especially for development teams that release updates multiple times per day. Additionally, having each development team roll their own automation around this kind of management rapidly reduces the stability of the platform altogether. In order to centralize some of the deployment-level management inherent to a cloud platform, we created the Ministry of Truth. University of London’s Senate House, architectural inspiration for Orwell’s 1984 Ministry of Truth. The Ministry of Truth (MoT) consists of a collection of microservices running in every region responsible for distributing and consolidating events pertinent to deployments and container orchestration within a hybrid-cloud platform. I’ll provide insight into the rules of the game and how we accomplish this task in a production-isolated infrastructure. PROBLEM SPACE Given a central datastore, a central API, and several collections of microservices, provide conventions so that messages may be relayed between all three parts in an eventually consistent manner. This post will go into how MoT forwards user requests to regional agents, sends messages between microservices, and pushes the data to be stored through a persistence flow. We seek to avoid implementation specifics or application details, but may do so for the sake of providing a comprehensive example. AGENTS In general, we strive to build our agents to perform one simple, lightweight action, triggered by an event from a data source and potentially publishing the result to a corresponding data sink. For example, we gather data from Consul, our platform’s service-discovery component. In order to accomplish a part of the data gathering, the InstanceConsulStateAgent does the following: 1. Read in ConsulAppState from the ConsulStateAgent 2. Convert the ConsulAppState data into a collection of InstanceInfo records 3. Compare each InstanceInfo record with the previous record by key 4. Publish the latest record downstream on creation, update, or deletion by key Partial view of MoT consul flow In short, this lets us know if the Consul service data has changed, separating the filter from the downstream persistence components. We have similar flows for some of the other primary platform services, Marathon and Mesos. The MoT microservices are collections of semantically similar agents bundled into Dropwizard apps and grouped with other utilities or models shared among the agents. Some examples include: * mot-deployment-agents: responsible for sending deployment requests to the scheduler, grouping data from other sources into deployment state, mending deployments, etc. * mot-consul-agents: responsible for collecting data from Consul, enabling traffic to route to services in the catalog, etc. The microservices themselves can be bundled as well. Let’s take a look at a high level overview of the three major MoT layers. A MEAL IN 3 COURSES You can think of MoT as having three primary components with special communication channels between them: 1. The centralized API / service layer that handles all database reads and some writes in addition to any API requests from users. 2. The regional component specific to the datacenter. These are the local microservices that handle implementation details. In the case of MoT, this is primarily multi-region container orchestration and multi-source data aggregation. 3. The persistence layer consisting of archival agents that store any data supported by the API endpoints surfaced via the API. A three part view of MoT Some flows such as configuration updates only ever reside in the service layer, as there is no need to interact with the regional services or deployments. Others, such as enabling traffic for an existing deployment heavily involve all 3 components of MoT. KAFKA ETIQUETTE In order to transfer data between the channels, we use Kafka topics as the primary event bus. Let’s take a brief glimpse at the rules of thumb, and then perform a deep dive into the flow of a deployment request to get an example of how it all integrates. In a multi-region architecture with a centralized API, we must forward information across regions. We opted to incur the cross region penalty when consuming from topics in a remote cluster. More specifically, we follow the three rules below. NEVER PRODUCE TO A TOPIC OUTSIDE OF YOUR DATACENTER When an event has been processed by an agent, we want to push the result record to a topic as quickly as possible, and move on to the next record. This rule keeps us from making unnecessary connections to foreign regions and adding latency between each processed event. CONSUME FROM THE LOCAL KAFKA CLUSTER WHENEVER POSSIBLE There are many topics within the MoT architecture that service communication between microservices or agents in the regional Kafka cluster. If the data stream doesn’t require outside information from other regions, stay local. PUT AN IDENTIFYING SUFFIX ON TOPICS TO BE CONSUMED FROM A FOREIGN DATACENTER Some messages must be propagated to the other regions. In this particular instance, we append a suffix of- -<appEnvironment>-<region> TYING IT ALL TOGETHER WITH AN EXAMPLE: Let’s take a look at how creating a new deployment in test-us-east-1 takes place throughout the system. Requesting a deployment: 1. A user sends a POST to mot.homeawaycorp.com/deployments to deploy an app to the test-us-east-1 region. 2. The MoT service layer will validate the structure of the deployment request, producing it to the region-specific topic mot-deployment-launch-events-test-us-east-1 in the production-us-east-1 Kafka cluster. 3. The central→regional mirrormakers will consume the region-specific topics from the production-us-east-1 Kafka cluster then produce records to the stage-us-east-1 Kafka cluster. 4. The MoT Regional Layer deployed in test-us-east-1 will consume records from the mot-deployment-launch-events-test-us-east-1 topic in the stage-us-east-1 Kafka cluster, perform some business logic, then produce to the mot-deployment-complete-events topic in the test-us-east-1 Kafka cluster. As a short aside, let us clarify the reasoning for including the stage-us-east-1 Kafka cluster at all. We elected to mirror topics from the central production cluster to a pseudo-central cluster in the non-production environments. This limits the number of firewall exceptions that violate the production | non-production boundary. Furthermore, using mirrormakers at all violates our first rule of Kafka etiquette, as we must mirror the source record to a foreign region’s cluster. Breaking this rule from a central cluster to the non-prod central cluster limits the number of foreign regions for which we break Kafka etiquette. Great! So now we’ve successfully taken a launch request from a user hitting an API in production-us-east-1 and piped it to a launch request against the regional scheduler in test-us-east-1. The record produced to the mot-deployment-complete-events topic marks the end of the deployment request flow as triggered by the user’s request. However, the story does not stop there. If we were to take a look at the deployment’s dashboard for the app, there would not be much to see. The only data persisted so far was a metadata shell storing some information pulled from the initial request in the DeploymentOperationAgent’s business logic. Let’s take a look at the persistence of instance data and deployment state based on data collected from the regional systems and services. Persisting deployment state to the datastore- 1. Marathon, Mesos, and Consul are continually polled for data about the state of an instance. For example, this state data might include the runtime host and port of an AppInstance, the most recent HealthCheckResult blob or the set of Consul tags associated with the service. The MoT regional layer aggregates and transforms these data streams into MoT models meant to be persisted. The records to be persisted on produced to specific topics, for example: mot-deployment-state-change in the test-us-east-1 Kafka cluster. 2. Preconfigured regional→central mirrormakers on the same hosts as the central→regional mirrormakers mirror the topics from the regional Kafka clusters to the production-us-east-1 Kafka cluster. The destination topic has a suffix of the .<appenv>-<region> and thus is named mot-deployment-state-change.test-us-east-1 in the production-us-east-1 Kafka cluster. 3. The MoT persistence layer has a MultiRegionAgentFactory pattern that spins up an ArchivalAgent for each region in MultiPaaS. For our example, we will have a DeploymentStateArchivalAgent with a target region of test-us-east-1. The DeploymentStateArchivalAgent consumes from the mot-deployment-state-change.test-us-east-1 topic in the production-us-east-1 Kafka cluster and persists the data to the Cassandra cluster in the production-us-east-1 region. Awesome! Now if a user were to do a GET on the supported deployment API endpoints, they would expect to see the aggregated result of any instance and deployment state collected and persisted by the above flow. Now we’ve seen patterns of both dispersing information from our central region to our regional services and collecting it from the regional services back into the central region. THE DESIGN IS SIMPLER IN PRODUCTION Now that we’ve looked at how MoT propagates data over the production | non-production boundary, we can take a look at the simpler flow of requests between two production regions. Production-to-production communication no longer necessitates the use of mirrormakers. Each time a record needs to be pulled between Kafka clusters, the destination region’s consumer can poll the source cluster and localize the data for further processing. The data persistence flow between production regions is likewise simplified: CONCLUSIONS I hope this blog has served you well. In decoupling the persistence, service, and regional layers, we’ve allowed for reduced blast radii in outage scenarios and minimized the amount of data we must send between regions. Best of luck with your journeys in the hybrid cloud! SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD. FREE Distraction-free reading. No ads. Organize your knowledge with lists and highlights. Tell your story. Find your audience. Sign up for free MEMBERSHIP Read member-only stories Support writers you read most Earn money for your writing Listen to audio narrations Read offline with the Medium app Try for 5 $/month Tech Container Orchestration Cloud Services 53 53 Follow PUBLISHED IN HOMEAWAY TECH BLOG 370 Followers ·Last published Jul 3, 2019 Software and data science revolutionizing vacations Follow Follow WRITTEN BY JACOB PATTERSON 4 Followers ·1 Following Follow MORE FROM JACOB PATTERSON AND HOMEAWAY TECH BLOG In HomeAway Tech Blog by Himanshu Verma SIMPSON’S PARADOX IN A/B TESTING RESULTS OF AN A/B TEST ON BROWSER VARIANTS DON’T SEEM TO ADD UP Sep 20, 2018 68 3 In HomeAway Tech Blog by Adam Haines THE STRANGLER PATTERN IN PRACTICE FIVE MILESTONES ON THE ROAD TO THE CLOUD Mar 5, 2019 118 3 In HomeAway Tech Blog by Tim Renner WRITE BETTER PYTHON WITH HYPOTHESIS WHEN TESTING CODE, WE AT HOMEAWAY ALWAYS START WITH UNIT TESTS. FOR A PURE FUNCTION, WE SPECIFY THE INPUTS, WHAT WE KNOW THE OUTPUT SHOULD… Sep 24, 2018 296 1 In HomeAway Tech Blog by Eduardo Solis TESTING DROPWIZARD EMBEDDED WITH JUNIT5 WITH MINIMAL CHANGES TO YOUR UNIT TEST CODE YOU CAN START REAPING THE BENEFITS OF JUNIT5 FOR YOUR DROPWIZARD BASED APPLICATIONS! Nov 5, 2018 13 See all from Jacob Patterson See all from HomeAway Tech Blog RECOMMENDED FROM MEDIUM Jessica Stillman JEFF BEZOS SAYS THE 1-HOUR RULE MAKES HIM SMARTER. NEW NEUROSCIENCE SAYS HE’S RIGHT JEFF BEZOS’S MORNING ROUTINE HAS LONG INCLUDED THE ONE-HOUR RULE. NEW NEUROSCIENCE SAYS YOURS PROBABLY SHOULD TOO. Oct 30 14.9K 358 Mark Manson 40 LIFE LESSONS I KNOW AT 40 (THAT I WISH I KNEW AT 20) TODAY IS MY 40TH BIRTHDAY. Sep 23 30K 661 LISTS APPLE'S VISION PRO 7 stories·81 saves BUSINESS 101 25 stories·1305 saves FIGMA 101 7 stories·796 saves STORIES TO HELP YOU GROW AS A SOFTWARE DEVELOPER 19 stories·1512 saves In Stackademic by Abdur Rahman PYTHON IS NO MORE THE KING OF DATA SCIENCE 5 REASONS WHY PYTHON IS LOSING ITS CROWN Oct 23 9.3K 35 Harendra HOW I AM USING A LIFETIME 100% FREE SERVER GET A SERVER WITH 24 GB RAM + 4 CPU + 200 GB STORAGE + ALWAYS FREE Oct 26 6.9K 101 F. Perry Wilson, MD MSCE HOW OLD IS YOUR BODY? STAND ON ONE LEG AND FIND OUT ACCORDING TO NEW RESEARCH, THE TIME YOU CAN STAND ON ONE LEG IS THE BEST MARKER OF PHYSICAL AGING. Oct 23 20K 484 Alexander Nguyen I WROTE ON LINKEDIN FOR 100 DAYS. NOW I NEVER WORRY ABOUT FINDING A JOB. EVERYONE IS HIRING. Sep 21 37K 722 See more recommendations Help Status About Careers Press Blog Privacy Terms Text to speech Teams To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.