medium.com Open in urlscan Pro
2606:4700:7::a29f:9804 Public Scan

Back to summary
Submitted URL:
https://multicloud.dev/
Effective URL:
https://medium.com/homeaway-tech-blog/supporting-multi-region-deployments-in-the-hybrid-cloud-683559fb42c0
Submission Tags: falconsandbox
Submission: On December 10 via api (December 10th 2024, 5:13:20 am UTC) from US — Scanned from DE
Form analysis
0 forms found in the DOM

Text Content

Open in app

Sign up

Sign in

Write


Sign up

Sign in




SUPPORTING MULTI-REGION DEPLOYMENTS IN THE HYBRID CLOUD


RELEASING MICROSERVICES EFFICIENTLY AND RELIABLY AT SCALE

Jacob Patterson

·

Follow

Published in

HomeAway Tech Blog

·
8 min read
·
Nov 9, 2018

53



Listen

Share

Here at HomeAway, we strive to provide a highly available hybrid cloud platform
to ease the operations burden for product-focused developers. The platform
currently supports three distinct runtime environments (test, stage,
production), each containing a set of physically isolated data centers defined
as regions. While maintaining an ecosystem with such a large amount of physical
and logical isolation enables important features such as high availability
deployments and geo-aware routing, it is difficult to interact with such a
distributed set of services.

As a quick end-user example, consider how one might handle deploying an
active-active set-up across six availability zones within two regions. Each
scheduler is logically restricted to its resource pool within the availability
zone, and so six calls to the six unique container schedulers are required to
deploy the requested application. Doing this manually for every new versioned
release of a deployment would quickly lose viability, especially for development
teams that release updates multiple times per day. Additionally, having each
development team roll their own automation around this kind of management
rapidly reduces the stability of the platform altogether.

In order to centralize some of the deployment-level management inherent to a
cloud platform, we created the Ministry of Truth.


University of London’s Senate House, architectural inspiration for Orwell’s 1984
Ministry of Truth.

The Ministry of Truth (MoT) consists of a collection of microservices running in
every region responsible for distributing and consolidating events pertinent to
deployments and container orchestration within a hybrid-cloud platform. I’ll
provide insight into the rules of the game and how we accomplish this task in a
production-isolated infrastructure.


PROBLEM SPACE

Given a central datastore, a central API, and several collections of
microservices, provide conventions so that messages may be relayed between all
three parts in an eventually consistent manner. This post will go into how MoT
forwards user requests to regional agents, sends messages between microservices,
and pushes the data to be stored through a persistence flow. We seek to avoid
implementation specifics or application details, but may do so for the sake of
providing a comprehensive example.


AGENTS

In general, we strive to build our agents to perform one simple, lightweight
action, triggered by an event from a data source and potentially publishing the
result to a corresponding data sink. For example, we gather data from Consul,
our platform’s service-discovery component. In order to accomplish a part of the
data gathering, the InstanceConsulStateAgent does the following:

 1. Read in ConsulAppState from the ConsulStateAgent
 2. Convert the ConsulAppState data into a collection of InstanceInfo records
 3. Compare each InstanceInfo record with the previous record by key
 4. Publish the latest record downstream on creation, update, or deletion by key


Partial view of MoT consul flow

In short, this lets us know if the Consul service data has changed, separating
the filter from the downstream persistence components. We have similar flows for
some of the other primary platform services, Marathon and Mesos. The MoT
microservices are collections of semantically similar agents bundled into
Dropwizard apps and grouped with other utilities or models shared among the
agents. Some examples include:

 * mot-deployment-agents: responsible for sending deployment requests to the
   scheduler, grouping data from other sources into deployment state, mending
   deployments, etc.
 * mot-consul-agents: responsible for collecting data from Consul, enabling
   traffic to route to services in the catalog, etc.

The microservices themselves can be bundled as well. Let’s take a look at a high
level overview of the three major MoT layers.


A MEAL IN 3 COURSES

You can think of MoT as having three primary components with special
communication channels between them:

 1. The centralized API / service layer that handles all database reads and some
    writes in addition to any API requests from users.
 2. The regional component specific to the datacenter. These are the local
    microservices that handle implementation details. In the case of MoT, this
    is primarily multi-region container orchestration and multi-source data
    aggregation.
 3. The persistence layer consisting of archival agents that store any data
    supported by the API endpoints surfaced via the API.


A three part view of MoT

Some flows such as configuration updates only ever reside in the service layer,
as there is no need to interact with the regional services or deployments.
Others, such as enabling traffic for an existing deployment heavily involve all
3 components of MoT.


KAFKA ETIQUETTE

In order to transfer data between the channels, we use Kafka topics as the
primary event bus. Let’s take a brief glimpse at the rules of thumb, and then
perform a deep dive into the flow of a deployment request to get an example of
how it all integrates.

In a multi-region architecture with a centralized API, we must forward
information across regions. We opted to incur the cross region penalty when
consuming from topics in a remote cluster. More specifically, we follow the
three rules below.


NEVER PRODUCE TO A TOPIC OUTSIDE OF YOUR DATACENTER

When an event has been processed by an agent, we want to push the result record
to a topic as quickly as possible, and move on to the next record. This rule
keeps us from making unnecessary connections to foreign regions and adding
latency between each processed event.


CONSUME FROM THE LOCAL KAFKA CLUSTER WHENEVER POSSIBLE

There are many topics within the MoT architecture that service communication
between microservices or agents in the regional Kafka cluster. If the data
stream doesn’t require outside information from other regions, stay local.


PUT AN IDENTIFYING SUFFIX ON TOPICS TO BE CONSUMED FROM A FOREIGN DATACENTER

Some messages must be propagated to the other regions. In this particular
instance, we append a suffix of-

-<appEnvironment>-<region>


TYING IT ALL TOGETHER WITH AN EXAMPLE:

Let’s take a look at how creating a new deployment in test-us-east-1 takes place
throughout the system.

Requesting a deployment:


 1. A user sends a POST to mot.homeawaycorp.com/deployments to deploy an app to
    the test-us-east-1 region.
 2. The MoT service layer will validate the structure of the deployment request,
    producing it to the region-specific topic
    mot-deployment-launch-events-test-us-east-1 in the production-us-east-1
    Kafka cluster.
 3. The central→regional mirrormakers will consume the region-specific topics
    from the production-us-east-1 Kafka cluster then produce records to the
    stage-us-east-1 Kafka cluster.
 4. The MoT Regional Layer deployed in test-us-east-1 will consume records from
    the mot-deployment-launch-events-test-us-east-1 topic in the stage-us-east-1
    Kafka cluster, perform some business logic, then produce to the
    mot-deployment-complete-events topic in the test-us-east-1 Kafka cluster.

As a short aside, let us clarify the reasoning for including the stage-us-east-1
Kafka cluster at all. We elected to mirror topics from the central production
cluster to a pseudo-central cluster in the non-production environments. This
limits the number of firewall exceptions that violate the production |
non-production boundary. Furthermore, using mirrormakers at all violates our
first rule of Kafka etiquette, as we must mirror the source record to a foreign
region’s cluster. Breaking this rule from a central cluster to the non-prod
central cluster limits the number of foreign regions for which we break Kafka
etiquette.

Great! So now we’ve successfully taken a launch request from a user hitting an
API in production-us-east-1 and piped it to a launch request against the
regional scheduler in test-us-east-1. The record produced to the
mot-deployment-complete-events topic marks the end of the deployment request
flow as triggered by the user’s request. However, the story does not stop there.
If we were to take a look at the deployment’s dashboard for the app, there would
not be much to see. The only data persisted so far was a metadata shell storing
some information pulled from the initial request in the
DeploymentOperationAgent’s business logic. Let’s take a look at the persistence
of instance data and deployment state based on data collected from the regional
systems and services.

Persisting deployment state to the datastore-


 1. Marathon, Mesos, and Consul are continually polled for data about the state
    of an instance. For example, this state data might include the runtime host
    and port of an AppInstance, the most recent HealthCheckResult blob or the
    set of Consul tags associated with the service. The MoT regional layer
    aggregates and transforms these data streams into MoT models meant to be
    persisted. The records to be persisted on produced to specific topics, for
    example: mot-deployment-state-change in the test-us-east-1 Kafka cluster.
 2. Preconfigured regional→central mirrormakers on the same hosts as the
    central→regional mirrormakers mirror the topics from the regional Kafka
    clusters to the production-us-east-1 Kafka cluster. The destination topic
    has a suffix of the .<appenv>-<region> and thus is named
    mot-deployment-state-change.test-us-east-1 in the production-us-east-1 Kafka
    cluster.
 3. The MoT persistence layer has a MultiRegionAgentFactory pattern that spins
    up an ArchivalAgent for each region in MultiPaaS. For our example, we will
    have a DeploymentStateArchivalAgent with a target region of test-us-east-1.
    The DeploymentStateArchivalAgent consumes from the
    mot-deployment-state-change.test-us-east-1 topic in the production-us-east-1
    Kafka cluster and persists the data to the Cassandra cluster in the
    production-us-east-1 region.

Awesome! Now if a user were to do a GET on the supported deployment API
endpoints, they would expect to see the aggregated result of any instance and
deployment state collected and persisted by the above flow. Now we’ve seen
patterns of both dispersing information from our central region to our regional
services and collecting it from the regional services back into the central
region.


THE DESIGN IS SIMPLER IN PRODUCTION

Now that we’ve looked at how MoT propagates data over the production |
non-production boundary, we can take a look at the simpler flow of requests
between two production regions.



Production-to-production communication no longer necessitates the use of
mirrormakers. Each time a record needs to be pulled between Kafka clusters, the
destination region’s consumer can poll the source cluster and localize the data
for further processing.

The data persistence flow between production regions is likewise simplified:




CONCLUSIONS

I hope this blog has served you well. In decoupling the persistence, service,
and regional layers, we’ve allowed for reduced blast radii in outage scenarios
and minimized the amount of data we must send between regions. Best of luck with
your journeys in the hybrid cloud!




SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD.


FREE



Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.


Sign up for free


MEMBERSHIP



Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app


Try for 5 $/month
Tech
Container Orchestration
Cloud Services


53

53



Follow


PUBLISHED IN HOMEAWAY TECH BLOG

370 Followers
·Last published Jul 3, 2019

Software and data science revolutionizing vacations

Follow
Follow


WRITTEN BY JACOB PATTERSON

4 Followers
·1 Following

Follow



MORE FROM JACOB PATTERSON AND HOMEAWAY TECH BLOG

In

HomeAway Tech Blog

by

Himanshu Verma


SIMPSON’S PARADOX IN A/B TESTING


RESULTS OF AN A/B TEST ON BROWSER VARIANTS DON’T SEEM TO ADD UP

Sep 20, 2018
68
3



In

HomeAway Tech Blog

by

Adam Haines


THE STRANGLER PATTERN IN PRACTICE


FIVE MILESTONES ON THE ROAD TO THE CLOUD

Mar 5, 2019
118
3



In

HomeAway Tech Blog

by

Tim Renner


WRITE BETTER PYTHON WITH HYPOTHESIS


WHEN TESTING CODE, WE AT HOMEAWAY ALWAYS START WITH UNIT TESTS. FOR A PURE
FUNCTION, WE SPECIFY THE INPUTS, WHAT WE KNOW THE OUTPUT SHOULD…

Sep 24, 2018
296
1



In

HomeAway Tech Blog

by

Eduardo Solis


TESTING DROPWIZARD EMBEDDED WITH JUNIT5


WITH MINIMAL CHANGES TO YOUR UNIT TEST CODE YOU CAN START REAPING THE BENEFITS
OF JUNIT5 FOR YOUR DROPWIZARD BASED APPLICATIONS!

Nov 5, 2018
13


See all from Jacob Patterson
See all from HomeAway Tech Blog



RECOMMENDED FROM MEDIUM

Jessica Stillman


JEFF BEZOS SAYS THE 1-HOUR RULE MAKES HIM SMARTER. NEW NEUROSCIENCE SAYS HE’S
RIGHT


JEFF BEZOS’S MORNING ROUTINE HAS LONG INCLUDED THE ONE-HOUR RULE. NEW
NEUROSCIENCE SAYS YOURS PROBABLY SHOULD TOO.


Oct 30
14.9K
358



Mark Manson




40 LIFE LESSONS I KNOW AT 40 (THAT I WISH I KNEW AT 20)


TODAY IS MY 40TH BIRTHDAY.

Sep 23
30K
661




LISTS


APPLE'S VISION PRO

7 stories·81 saves


BUSINESS 101

25 stories·1305 saves


FIGMA 101

7 stories·796 saves


STORIES TO HELP YOU GROW AS A SOFTWARE DEVELOPER

19 stories·1512 saves


In

Stackademic

by

Abdur Rahman


PYTHON IS NO MORE THE KING OF DATA SCIENCE


5 REASONS WHY PYTHON IS LOSING ITS CROWN


Oct 23
9.3K
35



Harendra


HOW I AM USING A LIFETIME 100% FREE SERVER


GET A SERVER WITH 24 GB RAM + 4 CPU + 200 GB STORAGE + ALWAYS FREE


Oct 26
6.9K
101



F. Perry Wilson, MD MSCE




HOW OLD IS YOUR BODY? STAND ON ONE LEG AND FIND OUT


ACCORDING TO NEW RESEARCH, THE TIME YOU CAN STAND ON ONE LEG IS THE BEST MARKER
OF PHYSICAL AGING.


Oct 23
20K
484



Alexander Nguyen


I WROTE ON LINKEDIN FOR 100 DAYS. NOW I NEVER WORRY ABOUT FINDING A JOB.


EVERYONE IS HIRING.


Sep 21
37K
722


See more recommendations

Help

Status

About

Careers

Press

Blog

Privacy

Terms

Text to speech

Teams

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.
medium.com Open in urlscan Pro 2606:4700:7::a29f:9804 Public Scan

Form analysis 0 forms found in the DOM

Text Content

medium.com Open in urlscan Pro
2606:4700:7::a29f:9804 Public Scan

Form analysis
0 forms found in the DOM