buoyant.io Open in urlscan Pro
63.35.51.142  Public Scan

Submitted URL: https://servicemesh.io/
Effective URL: https://buoyant.io/service-mesh-manifesto
Submission: On August 26 via automatic, source certstream-suspicious — Scanned from IT

Form analysis 2 forms found in the DOM

POST https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/46339115/dfa3523f-e058-4ad0-8075-833b6c75f41d

<form id="hsForm_dfa3523f-e058-4ad0-8075-833b6c75f41d" method="POST" accept-charset="UTF-8" enctype="multipart/form-data" novalidate=""
  action="https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/46339115/dfa3523f-e058-4ad0-8075-833b6c75f41d"
  class="hs-form-private hsForm_dfa3523f-e058-4ad0-8075-833b6c75f41d hs-form-dfa3523f-e058-4ad0-8075-833b6c75f41d hs-form-dfa3523f-e058-4ad0-8075-833b6c75f41d_b25695df-7275-49d8-9ce4-8346881b02d0 hs-form stacked hs-custom-style"
  target="target_iframe_dfa3523f-e058-4ad0-8075-833b6c75f41d" data-instance-id="b25695df-7275-49d8-9ce4-8346881b02d0" data-form-id="dfa3523f-e058-4ad0-8075-833b6c75f41d" data-portal-id="46339115"
  data-test-id="hsForm_dfa3523f-e058-4ad0-8075-833b6c75f41d" data-cb-wrapper="true">
  <div class="hs_email hs-email hs-fieldtype-text field hs-form-field"><label id="label-email-dfa3523f-e058-4ad0-8075-833b6c75f41d" class="" placeholder="Enter your Business email" for="email-dfa3523f-e058-4ad0-8075-833b6c75f41d"><span>Business
        email</span><span class="hs-form-required">*</span></label>
    <legend class="hs-field-desc" style="display: none;"></legend>
    <div class="input"><input id="email-dfa3523f-e058-4ad0-8075-833b6c75f41d" name="email" required="" placeholder="" type="email" class="hs-input" inputmode="email" autocomplete="email" value=""></div>
  </div>
  <div class="hs_recaptcha hs-recaptcha field hs-form-field">
    <div class="input">
      <div class="grecaptcha-badge" data-style="inline" style="width: 256px; height: 60px; box-shadow: gray 0px 0px 5px;">
        <div class="grecaptcha-logo"><iframe title="reCAPTCHA" width="256" height="60" role="presentation" name="a-ntd008rhpmlh" frameborder="0" scrolling="no"
            sandbox="allow-forms allow-popups allow-same-origin allow-scripts allow-top-navigation allow-modals allow-popups-to-escape-sandbox allow-storage-access-by-user-activation"
            src="https://www.google.com/recaptcha/enterprise/anchor?ar=1&amp;k=6Ld_ad8ZAAAAAAqr0ePo1dUfAi0m4KPkCMQYwPPm&amp;co=aHR0cHM6Ly9idW95YW50LmlvOjQ0Mw..&amp;hl=en&amp;v=i7X0JrnYWy9Y_5EYdoFM79kV&amp;size=invisible&amp;badge=inline&amp;cb=bkfws7tf9sz"></iframe>
        </div>
        <div class="grecaptcha-error"></div><textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response"
          style="width: 250px; height: 40px; border: 1px solid rgb(193, 193, 193); margin: 10px 25px; padding: 0px; resize: none; display: none;"></textarea>
      </div><iframe style="display: none;"></iframe>
    </div><input type="hidden" name="g-recaptcha-response" id="hs-recaptcha-response" value="">
  </div>
  <div class="hs_submit hs-submit">
    <div class="hs-field-desc" style="display: none;"></div>
    <div class="actions"><input type="submit" class="hs-button primary large" value="Submit"></div>
  </div><input name="hs_context" type="hidden"
    value="{&quot;embedAtTimestamp&quot;:&quot;1724669596832&quot;,&quot;formDefinitionUpdatedAt&quot;:&quot;1721686752588&quot;,&quot;lang&quot;:&quot;en&quot;,&quot;embedType&quot;:&quot;REGULAR&quot;,&quot;clonedFromForm&quot;:&quot;6bb2714a-86c0-4161-8df8-e69caaf45745&quot;,&quot;notifyHubSpotOwner&quot;:&quot;true&quot;,&quot;userAgent&quot;:&quot;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36&quot;,&quot;pageTitle&quot;:&quot;The Service Mesh Manifesto&quot;,&quot;pageUrl&quot;:&quot;https://buoyant.io/service-mesh-manifesto&quot;,&quot;isHubSpotCmsGeneratedPage&quot;:false,&quot;hutk&quot;:&quot;7acfe9133be80fa81d4512dd95e1611b&quot;,&quot;__hsfp&quot;:430326251,&quot;__hssc&quot;:&quot;9342122.1.1724669599483&quot;,&quot;__hstc&quot;:&quot;9342122.7acfe9133be80fa81d4512dd95e1611b.1724669599483.1724669599483.1724669599483.1&quot;,&quot;formTarget&quot;:&quot;#hbspt-form-b25695df-7275-49d8-9ce4-8346881b02d0&quot;,&quot;rumScriptExecuteTime&quot;:2658.2999999523163,&quot;rumTotalRequestTime&quot;:3352.7999999523163,&quot;rumTotalRenderTime&quot;:3387.2000000476837,&quot;rumServiceResponseTime&quot;:694.5,&quot;rumFormRenderTime&quot;:34.40000009536743,&quot;connectionType&quot;:&quot;4g&quot;,&quot;firstContentfulPaint&quot;:0,&quot;largestContentfulPaint&quot;:0,&quot;locale&quot;:&quot;en&quot;,&quot;timestamp&quot;:1724669599493,&quot;originalEmbedContext&quot;:{&quot;portalId&quot;:&quot;46339115&quot;,&quot;formId&quot;:&quot;dfa3523f-e058-4ad0-8075-833b6c75f41d&quot;,&quot;region&quot;:&quot;na1&quot;,&quot;target&quot;:&quot;#hbspt-form-b25695df-7275-49d8-9ce4-8346881b02d0&quot;,&quot;isBuilder&quot;:false,&quot;isTestPage&quot;:false,&quot;isPreview&quot;:false,&quot;css&quot;:&quot;&quot;,&quot;isMobileResponsive&quot;:true},&quot;correlationId&quot;:&quot;b25695df-7275-49d8-9ce4-8346881b02d0&quot;,&quot;renderedFieldsIds&quot;:[&quot;email&quot;],&quot;captchaStatus&quot;:&quot;LOADED&quot;,&quot;emailResubscribeStatus&quot;:&quot;NOT_APPLICABLE&quot;,&quot;isInsideCrossOriginFrame&quot;:false,&quot;source&quot;:&quot;forms-embed-1.5781&quot;,&quot;sourceName&quot;:&quot;forms-embed&quot;,&quot;sourceVersion&quot;:&quot;1.5781&quot;,&quot;sourceVersionMajor&quot;:&quot;1&quot;,&quot;sourceVersionMinor&quot;:&quot;5781&quot;,&quot;allPageIds&quot;:{},&quot;_debug_embedLogLines&quot;:[{&quot;clientTimestamp&quot;:1724669596950,&quot;level&quot;:&quot;INFO&quot;,&quot;message&quot;:&quot;Retrieved pageContext values which may be overriden by the embed context: {\&quot;pageTitle\&quot;:\&quot;The Service Mesh Manifesto\&quot;,\&quot;pageUrl\&quot;:\&quot;https://buoyant.io/service-mesh-manifesto\&quot;,\&quot;userAgent\&quot;:\&quot;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36\&quot;,\&quot;isHubSpotCmsGeneratedPage\&quot;:false}&quot;},{&quot;clientTimestamp&quot;:1724669596952,&quot;level&quot;:&quot;INFO&quot;,&quot;message&quot;:&quot;Retrieved countryCode property from normalized embed definition response: \&quot;IT\&quot;&quot;},{&quot;clientTimestamp&quot;:1724669599488,&quot;level&quot;:&quot;INFO&quot;,&quot;message&quot;:&quot;Retrieved analytics values from API response which may be overriden by the embed context: {\&quot;hutk\&quot;:\&quot;7acfe9133be80fa81d4512dd95e1611b\&quot;}&quot;}]}"><iframe
    name="target_iframe_dfa3523f-e058-4ad0-8075-833b6c75f41d" style="display: none;"></iframe>
</form>

Name: wf-form-Cookie-PreferencesGET

<form id="cookie-preferences" name="wf-form-Cookie-Preferences" data-name="Cookie Preferences" method="get" class="fs-cc-prefs2_form" data-wf-page-id="662bf2c8427d4182621770d8" data-wf-element-id="cf17697b-740a-7346-3c3c-175fefc12f59"
  aria-label="Cookie Preferences" data-cb-wrapper="true">
  <div fs-cc="close" class="fs-cc-prefs2_close" role="button" tabindex="0">
    <div class="fs-cc-preferences2_close-icon w-embed"><svg fill="currentColor" aria-hidden="true" focusable="false" viewBox="0 0 16 16">
        <path d="M9.414 8l4.293-4.293-1.414-1.414L8 6.586 3.707 2.293 2.293 3.707 6.586 8l-4.293 4.293 1.414 1.414L8 9.414l4.293 4.293 1.414-1.414L9.414 8z"></path>
      </svg></div>
  </div>
  <div class="fs-cc-prefs2_content">
    <div class="fs-cc-prefs2_space-small">
      <div class="fs-cc-prefs2_title">Privacy Preferences</div>
    </div>
    <div class="fs-cc-prefs2_option">By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our
      <a href="https://buoyant.io/policy/privacy" class="fs-cc-banner2_text-link">Privacy Policy</a> for more information.</div>
    <div class="fs-cc-banner2_buttons-wrapper modal_cookies">
      <a fs-cc="deny" href="#" class="fs-cc-banner2_button fs-cc-button-alt modal_cookies w-button" role="button" tabindex="0">Only required</a><a fs-cc="allow" href="#" class="fs-cc-banner2_button modal_cookies w-button" role="button" tabindex="0">Accept all</a>
    </div>
  </div>
</form>

Text Content

O'Reilly ebook | Linkerd: Up & Running

Download »

New Online Course: Service Mesh 101. Get service mesh certified with Buoyant!

Enroll now »
Product
Solutions
Service mesh

FIPS-140-2 Compliance

FIPS encryption in transit for any Kubernetes app

Kubernetes at the Edge
Instant network security and reliability for Kubernetes on the edge
Zero Trust Security
Zero trust network security in Kubernetes
with the service mesh


LINKERD:
UP & RUNNING (O'REILLY)

A guide to operationalizing a Kubernetes-native service mesh.

Download

Pricing
Resources
Service mesh


Blog

Linkerd, service mesh, and much more!


Case studies
Learn how organizations across the world
run Linkerd in production.

Ebooks, videos and resources
Enterpise-focused documents to
better inform you about the service mesh


LINKERD:
UP & RUNNING (O'REILLY)

A guide to operationalizing a Kubernetes-native service mesh.

Download

Academy
Service Mesh Academy

Hands-on, engineer-focused training on
the fundamentals of Linkerd.


Buoyant Linkerd Forum
Get support for Linkerd from the
maintainers, experts, and Linkerd community!
The Service Mesh Manifesto
What every software engineer needs to know
about the service mesh.
Linkerd vs Istio
How do the two service meshes compare?
Linkerd vs Cilium
Five Key Differences in 2024
Guide to mTLS and Kubernetes
A Kubernetes engineer’s guide to mutual TLS.
Upcoming Workshop
Live on
Sep 12
Top Five Things You Need to Know about Linkerd 2.16

The Linkerd team has been hard at work, as always, culminating in the release of
Linkerd 2.16! In this Service Mesh Academy, we’ll take a whirlwind tour of all
the highlights of this new release.

Register to attend
Self paced courses

Get Service Mesh-Certified with Buoyant's Service Mesh Academy self-paced course
Service Mesh 101.

Enroll now
About
About
About us
Our mission and our story
Contact Us
Have a question or ready to get started with Linkerd?
Newsroom

The latest news about the service mesh
DemoBook a demoDocsRead the docs

 * Product
 * Pricing
 * Soultions
   FIPS-140-2 Compliance
   FIPS encryption in transit for any Kubernetes app
   Kubernetes at the Edge
   Instant network security and reliability for Kubernetes on the edge
   Zero Trust Security
   Zero trust network security in Kubernetes with the service mesh
   Linkerd production runbook
   Buoyant's guide to running Linkerd in production.
 * Resources
   Blog
   Linkerd, service mesh, and much more!
   Case studies
   Learn how organizations across the world run Linkerd in production.
   Ebooks, videos & resources
   Enterpise-focused documents to better inform you about the service mesh
   Linkerd production runbook
   Buoyant's guide to running Linkerd in production.
 * Academy
   Service Mesh Academy
   Hands-on, engineer-focused training on the fundamentals of Linkerd.
   Buoyant’s Linkerd Forum
   Get support for Linkerd from the maintainers, experts, and Linkerd community!
   Linkerd vs Istio
   How do the two service meshes compare?
   Linkerd vs Cilium
   Five Key Differences in 2024
   Guide to mTLS and Kubernetes
   A Kubernetes engineer’s guide to mutual TLS.
   The Service Mesh Manifesto
   What every software engineer needs to know about the service mesh.
   Upcoming Workshop
   Have a question or ready to get started with Linkerd?
   Self paced courses
   Enroll for Service Mesh 101
 * About
   About us
   Our mission and our story
   Contact Us
   Have a question or ready to get started with Linkerd?
   Newsroom
   The latest news about the service mesh

Book a demoRead the docs





THE SERVICE MESH


WHAT EVERY SOFTWARE ENGINEER NEEDS TO KNOW ABOUT THE WORLD’S MOST OVER-HYPED
TECHNOLOGY

William Morgan


INTRODUCTION

If you’re a software engineer working anywhere near backend systems, the term
“service mesh” has probably infiltrated your consciousness some time over the
past few years. Thanks to a strange confluence of events, this phrase has been
rolling around the industry like a giant Katamari ball, glomming on successively
bigger pieces of marketing and hype and showing no signs of stopping any time
soon.

The service mesh was born in the murky, trend-infested waters of the cloud
native ecosystem, which unfortunately means that a huge amount of service mesh
content ranges from “low-calorie fluff” to—to use a technical term—“basically
bullshit”. But there’s some real, concrete, and important value to the service
mesh, if you can cut through all the noise.

In this guide I’m going to attempt just that: to provide an honest, deep,
engineer-focused guide to the service mesh. I’m going to cover not just the what
but also the why and the why now. Finally, I’m going to attempt to describe why
I think this particular technology has attracted such a crazy level of hype,
which is an interesting story in and of itself.


WHO AM I?

Hi there. I’m William Morgan. I am one of the creators of Linkerd, the very
first service mesh project and the project that gave birth to the term service
mesh itself. (Sorry!) I’m also the CEO of Buoyant, a startup that builds cool
service mesh stuff like Linkerd and Buoyant Cloud.

As you might imagine, I am very biased and have some strong opinions on this
topic. That said, so I’m going to do my best to leave the editorializing to a
minimum (except one section, “Why do people talk so much about this?“, where
I’ll unveil some opinions) and I’ll do my best to write this guide in a way that
is as objective as possible. When I need concrete examples I’ll primarily rely
on Linkerd, but when there are differences I know about with other mesh
implementations I’ll call them out.

Ok. On to the good stuff!


WHAT IS A SERVICE MESH?

For all the hype, the service mesh is architecturally pretty straightforward.
It’s nothing more than a bunch of userspace proxies, stuck “next” to your
services (we’ll talk about what “next” means in a bit), plus a set of management
processes. The proxies are referred to as the service mesh’s data plane, and the
management processes as its control plane. The data plane intercepts calls
between services and “does stuff” with these calls; the control plane
coordinates the behavior of the proxies, and provides an API for you, the
operator, to manipulate and measure the mesh as a whole.



What are these proxies? They’re Layer 7-aware TCP proxies, just like haproxy and
NGINX. The choice of proxy varies; Linkerd uses a Rust “micro-proxy” simply
called Linkerd-proxy that we built specifically for the service mesh. Other
meshes use different proxies; Envoy is a common choice. But the choice of proxy
is an implementation detail. (Edit January 2020: see Why Linkerd Doesn’t Use
Envoy for more on why Linkerd uses Linkerd2-proxy rather than Envoy.)

What do these proxies do? They proxy calls to and from the services, of course.
(Strictly speaking, they act as both “proxies” and “reverse proxies”, handling
both incoming and outgoing calls.) And they implement a feature set that focuses
on the calls between services. This focus on traffic between services is what
differentiates service mesh proxies from, say, API gateways or ingress proxies,
which focus on calls from the outside world into the cluster as a whole.

So that’s the data plane. The control plane is simpler: it’s a set of components
that provide whatever machinery the data plane needs to act in a coordinated
fashion, including service discovery, TLS certificate issuing, metrics
aggregation, and so on. The data plane calls the control plane to inform its
behavior; the control plane in turn provides an API to allow the user to modify
and inspect the behavior of the data plane as a whole.

Here’s a diagram of Linkerd’s control plane and data plane. You can see that the
control plane has several different components, including a small Prometheus
instance that aggregates metrics data from the proxies, as well as components
such as destination (service discovery), identity (certificate authority), and
public-api (web and CLI endpoints). The data plane, by contrast, is just a
single linkerd-proxy next to an application instance. This is just the logical
diagram; when deployed, you may end up with three replicas of each control plane
component but hundreds or thousands of data plane proxies.

(The blue boxes in this diagram represent Kubernetes pod boundaries. You can see
that the linkerd-proxy containers actually run in the same pod as the
application containers. This pattern is known as a sidecar container.)



The architecture of the service mesh has a couple big implications. For one,
since the proxy featureset is designed for service-to-service calls, the service
mesh really only makes sense if your application is built as services. You could
use it with a monolith, but it would be a whole lot of machinery to run a single
proxy, and the featureset wouldn’t be a great fit.

Another consequence is that the service mesh is going to require lots and lots
of proxies. In fact, Linkerd adds one linkerd-proxy per instance of every
service. (Some other mesh implementations add one proxy per node / host / VM.
It’s a lot either way.) This heavy use of proxies itself has a couple
implications:

 1. Whatever these data plane proxies are, they’d better be fast. You’re adding
    two proxy hops to every call, one on the client side and one on the server
    side.
 2. Also, the proxies need to be small and light. Each one will consume memory
    and CPU, and this consumption will scale linearly with your application.
 3. You’re going to need a system for deploying and updating lots of proxies.
    You don’t want to have to do this by hand.

But, at least at the 10,000ft level, that’s really all there is to the service
mesh: you deploy a ton of userspace proxies to “do stuff” to internal,
service-to-service traffic, and you use the control plane to change their
behavior and to query the data they generate.

Now let’s move on to the why.


WHY DOES THE SERVICE MESH MAKE SENSE?

If you’re encountering the idea of service mesh for the first time, you can be
forgiven if your first reaction is mild horror. The design of the service mesh
means that not only does it add latency to your application, it also consumes
resources and also introduces a whole bunch of machinery. One minute you’re
installing a service mesh, the next you’re suddenly on the hook for operating
hundreds or thousands of proxies. Why would anyone want to do this?

There are two parts to the answer. The first is that the operational cost of
deploying these proxies can be greatly reduced, thanks to some other changes
that are happening in the ecosystem. Lots more on that later.

The more important answer is because this design is actually a great way to
introduce additional logic into the system. That’s not only because there are a
ton of features you can add right there, but also because you can add them
without changing the ecosystem. In fact, the entire service mesh model is
predicated on this very insight: that, in a multi-service system, regardless of
what individual services actually do, the traffic between them is an ideal
insertion point for functionality.



For example, Linkerd, like most meshes, has a Layer 7 feature set focused
primarily on HTTP calls, including HTTP/2 and gRPC.1 The feature set is broad,
but can be divided into three classes:

 1. Reliability features. Request retries, timeouts, canaries (traffic
    splitting/shifting), etc.
 2. Observability features. Aggregation of success rates, latencies, and request
    volumes for each service, or individual routes; drawing of service topology
    maps; etc.
 3. Security features. Mutual TLS, access control, etc.



Many of these features operate at the request level (hence the “L7 proxy”). For
example, if service Foo makes an HTTP call to service Bar, the linkerd-proxy on
Foo’s side can load balance that call intelligently across all the instances of
Bar based on the observed latency of each one; it can retry the request if it
fails and if it’s idempotent; it can record the response code and latency; and
so on. Similarly, the linkerd-proxy on Bar’s side can reject the call if it’s
not allowed, or is over the rate limit; it can record latency from its
perspective; and so on.



The proxies can “do stuff” at the connection level too. For example, Foo’s
linkerd-proxy can initiate a TLS connection and Bar’s linkerd-proxy can
terminate it, and both sides can validate the others’ TLS certificate.2 This
provides not just encryption between services, but a cryptographically secure
form of service identity—Foo and Bar can “prove” they are who they say they are.

Whether they’re at the request or at the connection level, one important thing
to note is that the features of the service mesh are all operational in nature.
There isn’t anything in Linkerd about transforming the semantics of the request
payload, e.g. adding fields to a JSON blob or transforming a protobuf. This is
an important distinction that touch on again when we talk about ESBs and
middleware.

So that’s the set of features that the service mesh can provide. But why not
just implement them directly in the application? Why bother with the proxies at
all?


WHY IS THE SERVICE MESH A GOOD IDEA?

While the featureset is interesting, the core value of the service mesh is not
actually in the features. After all, we could implement these features directly
in the application themselves. (In fact, we’ll see later that this was the
genesis of the service mesh.) If I had to put it into a single sentence, the
value of the service mesh comes down to this: The service mesh gives you
features that are critical for running modern server-side software in a way
that’s uniform across your stack and decoupled from application code.

Let’s take that one bit at a time.

Features that are critical for running modern server-side software. If you are
building a transactional, server side application that is connected to the
public Internet and takes requests from the outside world and responds to them
within some short timeframe—think web apps, API servers, and the bulk of modern
server-side software—and if you are building this system as a collection of
services which talk to each other in a synchronous fashion, and if you are
continually modifying this software to add more functionality, and if you are
tasked with keeping this system running even while you’re modifying it—then
congratulations, you are building modern server-side software. And all those
glorious features listed above actually turn out to be critical for you. The
application must be reliable; it must be secure; and you must be able to observe
what it’s doing. And that’s exactly what the service mesh helps with.

(Ok, I snuck an opinion in there: that this approach is the modern way to build
server-side software. There are people in the world today who are building
monoliths or “reactive microservices” and other things that don’t fit into the
definition above, who hold a different opinion.)

Uniform across your stack. The features provided by the service mesh aren’t just
critical, they apply to every service in your application, regardless of what
language the service is written in, what framework is uses, who wrote it, how it
was deployed, or any other detail of development or deployment.

Decoupled from application code. Finally, the service mesh doesn’t just provide
features uniformly across your stack, it does so in a way that requires no
application changes. The fundamental ownership of the service mesh
functionality—including the operational ownership of configuration, updates,
operation, maintenance, etc—lies purely at the platform level, independent of
the application. The application can change without the service mesh being
involved, and the service mesh can change without the application being
involved.

In short: not only does the service mesh provide vital features, it does so in a
way that’s global, uniform, and independent of the application. And so while
yes, the features of the service mesh could be implemented in the service code
(even as a library that was linked in to to every service), this approach would
not provide the decoupling and uniformity that’s at the heart of the service
mesh value prop.

And all you have to do is add a lot of proxies! I promise that we were going to
talk about the operational cost of adding all these proxies very soon. But
first, we need a pit stop to examine this idea of decoupling from the
perspective of people.


WHO DOES THE SERVICE MESH HELP?

As inconvenient as it may be, it turns out that in order for technology to
actually have an impact, it must be adopted by human beings. So who adopts the
service mesh? Who benefits from it?

If you’re building what I’ve described above as modern server software above,
you can roughly think of your team as divided into service owners, who are in
the business of building the business logic, and platform owners, who are
building the internal platform on which these services run. In small
organizations, these may be the same people, but as the organization gets larger
these roles typically get more defined and even further subdivided. (There’s a
lot more to be said here about the changing nature of devops, the organizational
impact of microservices, etc. But for now let’s take these descriptions as a
given.)

Seen through this lens, the immediate beneficiary of the service mesh is the
platform owners. The goal of the platform team, after all, is to build the
internal platform on which the service owners can run their business logic, and
to do so in a way that keeps the service owners as independent as possible from
the gory details of operationalization. The service mesh not only provides
features that are critical for accomplishing this, it does so in a way that
doesn’t, in turn, incur a dependency on service owners.

The service owners also benefit, albeit in a more indirect way. The goal of the
service owner is to be as productive in possible in building the logic of the
business, and the fewer operational mechanics they have to worry about, the
easier that is. Rather than being on the hook for implementing e.g. retry
policies or TLS, they can focus purely on business logic concerns and trust that
the platform will take care of the rest. That’s a big plus for them as well.

The organizational value of the decoupling between platform and service owners
can’t be overstated. In fact, I think it might be the key reason why the service
mesh is valuable.

We learned this lesson when one of our earliest Linkerd adopters told us just
why they were adopting a service mesh: because it allowed them to “not have to
talk to people”. This was a platform team at a large company that was migrating
to Kubernetes. Because their app handled sensitive information, they wanted to
encrypt all communication on the clusters. There were hundreds of services and
hundreds of developers teams, and they were not looking forward to convincing
each dev team to add TLS to their roadmap. By installing Linkerd, they shifted
ownership of the feature out of the hands of developers, for whom it was an
imposition, and into the hands of the platform team, for whom it was a top-level
priority. Linkerd didn’t solve a technical problem for them so much as it solved
an organizational problem.



In short, the service mesh is less a solution to a technical problem than it is
a solution to a socio-technical problem.3


DOES THE SERVICE MESH SOLVE ALL MY PROBLEMS?

Yes. Er, no!

If you look at the three classes of features outlined above—reliability,
security, and observability—it should be clear that the service mesh is not a
complete solution for any of these domains. While Linkerd can retry requests
when it knows that they are idempotent, it can’t make decisions about what to
return to the user if a service is entirely down—the application must make these
decisions. While Linkerd can report success rates, etc, it can’t look inside a
service and report internal metrics—the application must have instrumentation.
And while Linkerd can do things like mutual TLS “for free”, there’s a lot more
to security solution than just that.

The subset of features in those domains that the service mesh provides are the
ones that are platform features. By this I mean features that are:

 1. Independent of business logic. The way that traffic latency histograms are
    computed for calls between Foo and Bar is totally independent of why Foo is
    calling the Bar in the first place.
 2. Difficult to implement correctly. Linkerd’s retries are parameterized with
    sophisticated things like retry budgets because the naive approach to
    retries is a sure path to “retry storms” and other distributed system
    failure modes.
 3. Most effective when implemented uniformly. The mechanics of mutual TLS only
    really make sense when everyone is doing them.

Because these features are implemented at the proxy layer, rather than at the
application layer, the service mesh provides them at the platform, not
application, level. It doesn’t matter what language the services are written in,
or what framework they use, or who wrote them, or how they got there. The
proxies function independent of all that, and the ownership of this
functionality—including the operational ownership of configuration, updates,
operation, maintenance, etc—lies purely at the platform level.




EXAMPLE FEATURES OF THE SERVICE MESH

Service Mesh
Platform (non-service mesh)
Application
Observability
Service success rates
Log aggregation
Instrumentation of internal feature usage
Reliability
Request retries
Multiple replicas of dataset
Handling of failure when an entire component is down
Security
Mutual TLS between all services
Encryption of data at rest
Ensuring users only have access to their own data

To summarize: the service mesh is not a complete solution to reliability, or to
observability, or to security. The broader ownership of those domains
necessarily involves service owners, ops and SRE teams, and other parts of the
organization. The service mesh can only provide a platform-layer “slice” of each
domain.


WHY DOES THE SERVICE MESH MAKE SENSE NOW?

At this point you may be saying to yourself: ok, if this service mesh thing is
so great, why weren’t we rolling millions of proxies in our stack ten years ago?

There’s a shallow answer to this, which is that ten years ago everyone was
building monoliths, and so no one needed a service mesh. Which is true, but I
think misses the point. Even ten years ago, the concept of “microservices” as a
feasible way of building high-scale systems was widely discussed, and was
publicly being put into practice at companies like Twitter, Facebook, Google,
and Netflix. The general sentiment, at least in the parts of the industry I was
exposed to, was that microservices were the “right way” to build high-scale
systems, even if gosh they were really painful to do.



Of course, while there were companies operating microservices ten years ago,
they were by and large not installing proxies everywhere to form a service mesh.
If you looked closely, though, they were doing something related: many of these
organizations mandated the use of a specific internal library for network
communication (sometimes called a “fat client” library). Netflix had Hysterix,
Google had the Stubby libraries, and Twitter had Finagle. Finagle, for example,
was mandatory for every new service at Twitter, handled both client and server
sides of the connection, and implemented retries, and request routing, and load
balancing, and instrumentation. It provided a consistent layer of reliability
and observability across the entire Twitter stack, independent of what the
service itself actually did. Sure, it only worked for JVM languages, and it had
a programming model that you had to build your whole app around, but the
operational features it provided were almost exactly those of the service mesh.4

So ten years ago, not only did we have microservices, we had proto-service-mesh
libraries that solved many of the same problems that the service mesh solves
today. But we didn’t have the service mesh. Something else needed to change
first.



And that’s where the deeper answer lies, buried in another difference that’s
happened over the past ten years: there’s been a dramatic reduction of the cost
of deploying microservices. The companies I’ve listed above who were publicly
using microservices a decade ago—Twitter, Netflix, Facebook, Google—were
companies of immense scale and immense resources. They had not just the need but
the talent to build, deploy, and operate significant microservice applications.
The sheer amount of engineering time and energy that went into Twitter’s
migration from monolith to microservices boggles the imagination,5 and this sort
of infrastructural maneuver was essentially impossible for smaller companies.

Contrast that to today, where you might encounter startups with a 5:1 or even
10:1 ratio of microservices to developers—and what’s more, they are equipped to
handle it. If running 50 microservices is a plausible approach for a 5-person
startup, then clearly something has reduced the cost of adopting microservices.



The dramatic reduction in the cost of operating microservices is a result of one
thing: the rise in the adoption of containers and container orchestrators. And
this is where the deeper answer to the question of what change has enabled the
service mesh lies. What’s made the service mesh operationally viable is the same
thing that’s making microservices operationally viable: Kubernetes and Docker.

Why? Well, Docker solves one big thing: the packaging problem. By allowing you
to package your app and its (non-network) runtime dependencies into a container,
your app is now a fungible unit that can be thrown around and run anywhere. By
the same token, Docker makes it exponentially easier to run a polyglot stack:
because the container is an atomic unit of execution, for deploy and operational
purposes it doesn’t really matter what’s inside the container, and whether it’s
a JVM app or a Node app or Go or Python or Ruby. You just run it.

Kubernetes solves the next step: now that I have a bunch of “executable things”,
and I also have a bunch of “things that can execute these executable things”
(aka machines), I need a mapping between them. In a broad sense, you give
Kubernetes a bunch of containers and a bunch of machines, and it figures out
this mapping. (Which of course is a dynamic and ever-shifting thing, as new
containers roll through the system, machines come in and out of operation, and
so on. But Kubernetes figures it out.)

Once you have Kubernetes going, the deploy-time cost of running one service is
not that much different from running ten services, and in fact not that
different from 100 services. Combine that with the container as packaging
mechanism that encourages polyglot implementations, and the result is a ton of
new applications that are implemented as microservices written in a variety of
languages—exactly the environment the service mesh is most suited for.



And so finally we come to why the service mesh is feasible now: the very same
uniformity that Kubernetes provides for services is directly applicable to the
operational challenges of the service mesh. You package the proxies into
containers, you tell Kubernetes to stick ‘em everywhere, and voila! You got
yourself a service mesh, with all the deploy-time mechanics handled for you by
Kubernetes.6

To summarize: the reason why the service mesh makes sense now, as opposed to 10
years ago, is that the rise of Kubernetes and Docker have not only dramatically
increased the need to run a service mesh, by making it easy to build your
application as a polyglot microservices architecture, they’ve dramatically
reduced the cost of running a service mesh, by providing mechanisms for
deploying and maintaining fleets of sidecar proxies.


WHY DO PEOPLE TALK SO MUCH ABOUT THE SERVICE MESH?

Content warning: In this section, I resort to speculation, conjecture, inside
baseball, and opinion.

One need only search for “service mesh” to encounter a Kafka-esque fever dream
of a landscape, full of confusing projects, low-calorie recycled content, and
general echo chamber distortion. All shiny new tech has a certain level of this,
but the service mesh seems to have a particularly bad case. Why is that?



Well, partly it’s my fault. I’ve done my best to talk up Linkerd and the service
mesh at every opportunity, over countless blog posts and podcasts and articles
like this one. But I’m not that powerful. To really answer this question, I have
to talk about the service mesh landscape. And it’s impossible to talk about the
landscape without talking about one project in particular: Istio, an open source
service mesh that’s billed as a collaboration between Google, IBM, and Lyft.7



What’s remarkable about Istio is two things. First, the sheer amount of
marketing effort that Google, in particular, is placing behind it. In my
estimation, the majority of people who know about the service mesh today were
introduced to it through Istio. The second remarkable thing is just how poorly
Istio has been received. Obviously I have a horse in this race, but trying to be
as objective as I can, it seems to me that Istio has developed a pretty public
backlash in a way that’s uncommon (though not unheard of8) for an open source
project.9

Leaving aside my personal theories as to why that’s happening, I believe it’s
Google’s involvement here that is really the reason that the service mesh space
is so hype-y. Specifically, the combination of a) Istio being promoted so
heavily by Google; b) its corresponding lackluster reception; and c) the recent
meteoric rise of Kubernetes still fresh on everyone’s minds have all combined to
form a kind of heady, oxygen-free environment where capacity for rational
thought is extinguished and only a weird kind of cloud-native tulip mania
remains.

From the Linkerd perspective, of course, this is… I guess I would describe it as
a mixed blessing. I mean, it’s great that the service mesh is a “thing” now—this
was not the case in 2016 when Linkerd first got off the ground, and it was
really hard to get anyone to pay attention. We don’t have that problem any more!
But it sucks that the service mesh landscape is so confusing and it’s so hard to
understand even which projects are service meshes, never mind which one fits
your use case the best. That does everyone a disservice. (And there are
certainly situations where Istio or another project would be the right choice
over Linkerd—it’s far from a one-size-fits-all solution.)

On the Linkerd side, our strategy has been to ignore the noise, continue
focusing on solving real problems for our community, and basically wait for the
whole thing to blow over. The level of hype will eventually subside and we can
all get on with our lives.

In the meantime, though, we’re all going to have to suffer through this
together.


SO… SHOULD I, A HUMBLE SOFTWARE ENGINEER, CARE ABOUT THE SERVICE MESH?

If you’re a software engineer, here’s my basic rubric for whether you should
care about the service mesh.

If you are in a pure business-logic-implementin’ developer role: No, you don’t
really need to care about the service mesh. I mean, you’re certainly welcome to
care, but ideally the service mesh won’t directly affect anything in your life.
Keep building that sweet, sweet business logic that gets everyone around you
paid.

If you are in a platform role in an org that is using Kubernetes: Yes, you 100%
should care. Unless you are adopting K8s purely to run a monolith or to do batch
processing (in which case, I would seriously ask the question of why K8s),
you’re going to end up in a situation where you have lots of microservices, all
written by other people, all talking to each other, all tied together into one
unholy bundle of runtime dependencies, and you’re going to need a way to deal
with that. Since you’re on Kubernetes, you will have several service mesh
options, and you should have an informed opinion about which ones or even
whether you want any of them at all. (Start with Linkerd.)

If you are in a platform role in an org that is NOT using Kubernetes, but IS
“doing microservices”: Yes, you should care, but it’s going to be complicated.
Sure, you could get the value of the service mesh by deploying lots of proxies
everywhere, but the nice part of Kubernetes is the deployment model, and your
ROI equation is going to look very different if you have to manage these proxies
yourself.

If you are in a platform role in an org that is “doing monoliths”: No, you
probably don’t need to care. If you are operating a monolith, or even a
“collection of monoliths” that have well-defined and infrequently-changing
communication patterns, then the service mesh will not add very much and you can
probably just ignore it and hope it goes away.


CONCLUSION

The service mesh probably doesn’t actually hold the title of “the World’s Most
Over-Hyped Technology”–that dubious distinction probably goes to Bitcoin or AI.
Maybe it’s merely in the top 5. But if you can cut through the layers of noise,
there’s some real value to be had for anyone who’s building applications on
Kubernetes.

Finally, I’d love for you to try Linkerd—it should take about 60 seconds to
install on a Kubernetes cluster, even just a Minikube on your laptop—and you can
see for yourself exactly what I’m talking about.

Want to give Linkerd a try? You can download and run the production-ready
Buoyant Enterprise for Linkerd in minutes. Get started today!


FAQS


IF I JUST IGNORE THIS WHOLE SERVICE MESH THING WILL IT JUST GO AWAY?

Sadly, the service mesh is here to stay.


BUT I DON’T WANT TO USE A SERVICE MESH

Then don’t. But see my guide above as to whether you need to understand it.


ISN’T THIS JUST ESB / MIDDLEWARE ALL OVER AGAIN?

One big difference is that the service mesh focuses on operational logic, not
business logic. That was the downfall of the enterprise service bus. Keeping
that separation is critical for the service mesh avoiding the same fate.


HOW IS THIS DIFFERENT FROM API GATEWAYS?

There are a million articles about this, but basically it comes down to: API
gateways handle ingress concerns, not intra-cluster concerns, and often deal
with a certain amount of business logic, e.g. “user X is only allowed to make 30
requests a day”.


IS ENVOY A SERVICE MESH?

No, Envoy is a proxy. Envoy can be used to make a service mesh (and many other
things; it’s a general-purpose proxy). But it’s not a service mesh.

I wrote a blog post about why Linkerd doesn’t use Envoy.


IS NETWORK SERVICE MESH A SERVICE MESH?

No. Despite the name, and the talented engineers behind it, it’s not a service
mesh. (Marketing is fun, right?)


WILL THE SERVICE MESH HELP MY REACTIVE, ASYNCHRONOUS MESSAGE QUEUE-BASED SYSTEM?

Yes. At least, Linkerd’s ability to provide things like mTLS, policy, byte-level
metrics, etc will be very useful. Other features, such as request balancing and
latency monitoring, are specific to synchronous systems such as HTTP and won’t
be useful, but typically your message broker will have other ways of achieving
those goals.


WHICH SERVICE MESH SHOULD I USE?

Linkerd. Duh.


I THINK THIS ARTICLE SUCKS / I THINK YOU SUCK

Please share this link with all your friends so that they can see just how much
it sucks / I suck.


THANKS AND CREDITS

As you might’ve guessed from the title, this article was inspired by Jay Krep’s
fantastic treatise on logs, The Log: What every software engineer should know
about real-time data’s unifying abstraction. I met Jay when I interviewed at
LinkedIn almost a decade ago and he’s been an inspiration ever since.

While I like to call myself a Linkerd maintainer, the reality is that I am
mostly “maintainer of Linkerd’s README.md”. Linkerd today is the work of many
many many many people, and would not be possible without the amazing community
of contributors and adopters.

Finally, a special shoutout to the creator of Linkerd, Oliver Gould (primus
inter pares), who took the plunge with me on this whole service mesh thing many
years ago.


FOOTNOTES

 1. From Linkerd’s perspective, gRPC is basically the same as HTTP/2, you just
    happen to be using protobuf in the payload. From the developer’s
    perspective, of course, it’s quite different. [return]
 2. “Mutual” means that the client’s certificate is also validated. This is as
    opposed to “regular” TLS, e.g. between a web browser and a web server, which
    typically only validates the server’s certificate. [return]
 3. Thanks to Cindy Sridharan for introducing me to this term. [return]
 4. In fact, the first version of Linkerd was simply Finagle wrapped up in proxy
    form. [return]
 5. As does, frankly, the fact that it succeeded. [return]
 6. At least, at the 10,000-ft level. There’s a lot more to it than this, of
    course. [return]
 7. These three companies play very different roles: Lyft’s involvement seems to
    be in name only; they were the originator of Envoy but don’t appear to use
    Istio or even contribute to it. IBM contributes to Istio and also uses it.
    Google contributes heavily but as far as I can tell doesn’t actually use
    Istio. [return]
 8. Systemd comes to mind. The comparison has been made, several times. [return]
 9. In practice, Istio appears to have issues not just with complexity and UX
    but with performance. During our third-party Linkerd benchmark evaluation,
    for example, evaluators were able to find situations where Istio’s tail
    latency was 100x that of Linkerd, as well as low-resource environments where
    Linkerd happily chugged along but Istio completely stopped functioning.
    [return]


‍

The enterprise architect's guide to the service mesh

Everything you need to know about service meshes, including mTLS, zero trust,
eBPF, sidecars, and more. This is your essential guide!

Download ebook

Relevant articles

Linkerd vs IstioeBPF, Sidecars, and the Future of the Service MeshA Kubernetes
engineer’s guide to mTLS



SIGN UP FOR THE SERVICE MESH NEWSLETTER

No junk. No spam. Just the latest and greatest service mesh news, from the
company that invented the service mesh.

Business email*



Copyright © 2024 Buoyant, Inc.



PRODUCTS

 * Buoyant Enterprise for Linkerd
 * Linkerd Support
 * Buoyant Cloud


SERVICE MESH

 * Service Mesh Academy
 * The Service MeshLinkerd vs IstiomTLS guideCase StudiesResources


ABOUT

 * About Us
 * Newsroom
 * Contact Us


LEGAL

Privacy Policy
Cookie Policy

By clicking “Accept”, you agree to the storing of cookies on your device to
enhance site navigation, analyze site usage, and assist in our marketing
efforts. View our Privacy Policy for more information.
Only requiredAccept all

Privacy Preferences
By clicking “Accept”, you agree to the storing of cookies on your device to
enhance site navigation, analyze site usage, and assist in our marketing
efforts. View our Privacy Policy for more information.
Only requiredAccept all