developers.redhat.com
Open in
urlscan Pro
2a02:26f0:6c00::210:bab0
Public Scan
Submitted URL: https://app.engage.redhat.com/e/er?s=1795&lid=248057&elqTrackId=6ed4c39382b449f6884495e1250c0282&elq=4da937ef23644a4dadc2abf28...
Effective URL: https://developers.redhat.com/articles/2022/05/03/fine-tune-kafka-performance-kafka-optimization-theorem?sc_cid=7013a0000030wJ...
Submission: On May 16 via api from US — Scanned from DE
Effective URL: https://developers.redhat.com/articles/2022/05/03/fine-tune-kafka-performance-kafka-optimization-theorem?sc_cid=7013a0000030wJ...
Submission: On May 16 via api from US — Scanned from DE
Form analysis
1 forms found in the DOMGET /search
<form action="/search" method="get">
<label for="pfe-navigation__search-label-universal">Search</label>
<input id="pfe-navigation__search-label-universal" name="t" type="text" placeholder="Search">
<button>Search</button>
</form>
Text Content
Skip to main content * Topics FEATURED TOPICS * Kubernetes Learn how this powerful open-source tool helps you manage components across containers in any environment. * Quarkus Kubernetes-native Java with low memory footprint and fast boot times for microservices and serverless applications. * DevOps DevOps involves the combination of cultural change, process automation, and tools to improve your time-to-market. * Linux Develop applications on the most popular Linux for the enterprise—all while using the latest technologies. OTHER TOPICS * .NET Core * Apache Kafka on Kubernetes * API Management * Camel K * Containers * Data Integration * Data Science * DevOps * DevTools * Edge computing * Event-Driven Architecture * GitOps * Istio service mesh * Java * Javascript * Microservices * Node.js * Open Source * Operators * Python * Serverless * Spring Boot * View all topics * Products FEATURED PRODUCTS * Red Hat Enterprise Linux A stable, proven foundation that's versatile enough for rolling out new applications, virtualizing environments, and creating a secure hybrid cloud. * OpenShift Open, hybrid-cloud Kubernetes platform to build, run, and scale container-based applications -- now with developer tools, CI/CD, and release management. * Red Hat build of OpenJDK The Red Hat build of OpenJDK is a free and supportable open source implementation of the Java Platform, Standard Edition (Java SE). MORE PRODUCTS * 3scale API Management * AMQ * Ansible Automation Platform * CodeReady Containers * CodeReady Studio * CodeReady Workspaces * Container Development Kit * Fuse * JBoss Enterprise Application Platform * Process Automation Manager * Migration Toolkit for Applications * OpenShift API Management * OpenShift Application Runtimes * OpenShift Data Science * OpenShift Streams for Apache Kafka * Red Hat Decision Manager * Red Hat Developer Toolset * Red Hat build of Quarkus * View all products * Develop in the sandbox * Developer Sandbox * Build * Tools * Events * Learn * Partner Search Search All Red Hat Back to menu * You are here RED HAT Learn about our open source products, services, and company. * You are here RED HAT CUSTOMER PORTAL Get product support and knowledge from the open source experts. * You are here RED HAT DEVELOPER Read developer tutorials and download Red Hat software for cloud application development. * You are here RED HAT PARTNER CONNECT Become a Red Hat partner and get support in building customer solutions. -------------------------------------------------------------------------------- * PRODUCTS * ANSIBLE.COM Learn about and try our IT automation product. * TRY, BUY, SELL * RED HAT HYBRID CLOUD Access technical how-tos, tutorials, and learning paths focused on Red Hat’s hybrid cloud managed services. * RED HAT STORE Buy select Red Hat products and services online. * RED HAT MARKETPLACE Try, buy, sell, and manage certified enterprise software for container-based environments. * COMMUNITY & OPEN SOURCE * THE ENTERPRISERS PROJECT Read analysis and advice articles written by CIOs, for CIOs. * OPENSOURCE.COM Read articles on a range of topics about open source. * * RED HAT SUMMIT Register for and learn about our annual open source IT industry event. * RED HAT ECOSYSTEM CATALOG Find hardware, software, and cloud providers―and download container images―certified to perform with Red Hat technologies. Article FINE-TUNE KAFKA PERFORMANCE WITH THE KAFKA OPTIMIZATION THEOREM May 3, 2022 Share on twitter Share on facebook Share on linkedin Share with email Tags: Kafka Bilgin Ibryam Red Hat Product Manager Table of contents: * Optimization goals for Kafka * Kafka priorities and the CAP theorem * Kafka primitives * The Kafka optimization theorem * Summary The performance of your Apache Kafka environment will be affected by many factors, including choices, such as the number of partitions, number of replicas, producer acknowledgments, and message batch sizes you provision. But different applications have different requirements. Some prioritize latency over throughput, and some do the opposite. Similarly, some applications put a premium on durability, whereas others care more about availability. This article introduces a way of thinking about the tradeoffs and how to design your message flows using a model I call the Kafka optimization theorem. Optimization goals for Kafka OPTIMIZATION GOALS FOR KAFKA Like other systems that handle large amounts of data, Kafka balances the opposing goals of latency and throughput. Latency, in this article, refers to how long it takes to process a single message, whereas throughput refers to how many messages can be processed during a typical time period. Architectural choices that improve one goal tend to degrade the other, so they can be seen as opposite ends of a spectrum. Two other opposed goals are durability and availability. Durability (which in our context also includes consistency) determines how robust the entire system is in the face of failure, whereas availability determines how likely it is to be running. There are tradeoffs between these two goals as well. Thus, the main performance considerations for Kafka can be represented as in Figure 1. The diagram consists of two axes, each with one of the goals at one of its ends. For each axis, performance lies somewhere between the two ends. Figure 1: Kafka performance involves two orthogonal axes: Availability versus durability and latency versus throughput. Kafka priorities and the CAP theorem KAFKA PRIORITIES AND THE CAP THEOREM The CAP theorem, which balances consistency (C), availability (A), and partition tolerance (P), is one of the foundational theorems in distributed systems. Note: The word partition in CAP is totally different from its use in Kafka streaming. In CAP, a partition is a rupture in the network that causes two communicating peers to be temporarily unable to communicate. In Kafka, a partition is simply a division of data to permit parallelism. Partition tolerance in the CAP sense is always required (because one can't guarantee that a partition won't happen), so the only real choice in CAP is whether to prioritize CP or AP. But CAP is sometimes extended with other considerations. In the absence of a partition, one can ask what else (E) happens and choose between optimizing for latency (L) or consistency (C). The complete list of considerations is called PACELC (Figure 2). PACELC explains how some systems such as Amazon's Dynamo favor availability and lower latency over consistency, whereas ACID-compliant systems favor consistency over availability or lower latency. Figure 2: Besides the familiar CAP principle, one can choose between latency and consistency when there is no partitioning. Kafka primitives KAFKA PRIMITIVES Every time you configure a Kafka cluster, create a topic, or configure a producer or a consumer, you are making a choice of latency versus throughput and of availability versus durability. To explain why, we have to look into the main primitives in Kafka and the role that client applications play in an event flow. PARTITIONS Topic partitions represent the unit of parallelism in Kafka, and are shown on the horizontal axis in Figure 1. In the broker and the clients, reads and writes to different partitions can be done fully in parallel. While many factors can limit throughput, a higher number of partitions for a topic generally enables higher throughput, and a smaller number leads to lower throughput. However, a very large number of partitions causes the creation of more metadata that needs to be passed and processed across all brokers and clients. This increased metadata can degrade end-to-end latency unless more resources are added to the brokers. This is a somewhat simplified description of how partitions work on Kafka, but it will serve to demonstrate the main tradeoff that the number of partitions leads to in our context. REPLICAS Replicas determine the number of copies (including the leader) for each topic partition in a cluster, and are shown on the vertical axis in Figure 1. Data durability is ensured by placing replicas of a partition in different brokers. A higher number of replicas ensures that the data is copied to more brokers and offers better durability of data in the case of broker failure. On the other hand, a lower number of replicas reduces data durability, but in certain circumstances can increase availability to the producers or consumers by tolerating more broker failures. Availability for a consumer is determined by the availability of in-sync replicas, whereas availability for a producer is determined by the minimum number of in-sync replicas (min.isr). With a low number of replicas, the availability of overall data flow depends on which brokers fail (whether it's the one with the partition's leader of interest) and whether the other brokers are in sync. For our purposes, we can assume that fewer replicas lead to higher application availability and lower data durability. PRODUCERS AND CONSUMERS Partitions and replicas represent the broker side of the picture, but are not the only primitives that influence the tradeoffs between throughput and latency and between durability and availability. Other participants that shape the main Kafka optimization tradeoffs are found in the client applications. Topics are used by producers that send messages and consumers that read these messages. Producers and consumers also state their preference between throughput versus latency and durability versus availability through various configuration options. It is the combination of topic and client application configurations (and other cluster-level configurations, such as leader election) that defines what your application is optimized for. Consider the flow of events consisting of a producer, a consumer, and a topic. Optimizing such an event flow for average latency on the client side requires tuning the producer and consumer to exchange smaller batches of messages. The same flow can be tuned for average throughput by configuring the producer and consumer for larger batches of messages. Producers and consumers involved in a message flow also state their preference for durability or availability. A producer that favors durability over availability can demand a higher number of acknowledgments by specifying acks=all. A lower number of acknowledgments (lower than the default min.isr, which means 0 or 1) could lead to higher availability from the producer point of view by tolerating a higher number of broker failures. However, this lower number reduces data durability in the case of catastrophic events affecting the brokers. The consumer configurations influencing the vertical dimension are not as straightforward as the producer configurations, but depend on the consumer application logic. A consumer can favor higher consistency by committing message consumption offsets more often or even individually. This does not affect the durability of the records in the broker, but does affect how consumed records are tracked, and can prevent duplicate message processing in the cosumer. Alternatively, the consumer can favor availability by increasing various timeouts and tolerating broker failures for longer periods of time. The Kafka optimization theorem THE KAFKA OPTIMIZATION THEOREM We have defined the main actors involved in an event flow as a producer, a consumer, and a topic. We've also defined the opposing goals we have to optimize for: throughput versus latency and durability versus availability. Given these parameters, the Kafka optimization theorem states that any Kafka data flow makes the tradeoffs shown in Figure 3 in throughput versus latency and durability versus availability. Figure 3: Choices for each parameter determines where a Kafka application stands on the axes defined in Figure 1. Figure 3: Choices for each parameter determines where a Kafka application stands on these two axes THE KAFKA OPTIMIZATION THEOREM AND THE PRIMARY CONFIGURATION OPTIONS For simplicity, we put producer and consumer preferences on the same axes as topic replicas and partitions in Figure 3. Optimizing for a particular goal is easier when these primitives are aligned. For example, optimizing an end-to-end data flow for low latency is best achieved with smaller producer and consumer message batches combined with a small number of partitions. A data flow optimized for higher throughput would have larger producer and consumer message batches and a higher number of partitions for parallel processing. A data flow optimized for durability would have a higher number of replicas and require a higher number of producer acknowledgments and granular consumer commits. A data flow that is optimized for availability would prefer a smaller number of replicas and a smaller number of producer acknowledgments with larger timeouts. In practice, there is no correlation in the configuration of partitions, replicas, producers, and consumers. It is possible to have a large number of replicas for a topic (min.isr), but have a producer that requires a 0 or 1 for acknowledgments. Or you could configure a higher number of partitions because your application logic requires it, and small producer and consumer message batches. These are valid scenarios in Kafka, and one of its strengths is that it's a highly configurable and flexible eventing system that satisfies many use cases. Nevertheless, the framework proposed here can serve as a mental model for the main Kafka primitives and how they relate to the optimization dimensions. If you understand the foundational forces, you can tune them in ways specific to your application and understand the effects. THE GOLDEN RATIO The proposed Kafka optimization theorem deliberately avoids stating numbers, instead just laying out the relationships between the main primitives and the direction of change. The theorem is not intended to serve as a concrete optimization configuration but rather as a guide that shows how reducing or increasing a primitive configuration influences the direction of the optimization. Yet sharing a few of the most common configuration options accepted as the industry best practices could be useful as a starting point and demonstration of the theorem in practice. Kafka clusters today, whether run on-premises with something like Strimzi, or as a fully managed service offering such as OpenShift Streams for Apache Kafka, are almost always deployed within a single region. A production-grade Kafka cluster is typically spread into three availability zones with a replication factor of 3 (RF=3) and minimum in-sync replicas of 2 (min.isr=2). These values ensure a good level of data durability during happy times, and good availability for client applications during temporary disruptions. This configuration represents a solid middle ground, because specifying min.isr=3 would prevent producers from producing even when a single broker is affected, and specifying min.isr=1 would affect both producer and consumer when a leader is down. Typically, this replication configuration is accompanied by acks=all on the producer side and by default offset commit configurations for the consumers. While there are commonalities in consistency and availability tradeoffs among different applications, throughput and latency requirements vary. The number of partitions for a topic is influenced by the shape of the data, the data processing logic, and its ordering requirements. At the same time, the number of partitions dictates the maximum parallelism and message throughput you can achieve. As a consequence, there is no good default number or range for the partition count. By default, Kafka clients are optimized for low latency. We can observe that from the default producer values (batch.size=16384, linger.ms=0, compression.type=none) and the default consumer values (fetch.min.bytes=1, fetch.max.wait.ms=500). The producer prior to Kafka 3 had acks=1, which recently changed to acks=all to create a preference for durability rather than availability or low latency. Optimizing the client applications for throughput would require increasing wait times and batch sizes five to tenfold and examining the results. Knowing the default values and what they are optimized for is a good starting point for your service optimization goals. Summary SUMMARY CAP is a great theorem for failure scenarios. While failures are a given and partitioning will always happen, we have to optimize applications for happy paths too. This article introduces a simplified model explaining how Kafka primitives can be used for performance optimization. The article deliberately focuses on the main primitives and selective configuration options to demonstrate its principles. In real life, more factors influence your application performance metrics. Encryption, compression, and memory allocation affect latency and throughput. Transactions, exactly-one semantics, retries, message flushes, and leader election affect consistency and availability. You might have broker primitives (replicas and partitions) and client primitives (batch sizes and acknowledgments) optimized for competing tradeoffs. Finding the ideal Kafka configuration for your application will require experimentation, and the Kafka optimization theorem will guide you in the journey. RECENT ARTICLES * ALL ABOUT LOCAL AND SELF-MANAGED KAFKA DISTRIBUTIONS * HOW TO USE OPERATORS WITH AWS CONTROLLERS FOR KUBERNETES * DEVELOPER TOOLS REBRAND, SAY FAREWELL TO CODEREADY NAME * HOW TO ORGANIZE JFR DATA WITH RECORDING LABELS IN CRYOSTAT 2.1 * RHEL 8.6: WHAT'S NEW AND HOW TO UPGRADE RELATED CONTENT APPLICATION MODERNIZATION PATTERNS WITH APACHE KAFKA, DEBEZIUM, AND KUBERNETES DISTRIBUTED TRANSACTION PATTERNS FOR MICROSERVICES COMPARED GETTING STARTED WITH RED HAT OPENSHIFT STREAMS FOR APACHE KAFKA WHAT’S UP NEXT? Get started learning how to develop with OpenShift Streams for Apache Kafka. In this learning path, you’ll sign up for a free Red Hat account, provision a managed Kafka instance, and connect to it using service account credentials via SSL. Learn by doing COMMENTS Please enable JavaScript to view the comments powered by Disqus. * FEATURED TOPICS * Istio * Quarkus * CI/CD * Serverless * Enterprise Java * Linux * Microservices * DevOps * BUILD * Getting Started Center * Developer Tools * Interactive Tutorials * Container Catalog * Operators Marketplace * Certify Applications * Red Hat on Github * QUICKLINKS * What's new * DevNation events * Upcoming Events * Books * Cheat Sheets * Videos * Products * COMMUNICATE * Site Status Dashboard * Report a website issue * Report a security problem * Helping during COVID-19 * About us * Contact Sales Red Hat Developer Build here. Go anywhere. We serve the builders. The problem solvers who create careers with code. Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead. * * * * Sign me up * ©2022 Red Hat, Inc. * Cookie-Präferenzen * Privacy Statement * Terms of Use * All policies and guidelines ✓ Thanks for sharing! AddToAny More…