www.influxdata.com
Open in
urlscan Pro
2606:4700:10::ac43:29d9
Public Scan
Submitted URL: https://em.influxdata.com/OTcyLUdEVS01MzMAAAGECvUNAnAZtGU14xTrg00c0VvzM56S9ftI10y3EKulnoCG-OA_Dt27H0C6OrCyVCHEUO0_wDE=
Effective URL: https://www.influxdata.com/blog/mqtt-vs-kafka-iot-advocates-perspective-part-1/?utm_source=newsletter&utm_medium=email&utm_...
Submission: On April 27 via api from US — Scanned from DE
Effective URL: https://www.influxdata.com/blog/mqtt-vs-kafka-iot-advocates-perspective-part-1/?utm_source=newsletter&utm_medium=email&utm_...
Submission: On April 27 via api from US — Scanned from DE
Form analysis
4 forms found in the DOMGET https://www.influxdata.com/search/r/
<form class="search-form" action="https://www.influxdata.com/search/r/" method="get">
<span class="search-text-wrap">
<label for="s" class="screen-reader-text">Search …</label>
<input name="s" class="search-field" type="text" autocomplete="off" value="" placeholder="Search …">
</span>
<span id="close" class="close"><span class="ast-icon icon-close"></span></span>
</form>
GET https://www.influxdata.com/search/r/
<form class="search-form" action="https://www.influxdata.com/search/r/" method="get">
<span class="search-text-wrap">
<label for="s" class="screen-reader-text">Search …</label>
<input name="s" class="search-field" type="text" autocomplete="off" value="" placeholder="Search …">
</span>
<span id="close" class="close"><span class="ast-icon icon-close"></span></span>
</form>
POST https://www.influxdata.com/wp-comments-post.php
<form action="https://www.influxdata.com/wp-comments-post.php" method="post" id="ast-commentform" class="comment-form">
<p class="comment-notes"><span id="email-notes">Your email address will not be published.</span> Required fields are marked <span class="required">*</span></p>
<div class="ast-row comment-textarea">
<fieldset class="comment-form-comment">
<div class="comment-form-textarea ast-col-lg-12"><label for="comment" class="screen-reader-text">Type here..</label><textarea id="comment" name="comment" placeholder="Type here.." cols="45" rows="8" aria-required="true"></textarea></div>
</fieldset>
</div>
<div class="ast-comment-formwrap ast-row">
<p class="comment-form-author ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="author" class="screen-reader-text">Name*</label><input id="author" name="author" type="text" value="" placeholder="Name*" size="30"
aria-required="true"></p>
<p class="comment-form-email ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="email" class="screen-reader-text">Email*</label><input id="email" name="email" type="text" value="" placeholder="Email*" size="30"
aria-required="true"></p>
<p class="comment-form-url ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="url"><label for="url" class="screen-reader-text">Website</label><input id="url" name="url" type="text" value="" placeholder="Website" size="30"></label>
</p>
</div>
<p class="form-submit"><input name="submit" type="submit" id="submit" class="submit" value="Post Comment »"> <input type="hidden" name="comment_post_ID" value="267046" id="comment_post_ID">
<input type="hidden" name="comment_parent" id="comment_parent" value="0">
</p>
<p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="be0102e840"></p>
<p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js" name="ak_js" value="1651081085570">
<script>
document.getElementById("ak_js").setAttribute("value", (new Date()).getTime());
</script>
</p>
</form>
<form id="mktoForm_1212"></form>
Text Content
Skip to content Search … * ProductsMenu Toggle * InfluxDB InfluxDB Build real-time applications for analytics, IoT and cloud-native services in less time with less code using InfluxDB. Learn more Running in the cloud Fast, elastic, serverless real-time monitoring platform, dashboarding engine, analytics service and event and metrics processor. Running in my own environment InfluxDB Enterprise is the solution for running the InfluxDB platform on your own infrastructure. Running on my laptop InfluxDB is the essential time series toolkit — dashboards, queries, tasks and agents all in one place. Collecting data Telegraf is a plugin-driven server agent for collecting and sending metrics and events from databases, systems, and IoT sensors. * Developers * Customers * CompanyMenu Toggle * Company-mega The Platform for Real-Time Apps Innovators are building the future of data with our leading time series platform, InfluxDB. About Us At InfluxData, we empower developers and organizations to build real-time IoT, analytics and cloud applications with time-stamped data. Careers InfluxData is a remote-first company that’s growing rapidly worldwide. Join us! * Pricing * Contact Us * LoginMenu Toggle * Login to InfluxDB Cloud 2.0 * Login to InfluxDB Enterprise * Login to InfluxDB Cloud 1.x * Get InfluxDB Search … Main Menu Get InfluxDB Get InfluxDB * ProductsMenu Toggle * InfluxDB InfluxDB Build real-time applications for analytics, IoT and cloud-native services in less time with less code using InfluxDB. Learn more Running in the cloud Fast, elastic, serverless real-time monitoring platform, dashboarding engine, analytics service and event and metrics processor. Running in my own environment InfluxDB Enterprise is the solution for running the InfluxDB platform on your own infrastructure. Running on my laptop InfluxDB is the essential time series toolkit — dashboards, queries, tasks and agents all in one place. Collecting data Telegraf is a plugin-driven server agent for collecting and sending metrics and events from databases, systems, and IoT sensors. * Developers * Customers * CompanyMenu Toggle * Company-mega The Platform for Real-Time Apps Innovators are building the future of data with our leading time series platform, InfluxDB. About Us At InfluxData, we empower developers and organizations to build real-time IoT, analytics and cloud applications with time-stamped data. Careers InfluxData is a remote-first company that’s growing rapidly worldwide. Join us! MQTT VS KAFKA: AN IOT ADVOCATE’S PERSPECTIVE (PART 1 – THE BASICS) By Jay Clifford / April 20, 2022 April 25, 2022 / Community, Developer, IoT / Leave a Comment 9 minutes With the Kafka Summit fast approaching, I thought it was time to get my hands dirty and see what it’s all about. As an advocate for IoT, I heard about Kafka but was too embedded in protocols like MQTT to investigate further. For the uninitiated (like me) both protocols seem extremely similar if not almost competing. However, I have learned this is far from the case and actually, in many cases, they complement one another. In this blog series, I hope to summarize what Kafka and MQTT are and how they can both fit into an IoT architecture. To help explain some of the concepts, I thought it would be practical to use a past scenario: In the previous blog, we discussed a scenario where we wanted to monitor emergency fuel generators. We created a simulator with the InfluxDB Python Client library to send generator data to InfluxDB Cloud. For this blog, I decided to reuse that simulator but replace the client library with an MQTT publisher and Kafka producer to understand the core mechanics behind each. You can find the code for this demo here. UNDERSTANDING THE BASICS So what is Kafka? Kafka is described as an event streaming platform. It conforms to a publisher-subscriber architecture with the added benefit of data persistence (to understand more of the fundamentals, check out this blog). Kafka also promotes some pretty great benefits within the IoT sector: * High throughput * High availability * Connectors to well-known third-party platforms So why would I not just build my entire IoT platform using Kafka? Well, it boils down to a few key issues: 1. Kafka is built for stable networks which deploy a good infrastructure 2. It does not deploy key data delivery features such as Keep-Alive and Last Will Having said this, let’s go ahead and compare implementations of writing a basic Kafka producer and compare it to an MQTT publisher within the context of the Emergency generator demo: Assumptions: For the purposes of this demo, I will be making use of the Mosquitto MQTT Broker and the Confluent platform (Kafka). We will not cover the initial creation/setup here, but you can consult these instructions accordingly: 1. Mosquito Broker 2. Confluent (I highly recommend using the free trial of Confluent Cloud to sense check if Kafka is right for you before bogging yourself down in an on-prem setup) INITIALIZATION Let’s start with the initialization of our MQTT publisher and Kafka producer: MQTT The minimum requirements for an MQTT Publisher (omitting security) are as follows: 1. Host: The address / IP of the platform hosting the Mosquitto server 2. Port: Which port will the MQTT producer talk to. Usually 1883 for basic connectivity, 8883 TLS. 3. Keep Alive Interval: The amount of time in seconds allowed between communications. self.client.connect(host=self.mqttBroker,port=self.port, keepalive=MQTT_KEEPALIVE_INTERVAL) KAFKA There was a little more background work when it came to Kafka. We had to establish connectivity to two different Kafka entities: 1. Kafka cluster: This is a given we will be sending our payload here. 2. Schema registry: The registry lies outside the scope of the Kafka Cluster. It handles the storing and delivery of topic schemers. In other words, this forces producers to deliver data in a format that is expected by the Kafka consumer. More on this later. So let’s set up connectivity to both entities: Schema registry schema_registry_conf = {'url': 'https://psrc-8vyvr.eu-central-1.aws.confluent.cloud', 'basic.auth.user.info': <USERNAME>:<PASSWORD>'} schema_registry_client = SchemaRegistryClient(schema_registry_conf) Copy The breakdown: * url: The address of your schema registry. Confluent supports the creation of registries for hosting. * authentication: Like any repository, it contains basic security to keep your schema designs secure. Kafka cluster self.json_serializer = JSONSerializer(self.schema_str, schema_registry_client, engine_to_dict) self.p = SerializingProducer({ 'bootstrap.servers': 'pkc-41wq6.eu-west-2.aws.confluent.cloud:9092', 'sasl.mechanism': 'PLAIN', 'security.protocol': 'SASL_SSL', 'sasl.username': '######', 'sasl.password': '######', 'error_cb': error_cb, 'key.serializer': StringSerializer('utf_8'), 'value.serializer': self.json_serializer }) Copy The breakdown: 1. bootstrap.servers: In short, the address points to Confluent Cloud hosting our Kafka Cluster; more specifically, a Kafka broker. (Kafka also has the notation of brokers but on a per-topic basis). Bootstrap is a reference to the producer establishing its presence globally in the cluster. 2. sasl.*: Simple security authentication protocol; these are a minimum requirement for connecting to Confluent Kafka. I won’t cover this here, as it is of no interest to our overall comparison. 3. error_cb: Handles Kafka error handling. 4. key_serializer: This describes how the message key will be stored within Kafka. Keys are an extremely important part of how Kafka handles payloads. More on this within the next blog. 5. Value.serializer: We will cover this next, in short, we must describe what type of data our producer will be sending. This is why defining our schema registry is very important. TOPICS AND DELIVERY Now that we have initiated our MQTT publisher and Kafka producer, it’s time to send our Emergency generator data. To do this, both protocols require the establishment of a topic and data preparation before delivery: MQTT Within MQTT’s world, a topic is a UTF-8 string that establishes logical filtering between payloads. Topic Name Payload temperature 36 fuel 400 In Part 2 of this series, we break down the capabilities and differences of MQTT and Kafka topics in further detail. For now, we are going to establish one topic to send all of our Emergency Generator data (this is not best practice but is logical in the complexity build-up of this project). message = json.dumps(data) self.client.publish(topic=”emergency_generator”, message) Copy MQTT has the benefit of being able to generate topics on demand during the delivery of a payload. If the topic already exists, the payload is simply sent to the established topic. If not, the topic is created. This makes our code relatively simple. We define our topic name and the JSON string we plan to send. MQTT payloads by default are extremely flexible, which has pros and cons. On the positive side, you do not need to define strict schema typing for your data. On the other hand, you rely on your subscribers being robust enough to handle the incoming messages which fall out of the norm. KAFKA So I must admit, I came in with foolish optimism that sending a JSON payload via Kafka would be as simple as publish(). How wrong I was! Let’s walk through it: self.schema_str = """{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "Generator", "description": "A fuel engines health data", "type": "object", "properties": { "generatorID": { "description": "UniqueID of generator", "type": "string" }, "lat": { "description": "latitude", "type": "number" }, "lon": { "description": "longitude", "type": "number" }, "temperature": { "description": "temperature", "type": "number" }, "pressure": { "description": "pressure", "type": "number" }, "fuel": { "description": "fuel", "type": "number" } }, "required": [ "generatorID", "lat", "lon", "temperature", "pressure", "fuel" ] }""" schema_registry_conf = {'url': 'https://psrc-8vyvr.eu-central-1.aws.confluent.cloud', 'basic.auth.user.info': environ.get('SCHEMEA_REGISTRY_LOGIN')} schema_registry_client = SchemaRegistryClient(schema_registry_conf) self.json_serializer = JSONSerializer(self.schema_str, schema_registry_client, engine_to_dict) self.p = SerializingProducer({ 'bootstrap.servers': 'pkc-41wq6.eu-west-2.aws.confluent.cloud:9092', 'sasl.mechanism': 'PLAIN', 'security.protocol': 'SASL_SSL', 'sasl.username': environ.get('SASL_USERNAME'), 'sasl.password': environ.get('SASL_PASSWORD'), 'error_cb': error_cb, 'key.serializer': StringSerializer('utf_8'), 'value.serializer': self.json_serializer }) Copy The first task on our list is to establish a JSON schema. The JSON schema describes the expected structure of our data. In our example, we define our generator meter readings (temperature, pressure, fuel) and also our metadata (generdatorID, lat, lon). Note, within the definition, we define their data types and which data points are required to be sent with each payload. We have already discussed connecting to our schema registry earlier. Next, we want to register our JSON schema with the registry and create a JSON serializer. To do this we need three parameters: 1. schema_str: the schema design we discussed 2. schema _registry_client: Our object connecting to the registry 3. engine_to_dict: The JSON serializer which allows you to write a custom function for building out a Python dictionary struct which will be converted to JSON format. The json_serializer object is then included within the initialization of the Serializing Producer. Finally to send data we call our producer object: self.p.produce(topic=topic, key=str(uuid4()), value=data, on_delivery=delivery_report) To send data to our Kafka cluster we: 1. Define our topic name (Kafka by default requires the manual generation of topics. You can, via settings within the broker/cluster, allow auto-generation). 2. Create a unique key for our data, the data we wish to publish (this will be processed through our custom function and delivery report (a function defined to provide feedback on successful or unsuccessful delivery of the payload). My first impression of strongly typed / schemer-based design was: “Wow, this must leave system designers with a lot of code to maintain and a steep learning curve”. As I implemented the example, I realized you would probably avert a lot of future technical debt this way. Schemers force new producers/consumers to conform to the current intended data structure or generate a new schema version. This allows the current system to continue unimpeded by a rogue producer connecting to your Kafka cluster. I am going to cover this in more detail within Part 2 of this blog series. PROSPECTIVE AND CONCLUSION So, what have we done? Well, in its most brutally simplistic form we have created a Kafka producer and MQTT publisher to transmit our generator data. At face value, it may seem Kafka seems vastly more complex in its setup than MQTT for the same result. At this level, you would be correct. However, we have barely scraped the surface of what Kafka can do and how it should be deployed in a true IoT architecture. I plan to release two more blogs in this series: * Part 2: I cover more of the features unique to Kafka, such as a deeper look into topics, scalability and third-party integrations (including InfluxDB). * Part 3: We are going to take what we have learned and apply best practices to a real IoT project. We will use Kafka’s MQTT proxy and delve deeper into third-party integrations to get the most out of your Kafka infrastructure. Until then check out the code, run it, play with it, and improve it. Next blog (Part 2 of this series) we cover topics in more detail. RELATED BLOG POSTS * MQTT vs Kafka: An IoT Advocate’s Perspective (Part 2 – Kafka the Mighty) In Part 1 of this series, we started to compare the uses of Kafka … MQTT vs Kafka: An IoT Advocate’s Perspective (Part 2 – Kafka the Mighty) Read More » Jay Clifford * Come See Us at Kafka Summit London 2022 Get ready, because InfluxData and the InfluxDB maintainer team are preparing for Kafka Summit … Come See Us at Kafka Summit London 2022 Read More » Jason Myers * Using Google Workspace Data for Security Observability This article was originally published in The New Stack. Keeping your systems secure is … Using Google Workspace Data for Security Observability Read More » Darin Fisher Twitter Facebook Linkedin Post navigation ← Previous Post Next Post → LEAVE A COMMENT CANCEL REPLY Your email address will not be published. Required fields are marked * Type here.. Name* Email* Website Δ Twitter Facebook Linkedin CATEGORIES * About Company * Community * Developer► * Chronograf * Flux * InfluxData * InfluxDB * InfluxDB Templates * Kapacitor * Partners► * AWS * Azure * Google Cloud * Release Notes * Tech Tips * Telegraf * General * InfluxDB Cloud * InfluxDB Enterprise * Press Room► * In The News * Press Releases * Trust * Tutorial * Use Case► * Analytics * DevOps * IIoT * IoT * Security TRY INFLUXDB CLOUD The most powerful time series database as a service is a click away. Try It Free 548 Market St, PMB 77953 San Francisco, California 94104 Contact Us PRODUCTS InfluxDB Telegraf Pricing Support Use Cases RESOURCES InfluxDB U Blog Community Customers Swag Events INFLUXDATA About Careers Partners Legal Newsroom Contact Sales SIGN UP FOR THE INFLUXDATA NEWSLETTER © 2022 InfluxData Inc. All Rights Reserved. Sitemap Scroll to Top X INFLUXDATA CONSENT MANAGER Like many companies, InfluxData uses cookies and other technologies, some of which are essential to make our website work. Others help us improve services and the user experience. In using our site, you agree to the Privacy Policy and Cookie Policy.