www.influxdata.com Open in urlscan Pro
2606:4700:10::ac43:29d9  Public Scan

Submitted URL: https://em.influxdata.com/OTcyLUdEVS01MzMAAAGECvUNAnAZtGU14xTrg00c0VvzM56S9ftI10y3EKulnoCG-OA_Dt27H0C6OrCyVCHEUO0_wDE=
Effective URL: https://www.influxdata.com/blog/mqtt-vs-kafka-iot-advocates-perspective-part-1/?utm_source=newsletter&utm_medium=email&utm_...
Submission: On April 27 via api from US — Scanned from DE

Form analysis 4 forms found in the DOM

GET https://www.influxdata.com/search/r/

<form class="search-form" action="https://www.influxdata.com/search/r/" method="get">
  <span class="search-text-wrap">
    <label for="s" class="screen-reader-text">Search …</label>
    <input name="s" class="search-field" type="text" autocomplete="off" value="" placeholder="Search …">
  </span>
  <span id="close" class="close"><span class="ast-icon icon-close"></span></span>
</form>

GET https://www.influxdata.com/search/r/

<form class="search-form" action="https://www.influxdata.com/search/r/" method="get">
  <span class="search-text-wrap">
    <label for="s" class="screen-reader-text">Search …</label>
    <input name="s" class="search-field" type="text" autocomplete="off" value="" placeholder="Search …">
  </span>
  <span id="close" class="close"><span class="ast-icon icon-close"></span></span>
</form>

POST https://www.influxdata.com/wp-comments-post.php

<form action="https://www.influxdata.com/wp-comments-post.php" method="post" id="ast-commentform" class="comment-form">
  <p class="comment-notes"><span id="email-notes">Your email address will not be published.</span> Required fields are marked <span class="required">*</span></p>
  <div class="ast-row comment-textarea">
    <fieldset class="comment-form-comment">
      <div class="comment-form-textarea ast-col-lg-12"><label for="comment" class="screen-reader-text">Type here..</label><textarea id="comment" name="comment" placeholder="Type here.." cols="45" rows="8" aria-required="true"></textarea></div>
    </fieldset>
  </div>
  <div class="ast-comment-formwrap ast-row">
    <p class="comment-form-author ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="author" class="screen-reader-text">Name*</label><input id="author" name="author" type="text" value="" placeholder="Name*" size="30"
        aria-required="true"></p>
    <p class="comment-form-email ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="email" class="screen-reader-text">Email*</label><input id="email" name="email" type="text" value="" placeholder="Email*" size="30"
        aria-required="true"></p>
    <p class="comment-form-url ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="url"><label for="url" class="screen-reader-text">Website</label><input id="url" name="url" type="text" value="" placeholder="Website" size="30"></label>
    </p>
  </div>
  <p class="form-submit"><input name="submit" type="submit" id="submit" class="submit" value="Post Comment »"> <input type="hidden" name="comment_post_ID" value="267046" id="comment_post_ID">
    <input type="hidden" name="comment_parent" id="comment_parent" value="0">
  </p>
  <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="be0102e840"></p>
  <p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js" name="ak_js" value="1651081085570">
    <script>
      document.getElementById("ak_js").setAttribute("value", (new Date()).getTime());
    </script>
  </p>
</form>

<form id="mktoForm_1212"></form>

Text Content

Skip to content
Search …
 * ProductsMenu Toggle
    * InfluxDB
      
      InfluxDB
      
      Build real-time applications for analytics, IoT and cloud-native services
      in less time with less code using InfluxDB.
      
      Learn more
      
      Running in the cloud
      
      Fast, elastic, serverless real-time monitoring platform, dashboarding
      engine, analytics service and event and metrics processor.
      
      Running in my own environment
      
      InfluxDB Enterprise is the solution for running the InfluxDB platform on
      your own infrastructure.
      
      Running on my laptop
      
      InfluxDB is the essential time series toolkit — dashboards, queries, tasks
      and agents all in one place.
      
      Collecting data
      
      Telegraf is a plugin-driven server agent for collecting and sending
      metrics and events from databases, systems, and IoT sensors.
      
      

 * Developers
 * Customers
 * CompanyMenu Toggle
    * Company-mega
      
      The Platform for Real-Time Apps
      
      Innovators are building the future of data with our leading time series
      platform, InfluxDB.
      
      About Us
      
      At InfluxData, we empower developers and organizations to build real-time
      IoT, analytics and cloud applications with time-stamped data.
      
      Careers
      
      InfluxData is a remote-first company that’s growing rapidly worldwide.
      Join us!
      
      

 * Pricing
 * Contact Us
 * LoginMenu Toggle
   * Login to InfluxDB Cloud 2.0
   * Login to InfluxDB Enterprise
   * Login to InfluxDB Cloud 1.x
 * Get InfluxDB



Search …
Main Menu



Get InfluxDB
Get InfluxDB
 * ProductsMenu Toggle
    * InfluxDB
      
      InfluxDB
      
      Build real-time applications for analytics, IoT and cloud-native services
      in less time with less code using InfluxDB.
      
      Learn more
      
      Running in the cloud
      
      Fast, elastic, serverless real-time monitoring platform, dashboarding
      engine, analytics service and event and metrics processor.
      
      Running in my own environment
      
      InfluxDB Enterprise is the solution for running the InfluxDB platform on
      your own infrastructure.
      
      Running on my laptop
      
      InfluxDB is the essential time series toolkit — dashboards, queries, tasks
      and agents all in one place.
      
      Collecting data
      
      Telegraf is a plugin-driven server agent for collecting and sending
      metrics and events from databases, systems, and IoT sensors.
      
      

 * Developers
 * Customers
 * CompanyMenu Toggle
    * Company-mega
      
      The Platform for Real-Time Apps
      
      Innovators are building the future of data with our leading time series
      platform, InfluxDB.
      
      About Us
      
      At InfluxData, we empower developers and organizations to build real-time
      IoT, analytics and cloud applications with time-stamped data.
      
      Careers
      
      InfluxData is a remote-first company that’s growing rapidly worldwide.
      Join us!
      
      


MQTT VS KAFKA: AN IOT ADVOCATE’S PERSPECTIVE (PART 1 – THE BASICS)

By Jay Clifford / April 20, 2022 April 25, 2022 / Community, Developer, IoT /
Leave a Comment
9 minutes

With the Kafka Summit fast approaching, I thought it was time to get my hands
dirty and see what it’s all about. As an advocate for IoT, I heard about Kafka
but was too embedded in protocols like MQTT to investigate further. For the
uninitiated (like me) both protocols seem extremely similar if not almost
competing. However, I have learned this is far from the case and actually, in
many cases, they complement one another.

In this blog series, I hope to summarize what Kafka and MQTT are and how they
can both fit into an IoT architecture. To help explain some of the concepts, I
thought it would be practical to use a past scenario:



 

In the previous blog, we discussed a scenario where we wanted to monitor
emergency fuel generators. We created a simulator with the InfluxDB Python
Client library to send generator data to InfluxDB Cloud. For this blog, I
decided to reuse that simulator but replace the client library with an MQTT
publisher and Kafka producer to understand the core mechanics behind each.

You can find the code for this demo here.


UNDERSTANDING THE BASICS

So what is Kafka? Kafka is described as an event streaming platform. It conforms
to a publisher-subscriber architecture with the added benefit of data
persistence (to understand more of the fundamentals, check out this blog). Kafka
also promotes some pretty great benefits within the IoT sector:

 * High throughput
 * High availability
 * Connectors to well-known third-party platforms

So why would I not just build my entire IoT platform using Kafka? Well, it boils
down to a few key issues:

 1. Kafka is built for stable networks which deploy a good infrastructure
 2. It does not deploy key data delivery features such as Keep-Alive and Last
    Will

Having said this, let’s go ahead and compare implementations of writing a basic
Kafka producer and compare it to an MQTT publisher within the context of the
Emergency generator demo:



Assumptions: For the purposes of this demo, I will be making use of the
Mosquitto MQTT Broker and the Confluent platform (Kafka). We will not cover the
initial creation/setup here, but you can consult these instructions accordingly:

 1. Mosquito Broker 
 2. Confluent (I highly recommend using the free trial of Confluent Cloud to
    sense check if Kafka is right for you before bogging yourself down in an
    on-prem setup)


INITIALIZATION

Let’s start with the initialization of our MQTT publisher and Kafka producer:

MQTT

The minimum requirements for an MQTT Publisher (omitting security) are as
follows:

 1. Host: The address / IP of the platform hosting the Mosquitto server
 2. Port: Which port will the MQTT producer talk to. Usually 1883 for basic
    connectivity, 8883 TLS.
 3. Keep Alive Interval: The amount of time in seconds allowed between
    communications.

self.client.connect(host=self.mqttBroker,port=self.port,
keepalive=MQTT_KEEPALIVE_INTERVAL)

KAFKA

There was a little more background work when it came to Kafka. We had to
establish connectivity to two different Kafka entities:

 1. Kafka cluster: This is a given we will be sending our payload here.
 2. Schema registry: The registry lies outside the scope of the Kafka Cluster.
    It handles the storing and delivery of topic schemers. In other words, this
    forces producers to deliver data in a format that is expected by the Kafka
    consumer. More on this later.

So let’s set up connectivity to both entities:

Schema registry

schema_registry_conf = {'url': 'https://psrc-8vyvr.eu-central-1.aws.confluent.cloud', 
                                'basic.auth.user.info': <USERNAME>:<PASSWORD>'}
     schema_registry_client = SchemaRegistryClient(schema_registry_conf)
Copy

The breakdown:

 * url: The address of your schema registry. Confluent supports the creation of
   registries for hosting.
 * authentication: Like any repository, it contains basic security to keep your
   schema designs secure.

Kafka cluster

self.json_serializer = JSONSerializer(self.schema_str, schema_registry_client, engine_to_dict)
    self.p = SerializingProducer({
        'bootstrap.servers': 'pkc-41wq6.eu-west-2.aws.confluent.cloud:9092',
        'sasl.mechanism': 'PLAIN',
        'security.protocol': 'SASL_SSL',
        'sasl.username': '######',
        'sasl.password': '######',
        'error_cb': error_cb,
        'key.serializer': StringSerializer('utf_8'),
        'value.serializer': self.json_serializer
        })
Copy

The breakdown:

 1. bootstrap.servers: In short, the address points to Confluent Cloud hosting
    our Kafka Cluster; more specifically, a Kafka broker. (Kafka also has the
    notation of brokers but on a per-topic basis). Bootstrap is a reference to
    the producer establishing its presence globally in the cluster.
 2. sasl.*: Simple security authentication protocol; these are a minimum
    requirement for connecting to Confluent Kafka. I won’t cover this here, as
    it is of no interest to our overall comparison.
 3. error_cb: Handles Kafka error handling.
 4. key_serializer: This describes how the message key will be stored within
    Kafka. Keys are an extremely important part of how Kafka handles payloads.
    More on this within the next blog.
 5. Value.serializer: We will cover this next, in short, we must describe what
    type of data our producer will be sending. This is why defining our schema
    registry is very important.


TOPICS AND DELIVERY

Now that we have initiated our MQTT publisher and Kafka producer, it’s time to
send our Emergency generator data. To do this, both protocols require the
establishment of a topic and data preparation before delivery:

MQTT

Within MQTT’s world, a topic is a UTF-8 string that establishes logical
filtering between payloads.

Topic Name Payload temperature 36 fuel 400

In Part 2 of this series, we break down the capabilities and differences of MQTT
and Kafka topics in further detail. For now, we are going to establish one topic
to send all of our Emergency Generator data (this is not best practice but is
logical in the complexity build-up of this project).

message = json.dumps(data)
self.client.publish(topic=”emergency_generator”, message)
Copy

MQTT has the benefit of being able to generate topics on demand during the
delivery of a payload. If the topic already exists, the payload is simply sent
to the established topic. If not, the topic is created. This makes our code
relatively simple. We define our topic name and the JSON string we plan to send.
MQTT payloads by default are extremely flexible, which has pros and cons. On the
positive side, you do not need to define strict schema typing for your data. On
the other hand, you rely on your subscribers being robust enough to handle the
incoming messages which fall out of the norm.

KAFKA

So I must admit, I came in with foolish optimism that sending a JSON payload via
Kafka would be as simple as publish(). How wrong I was! Let’s walk through it:

self.schema_str = """{
            "$schema": "http://json-schema.org/draft-07/schema#",
            "title": "Generator",
            "description": "A fuel engines health data",
            "type": "object",
            "properties": {
                "generatorID": {
                "description": "UniqueID of generator",
                "type": "string"
                },
                "lat": {
                "description": "latitude",
                "type": "number"
                },
                "lon": {
                "description": "longitude",
                "type": "number"
                },
                "temperature": {
                "description": "temperature",
                "type": "number"
                },
                "pressure": {
                "description": "pressure",
                "type": "number"
                },
                "fuel": {
                "description": "fuel",
                "type": "number"
                }
            },
            "required": [ "generatorID", "lat", "lon", "temperature", "pressure", "fuel" ]
            }"""

        
        schema_registry_conf = {'url': 'https://psrc-8vyvr.eu-central-1.aws.confluent.cloud', 
                                'basic.auth.user.info': environ.get('SCHEMEA_REGISTRY_LOGIN')}
        schema_registry_client = SchemaRegistryClient(schema_registry_conf)

        self.json_serializer = JSONSerializer(self.schema_str, schema_registry_client, engine_to_dict)

        self.p = SerializingProducer({
        'bootstrap.servers': 'pkc-41wq6.eu-west-2.aws.confluent.cloud:9092',
        'sasl.mechanism': 'PLAIN',
        'security.protocol': 'SASL_SSL',
        'sasl.username': environ.get('SASL_USERNAME'),
        'sasl.password': environ.get('SASL_PASSWORD'),
        'error_cb': error_cb,
        'key.serializer': StringSerializer('utf_8'),
        'value.serializer': self.json_serializer
        })
Copy

The first task on our list is to establish a JSON schema. The JSON schema
describes the expected structure of our data. In our example, we define our
generator meter readings (temperature, pressure, fuel) and also our metadata
(generdatorID, lat, lon).  Note, within the definition, we define their data
types and which data points are required to be sent with each payload.

We have already discussed connecting to our schema registry earlier. Next, we
want to register our JSON schema with the registry and create a JSON serializer.
To do this we need three parameters:

 1. schema_str: the schema design we discussed
 2. schema _registry_client: Our object connecting to the registry
 3. engine_to_dict: The JSON serializer which allows you to write a custom
    function for building out a Python dictionary struct which will be converted
    to JSON format.

The json_serializer object is then included within the initialization of the
Serializing Producer.

Finally to send data we call our producer object:

self.p.produce(topic=topic, key=str(uuid4()), value=data,
on_delivery=delivery_report)

To send data to our Kafka cluster we:

 1. Define our topic name (Kafka by default requires the manual generation of
    topics. You can, via settings within the broker/cluster, allow
    auto-generation).
 2. Create a unique key for our data, the data we wish to publish (this will be
    processed through our custom function and delivery report (a function
    defined to provide feedback on successful or unsuccessful delivery of the
    payload).

My first impression of strongly typed / schemer-based design was: “Wow, this
must leave system designers with a lot of code to maintain and a steep learning
curve”. As I implemented the example, I realized you would probably avert a lot
of future technical debt this way. Schemers force new producers/consumers to
conform to the current intended data structure or generate a new schema version.
This allows the current system to continue unimpeded by a rogue producer
connecting to your Kafka cluster. I am going to cover this in more detail within
Part 2 of this blog series.


PROSPECTIVE AND CONCLUSION

So, what have we done? Well, in its most brutally simplistic form we have
created a Kafka producer and MQTT publisher to transmit our generator data. At
face value, it may seem Kafka seems vastly more complex in its setup than MQTT
for the same result.

At this level, you would be correct. However, we have barely scraped the surface
of what Kafka can do and how it should be deployed in a true IoT architecture. I
plan to release two more blogs in this series:

 * Part 2: I cover more of the features unique to Kafka, such as a deeper look
   into topics, scalability and third-party integrations (including InfluxDB).
 * Part 3: We are going to take what we have learned and apply best practices to
   a real IoT project. We will use Kafka’s MQTT proxy and delve deeper into
   third-party integrations to get the most out of your Kafka infrastructure.

Until then check out the code, run it, play with it, and improve it. Next blog
(Part 2 of this series) we cover topics in more detail.

RELATED BLOG POSTS

 * MQTT vs Kafka: An IoT Advocate’s Perspective (Part 2 – Kafka the Mighty)
   In Part 1 of this series, we started to compare the uses of Kafka …
   
   MQTT vs Kafka: An IoT Advocate’s Perspective (Part 2 – Kafka the Mighty) Read
   More »
   
   Jay Clifford
 * Come See Us at Kafka Summit London 2022
   Get ready, because InfluxData and the InfluxDB maintainer team are preparing
   for Kafka Summit …
   
   Come See Us at Kafka Summit London 2022 Read More »
   
   Jason Myers
 * Using Google Workspace Data for Security Observability
   This article was originally published in The New Stack. Keeping your systems
   secure is …
   
   Using Google Workspace Data for Security Observability Read More »
   
   Darin Fisher

Twitter
Facebook
Linkedin

Post navigation
← Previous Post
Next Post →


LEAVE A COMMENT CANCEL REPLY

Your email address will not be published. Required fields are marked *

Type here..

Name*

Email*

Website





Δ

Twitter
Facebook
Linkedin



CATEGORIES

 * About Company
 * Community
 * Developer►
   * Chronograf
   * Flux
   * InfluxData
   * InfluxDB
   * InfluxDB Templates
   * Kapacitor
   * Partners►
     * AWS
     * Azure
     * Google Cloud
   * Release Notes
   * Tech Tips
   * Telegraf
 * General
 * InfluxDB Cloud
 * InfluxDB Enterprise
 * Press Room►
   * In The News
   * Press Releases
 * Trust
 * Tutorial
 * Use Case►
   * Analytics
   * DevOps
   * IIoT
   * IoT
   * Security

TRY INFLUXDB CLOUD

The most powerful time series database as a service is a click away.

Try It Free



548 Market St, PMB 77953
San Francisco, California 94104

Contact Us



PRODUCTS

InfluxDB
Telegraf
Pricing
Support
Use Cases

RESOURCES

InfluxDB U
Blog
Community
Customers
Swag
Events

INFLUXDATA

About
Careers
Partners
Legal
Newsroom
Contact Sales

SIGN UP FOR THE INFLUXDATA NEWSLETTER



© 2022  InfluxData Inc. All Rights Reserved. Sitemap


Scroll to Top
X


INFLUXDATA CONSENT MANAGER

Like many companies, InfluxData uses cookies and other technologies, some of
which are essential to make our website work. Others help us improve services
and the user experience. In using our site, you agree to the Privacy Policy and
Cookie Policy.