www.influxdata.com Open in urlscan Pro
2606:4700:10::ac43:29d9 Public Scan

Back to summary

Submitted URL:
https://em.influxdata.com/OTcyLUdEVS01MzMAAAGECvUNAnAZtGU14xTrg00c0VvzM56S9ftI10y3EKulnoCG-OA_Dt27H0C6OrCyVCHEUO0_wDE=
Effective URL:
https://www.influxdata.com/blog/mqtt-vs-kafka-iot-advocates-perspective-part-1/?utm_source=newsletter&utm_medium=email&utm_...
Submission: On April 27 via api (April 27th 2022, 5:38:11 pm UTC) from US — Scanned from DE

Form analysis
4 forms found in the DOM

GET https://www.influxdata.com/search/r/

<form class="search-form" action="https://www.influxdata.com/search/r/" method="get">
  <span class="search-text-wrap">
    <label for="s" class="screen-reader-text">Search …</label>
    <input name="s" class="search-field" type="text" autocomplete="off" value="" placeholder="Search …">
  </span>
  <span id="close" class="close"><span class="ast-icon icon-close"></span></span>
</form>

GET https://www.influxdata.com/search/r/

<form class="search-form" action="https://www.influxdata.com/search/r/" method="get">
  <span class="search-text-wrap">
    <label for="s" class="screen-reader-text">Search …</label>
    <input name="s" class="search-field" type="text" autocomplete="off" value="" placeholder="Search …">
  </span>
  <span id="close" class="close"><span class="ast-icon icon-close"></span></span>
</form>

POST https://www.influxdata.com/wp-comments-post.php

<form action="https://www.influxdata.com/wp-comments-post.php" method="post" id="ast-commentform" class="comment-form">
  <p class="comment-notes"><span id="email-notes">Your email address will not be published.</span> Required fields are marked <span class="required">*</span></p>
  <div class="ast-row comment-textarea">
    <fieldset class="comment-form-comment">
      <div class="comment-form-textarea ast-col-lg-12"><label for="comment" class="screen-reader-text">Type here..</label><textarea id="comment" name="comment" placeholder="Type here.." cols="45" rows="8" aria-required="true"></textarea></div>
    </fieldset>
  </div>
  <div class="ast-comment-formwrap ast-row">
    <p class="comment-form-author ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="author" class="screen-reader-text">Name*</label><input id="author" name="author" type="text" value="" placeholder="Name*" size="30"
        aria-required="true"></p>
    <p class="comment-form-email ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="email" class="screen-reader-text">Email*</label><input id="email" name="email" type="text" value="" placeholder="Email*" size="30"
        aria-required="true"></p>
    <p class="comment-form-url ast-col-xs-12 ast-col-sm-12 ast-col-md-4 ast-col-lg-4"><label for="url"><label for="url" class="screen-reader-text">Website</label><input id="url" name="url" type="text" value="" placeholder="Website" size="30"></label>
    </p>
  </div>
  <p class="form-submit"><input name="submit" type="submit" id="submit" class="submit" value="Post Comment »"> <input type="hidden" name="comment_post_ID" value="267046" id="comment_post_ID">
    <input type="hidden" name="comment_parent" id="comment_parent" value="0">
  </p>
  <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="be0102e840"></p>
  <p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js" name="ak_js" value="1651081085570">
    <script>
      document.getElementById("ak_js").setAttribute("value", (new Date()).getTime());
    </script>
  </p>
</form>

<form id="mktoForm_1212"></form>

Text Content

Skip to content
Search …
* ProductsMenu Toggle
* InfluxDB

InfluxDB

Build real-time applications for analytics, IoT and cloud-native services
in less time with less code using InfluxDB.

Learn more

Running in the cloud

Fast, elastic, serverless real-time monitoring platform, dashboarding
engine, analytics service and event and metrics processor.

Running in my own environment

InfluxDB Enterprise is the solution for running the InfluxDB platform on
your own infrastructure.

Running on my laptop

InfluxDB is the essential time series toolkit — dashboards, queries, tasks
and agents all in one place.

Collecting data

Telegraf is a plugin-driven server agent for collecting and sending
metrics and events from databases, systems, and IoT sensors.

* Developers
* Customers
* CompanyMenu Toggle
* Company-mega

The Platform for Real-Time Apps

Innovators are building the future of data with our leading time series
platform, InfluxDB.

About Us

At InfluxData, we empower developers and organizations to build real-time
IoT, analytics and cloud applications with time-stamped data.

Careers

InfluxData is a remote-first company that’s growing rapidly worldwide.
Join us!

* Pricing
* Contact Us
* LoginMenu Toggle
* Login to InfluxDB Cloud 2.0
* Login to InfluxDB Enterprise
* Login to InfluxDB Cloud 1.x
* Get InfluxDB

Search …
Main Menu

Get InfluxDB
Get InfluxDB
* ProductsMenu Toggle
* InfluxDB

InfluxDB

Build real-time applications for analytics, IoT and cloud-native services
in less time with less code using InfluxDB.

Learn more

Running in the cloud

Fast, elastic, serverless real-time monitoring platform, dashboarding
engine, analytics service and event and metrics processor.

Running in my own environment

InfluxDB Enterprise is the solution for running the InfluxDB platform on
your own infrastructure.

Running on my laptop

InfluxDB is the essential time series toolkit — dashboards, queries, tasks
and agents all in one place.

Collecting data

Telegraf is a plugin-driven server agent for collecting and sending
metrics and events from databases, systems, and IoT sensors.

* Developers
* Customers
* CompanyMenu Toggle
* Company-mega

The Platform for Real-Time Apps

Innovators are building the future of data with our leading time series
platform, InfluxDB.

About Us

At InfluxData, we empower developers and organizations to build real-time
IoT, analytics and cloud applications with time-stamped data.

Careers

InfluxData is a remote-first company that’s growing rapidly worldwide.
Join us!

MQTT VS KAFKA: AN IOT ADVOCATE’S PERSPECTIVE (PART 1 – THE BASICS)

By Jay Clifford / April 20, 2022 April 25, 2022 / Community, Developer, IoT /
Leave a Comment
9 minutes

With the Kafka Summit fast approaching, I thought it was time to get my hands
dirty and see what it’s all about. As an advocate for IoT, I heard about Kafka
but was too embedded in protocols like MQTT to investigate further. For the
uninitiated (like me) both protocols seem extremely similar if not almost
competing. However, I have learned this is far from the case and actually, in
many cases, they complement one another.

In this blog series, I hope to summarize what Kafka and MQTT are and how they
can both fit into an IoT architecture. To help explain some of the concepts, I
thought it would be practical to use a past scenario:

In the previous blog, we discussed a scenario where we wanted to monitor
emergency fuel generators. We created a simulator with the InfluxDB Python
Client library to send generator data to InfluxDB Cloud. For this blog, I
decided to reuse that simulator but replace the client library with an MQTT
publisher and Kafka producer to understand the core mechanics behind each.

You can find the code for this demo here.

UNDERSTANDING THE BASICS

So what is Kafka? Kafka is described as an event streaming platform. It conforms
to a publisher-subscriber architecture with the added benefit of data
persistence (to understand more of the fundamentals, check out this blog). Kafka
also promotes some pretty great benefits within the IoT sector:

* High throughput
* High availability
* Connectors to well-known third-party platforms

So why would I not just build my entire IoT platform using Kafka? Well, it boils
down to a few key issues:

1. Kafka is built for stable networks which deploy a good infrastructure
2. It does not deploy key data delivery features such as Keep-Alive and Last
Will

Having said this, let’s go ahead and compare implementations of writing a basic
Kafka producer and compare it to an MQTT publisher within the context of the
Emergency generator demo:

Assumptions: For the purposes of this demo, I will be making use of the
Mosquitto MQTT Broker and the Confluent platform (Kafka). We will not cover the
initial creation/setup here, but you can consult these instructions accordingly:

1. Mosquito Broker
2. Confluent (I highly recommend using the free trial of Confluent Cloud to
sense check if Kafka is right for you before bogging yourself down in an
on-prem setup)

INITIALIZATION

Let’s start with the initialization of our MQTT publisher and Kafka producer:

MQTT

The minimum requirements for an MQTT Publisher (omitting security) are as
follows:

1. Host: The address / IP of the platform hosting the Mosquitto server
2. Port: Which port will the MQTT producer talk to. Usually 1883 for basic
connectivity, 8883 TLS.
3. Keep Alive Interval: The amount of time in seconds allowed between
communications.

self.client.connect(host=self.mqttBroker,port=self.port,
keepalive=MQTT_KEEPALIVE_INTERVAL)

KAFKA

There was a little more background work when it came to Kafka. We had to
establish connectivity to two different Kafka entities:

1. Kafka cluster: This is a given we will be sending our payload here.
2. Schema registry: The registry lies outside the scope of the Kafka Cluster.
It handles the storing and delivery of topic schemers. In other words, this
forces producers to deliver data in a format that is expected by the Kafka
consumer. More on this later.

So let’s set up connectivity to both entities:

Schema registry

schema_registry_conf = {'url': 'https://psrc-8vyvr.eu-central-1.aws.confluent.cloud',
'basic.auth.user.info': <USERNAME>:<PASSWORD>'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)
Copy

The breakdown:

* url: The address of your schema registry. Confluent supports the creation of
registries for hosting.
* authentication: Like any repository, it contains basic security to keep your
schema designs secure.

Kafka cluster

self.json_serializer = JSONSerializer(self.schema_str, schema_registry_client, engine_to_dict)
self.p = SerializingProducer({
'bootstrap.servers': 'pkc-41wq6.eu-west-2.aws.confluent.cloud:9092',
'sasl.mechanism': 'PLAIN',
'security.protocol': 'SASL_SSL',
'sasl.username': '######',
'sasl.password': '######',
'error_cb': error_cb,
'key.serializer': StringSerializer('utf_8'),
'value.serializer': self.json_serializer
})
Copy

The breakdown:

1. bootstrap.servers: In short, the address points to Confluent Cloud hosting
our Kafka Cluster; more specifically, a Kafka broker. (Kafka also has the
notation of brokers but on a per-topic basis). Bootstrap is a reference to
the producer establishing its presence globally in the cluster.
2. sasl.*: Simple security authentication protocol; these are a minimum
requirement for connecting to Confluent Kafka. I won’t cover this here, as
it is of no interest to our overall comparison.
3. error_cb: Handles Kafka error handling.
4. key_serializer: This describes how the message key will be stored within
Kafka. Keys are an extremely important part of how Kafka handles payloads.
More on this within the next blog.
5. Value.serializer: We will cover this next, in short, we must describe what
type of data our producer will be sending. This is why defining our schema
registry is very important.

TOPICS AND DELIVERY

Now that we have initiated our MQTT publisher and Kafka producer, it’s time to
send our Emergency generator data. To do this, both protocols require the
establishment of a topic and data preparation before delivery:

MQTT

Within MQTT’s world, a topic is a UTF-8 string that establishes logical
filtering between payloads.

Topic Name Payload temperature 36 fuel 400

In Part 2 of this series, we break down the capabilities and differences of MQTT
and Kafka topics in further detail. For now, we are going to establish one topic
to send all of our Emergency Generator data (this is not best practice but is
logical in the complexity build-up of this project).

message = json.dumps(data)
self.client.publish(topic=”emergency_generator”, message)
Copy

MQTT has the benefit of being able to generate topics on demand during the
delivery of a payload. If the topic already exists, the payload is simply sent
to the established topic. If not, the topic is created. This makes our code
relatively simple. We define our topic name and the JSON string we plan to send.
MQTT payloads by default are extremely flexible, which has pros and cons. On the
positive side, you do not need to define strict schema typing for your data. On
the other hand, you rely on your subscribers being robust enough to handle the
incoming messages which fall out of the norm.

KAFKA

So I must admit, I came in with foolish optimism that sending a JSON payload via
Kafka would be as simple as publish(). How wrong I was! Let’s walk through it:

self.schema_str = """{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Generator",
"description": "A fuel engines health data",
"type": "object",
"properties": {
"generatorID": {
"description": "UniqueID of generator",
"type": "string"
},
"lat": {
"description": "latitude",
"type": "number"
},
"lon": {
"description": "longitude",
"type": "number"
},
"temperature": {
"description": "temperature",
"type": "number"
},
"pressure": {
"description": "pressure",
"type": "number"
},
"fuel": {
"description": "fuel",
"type": "number"
}
},
"required": [ "generatorID", "lat", "lon", "temperature", "pressure", "fuel" ]
}"""

schema_registry_conf = {'url': 'https://psrc-8vyvr.eu-central-1.aws.confluent.cloud',
'basic.auth.user.info': environ.get('SCHEMEA_REGISTRY_LOGIN')}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)

self.json_serializer = JSONSerializer(self.schema_str, schema_registry_client, engine_to_dict)

self.p = SerializingProducer({
'bootstrap.servers': 'pkc-41wq6.eu-west-2.aws.confluent.cloud:9092',
'sasl.mechanism': 'PLAIN',
'security.protocol': 'SASL_SSL',
'sasl.username': environ.get('SASL_USERNAME'),
'sasl.password': environ.get('SASL_PASSWORD'),
'error_cb': error_cb,
'key.serializer': StringSerializer('utf_8'),
'value.serializer': self.json_serializer
})
Copy

The first task on our list is to establish a JSON schema. The JSON schema
describes the expected structure of our data. In our example, we define our
generator meter readings (temperature, pressure, fuel) and also our metadata
(generdatorID, lat, lon). Note, within the definition, we define their data
types and which data points are required to be sent with each payload.

We have already discussed connecting to our schema registry earlier. Next, we
want to register our JSON schema with the registry and create a JSON serializer.
To do this we need three parameters:

1. schema_str: the schema design we discussed
2. schema _registry_client: Our object connecting to the registry
3. engine_to_dict: The JSON serializer which allows you to write a custom
function for building out a Python dictionary struct which will be converted
to JSON format.

The json_serializer object is then included within the initialization of the
Serializing Producer.

Finally to send data we call our producer object:

self.p.produce(topic=topic, key=str(uuid4()), value=data,
on_delivery=delivery_report)

To send data to our Kafka cluster we:

1. Define our topic name (Kafka by default requires the manual generation of
topics. You can, via settings within the broker/cluster, allow
auto-generation).
2. Create a unique key for our data, the data we wish to publish (this will be
processed through our custom function and delivery report (a function
defined to provide feedback on successful or unsuccessful delivery of the
payload).

My first impression of strongly typed / schemer-based design was: “Wow, this
must leave system designers with a lot of code to maintain and a steep learning
curve”. As I implemented the example, I realized you would probably avert a lot
of future technical debt this way. Schemers force new producers/consumers to
conform to the current intended data structure or generate a new schema version.
This allows the current system to continue unimpeded by a rogue producer
connecting to your Kafka cluster. I am going to cover this in more detail within
Part 2 of this blog series.

PROSPECTIVE AND CONCLUSION

So, what have we done? Well, in its most brutally simplistic form we have
created a Kafka producer and MQTT publisher to transmit our generator data. At
face value, it may seem Kafka seems vastly more complex in its setup than MQTT
for the same result.

At this level, you would be correct. However, we have barely scraped the surface
of what Kafka can do and how it should be deployed in a true IoT architecture. I
plan to release two more blogs in this series:

* Part 2: I cover more of the features unique to Kafka, such as a deeper look
into topics, scalability and third-party integrations (including InfluxDB).
* Part 3: We are going to take what we have learned and apply best practices to
a real IoT project. We will use Kafka’s MQTT proxy and delve deeper into
third-party integrations to get the most out of your Kafka infrastructure.

Until then check out the code, run it, play with it, and improve it. Next blog
(Part 2 of this series) we cover topics in more detail.

www.influxdata.com Open in urlscan Pro 2606:4700:10::ac43:29d9 Public Scan

Form analysis 4 forms found in the DOM

GET https://www.influxdata.com/search/r/

GET https://www.influxdata.com/search/r/

POST https://www.influxdata.com/wp-comments-post.php

Text Content

www.influxdata.com Open in urlscan Pro
2606:4700:10::ac43:29d9 Public Scan

Form analysis
4 forms found in the DOM