www.zillow.com Open in urlscan Pro
13.227.219.23  Public Scan

URL: https://www.zillow.com/tech/how-to-pitch-apache-kafka/
Submission: On January 18 via manual from GB — Scanned from GB

Form analysis 2 forms found in the DOM

GET https://www.zillow.com/tech/

<form action="https://www.zillow.com/tech/" method="get" role="search" class="subnav-search">
  <div class="search_input_container">
    <input id="search-input" type="search" name="s" placeholder="Search Zillow Tech Hub" class="form-control" required="">
    <input type="submit" name="submit" hidefocus="true" tabindex="-1" autocomplete="off">
  </div>
</form>

GET https://www.zillow.com/tech/

<form action="https://www.zillow.com/tech/" method="get" role="search" class="subnav-mobile-search">
  <div class="search_input_container">
    <input id="mobile_search_input" type="search" name="s" placeholder="Search Zillow Tech Hub" class="form-control" aria-label="Search Zillow Tech Hub" required="">
    <input type="submit" name="submit" hidefocus="true" tabindex="-1" autocomplete="off">
    <button id="mobile_search_button">
      <span id="subnav_mobile_search-icon" style="display:flex; fill:#62aef7;">
        <!--?xml version="1.0" encoding="utf-8"?-->
        <!-- Generator: Adobe Illustrator 23.0.2, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
        <svg version="1.1" class="icon-subnav-search" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" viewBox="0 0 20 20" style="enable-background:new 0 0 20 20;" xml:space="preserve">
          <style type="text/css">
            .subnav-search-st0 {
              fill: none;
              stroke: #006BFF;
              stroke-width: 2;
            }

            .subnav-search-st1 {
              fill: none;
              stroke: #006BFF;
              stroke-width: 2;
              stroke-linecap: round;
            }
          </style>
          <title>Search</title>
          <desc></desc>
          <g id="Page-1">
            <g id="Mobile-Nav-Closed-" transform="translate(-334.000000, -18.000000)">
              <g id="Group-2" transform="translate(-1.000000, 0.000000)">
                <g id="Group-Copy-9" transform="translate(335.000000, 18.000000)">
                  <circle id="Oval" class="subnav-search-st0" cx="8.6" cy="8.6" r="7.6"></circle>
                  <path id="Line" class="subnav-search-st1" d="M14.6,14.6L19,19"></path>
                </g>
              </g>
            </g>
          </g>
        </svg>
      </span>
    </button>
  </div>
</form>

Text Content

Have questions about buying, selling or renting during COVID-19? Learn more

Warning
This browser is no longer supported. Please switch to a supported browser or
download one of our Mobile Apps.
See Mobile Apps
Skip main navigation
 * Sign in
 * Join

Homepage
 * Buy Open Buy sub-menuChevron Down
   
    * HOMES FOR SALE
      
      * * Homes for sale
        * Foreclosures
        * For sale by owner
        * Open houses
      * * New construction
        * Coming soon
        * Recent home sales
        * All homes
   
    * BUNDLE BUYING & SELLING
      
      * * Buy and sell with Zillow 360
   
    * RESOURCES
      
      * * Buyers Guide
        * Foreclosure center
        * Real estate app

 * Rent Open Rent sub-menuChevron Down
   
    * SEARCH FOR RENTALS
      
      * * Rental buildings
        * Apartments for rent
        * Houses for rent
        * All rental listings
        * All rental buildings
   
    * RENTING
      
      * * Contacted rentals
        * Your rental
        * Messages
   
    * RESOURCES
      
      * * Affordability calculator
        * Renters guide

 * Sell Open Sell sub-menuChevron Down
   
    * RESOURCES
      
      * * Explore your options
        * See your home's Zestimate
        * Home values
        * Sellers guide
   
    * BUNDLE BUYING & SELLING
      
      * * Buy and sell with Zillow 360
   
    * SELLING OPTIONS
      
      * * Sell with Zillow Offers
        * Find a seller's agent
        * Post For Sale by Owner

 * Home Loans Open Home Loans sub-menuChevron Down
   
    * SHOP MORTGAGES
      
      * * Mortgage lenders
        * HELOC lenders
        * Mortgage rates
        * Refinance rates
        * All mortgage rates
   
    * CALCULATORS
      
      * * Mortgage calculator
        * Refinance calculator
        * Affordability calculator
        * Amortization calculator
        * Debt-to-Income calculator
   
    * RESOURCES
      
      * * Lender reviews
        * Mortgage learning center
        * Mortgages app
        * Lender resource center

 * Agent finder Open Agent finder sub-menuChevron Down
   
    * LOOKING FOR PROS?
      
      * * Real estate agents
        * Property managers
        * Home inspectors
        * Other pros
      * * Home improvement pros
        * Home builders
        * Real estate photographers
   
    * I'M A PRO
      
      * * Agent advertising
        * Agent resource center
        * Create a free agent account
      * * Real estate business plan
        * Real estate agent scripts
        * Listing flyer templates

 * Manage Rentals Open Manage Rentals sub-menuChevron Down
   
    * RENTAL MANAGEMENT TOOLS
      
      * * List a rental
        * My Listings
        * Messages
        * Applications
        * Leases
        * Payments
   
    * LEARN MORE
      
      * * Zillow Rental Manager
        * Price My Rental
        * Resource Center
        * Help Center

 * Advertise

 * Help

 * Sign in
 * Join




Search subnav-close
 * Zillow Tech Hub
 * AI/ML & Research
 * Data & Analytics
 * Software Engineering
 * Culture
 * Jobs


Menu subnav-close Search subnav-close
Zillow Tech Hub
 * Zillow Tech Hub Down
 * AI/ML & Research Down
 * Data & Analytics Down
 * Software Engineering Down
 * Culture Down
 * Jobs Down

Back Return to Zillow.com

Search
Software Engineering


HOW TO PITCH APACHE KAFKA

Shahar Cizer Kobrinsky • Aug 27 2020
 * Share
 * 
 * 
 * 
 * 
 * 

Imagine you are a senior engineer working in a company that’s running its tech
stack on top of AWS. Your tech organization is probably using a variety of AWS
services including some messaging services like SQS, SNS and Kinesis. As someone
who reads technical blog posts once in a while you realize Apache Kafka is
pretty popular technology for event streaming. You read it is supporting lower
latencies, higher throughput, longer retention periods, used in the largest tech
organizations and one of most popular Apache projects. You hop into your (now
virtual) Architect / CTO / VP seat and tell her you should use Kafka. Following
a quick POC you report back that, yes the throughput is great, but you couldn’t
support it yourself because of its complexity and the many knobs you need to
turn to make it work properly. That’s pretty much where it stays and you let it
go.


BEING ON BOTH SIDES

That is pretty much what I went through from the architect side at my previous
company as I did not think it was justified to add a few engineers for better
technical performance. I simply did not see the ROI. When you focus your
argument solely on technical benefits (of any technology) to decision makers at
a company, you are not doing any favors to yourself and you will miss out. Ask
yourself what is the impact on your organization, what are the challenges your
organization faces with data and what people are investing their time on when
they should be innovating on data.

Working on Zillow Group’s Data Platform for the past couple of years, looking at
the broader challenges of the Data Engineering group, it was time for me to be
on the other side, pitching to managers and executives the value of Kafka. I’ve
researched it more thoroughly this time and what business value it would bring.


CLOUD PROVIDERS ARE ONLY HALF MAGIC

See, the democratization of infrastructure by cloud providers made it easy to
just spin up the service you need, winning over on-premises solutions not only
from a cost point of view but also from a developer experience one. Consider a
case where my team needs to generate data about Users’ interactions with their
“Saved Homes” and send it to our push notification system. The team decides to
provision a Kinesis stream to do that. How would other people in the company
know that we did that? Where would they go look for such data (especially when
using multiple AWS accounts)? How would they know the meta information about the
data (schema, description, interoperability, quality guarantees, availability
information, partitioning scheme and much more)?

Creating a Kinesis Stream with Terraform

For a vast number of companies, data is the number one asset and source of
innovation. Democratizing data infrastructure without a standardized way for
defining metadata, common ingestion/consumption patterns and quality guarantees
can slow down the innovation from data or make dependencies a nightmare.

Think about the poor team trying to find the data in the sea of Kinesis streams
(oh hello AWS Console UI 🙁 ) and AWS accounts used in the company. Once they do
find it, how would they know the format of the data? How would they know what
field “is_tyrd” means? What would their production service do once the schema
changes? Many RCAs have been born simply because of that. In reality, as the
company grows, so do the complexities of its data pipelines. It’s no longer a
producer consumer relationship, but rather many producers, many middle steps,
intermediary consumers who are also producers, joins, transformations and
aggregations, which may end up in a failing customer report (at best) or bad
data impacting the company revenue. 

Which team should be notified when the reporting database has corrupted data?

 

All of that doesn’t really have a lot to do with either Kinesis or Kafka, but
mostly about understanding that the cloud providers’ level of abstraction and
“platform” ecosystem is simply not enough as it is to help mid-size/large
companies innovate on data. 


IT IS THE ECOSYSTEM STUPID

With Kafka, first and foremost, you have an ecosystem led by the Confluent
Schema Registry. The combination of validation-on-write and schema evolution has
a huge impact on data consumers. Using the Schema Registry (assuming
compatibility mode) guarantees your consumers that they will not crash due to
de-serialization errors. Producers can feel comfortable evolving their schemas
without that risk of impacting downstream, and the registry itself provides a
way for all interested parties to understand the data. At least at the schema
level.

Schema Management using Confluent Schema Registry

 

Kafka Connect is another important piece of the ecosystem. While AWS services
are great at integrating between themselves, they are less than great with
integrating with everything else. The closed garden simply doesn’t allow the
community to come together and build those integrations. While Kinesis has
integration capabilities using DMS it is well shy of the Kafka Connect ecosystem
of integrations, and connecting your data streams to other systems in an easy
and common way is another key to getting more from your data. 

A more technical piece of the ecosystem I’ll discuss is the client library. The
Kinesis Producer Library is a daemon C++ process that is somewhat of a black box
which is harder to debug and maintain of going rogue. The Kinesis Consumer
Library is coupled with DynamoDB for offset management – which is another
component to worry about (getting throughput exceptions for example). In my last
two companies we have actually implemented our own, thinner (simpler) version of
the Kinesis Producer Library. In that sense it is again the open source
community and the popularity of Kafka that helps in having more mature clients
(with the bonus of offsets being stored within Kafka). 

And then you get to a somewhat infamous point around AWS Kinesis – its read
limits. A Kinesis shard allows you to make up to 5 read transactions per second.
On top of the inherent latency that limit introduces, it is the coupling of
different consumers that is most bothersome. The entire premise of streaming
talks about decoupling business areas in a way to reduce coordination and just
have data as an API. Sharing throughput across consumers mandates one to be
aware of the other to make sure consumers are not eating away each other’s
throughput. You can mitigate that challenge through Kinesis Enhanced Fan-Out but
it does cost a fair bit more. Kafka on the other hand is bound by resources
rather than explicit limits. If your network, memory, CPU and disk can handle
additional consumers, no such coordination is needed. Worth noting that Kinesis
being a managed service has to tune to fit the majority of the customer workload
requirements (one size fits all), while with your own Kafka you can tailor fit
it.


GREAT, KAFKA. NOW WHAT?

But know (and let your executives know) that all this good stuff is not enough
to reach a nirvana of data innovation, which is why our investment in Kafka
included a Streaming Platform team.

The goal for the team is to make Kafka the central nervous system of events
across Zillow Group. We got inspiration from companies like PayPal and Expedia
and set these decisions:

 * We’ll delight our customers by meeting them where they are-Most of Zillow
   Group is using Terraform for its Infrastructure as Code solution. We have
   decided to build a Terraform provider for our platform. This also helps us to
   balance out between decentralization (not having a central team that needs to
   approve every production topic) and control (think how you prevent someone
   from creating a 10000 partitions topic).
   
 * We will invest heavily in metadata for discoverability The provisioning
   experience will include all necessary metadata to discover ownership,
   description, data privacy info, data lineage and schema information (Kafka
   only allows linking schemas by naming conventions). We will connect the
   metadata with our company’s Data Portal which helps people navigate through
   the entire catalog of data and removes tribal knowledge dependency.
 * We will help our customers adopt Kafka by removing the need to get into the
   complex details whenever possible – A configuration service that injects
   whatever producer/consumer configs you may require is helping achieve that,
   along with a set of client libraries for the most common use cases.
 * We will build company wide data ingestion patterns – mostly using Kafka
   Connect, but also by integrating with our Data Streaming Platform service
   which proxies Kafka.
 * We will connect with our Data Governance team as they build Data Contracts
   and Anomaly detection services – to be able to provide guarantees about the
   data within Kafka, and prevent the scenario of data engineers chasing
   upstream teams to understand what went wrong with the data.

Resource Provisioning System using Terraform

Lastly, before your pitch, get to know the numbers. How much does your
organization spend on those AWS services? how much (time/effort/$$) it spends on
the pain points I mentioned? Go ahead and research your different deployment
options, from vanilla Kafka, Confluent (on-premise and cloud) and the newer AWS
MSK.

Good luck with your pitch! 


WANT TO WORK AT ZILLOW?

View Openings


HOW TO PITCH APACHE KAFKA

Data & Analytics, Software Engineering

HOW ZILLOW VALIDATES PUBLIC RECORD ADDRESSES

Software Engineering

OPEN SOURCE AT ZILLOW GROUP

Software Engineering

ZILLOW ENGINEERS BUILD INTERNAL APP TO HIGHLIGHT HOMES THROUGHOUT BLACK HISTORY

Software Engineering

THE AFFORDABLE HOUSING SEARCH TOOL: THE POWER OF SOCIALLY INCLUSIVE DESIGN,
PARTNERSHIP AND CULTURE


READ NEXT

Data & Analytics, Software Engineering

How Zillow Validates Public Record Addresses

Software Engineering

Open Source at Zillow Group

Software Engineering

Zillow Engineers Build Internal App to Highlight Homes Throughout Black History

FEATURED

 * How Zillow Validates Public Record Addresses
 * Improving Recommendation Quality by Tapping into Listing Text
 * Open Source at Zillow Group
 * Automatic and Self-aware Anomaly Detection at Zillow Using Luminaire
 * How Zillow tests Clickstream Analytics

RECENT

 * Using SageMaker for Machine Learning Model Deployment with Zillow Floor Plans
 * Guided Search - Personalized Search Refinements to Help Customers Find their
   Dream Home
 * Zillow Floor Plan: Training Models to Detect Windows, Doors and Openings in
   Panoramas
 * Zillow Women in Tech: A Conversation with Ei-Nyung Choi
 * Utilizing both Explicit & Implicit Signals to Power Home Recommendations

 * About
 * Zestimates
 * Research
 * Careers
 * Help
 * Advertise
 * Fair Housing Guide
 * Terms of use
 * Privacy Portal
 * Cookie Preference
 * Blog
 * AI
 * Mobile Apps

 * Trulia
 * StreetEasy
 * HotPads
 * Out East
 * ShowingTime

Do Not Sell My Personal Information →

Zillow Group is committed to ensuring digital accessibility for individuals with
disabilities. We are continuously working to improve the accessibility of our
web experience for everyone, and we welcome feedback and accommodation requests.
If you wish to report an issue or seek an accommodation, please let us know.

Zillow, Inc. holds real estate brokerage licenses in multiple states. Zillow
(Canada), Inc. holds real estate brokerage licenses in multiple provinces. A
list of our real estate licenses is available here.
TREC: Information about brokerage services, Consumer protection notice
California DRE #1522444


Contact Zillow, Inc. Brokerage



For listings in Canada, the trademarks REALTOR®, REALTORS®, and the REALTOR®
logo are controlled by The Canadian Real Estate Association (CREA) and identify
real estate professionals who are members of CREA. The trademarks MLS®, Multiple
Listing Service® and the associated logos are owned by CREA and identify the
quality of services provided by real estate professionals who are members of
CREA. Used under license.

 * Download on the App Store
 * Get it on Google play

 * 
 * Follow us:FacebookVisit us on facebookInstagramVisit us on instagramTikTok
   LogoVisit us on tiktokTwitterVisit us on twitter
 * © 2006-2022 ZillowEqual Housing OpportunityEqual Housing Opportunity