www.zillow.com
Open in
urlscan Pro
13.227.219.23
Public Scan
URL:
https://www.zillow.com/tech/how-to-pitch-apache-kafka/
Submission: On January 18 via manual from GB — Scanned from GB
Submission: On January 18 via manual from GB — Scanned from GB
Form analysis
2 forms found in the DOMGET https://www.zillow.com/tech/
<form action="https://www.zillow.com/tech/" method="get" role="search" class="subnav-search">
<div class="search_input_container">
<input id="search-input" type="search" name="s" placeholder="Search Zillow Tech Hub" class="form-control" required="">
<input type="submit" name="submit" hidefocus="true" tabindex="-1" autocomplete="off">
</div>
</form>
GET https://www.zillow.com/tech/
<form action="https://www.zillow.com/tech/" method="get" role="search" class="subnav-mobile-search">
<div class="search_input_container">
<input id="mobile_search_input" type="search" name="s" placeholder="Search Zillow Tech Hub" class="form-control" aria-label="Search Zillow Tech Hub" required="">
<input type="submit" name="submit" hidefocus="true" tabindex="-1" autocomplete="off">
<button id="mobile_search_button">
<span id="subnav_mobile_search-icon" style="display:flex; fill:#62aef7;">
<!--?xml version="1.0" encoding="utf-8"?-->
<!-- Generator: Adobe Illustrator 23.0.2, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" class="icon-subnav-search" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" viewBox="0 0 20 20" style="enable-background:new 0 0 20 20;" xml:space="preserve">
<style type="text/css">
.subnav-search-st0 {
fill: none;
stroke: #006BFF;
stroke-width: 2;
}
.subnav-search-st1 {
fill: none;
stroke: #006BFF;
stroke-width: 2;
stroke-linecap: round;
}
</style>
<title>Search</title>
<desc></desc>
<g id="Page-1">
<g id="Mobile-Nav-Closed-" transform="translate(-334.000000, -18.000000)">
<g id="Group-2" transform="translate(-1.000000, 0.000000)">
<g id="Group-Copy-9" transform="translate(335.000000, 18.000000)">
<circle id="Oval" class="subnav-search-st0" cx="8.6" cy="8.6" r="7.6"></circle>
<path id="Line" class="subnav-search-st1" d="M14.6,14.6L19,19"></path>
</g>
</g>
</g>
</g>
</svg>
</span>
</button>
</div>
</form>
Text Content
Have questions about buying, selling or renting during COVID-19? Learn more Warning This browser is no longer supported. Please switch to a supported browser or download one of our Mobile Apps. See Mobile Apps Skip main navigation * Sign in * Join Homepage * Buy Open Buy sub-menuChevron Down * HOMES FOR SALE * * Homes for sale * Foreclosures * For sale by owner * Open houses * * New construction * Coming soon * Recent home sales * All homes * BUNDLE BUYING & SELLING * * Buy and sell with Zillow 360 * RESOURCES * * Buyers Guide * Foreclosure center * Real estate app * Rent Open Rent sub-menuChevron Down * SEARCH FOR RENTALS * * Rental buildings * Apartments for rent * Houses for rent * All rental listings * All rental buildings * RENTING * * Contacted rentals * Your rental * Messages * RESOURCES * * Affordability calculator * Renters guide * Sell Open Sell sub-menuChevron Down * RESOURCES * * Explore your options * See your home's Zestimate * Home values * Sellers guide * BUNDLE BUYING & SELLING * * Buy and sell with Zillow 360 * SELLING OPTIONS * * Sell with Zillow Offers * Find a seller's agent * Post For Sale by Owner * Home Loans Open Home Loans sub-menuChevron Down * SHOP MORTGAGES * * Mortgage lenders * HELOC lenders * Mortgage rates * Refinance rates * All mortgage rates * CALCULATORS * * Mortgage calculator * Refinance calculator * Affordability calculator * Amortization calculator * Debt-to-Income calculator * RESOURCES * * Lender reviews * Mortgage learning center * Mortgages app * Lender resource center * Agent finder Open Agent finder sub-menuChevron Down * LOOKING FOR PROS? * * Real estate agents * Property managers * Home inspectors * Other pros * * Home improvement pros * Home builders * Real estate photographers * I'M A PRO * * Agent advertising * Agent resource center * Create a free agent account * * Real estate business plan * Real estate agent scripts * Listing flyer templates * Manage Rentals Open Manage Rentals sub-menuChevron Down * RENTAL MANAGEMENT TOOLS * * List a rental * My Listings * Messages * Applications * Leases * Payments * LEARN MORE * * Zillow Rental Manager * Price My Rental * Resource Center * Help Center * Advertise * Help * Sign in * Join Search subnav-close * Zillow Tech Hub * AI/ML & Research * Data & Analytics * Software Engineering * Culture * Jobs Menu subnav-close Search subnav-close Zillow Tech Hub * Zillow Tech Hub Down * AI/ML & Research Down * Data & Analytics Down * Software Engineering Down * Culture Down * Jobs Down Back Return to Zillow.com Search Software Engineering HOW TO PITCH APACHE KAFKA Shahar Cizer Kobrinsky • Aug 27 2020 * Share * * * * * Imagine you are a senior engineer working in a company that’s running its tech stack on top of AWS. Your tech organization is probably using a variety of AWS services including some messaging services like SQS, SNS and Kinesis. As someone who reads technical blog posts once in a while you realize Apache Kafka is pretty popular technology for event streaming. You read it is supporting lower latencies, higher throughput, longer retention periods, used in the largest tech organizations and one of most popular Apache projects. You hop into your (now virtual) Architect / CTO / VP seat and tell her you should use Kafka. Following a quick POC you report back that, yes the throughput is great, but you couldn’t support it yourself because of its complexity and the many knobs you need to turn to make it work properly. That’s pretty much where it stays and you let it go. BEING ON BOTH SIDES That is pretty much what I went through from the architect side at my previous company as I did not think it was justified to add a few engineers for better technical performance. I simply did not see the ROI. When you focus your argument solely on technical benefits (of any technology) to decision makers at a company, you are not doing any favors to yourself and you will miss out. Ask yourself what is the impact on your organization, what are the challenges your organization faces with data and what people are investing their time on when they should be innovating on data. Working on Zillow Group’s Data Platform for the past couple of years, looking at the broader challenges of the Data Engineering group, it was time for me to be on the other side, pitching to managers and executives the value of Kafka. I’ve researched it more thoroughly this time and what business value it would bring. CLOUD PROVIDERS ARE ONLY HALF MAGIC See, the democratization of infrastructure by cloud providers made it easy to just spin up the service you need, winning over on-premises solutions not only from a cost point of view but also from a developer experience one. Consider a case where my team needs to generate data about Users’ interactions with their “Saved Homes” and send it to our push notification system. The team decides to provision a Kinesis stream to do that. How would other people in the company know that we did that? Where would they go look for such data (especially when using multiple AWS accounts)? How would they know the meta information about the data (schema, description, interoperability, quality guarantees, availability information, partitioning scheme and much more)? Creating a Kinesis Stream with Terraform For a vast number of companies, data is the number one asset and source of innovation. Democratizing data infrastructure without a standardized way for defining metadata, common ingestion/consumption patterns and quality guarantees can slow down the innovation from data or make dependencies a nightmare. Think about the poor team trying to find the data in the sea of Kinesis streams (oh hello AWS Console UI 🙁 ) and AWS accounts used in the company. Once they do find it, how would they know the format of the data? How would they know what field “is_tyrd” means? What would their production service do once the schema changes? Many RCAs have been born simply because of that. In reality, as the company grows, so do the complexities of its data pipelines. It’s no longer a producer consumer relationship, but rather many producers, many middle steps, intermediary consumers who are also producers, joins, transformations and aggregations, which may end up in a failing customer report (at best) or bad data impacting the company revenue. Which team should be notified when the reporting database has corrupted data? All of that doesn’t really have a lot to do with either Kinesis or Kafka, but mostly about understanding that the cloud providers’ level of abstraction and “platform” ecosystem is simply not enough as it is to help mid-size/large companies innovate on data. IT IS THE ECOSYSTEM STUPID With Kafka, first and foremost, you have an ecosystem led by the Confluent Schema Registry. The combination of validation-on-write and schema evolution has a huge impact on data consumers. Using the Schema Registry (assuming compatibility mode) guarantees your consumers that they will not crash due to de-serialization errors. Producers can feel comfortable evolving their schemas without that risk of impacting downstream, and the registry itself provides a way for all interested parties to understand the data. At least at the schema level. Schema Management using Confluent Schema Registry Kafka Connect is another important piece of the ecosystem. While AWS services are great at integrating between themselves, they are less than great with integrating with everything else. The closed garden simply doesn’t allow the community to come together and build those integrations. While Kinesis has integration capabilities using DMS it is well shy of the Kafka Connect ecosystem of integrations, and connecting your data streams to other systems in an easy and common way is another key to getting more from your data. A more technical piece of the ecosystem I’ll discuss is the client library. The Kinesis Producer Library is a daemon C++ process that is somewhat of a black box which is harder to debug and maintain of going rogue. The Kinesis Consumer Library is coupled with DynamoDB for offset management – which is another component to worry about (getting throughput exceptions for example). In my last two companies we have actually implemented our own, thinner (simpler) version of the Kinesis Producer Library. In that sense it is again the open source community and the popularity of Kafka that helps in having more mature clients (with the bonus of offsets being stored within Kafka). And then you get to a somewhat infamous point around AWS Kinesis – its read limits. A Kinesis shard allows you to make up to 5 read transactions per second. On top of the inherent latency that limit introduces, it is the coupling of different consumers that is most bothersome. The entire premise of streaming talks about decoupling business areas in a way to reduce coordination and just have data as an API. Sharing throughput across consumers mandates one to be aware of the other to make sure consumers are not eating away each other’s throughput. You can mitigate that challenge through Kinesis Enhanced Fan-Out but it does cost a fair bit more. Kafka on the other hand is bound by resources rather than explicit limits. If your network, memory, CPU and disk can handle additional consumers, no such coordination is needed. Worth noting that Kinesis being a managed service has to tune to fit the majority of the customer workload requirements (one size fits all), while with your own Kafka you can tailor fit it. GREAT, KAFKA. NOW WHAT? But know (and let your executives know) that all this good stuff is not enough to reach a nirvana of data innovation, which is why our investment in Kafka included a Streaming Platform team. The goal for the team is to make Kafka the central nervous system of events across Zillow Group. We got inspiration from companies like PayPal and Expedia and set these decisions: * We’ll delight our customers by meeting them where they are-Most of Zillow Group is using Terraform for its Infrastructure as Code solution. We have decided to build a Terraform provider for our platform. This also helps us to balance out between decentralization (not having a central team that needs to approve every production topic) and control (think how you prevent someone from creating a 10000 partitions topic). * We will invest heavily in metadata for discoverability The provisioning experience will include all necessary metadata to discover ownership, description, data privacy info, data lineage and schema information (Kafka only allows linking schemas by naming conventions). We will connect the metadata with our company’s Data Portal which helps people navigate through the entire catalog of data and removes tribal knowledge dependency. * We will help our customers adopt Kafka by removing the need to get into the complex details whenever possible – A configuration service that injects whatever producer/consumer configs you may require is helping achieve that, along with a set of client libraries for the most common use cases. * We will build company wide data ingestion patterns – mostly using Kafka Connect, but also by integrating with our Data Streaming Platform service which proxies Kafka. * We will connect with our Data Governance team as they build Data Contracts and Anomaly detection services – to be able to provide guarantees about the data within Kafka, and prevent the scenario of data engineers chasing upstream teams to understand what went wrong with the data. Resource Provisioning System using Terraform Lastly, before your pitch, get to know the numbers. How much does your organization spend on those AWS services? how much (time/effort/$$) it spends on the pain points I mentioned? Go ahead and research your different deployment options, from vanilla Kafka, Confluent (on-premise and cloud) and the newer AWS MSK. Good luck with your pitch! WANT TO WORK AT ZILLOW? View Openings HOW TO PITCH APACHE KAFKA Data & Analytics, Software Engineering HOW ZILLOW VALIDATES PUBLIC RECORD ADDRESSES Software Engineering OPEN SOURCE AT ZILLOW GROUP Software Engineering ZILLOW ENGINEERS BUILD INTERNAL APP TO HIGHLIGHT HOMES THROUGHOUT BLACK HISTORY Software Engineering THE AFFORDABLE HOUSING SEARCH TOOL: THE POWER OF SOCIALLY INCLUSIVE DESIGN, PARTNERSHIP AND CULTURE READ NEXT Data & Analytics, Software Engineering How Zillow Validates Public Record Addresses Software Engineering Open Source at Zillow Group Software Engineering Zillow Engineers Build Internal App to Highlight Homes Throughout Black History FEATURED * How Zillow Validates Public Record Addresses * Improving Recommendation Quality by Tapping into Listing Text * Open Source at Zillow Group * Automatic and Self-aware Anomaly Detection at Zillow Using Luminaire * How Zillow tests Clickstream Analytics RECENT * Using SageMaker for Machine Learning Model Deployment with Zillow Floor Plans * Guided Search - Personalized Search Refinements to Help Customers Find their Dream Home * Zillow Floor Plan: Training Models to Detect Windows, Doors and Openings in Panoramas * Zillow Women in Tech: A Conversation with Ei-Nyung Choi * Utilizing both Explicit & Implicit Signals to Power Home Recommendations * About * Zestimates * Research * Careers * Help * Advertise * Fair Housing Guide * Terms of use * Privacy Portal * Cookie Preference * Blog * AI * Mobile Apps * Trulia * StreetEasy * HotPads * Out East * ShowingTime Do Not Sell My Personal Information → Zillow Group is committed to ensuring digital accessibility for individuals with disabilities. We are continuously working to improve the accessibility of our web experience for everyone, and we welcome feedback and accommodation requests. If you wish to report an issue or seek an accommodation, please let us know. Zillow, Inc. holds real estate brokerage licenses in multiple states. Zillow (Canada), Inc. holds real estate brokerage licenses in multiple provinces. A list of our real estate licenses is available here. TREC: Information about brokerage services, Consumer protection notice California DRE #1522444 Contact Zillow, Inc. Brokerage For listings in Canada, the trademarks REALTOR®, REALTORS®, and the REALTOR® logo are controlled by The Canadian Real Estate Association (CREA) and identify real estate professionals who are members of CREA. The trademarks MLS®, Multiple Listing Service® and the associated logos are owned by CREA and identify the quality of services provided by real estate professionals who are members of CREA. Used under license. * Download on the App Store * Get it on Google play * * Follow us:FacebookVisit us on facebookInstagramVisit us on instagramTikTok LogoVisit us on tiktokTwitterVisit us on twitter * © 2006-2022 ZillowEqual Housing OpportunityEqual Housing Opportunity