www.acryldata.io
Open in
urlscan Pro
76.76.21.9
Public Scan
Submitted URL: https://pages.acryl.io/e3t/Ctc/GF*113/d1KqKk04/MW9pb-fKMLcW8kbWrr3RFwS1W3V7XmG57HpqnN4HzmfH3lYMRW7Y8-PT6lZ3mJW4Qj1mM8bK...
Effective URL: https://www.acryldata.io/blog/the-what-why-and-how-of-data-contracts?utm_medium=email&_hsmi=288122898&_hsenc=p2ANqtz-9n_p...
Submission: On January 08 via manual from IN — Scanned from DE
Effective URL: https://www.acryldata.io/blog/the-what-why-and-how-of-data-contracts?utm_medium=email&_hsmi=288122898&_hsenc=p2ANqtz-9n_p...
Submission: On January 08 via manual from IN — Scanned from DE
Form analysis
1 forms found in the DOMPOST https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/14552909/fb81cbcf-c1ce-41c7-8e91-79d56105e507
<form id="hsForm_fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" method="POST" accept-charset="UTF-8" enctype="multipart/form-data" novalidate=""
action="https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/14552909/fb81cbcf-c1ce-41c7-8e91-79d56105e507"
class="hs-form-private hsForm_fb81cbcf-c1ce-41c7-8e91-79d56105e507 hs-form-fb81cbcf-c1ce-41c7-8e91-79d56105e507 hs-form-fb81cbcf-c1ce-41c7-8e91-79d56105e507_c5d76315-f404-46ad-90a9-734b656aff34 hs-form stacked"
target="target_iframe_fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" data-instance-id="c5d76315-f404-46ad-90a9-734b656aff34" data-form-id="fb81cbcf-c1ce-41c7-8e91-79d56105e507" data-portal-id="14552909"
data-hs-cf-bound="true">
<div class="hs_email hs-email hs-fieldtype-text field hs-form-field"><label id="label-email-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" class="" placeholder="Enter your "
for="email-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05"><span></span></label>
<legend class="hs-field-desc" style="display: none;"></legend>
<div class="input"><input id="email-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" name="email" required="" placeholder="Your Email Address*" type="email" class="hs-input" inputmode="email"
autocomplete="email" value=""></div>
</div>
<div class="hs_utm_source hs-utm_source hs-fieldtype-text field hs-form-field" style="display: none;"><label id="label-utm_source-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" class=""
placeholder="Enter your utm_source" for="utm_source-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05"><span>utm_source</span></label>
<legend class="hs-field-desc" style="display: none;"></legend>
<div class="input"><input name="utm_source" class="hs-input" type="hidden" value="hs_email"></div>
</div>
<div class="hs_utm_medium hs-utm_medium hs-fieldtype-text field hs-form-field" style="display: none;"><label id="label-utm_medium-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" class=""
placeholder="Enter your utm_medium" for="utm_medium-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05"><span>utm_medium</span></label>
<legend class="hs-field-desc" style="display: none;"></legend>
<div class="input"><input name="utm_medium" class="hs-input" type="hidden" value="email"></div>
</div>
<div class="hs_utm_campaign hs-utm_campaign hs-fieldtype-text field hs-form-field" style="display: none;"><label id="label-utm_campaign-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" class=""
placeholder="Enter your utm_campaign" for="utm_campaign-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05"><span>utm_campaign</span></label>
<legend class="hs-field-desc" style="display: none;"></legend>
<div class="input"><input name="utm_campaign" class="hs-input" type="hidden" value=""></div>
</div>
<div class="hs_utm_content hs-utm_content hs-fieldtype-text field hs-form-field" style="display: none;"><label id="label-utm_content-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" class=""
placeholder="Enter your utm_content" for="utm_content-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05"><span>utm_content</span></label>
<legend class="hs-field-desc" style="display: none;"></legend>
<div class="input"><input name="utm_content" class="hs-input" type="hidden" value="288122898"></div>
</div>
<div class="hs_utm_term hs-utm_term hs-fieldtype-text field hs-form-field" style="display: none;"><label id="label-utm_term-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" class=""
placeholder="Enter your utm_term" for="utm_term-fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05"><span>utm_term</span></label>
<legend class="hs-field-desc" style="display: none;"></legend>
<div class="input"><input name="utm_term" class="hs-input" type="hidden" value=""></div>
</div>
<div class="hs_submit hs-submit">
<div class="hs-field-desc" style="display: none;"></div>
<div class="actions"><input type="submit" class="hs-button primary large" value="Get updates"></div>
</div><input name="hs_context" type="hidden"
value="{"embedAtTimestamp":"1704714240602","formDefinitionUpdatedAt":"1696975841100","lang":"en","embedType":"REGULAR","renderRawHtml":"true","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.129 Safari/537.36","pageTitle":"The What, Why, and How of Data Contracts","pageUrl":"https://www.acryldata.io/blog/the-what-why-and-how-of-data-contracts?utm_medium=email&_hsmi=288122898&_hsenc=p2ANqtz-9n_pNxorIwd6dZhASv3jr44r779lz1agP7_c3kGwbbdE4shsH_q1_g6PZJ8YQUAVadKmZbDJSS3ok-CnwseIdn82ySiU7iLAdHcL2CATz0-wdoQGA&utm_content=288122898&utm_source=hs_email","urlParams":{"utm_medium":"email","_hsmi":"288122898","_hsenc":"p2ANqtz-9n_pNxorIwd6dZhASv3jr44r779lz1agP7_c3kGwbbdE4shsH_q1_g6PZJ8YQUAVadKmZbDJSS3ok-CnwseIdn82ySiU7iLAdHcL2CATz0-wdoQGA","utm_content":"288122898","utm_source":"hs_email"},"isHubSpotCmsGeneratedPage":false,"hutk":"3850d9704d764ef10c3c4e6e7eb89659","__hsfp":1132539230,"__hssc":"209249869.1.1704714241146","__hstc":"209249869.3850d9704d764ef10c3c4e6e7eb89659.1704714241145.1704714241145.1704714241145.1","formTarget":"#form-18ace317-1c20-40c9-aa6f-45d0e0566f05","formInstanceId":"instance-18ace317-1c20-40c9-aa6f-45d0e0566f05","rumScriptExecuteTime":839.0999999046326,"rumTotalRequestTime":1055.0999999046326,"rumTotalRenderTime":1079.0999999046326,"rumServiceResponseTime":216,"rumFormRenderTime":24,"locale":"en","timestamp":1704714241152,"originalEmbedContext":{"portalId":"14552909","formId":"fb81cbcf-c1ce-41c7-8e91-79d56105e507","region":"na1","target":"#form-18ace317-1c20-40c9-aa6f-45d0e0566f05","isBuilder":false,"isTestPage":false,"isPreview":false,"formInstanceId":"instance-18ace317-1c20-40c9-aa6f-45d0e0566f05","isMobileResponsive":true},"correlationId":"c5d76315-f404-46ad-90a9-734b656aff34","renderedFieldsIds":["email","utm_source","utm_medium","utm_campaign","utm_content","utm_term"],"captchaStatus":"NOT_APPLICABLE","emailResubscribeStatus":"NOT_APPLICABLE","isInsideCrossOriginFrame":false,"source":"forms-embed-1.4371","sourceName":"forms-embed","sourceVersion":"1.4371","sourceVersionMajor":"1","sourceVersionMinor":"4371","allPageIds":{},"_debug_embedLogLines":[{"clientTimestamp":1704714240676,"level":"INFO","message":"Retrieved pageContext values which may be overriden by the embed context: {\"pageTitle\":\"The What, Why, and How of Data Contracts\",\"pageUrl\":\"https://www.acryldata.io/blog/the-what-why-and-how-of-data-contracts?utm_medium=email&_hsmi=288122898&_hsenc=p2ANqtz-9n_pNxorIwd6dZhASv3jr44r779lz1agP7_c3kGwbbdE4shsH_q1_g6PZJ8YQUAVadKmZbDJSS3ok-CnwseIdn82ySiU7iLAdHcL2CATz0-wdoQGA&utm_content=288122898&utm_source=hs_email\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.129 Safari/537.36\",\"urlParams\":{\"utm_medium\":\"email\",\"_hsmi\":\"288122898\",\"_hsenc\":\"p2ANqtz-9n_pNxorIwd6dZhASv3jr44r779lz1agP7_c3kGwbbdE4shsH_q1_g6PZJ8YQUAVadKmZbDJSS3ok-CnwseIdn82ySiU7iLAdHcL2CATz0-wdoQGA\",\"utm_content\":\"288122898\",\"utm_source\":\"hs_email\"},\"isHubSpotCmsGeneratedPage\":false}"},{"clientTimestamp":1704714240678,"level":"INFO","message":"Retrieved countryCode property from normalized embed definition response: \"DE\""},{"clientTimestamp":1704714241149,"level":"INFO","message":"Retrieved analytics values from API response which may be overriden by the embed context: {\"hutk\":\"3850d9704d764ef10c3c4e6e7eb89659\"}"}]}"><iframe
name="target_iframe_fb81cbcf-c1ce-41c7-8e91-79d56105e507_instance-18ace317-1c20-40c9-aa6f-45d0e0566f05" data-lf-form-tracking-inspected-ywvko4xp0de4z6bj="true" data-lf-yt-playback-inspected-ywvko4xp0de4z6bj="true"
data-lf-vimeo-playback-inspected-ywvko4xp0de4z6bj="true" style="display: none;"></iframe>
</form>
Text Content
Products Products Acryl DataHub Acryl Observe DATAHUB WORKFLOWS FOR DATA PLATFORM & GOVERNANCE LEADS Data powers crucial decision-making and insight generation at a wide variety of organizations and businesses. It’s frequently up to data Customer StoriesCustomer Stories BlogBlog CommunityCommunity Company Company About Careers Privacy “Acryl Data’s vision is to bring clarity to your data through its next generation multi-cloud metadata management platform.” Swaroop Jagadish, Co-Founder and CEO Join our Slack Join DataHub Slack Community Book a Demo BACK TO ALL POSTS THE WHAT, WHY, AND HOW OF DATA CONTRACTS Data Contract Data Engineering Metadata Data Quality Data Practitioner Maggie Hays Mar 14, 2023 Data Contract Data Engineering Metadata Data Quality Data Practitioner Ah, Data Contracts — one of the buzziest topics in the data world. Despite the topic flooding my LI/Reddit/Substack/Medium feeds, I found myself repeatedly scratching my head, trying to make sense of the hype. I wanted to get to the bottom of this, so I crowd-sourced questions and hosted an AMA with Chad Sanderson (one of the biggest proponents of data contracts) and Shirshanka Das (co-founder at Acryl Data) to talk about all things data contracts: * The What: What, exactly, is a data contract? * The Why: Why do data contracts matter? What are the core use cases behind them? What problems do they solve? * The How: How do we implement data contracts? How do we start building them into our data stack? There’s a lot to unpack here — let’s dig in! FIRST THING FIRST: MEET THE EXPERTS Chad Sanderson, one of the most prolific voices in the data platform and quality space, runs the Data Quality Camp community. Chad writes at length (https://dataproducts.substack.com/) about data, data products, data modeling, and the future of data engineering and architecture. Shirshanka Das is the CEO and Co-Founder of Acryl Data (https://www.acryldata.io/), the company maintaining the open-source DataHub project. He spent almost a decade at LinkedIn leading its data platform strategy and founded the DataHub project. He continues to lead the charge on DataHub’s developer-led approaches for modern data discovery, quality, and automated governance. THE WHAT: DEFINING A DATA CONTRACT Let’s start with the basics. WHAT, EXACTLY, IS A DATA CONTRACT? At its core, a data contract is an agreement between a producer and a consumer that clearly defines the following: * what data needs to move from a (producer’s) source to a (consumer’s) destination * the shape of that data, its schema, and semantics * expectations around availability and data quality * details about contract violation(s) and enforcement * how (and for how long) the consumer will use the data DATA CONTRACTS CLEARLY DEFINE ROLES & RESPONSIBILITIES Data contracts are bi-directional: an effective data contract sets clear expectations for both the producer and consumer of data. Even more, it holds both producers and consumers accountable for adherence to the contract and is frequently revisited and renegotiated as use cases and/or relevant parties evolve. This ensures the producer reliably generates high-quality and timely data while enforcing how that data is used downstream. This could mean auditing who has access, how it has been shared with others, or how it has been used/replicated for unforeseen use cases. ISN’T A DATA CONTRACT JUST A ________? DATA CONTRACTS VS. DATASET DDL (DATA DEFINITION LANGUAGE) Dataset DDL defines the physical storage of data — what your technology will or will not accept as a new record within the storage layer. While dataset DDL is undoubtedly a part of the data contract, it fails to capture semantic detail (what the data represents), data retention policies (how long the data can be stored), SLA/SLO requirements (when the data will reliably be available for consumption), and more. DATA CONTRACTS VS. DATA PRODUCTS Look at contracts as inputs to data products: a mechanism on which actual data products can be constructed and fulfilled. A data product can have multiple data contracts, and multiple data products can rely on the same data contract(s). THE WHY: WHY SHOULD WE CARE ABOUT DATA CONTACTS? Data practitioners’ workflows commonly include rapid iteration and prototyping to find specific slices and dices of data to address business needs. Whether building BI reporting tools, analyses, or training datasets for ML models, it’s expected that data practitioners prioritize speed to delivering business value over long-term scalability. By the time a data asset/data product is deployed to production, it’s highly likely to be multiple steps of enrichment and transformation removed from its source. The numerous layers of abstraction make it difficult for original data producers to understand which fields/attributes are critical to driving business value. Introducing a data contract for these prod-level assets is an effective way to align producers and consumers on the following: * technical schema requirements to be enforced upstream to minimize the impact of dropped columns, changes in data types, etc. * field- and dataset-level quality assertions to ensure high accuracy in output; no more “garbage-in, garbage-out” * Service Level Objectives to set guarantees of when the data will be available for processing * retention and masking policies to minimize compliance risk * in-scope business use cases to provide line-of-sight to data producers of how their resources are driving revenue THE HOW: WHERE DO DATA CONTRACTS FIT WITHIN OUR STACKS? Don’t overthink this one. You can introduce a contract anywhere you see a handoff between a producer and consumer. Keep in mind that you & your team may act as the producer *and* the consumer in your ETL pipelines. No matter where that handoff happens, contracts should be version-controlled, easily discoverable, and programmatically enforced. Some suggestions are to define your technical schema with Protobuf, Avro, or the like and store it within a registry. If you use Kafka or Confluent, the Kafka schema registry is a great starting point, but even GitHub works just fine to store contracts. While you need a way to discover/catalog your contracts, you must also detect and flag violations and take action based on them. This means you must run monitors, programmatically prevent breaking changes, and isolate bad data for review. Here are three ways to take action against violations: * The CI/CD workflow — Eg: evaluate and prevent schema-breaking changes before they are deployed. * On the data itself — If you’re using a stream processing system, you can check each data record to validate that it meets the contract’s expectations. Any contract violations are sent to an isolated queue for review, preventing low-quality records from entering the data product. * Through a monitoring layer — In this case, after the data arrives, you can look at the statistical distributions of the data and detect any unexpected changes in the shape of the data. MAKING A BUSINESS CASE FOR DATA CONTRACT > You manage the rest of your software as code. Why not your data? This, Shirshanka shared, resonates with executive leaders — given they are already bought in on the idea. Focus on the principle of ‘managing data using software engineering practices.’ The most effective way to secure funding for data contracts is to take advantage of existing initiatives and implement them iteratively on a subset of the data stack. MANAGING DATA CONTRACTS AT SCALE The big challenge in managing contracts is less of a technical challenge and more of a social-cultural challenge. You need to get people who don’t think about downstream data use cases to change their approach and consider playing an active engineering role around the data. Here’s an approach Chad recommends based on his work at Convoy: STEP 1: SPREAD AWARENESS The first step is building awareness of how producers’ data is leveraged downstream. Convoy had a data contract mechanism for defining column-level dependencies between data sources. Any time an engineer went to change a data source, they could easily see what impact that would have on downstream assets: what would potentially break, the use case, and how important it was. That went a long way in helping engineers understand the impact of breaking the contract and generating accountability. STEP 2: MEET PEOPLE WHERE THEY ARE At Convoy, a contract was implemented and defined through a schema registry and a schema serialization framework. Software engineers would use an SDK to define and push new versions of contracts. If backward-incompatible changes were detected, they surfaced in their GitHub flow. Whenever possible, meet people where they are and introduce as little change to their existing workflows as possible. The more deviation from their current workflow, the harder it will be to scale. DATA CONTRACTS AND THE MODERN DATA CATALOG The cost of creating a single data contract is non-trivial, and managing a large volume of contracts can quickly become challenging; you must ensure that you’re creating contracts on the most valuable data assets. The data catalog and its underlying metadata graph can help you prioritize which assets require a contract by using the following: * data lineage to understand how often business-critical downstream assets reference a dataset * data quality assertions and profiling results to determine a dataset’s reliability Companies like Optum, Saxo Bank, Zendesk, etc., already use this approach. If you’re looking for inspiration, check out how Stripe uses DataHub to solve their observability changes by encoding their data contracts in the Airflow DAGs. STARTING THE DATA CONTRACT JOURNEY: ADVICE AND RECOMMENDATIONS START SMALL Start with valuable, revenue-generating use cases. Introduce constraints gradually. Start with one or two meaningful and easy-to-debug constraints and introduce more nuanced use cases over time. LEVERAGE WHAT YOU HAVE Don’t look at data contracts as a net-new phenomenon. Maybe you’re already using dbt Tests or encoding quality checks within your Airflow DAGs — treat that as your starting point and build from there. Phew, we made it through. I hope this cleared up a concept or two to help you get started with data contracts. Best of luck on your data contract journey! CONNECT WITH DATAHUB Join us on Slack • Sign up for our Newsletter • Follow us on Twitter Data Contract Data Engineering Metadata Data Quality Data Practitioner NEXT UP ACRYL CLOUD FOR DATA LEADERS AND PRACTITIONERS Data work is a true team sport. Each and every data asset is the product of a clear distribution of labor, with people in a diversity of roles—including data practitioners, software developers, architects, governance authorities, and business domain experts—working collaboratively. Swaroop Jagadish 2023-12-11 DETECTING DEEP DATA QUALITY ISSUES WITH COLUMN-LEVEL ASSERTIONS You're a data engineer at a boutique e-commerce start-up. Your company sells luxury goods at steep discounts. One of your many responsibilities involves monitoring the "flash_sale_purchase_events" table in your start-up’s Snowflake data warehouse. Updates to columns in this table are supposed to reflect real-time participation by customers in the limited-time flash sales your company offers. John Joyce 2023-12-11 EXTRACTING COLUMN-LEVEL LINEAGE FROM SQL We built a SQL lineage parser that's schema-aware and can generate accurate column-level lineage from SQL queries. In our tests, it works significantly better than other open-source, Python-based lineage tools. Harshal Sheth 2023-11-03 Get started with Acryl today. Acryl Data delivers an easy to consume DataHub platform for the enterprise See it in action Acryl DataHub Acryl Observe Customer Stories CommunityBlog About CareersPrivacy utm_source utm_medium utm_campaign utm_content utm_term TermsPrivacySecurity © 2024 Acryl Data