www.acceldata.io Open in urlscan Pro
34.249.200.254  Public Scan

Submitted URL: https://messages.acceldata.io/e3t/Ctc/L1+113/d4H-vx04/MXhDd_wth_VW6gC2bp89dKGYW7RdXLG547-syN6crjpM3qgyTW8wLKSR6lZ3mhW7gyl-33XP...
Effective URL: https://www.acceldata.io/guide/data-reliability-best-practices?utm_campaign=Newsletter%20Campaign%20-%20September&utm_med...
Submission: On October 02 via api from ES — Scanned from ES

Form analysis 2 forms found in the DOM

<form id="mktoForm_1086" data-faitracker-form-bind="true" data-hs-cf-bound="true" novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutLeft" data-styles-ready="true">
  <style type="text/css">
    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton {
      color: #fff;
      border: 1px solid #75ae4c;
      padding: 0.4em 1em;
      font-size: 1em;
      background-color: #99c47c;
      background-image: -webkit-gradient(linear, left top, left bottom, from(#99c47c), to(#75ae4c));
      background-image: -webkit-linear-gradient(top, #99c47c, #75ae4c);
      background-image: -moz-linear-gradient(top, #99c47c, #75ae4c);
      background-image: linear-gradient(to bottom, #99c47c, #75ae4c);
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:hover {
      border: 1px solid #447f19;
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:focus {
      outline: none;
      border: 1px solid #447f19;
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:active {
      background-color: #75ae4c;
      background-image: -webkit-gradient(linear, left top, left bottom, from(#75ae4c), to(#99c47c));
      background-image: -webkit-linear-gradient(top, #75ae4c, #99c47c);
      background-image: -moz-linear-gradient(top, #75ae4c, #99c47c);
      background-image: linear-gradient(to bottom, #75ae4c, #99c47c);
    }
  </style>
  <div class="mktoFormRow">
    <div class="mktoFieldDescriptor mktoFormCol">
      <div class="mktoOffset"></div>
      <div class="mktoFieldWrap mktoRequiredField"><label for="Email" id="LblEmail" class="mktoLabel mktoHasWidth">
          <div class="mktoAsterix">*</div>
        </label>
        <div class="mktoGutter mktoHasWidth"></div><input id="Email" name="Email" placeholder="Email Address" maxlength="255" aria-labelledby="LblEmail InstructEmail" type="email" class="mktoField mktoEmailField mktoHasWidth mktoRequired"
          aria-required="true"><span id="InstructEmail" tabindex="-1" class="mktoInstruction"></span>
        <div class="mktoClear"></div>
      </div>
      <div class="mktoClear"></div>
    </div>
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utmsource" class="mktoField mktoFieldDescriptor mktoFormCol" value="hs_email">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utmmedium" class="mktoField mktoFieldDescriptor mktoFormCol" value="email">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utmcampaign" class="mktoField mktoFieldDescriptor mktoFormCol" value="Newsletter Campaign - September">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoButtonRow"><span class="mktoButtonWrap mktoSimple"><button type="submit" class="mktoButton" data-faitracker-form-bind="true" data-faitracker-click-bind="true">Sign Up</button></span></div><input type="hidden" name="formid"
    class="mktoField mktoFieldDescriptor" value="1086"><input type="hidden" name="munchkinId" class="mktoField mktoFieldDescriptor" value="416-IGP-956">
</form>

<form data-faitracker-form-bind="true" data-hs-cf-bound="true" novalidate="novalidate"
  style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); visibility: hidden; position: absolute; top: -500px; left: -1000px; width: 1600px;" class="mktoForm mktoHasWidth mktoLayoutLeft"></form>

Text Content

Acceldata Acquires AI Leader Bewgle to Deepen Data Observability Capabilities
for AI | Learn More

This is some text inside of a div block.

Platform

Platform Overview

Integrations
Watch this 10-min demo
Get an overview of the
Acceldata Data Observability Cloud
Watch The Demo Video

Solutions

By Use Case

Spend Intelligence

Data Reliability

Operational Intelligence
By Technology

Snowflake

Databricks

Hadoop

AWS
By PERSONA

Data Engineer

Data Executive

Platform Engineer

Database Admin

FinOps Practitioner
Resources

Resource Center

Ebooks, Guides, Case Studies, Articles and Videos


Events & Webinars

Upcoming and Past Events & Webinars


Blog
Support

Technical support from our experts


Documentation

Information on using our products


Open Source

Why Data Observability
Report
Top 10 Use Cases for Data Observability

Get the report

Company

About us

Our Story, Leadership team, Investors and Customers


Why Acceldata

Why data-driven enterprises trust Acceldata


Customers

Read case studies

Partners

Join our Partnership Program


Newsroom

Careers

Contact Us
Recent News
Acceldata Extends Enterprise Data Observability Capabilities into AI with
Acquisition of Bewgle
Acceldata to Co-Chair Enterprise Data Summit, Industry Event Featuring Speakers
from Saks Fifth Avenue, T-Mobile, Intuit, Credit Karma, Bayer, and More
Hitachi Vantara Introduces Data Reliability Engineering Services to Optimize
Data Ecosystems
Request A Demo





DATA RELIABILITY BEST PRACTICES



Download PDF
In this article:
Solving Increasingly Complex Data Supply Chain Issues
The Role of Data Reliability
Cost Implications of Poor Data
Best Practices for Data Reliability
Effective Use of Automation
Get the Entire Team Involved
Resulting Benefits From Data Reliability
Conclusion
Subscribe to Our Newsletter
*







Sign Up


Best practices provide the foundation on which great data teams can optimize
their platforms, processes, and operations.  They are well-established in many
mature product categories, and provide guardrails to development and engineering
teams that enable them to innovate, move quickly, and adapt to changing product
and market needs.

In emerging sectors such as data observability, best practices not only allow
data teams to optimize their efforts but also deliver a learning experience for
“how to” and “what to do.”

In this guide, we will outline some best practices for data reliability, which
is an essential component of data observability. As data engineering teams ramp
up their data reliability efforts, these best practices can show teams how to
effectively scale their efforts in ways that don’t require significant
investment in new resources.


SOLVING INCREASINGLY COMPLEX DATA SUPPLY CHAIN ISSUES

As analytics have become increasingly critical to an organization’s operations,
more data than ever is being captured and fed into analytics data stores, which
helps enterprises make decisions with greater accuracy. 

This data comes from a variety of - internally from applications and
repositories, and externally from service providers and independent data
producers. For companies that produce data products, an even greater percentage
of their data may come from external sources. And since the end product is the
data itself, reliably bringing together the data with high degrees of quality is
critical. In essence, data of high quality can help the organization achieve
competitive advantages and continuously deliver innovative, market-leading
products. Bad quality data will deliver bad outcomes and create bad products.
That can break the business.

The data pipelines that feed and transform data for consumption are increasingly
complex. The pipelines can break at any point due to data errors, poor logic, or
the necessary resources not being available to process the data.


THE ROLE OF DATA RELIABILITY

The data within the data pipelines that manage the data supply chains can 
generally be broken down into three sections:

 * The data landing zone, where source data is fed,
 * The transformation zone, where data is transformed into its final format, and
 * The consumption zone, where data is in its’ final format and is accessed by
   users.

In the past, most organizations would only apply data quality tests in the final
consumption zone due to resource and testing limitations.  The role of modern
data reliability is to check data in any of these three zones as well as to
monitor the data pipelines that are moving and transforming the data.


COST IMPLICATIONS OF POOR DATA

In software development, as well as other processes, there is the 1 x 10 x 100
rule which applies to the cost of fixing problems at different stages of the
process. In essence, it says that for every $1 it costs to detect and fix a
problem in development, it costs $10 to fix the problem when that problem is
detected in the QA/staging phase, and $100 to detect and fix it once the
software is in production.

The same rule can be applied for data pipelines and supply chains. For every $1
it costs to detect and fix a problem in the landing zone, it costs $10 to detect
and fix a problem in the transformation zone, and $100 to detect and fix it in
the consumption zone.

To effectively manage data and data pipelines, data incidents need to be
detected as early as possible in the supply chain. This helps data team managers
optimize resources, control costs, and produce the best possible data product.


BEST PRACTICES FOR DATA RELIABILITY

As with many other processes, both in the software world and other industries,
utilizing best practices for data reliability allows data teams to operate
effectively and efficiently. Following best practices helps teams produce
valuable, consumable data and deliver according to service level agreements
(SLAs) with the business.

Best practices also allow data teams to scale their data reliability efforts in
these ways: 

 * Scaling up to increase the number of quality tests on a data asset.
 * Scaling out to increase the number of data assets that are covered.
 * Scaling the data incident management to quickly correct issues.

Let’s explore some areas of best practices for data reliability.


DATA RELIABILITY ACROSS THE ENTIRE SUPPLY CHAIN

We mentioned earlier how data supply chains have gotten increasingly complex.
This complexity is manifested through things like:

 * The increasing number of sources that are being fed.
 * The sophistication of the logic used to transform the data.
 * The amount of resources required to process the data.

We roughly grouped data into the zones – the landing zone, the transformation
zone, and the consumption zone. Our first best practice is to apply data
reliability checks across all three zones and over the data pipelines.  This
allows us to detect and remediate issues such as:

 * Erroneous or low-quality data from sources in the consumption zone.
 * Poor quality data in the transformation and consumption zones due to faulty
   logic or pipeline breakdowns.
 * Stale data in the consumption zone due to data pipeline failures.


SHIFT-LEFT DATA RELIABILITY

Consider that data pipelines flow data from left to right from sources into the
data landing zone, transformation zone, and consumption zone. Where data was
once only checked in the consumption zone, today’s best practices call for data
teams to “shift-left” their data reliability checks into the data landing zone.

The result of shift-left data reliability is earlier detection and fast
correction of data incidents. It also keeps bad data from spreading further
downstream where it might be consumed by users and could result in poor and
misinformed decision-making.

The 1 x 10 x 100 rule applies here. Earlier detection means data incidents are
corrected quickly and efficiently at the lowest possible cost (the $1). If data
issues were to spread downstream they would impact more data assets becoming far
more costly to correct (the $10 or $100).


EFFECTIVE USE OF AUTOMATION

With data becoming increasingly sophisticated, manually writing a wide number
and variety of data checks can be time-consuming and error-prone. A third best
practice is effectively using automation features in a data reliability
solution.  

The Acceldata Data Observability platform combines artificial intelligence,
metadata capture, data profiling, and data lineage to gain insights into the
structure and composition of your data assets and pipelines. Using AI,
Acceldata:

 * Scours the data looking for multiple ways in which it can be checked for
   quality and reliability issues.
 * Makes recommendations to the data team on what rules/policies to use and
   automates the process of putting the policies in place.
 * Automates the process of running the policies and constantly checks the data
   assets against the rules.

The Data Observability platform also uses AI to automate more sophisticated
policies such as data drift and the process of data reconciliation used to keep
data consistent across various data assets. Acceldata uses the data lineage to
automate the work of tracking data flow among assets during data pipeline runs
and correlates performance data from the underlying data sources and
infrastructure so data teams can identify the root cause of data incidents.


SCALING

Because the number of data assets and pipelines continues to grow, there is a
corresponding growth in data volume. It is critical for data teams to use best
practices to scale their data reliability efforts, and as, we saw earlier, there
are three forms of scaling your data reliability: scaling up, out, and your
incident response.  Let’s explore two of these:

 * Scale Up: Using automation features such as those described above, data teams
   can scale up the number of tests and checks that are performed on a data
   asset and put in place more sophisticated checks such as schema- and data
   drift. Other policies can also be automated such as data reconciliation.
 * Scale Out: Automation features help with scaling out, but creating templated
   policies that can contain multiple rules which can be applied to data assets
   in one clean sweep, and applying the policies across many data assets helps
   data teams gain greater data reliability coverage on more assets.


CONTINUOUS MONITORING

Our last form of scaling is incident management. With data pipelines running
more frequently and touching more data and the business teams’ increased
dependency on data, there needs to be continuous monitoring to keep the data
healthy and flowing properly. A principal ingredient of that is effectively
handling incident management. 

Having a consolidated incident management and troubleshooting operation control
center allows data teams to get continuous visibility into data health and
enables them to respond rapidly to incidents. Data teams can avoid being the
“last to know” when incidents occur, and can respond proactively.

To enable continuous monitoring, your data observability platform should have a
scalable processing infrastructure. This facilitates the scale-up and scale-out
capabilities mentioned earlier and it allows tests to be run frequently.

To support continuous monitoring, data reliability dashboards and control
centers should be able to:

 * Offer instantaneous, 360o insights into data health.
 * Provide alerts and information on incidents when they occur.
 * Integrate with popular IT notification channels such as Slack.
 * Allow data teams to drill down into data about the incident to identify the
   root cause.


IDENTIFYING AND PREVENTING ISSUES

Quickly identifying the root cause of data incidents and remedying them is
critical to ensure data teams are responsive to the business and meet SLAs. To
meet these goals, data teams need as much information as possible about the
incident and what was happening at the time it occurred.

Acceldata provides correlated, multi-layer data on data assets, data pipelines,
data infrastructure, and the incidents at the time they happened. This data is
continuously captured over time providing a rich history of information on data
health.

Armed with this information, data teams can implement practices such as:

 * Perform root cause of any incident and make adjustments to data assets, data
   pipelines, and data infrastructure accordingly.
 * Automatically re-run data pipelines when incidents occur to quickly recover.
 * Weed out bad or erroneous data rows to keep data flowing without the
   low-quality rows.
 * Compare execution, timeliness, and performance at different points in time to
   see what’s changing.
 * Perform time-series analysis to determine if data assets, pipelines, or
   infrastructure is fluctuating or deteriorating.


OPTIMIZATION

Data volumes are constantly growing and new data pipelines and data assets put
additional load and strain on the data infrastructure. Continuous optimization
is another data reliability best practice data teams should embrace.  

Multi-layer data observability data can provide a great deal of detailed
information about incidents, execution, performance, timeliness, and cost.  Not
only can this information provide insights to identify the root cause of
problems, but it can also provide tips on how to optimize your data assets,
pipelines, and infrastructure.

Acceldata provides such detailed multi-layer data insights and goes beyond by
making recommendations on how to optimize your data, data pipelines, and
infrastructure and in some cases automate the adjustments. These recommendations
are highly tuned and specific to the underlying data platforms being used, such
as Snowflake, Databricks, Spark, and Hadoop.


GET THE ENTIRE TEAM INVOLVED

Data teams are skilled at knowing the technical aspects of the data and the
infrastructure supporting it.  However, they may not be as aware of the nuances
of the data content and how the business teams use the data. This is more in the
domain of data analysts or scientists.

Another best practice for data reliability is to get a wider team involved in
the process. Data analysts can contribute more business-oriented data quality
checks. They can also collaborate with data teams to determine tolerances on
data quality checks (e.g., percentage of null values that is acceptable) and the
best timing of data pipelines for data freshness that meets the needs of the
business.

Acceldata provides collaborative, easy-to-use low-code and no-code tools with
automated data quality checks and recommendations so data analysts, who might
have sophisticated programming skills, can easily set up their own data quality
and reliability checks. Acceldata offers role-based security to ensure different
members of the wider data team work securely.


RESULTING BENEFITS FROM DATA RELIABILITY

Implementing data reliability best practices will result in a number of
benefits, including:

 * Better deployment of your data engineering resources to maintain or lower
   data engineering costs.
 * Better management and visibility of the data infrastructure to keep those
   costs low.
 * Lower legal risk around your data.
 * Keeping a high reputation for your data products and increasing trust in the
   data.
 * Maintaining strong compliance around your data and eliminating potential
   fines for non-compliance.


CONCLUSION

Best practices are an essential part of every domain in the IT and data world
and that now includes the category of data reliability.  Best practices not only
allow teams to optimize their efforts and eliminate problems, but they also
provide a faster ramp-up in the solution area.

In this document we have described a number of key best practices for data
reliability that data teams can incorporate, including:

 * Apply tests across the entire supply chain.
 * Shift-left your data reliability to test data before it hits your
   warehouse/lakehouse.
 * Effectively use automation in data reliability solutions.
 * Scale up and scale out your data reliability to increase your coverage.
 * Continuously monitor and receive alerts to scale your data incident
   management.
 * Take full advantage of detailed multi-layer data to rapidly solve data
   incidents.
 * Continuously optimize your data assets, pipelines, and infrastructure using
   recommendations from your solution.
 * Get the wider team involved in your data reliability processes and efforts.

These best practices enable data teams to scale their efforts in ways that don’t
require significant investment in new resources while also allowing them to run
efficient and effective data operations.  Incorporate these into your planning
and everyday work for smooth data reliability processes.

‍


DATA RELIABILITY BEST PRACTICES



Download now



LEARN MORE ABOUT DATA OBSERVABILITY


DATA PIPELINES: HOW TO OPTIMIZE AT SCALE WITH DATA OBSERVABILITY

get the Guide



HOW TO OPTIMIZE YOUR ENTERPRISE DATA COSTS USING SPEND INTELLIGENCE

get the Guide



3 WAYS TO OPTIMIZE YOUR SNOWFLAKE SPEND

get the Guide

View More Resources
 * 
 * Enterprise data observability for the modern data stack
 * 

 * 
 * 


 * PLATFORM

 * Platform Overview
 * Integrations


 * BY USE CASE

 * Spend Intelligence
 * Data Reliability
 * Operational Intelligence


 * BY TECHNOLOGY

 * Hadoop
 * Databricks
 * Snowflake
 * AWS


 * BY PERSONA

 * Data Executive
 * Platform Engineer
 * Data Engineer
 * Database Admin
 * FinOps Practitioner


 * RESOURCES

 * Resource center
 * Events and Webinars
 * Open Source
 * Documentation
 * 
 * Support
 * Why Data Observability
 * Articles
 * Blog


 * COMPANY

 * About
 * Why Acceldata
 * Customers
 * Partners
 * Newsroom
 * Careers
 * Contact Us

©2023 Acceldata Inc — All Rights Reserved.
Privacy PolicyCookiesSaaS Terms & ConditionsTerms of Use


×
×
×
Notice
This website uses cookies or similar technologies as specified in the cookie
policy. You can consent to the use of such technologies by closing this notice.
Press again to continue 0/2
Accept
Created with iubenda