linkurious.com Open in urlscan Pro
51.159.75.186  Public Scan

Submitted URL: https://lemtrail.getlinkurious.com/api/t/c/usr_nTkDcQLDAz8hZyLXk/tsk_5qHMnoPH3FPcGQHuv/enc_U2FsdGVkX1_l2Y7lE7C698pFqPwVIbkLu69KZPNZ...
Effective URL: https://linkurious.com/blog/how-to-track-and-visualize-data-lineage/
Submission: On June 06 via api from CH — Scanned from FR

Form analysis 1 forms found in the DOM

POST https://forms-eu1.hsforms.com/submissions/v3/public/submit/formsnext/multipart/25115245/7ddfdc26-7f3b-4b6c-b659-1de0fdf94276

<form novalidate="" accept-charset="UTF-8" action="https://forms-eu1.hsforms.com/submissions/v3/public/submit/formsnext/multipart/25115245/7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" enctype="multipart/form-data"
  id="hsForm_7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" method="POST"
  class="hs-form stacked hs-form-private hsForm_7ddfdc26-7f3b-4b6c-b659-1de0fdf94276 hs-form-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276 hs-form-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276_9d6c2953-889a-4ff6-aa9c-976d3c886458"
  data-form-id="7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" data-portal-id="25115245" target="target_iframe_7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" data-reactid=".hbspt-forms-0">
  <div class="hs_email hs-email hs-fieldtype-text field hs-form-field" data-reactid=".hbspt-forms-0.1:$0"><label id="label-email-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" class="" placeholder="Enter your Enter your email"
      for="email-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" data-reactid=".hbspt-forms-0.1:$0.0"><span data-reactid=".hbspt-forms-0.1:$0.0.0">Enter your email</span><span class="hs-form-required" data-reactid=".hbspt-forms-0.1:$0.0.1">*</span></label>
    <legend class="hs-field-desc" style="display:block;" data-reactid=".hbspt-forms-0.1:$0.1">We respect your privacy. Unsubscribe instantly at any time.</legend>
    <div class="input" data-reactid=".hbspt-forms-0.1:$0.$email"><input id="email-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" class="hs-input" type="email" name="email" required="" placeholder="" value="" autocomplete="email"
        data-reactid=".hbspt-forms-0.1:$0.$email.0" inputmode="email"></div>
  </div>
  <div class="hs_newsletter__c hs-newsletter__c hs-fieldtype-select field hs-form-field" style="display:none;" data-reactid=".hbspt-forms-0.1:$1"><label id="label-newsletter__c-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" class=""
      placeholder="Enter your newsletter" for="newsletter__c-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" data-reactid=".hbspt-forms-0.1:$1.0"><span data-reactid=".hbspt-forms-0.1:$1.0.0">newsletter</span></label>
    <legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$1.1"></legend>
    <div class="input" data-reactid=".hbspt-forms-0.1:$1.$newsletter__c"><input name="newsletter__c" class="hs-input" type="hidden" value="I would like to receive the monthly newsletter." data-reactid=".hbspt-forms-0.1:$1.$newsletter__c.0"></div>
  </div>
  <div class="hs_lifecyclestage hs-lifecyclestage hs-fieldtype-radio field hs-form-field smart-field" style="display:none;" data-reactid=".hbspt-forms-0.1:$2"><label id="label-lifecyclestage-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" class=""
      placeholder="Enter your Lifecycle stage" for="lifecyclestage-7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" data-reactid=".hbspt-forms-0.1:$2.0"><span data-reactid=".hbspt-forms-0.1:$2.0.0">Lifecycle stage</span></label>
    <legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$2.1"></legend>
    <div class="input" data-reactid=".hbspt-forms-0.1:$2.$lifecyclestage"><input name="lifecyclestage" class="hs-input" type="hidden" value="subscriber" data-reactid=".hbspt-forms-0.1:$2.$lifecyclestage.0"></div>
  </div><noscript data-reactid=".hbspt-forms-0.2"></noscript>
  <div class="hs_submit hs-submit" data-reactid=".hbspt-forms-0.5">
    <div class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.5.0"></div>
    <div class="actions" data-reactid=".hbspt-forms-0.5.1"><input type="submit" value="Send" class="hs-button primary large" data-reactid=".hbspt-forms-0.5.1.0"></div>
  </div><noscript data-reactid=".hbspt-forms-0.6"></noscript><input name="hs_context" type="hidden"
    value="{&quot;rumScriptExecuteTime&quot;:1147.0999999046326,&quot;rumServiceResponseTime&quot;:1337,&quot;rumFormRenderTime&quot;:1.8999996185302734,&quot;rumTotalRenderTime&quot;:1339.7999997138977,&quot;rumTotalRequestTime&quot;:188.59999990463257,&quot;lang&quot;:&quot;en&quot;,&quot;embedType&quot;:&quot;REGULAR&quot;,&quot;renderRawHtml&quot;:&quot;true&quot;,&quot;embedAtTimestamp&quot;:&quot;1654511079128&quot;,&quot;formDefinitionUpdatedAt&quot;:&quot;1652966374423&quot;,&quot;pageUrl&quot;:&quot;https://linkurious.com/blog/how-to-track-and-visualize-data-lineage/&quot;,&quot;pageTitle&quot;:&quot;What is data lineage and how can graph analytics help track &amp; visualize it?&quot;,&quot;source&quot;:&quot;FormsNext-static-5.502&quot;,&quot;sourceName&quot;:&quot;FormsNext&quot;,&quot;sourceVersion&quot;:&quot;5.502&quot;,&quot;sourceVersionMajor&quot;:&quot;5&quot;,&quot;sourceVersionMinor&quot;:&quot;502&quot;,&quot;timestamp&quot;:1654511079128,&quot;userAgent&quot;:&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36&quot;,&quot;referrer&quot;:&quot;https://lemtrail.getlinkurious.com/&quot;,&quot;originalEmbedContext&quot;:{&quot;region&quot;:&quot;eu1&quot;,&quot;portalId&quot;:&quot;25115245&quot;,&quot;formId&quot;:&quot;7ddfdc26-7f3b-4b6c-b659-1de0fdf94276&quot;,&quot;target&quot;:&quot;#hubspotForm-newsletter-widget&quot;},&quot;renderedFieldsIds&quot;:[&quot;email&quot;],&quot;formTarget&quot;:&quot;#hubspotForm-newsletter-widget&quot;,&quot;correlationId&quot;:&quot;1b43a983-6d98-4972-beb3-04d304301e11&quot;,&quot;captchaStatus&quot;:&quot;NOT_APPLICABLE&quot;}"
    data-reactid=".hbspt-forms-0.7"><iframe name="target_iframe_7ddfdc26-7f3b-4b6c-b659-1de0fdf94276" style="display:none;" data-reactid=".hbspt-forms-0.8"></iframe>
</form>

Text Content

 * Solutions
   Use cases
    * Financial crime
    * Anti-fraud
    * AML
    * Security and Intelligence
    * Other
   
   Industry
    * Banking
    * Insurance
    * Government
    * Non-profit
   
   Introducing OpenScreening
   
   PEP & SANCTIONS SCREENING LIKE YOU’VE NEVER SEEN BEFORE.

 * Product
   Linkurious Enterprise
    * Product Overview
    * Detection
    * Case Management
    * Investigation
   
   Linkurious Enterprise Brochure
   
   THE INVESTIGATION PLATFORM THAT'S BOTH POWERFUL AND EASY TO USE

 * Why Linkurious
 * Resources
   Learn
    * Library
    * Blog
    * Documentation
   
   Discover
    * Anti-money laundering
    * Fraud
    * Graph analytics
   
   Connect
    * Partner program
    * Linkurious For Good
    * Developers
   
   Fraud Ebook
   
   OUTSMARTING FRAUDSTERS WITH NEXT-GENERATION TECHNOLOGY

 * Company
   Company
    * About us
    * Careers
    * Partners
   
   Linkurious for Good
   
   HELPING HEROES HELP ALL OF US

 * Connect with us

Connect with us Demo
Product


HOW TO TRACK AND VISUALIZE DATA LINEAGE

April 30, 2019
6mins

Back to blog

Data lineage is about tracking the flow of information. It is necessary to
guarantee the quality, usability and security of your data. For large
organizations, it is also a key conformity requirement. Unfortunately, many
organizations are missing this ability to connect data sources together because
of regulatory constraints, complex technology and scattered data.  

Webinar: how to track and visualization data lineage


WHAT IS DATA LINEAGE?

The success of an organization depends on the quality, usability and security of
its data. Want to provide amazing support to your customer? Create new products
and services? Respect legal requirements? The best companies approach these
issues in a data-driven way.

But when your management looks at the quarterly sales report, do you know
exactly what data they are looking at? Sometimes bad data can be more dangerous
than no data. That’s why data lineage is so important.

Data lineage is defined as a data life cycle that includes the data’s origins
and where it moves over time. For large organizations, that life cycle can be
quite complex as data flows from files, to databases or reports while going
through various transformation processes. Analyzing the data provenance of a
specific data point is very challenging.

Example of a real-life data pipeline at Pinterest.


Part of the issue is due to the limitations of the tools organizations are using
to map and track data lineage. Most of them are backed by Relational Database
Management Systems (RDBMS), database systems deployed in the 80’s to power
software applications. In RDBMS, the data architecture is tabular, with rows and
columns. This is well suited for operations where data is consistent and not
highly connected. But for connected data, these relational analysis tools have
some drawbacks. For instance:

 * querying connected data through SQL is a hard and error-prone process;
 * long processing time and low performance for questions that require looking
   up multiple connections (like getting the full data lineage of a given
   property);
 * it’s hard to accommodate an evolving data model in a relational database.

Graph databases are a perfect match for the challenges of data lineage. These
new type of databases emerged in the early 2000s to address the shortcomings of
relational systems.They came up with a new way of storing data: as a graph of
connected entities. There are some advantages to this approach:

 * it’s easy to model the flow of data in a graph;
 * you can query relationships with ease and in real-time;
 * a graph schema can evolve to accommodate new data and relationships.

In the next section, we detail how to use Linkurious Enterprise to build a
powerful and easy-to-use data lineage system on top of Neo4j, the leading graph
database system.


USING A GRAPH DATABASE TO POWER YOUR METADATA MANAGEMENT

To build an effective data lineage system, it is necessary to map the various
data elements and the processes or algorithms they go through. To be thorough,
we’d have to track the files, the tables, views, columns and reports in
databases, the ETL jobs, etc.

For the purpose of clarity, we have prepared a small dataset that focuses on
four types of entities: the metadata, the systems, the processes, and the
reports. We modeled our data as a graph, as depicted below.

Data lineage model.


Metadata (blue nodes) summarizes basic information about data. It can be, for
example, the column name is a database and its type. Metadata can flow through a
process (red node) such as an ETL job, a SQL query or program code to another
metadata. It is stored in a system (yellow node) like a database. Finally, it
can be used in a report (green node) a set of data accessible to end users
through a visual interface.

Having the data within Neo4j allows us to ask questions via Linkurious
Enterprise like “what is the data lineage of report y”. For that kind of query,
we can use Cypher, the Neo4j query language. The query below, for example, help
usto understand where the data from our sales report comes from:

// Data lineage pf the “Employee count” report MATCH (a)-[:FLOWS_TO*]->(b:REPORT
{name: ‘Employee_Count’}) RETURN a,b

That query will return all the entities which are involved in the report in
question.

Data lineage visualization


Here are a few other questions we can quickly answer using graph analytics:

 * is my database still being used in an important company process, or can I
   remove it?
 * what systems and reports would be impacted by a change in a particular
   process?
 * which data is used by whom?


GRAPH VISUALIZATION CAN HELP BUSINESS USERS INVESTIGATE DATA LINEAGE

A graph solution like Linkurious Enterprise sits on top of Neo4j. It gives
business users the ability to visualize and analyze data lineage to find answers
without the need for programming skills.

Within Linkurious Enterprise, you get access to full text search features to
look for any property or data element in the database through a search bar.

Within the interactive graph visualization interface, you can explore the graph
by expanding the relationships of your choice. It’s easy to drill down in the
data and find answers. That’s the difference between having a theoretical
capability of tracking data lineage and an analyst being able to quickly answer
a question regarding the provenance of his data with confidence.

For example, if I want to understand what data is used for my sales report I can
simply look up the report via the search bar and add it as a node to my
visualization. I can then explore its connections. In a few seconds I can find
out that the origin of my report is the order_total metadata stored in the
sales_db.

In our example, we worked with a sample dataset, but users can visualize graphs
with billions of nodes and edges in Linkurious Enterprise. The platform offers
advanced filtering options, letting you slice and dice the data to focus on
relevant pieces of information and answer crucial data lineage questions.


TRACK AND VISUALIZE DATA LINEAGE TODAY WITH LINKURIOUS ENTERPRISE

Approaching data lineage from the graph perspective is a way of tackling the
challenges faced by organizations. By bringing the data silos into an holistic
view of connected entities, graph technology Linkurious Enterprise helps
analysts take control of their data.

You can try Linkurious Enterprise now and extract new insights from your data!


Subscribe to our newsletter

A spotlight on financial crime, directly in your inbox.

Enter your email*We respect your privacy. Unsubscribe instantly at any time.

newsletter

Lifecycle stage


Share

Bringing criminal activity to light.
At Linkurious, we provide the next generation of detection and investigation
solutions to help teams of analysts and investigators to prevent even the most
sophisticated criminal networks from slipping through the cracks.
Follow us

Use cases Financial crime Anti-fraud AML Security and Intelligence Others
Industry Banking Government Insurance Non-profit
Product Linkurious Enterprise Overview Detection Investigation Case Management
Company About us Partners Careers
Learn Library Blog Documentation
Community Partner Program Linkurious for good Developers
Technology Neo4j Azure Cosmos DB Dataiku
Discover Anti-money laundering Fraud Graph analytics Why Linkurious
Connect Connect with us See a demo
Linkurious SAS © 2013-2022. All rights reserved. Privacy Policy

TOP

Axeptio consent