foresightbi.com.ng Open in urlscan Pro
166.62.10.136  Public Scan

URL: https://foresightbi.com.ng/microsoft-power-bi/dirty-data-samples-to-practice-on/
Submission: On June 27 via manual from US — Scanned from SG

Form analysis 4 forms found in the DOM

GET https://foresightbi.com.ng/

<form method="get" id="searchform" action="https://foresightbi.com.ng/">
  <i class="icon_search icon-search-fine"></i>
  <a href="#" class="icon_close"><i class="icon-cancel-fine"></i></a>
  <input type="text" class="field" name="s" placeholder="Enter your search">
  <input type="submit" class="submit" value="" style="display:none;">
</form>

GET https://foresightbi.com.ng/

<form method="get" id="searchform" action="https://foresightbi.com.ng/">
  <i class="icon_search icon-search-fine"></i>
  <a href="#" class="icon_close"><i class="icon-cancel-fine"></i></a>
  <input type="text" class="field" name="s" placeholder="Enter your search">
  <input type="submit" class="submit" value="" style="display:none;">
</form>

GET https://foresightbi.com.ng/

<form id="side-form" method="get" action="https://foresightbi.com.ng/"><input type="text" class="field" name="s" placeholder="Enter your search"><input type="submit" class="submit" value=""
    style="display:none;"><a class="submit" href="#"><i class="icon-search-fine"></i></a></form>

POST //translate.googleapis.com/translate_voting?client=te

<form id="goog-gt-votingForm" action="//translate.googleapis.com/translate_voting?client=te" method="post" target="votingFrame" class="VIpgJd-yAWNEb-hvhgNd-aXYTce"><input type="text" name="sl" id="goog-gt-votingInputSrcLang"><input type="text"
    name="tl" id="goog-gt-votingInputTrgLang"><input type="text" name="query" id="goog-gt-votingInputSrcText"><input type="text" name="gtrans" id="goog-gt-votingInputTrgText"><input type="text" name="vote" id="goog-gt-votingInputVote"></form>

Text Content

 * Contact
 * +2347080579790
 * info@foresightbi.com.ng

 * 
 * 
 * 
 * 
 * 
 * 
 * 


 * Home
 * About Us
   * Our Mission
   * Our Vision
   * Clients
 * Classroom Training
   * Courses
     * Power BI
     * SQL
     * Excel
   * Gallery
   * Power BI Bootcamp
   * Register
 * Online Training
 * Blog
   * Guest Blog
 * Tips & Tricks
 * Videos
 * Consulting
 * Contact Us

 * Designed By Dot Dauntless






DIRTY DATA SAMPLES – GET YOUR HANDS DIRTY CLEANING DATA

 * Home
 * Blog
 * MICROSOFT POWER BI
 * Dirty Data Samples – Get Your Hands Dirty Cleaning Data

Published by Ahmed Oyelowo on May 10, 2020

Categories
 * MICROSOFT POWER BI

Tags
 * data sample
 * dirty data
 * Power BI



Update: I have created a playlist of suggested solutions to the dirty datasets
provided in this article. I used Microsoft Power Query on Excel. To Learn more
about Power Query, get this excellent book by the best Power Query resource in
the world Power Query for Power BI and Excel.

The only guaranteed way to become better at cleaning dirty data is to avoid
getting it. No, that is the biggest lie of the millennium. The opposite is the
case. The only way to get better at preparing and cleaning dirty data is to
clean a variety of them.

The problem, however, is to find a guaranteed source with lots of different
dirty data cases for practice.

The objective of this article is to create a bank of different dirty data types,
mostly simulated with real life scenarios of what I have encountered. I always
say, data can be dirty in several millions of ways and you will never see it
all.

I generally classify dirty data into 2 categories: Structure Dirty and Content
Dirty. You should know there can be a third one, which is both dirty in
structure and in content. A while ago, I wrote about how to clean dirty data,
you should check it out to catch some tips before diving into this dirty ocean.
The compilation I have here can be downloaded as Excel Files. Each workbook
contains a worksheet for the raw dirty data and a second worksheet for a sample
of the target solution.

Here you go:

Contents [show]

 * 1. Badly Structured Sales Data 1
 * 2. Badly Structured Sales Data 2
 * 3. Badly Structured Sales Data 3
 * 4. Badly Structured Sales Data 4
 * 5. Jumbled Customer Details
 * 6. Medicine Data With Combined Quantity and Measure
 * 7. Hospital Data With Mixed Numbers and Characters
 * 8. Invoices With Merged Categories and Merged Amounts


1. BADLY STRUCTURED SALES DATA 1

Try re-arranging this data into the correct four columns. There has been a mix
of rows and columns everywhere. Also, watch out for Grand Totals and Sub Totals,
you do not need those in clean data.

Badly Structured Sales Data 1

Download this data here



2. BADLY STRUCTURED SALES DATA 2

This is pretty like number 1 above, with a different flavor. It has a date
column and does not include totals.

Badly Structured Sales Data 2

Download this data here


3. BADLY STRUCTURED SALES DATA 3

 Try re-arranging this data into the correct five columns. Again, you should
watch out for Totals.

Badly Structured Sales Data 3

Download this data here


4. BADLY STRUCTURED SALES DATA 4

Very similar to number 3 above and with a little different flavor as well.

 

Badly Structured Sales Data 4

Download this data here


5. JUMBLED CUSTOMER DETAILS

 We see this one often when you download or copy something from the web. You
should separate the different data categories into separate columns.



 

Jumbled Customer Details

Download this data here


6. MEDICINE DATA WITH COMBINED QUANTITY AND MEASURE

Going by clean data rules, you should have every field/column represent unique
things. So split the combined Quantity and Measure on this data into separate
columns/fields. When you are done, your Quantity column should sum up to
17,600.00. You will find this total on the clean worksheet once you download.


 

Medicine Data With Combined Quantity and Measure

Download this data here


7. HOSPITAL DATA WITH MIXED NUMBERS AND CHARACTERS

This data was collected by non-data-centric professionals. They have sometimes
used letters in place of some numbers. Like using letter S in place of Number 5.
When you are done with this, your numbers should sum up to that shown on the
clean data from the download.

Hospital Data With Mixed Numbers and Characters

Download this data here


8. INVOICES WITH MERGED CATEGORIES AND MERGED AMOUNTS

Because a single transaction (identified with an order id ) has multiple items
purchased, who ever captured this data decided to create a single row for each
order, thereby lumping the different items purchased and the amounts together
into 2 fields respectively.

The better thing to do is to let each item purchased be on a single row with the
amount. It is better to repeat the Order IDs on different rows than lumping up
amounts in a single cell. We would be analyzing items bought and amounts a lot,
we need them separated into rows.

Invoices With Merged Categories and Merged Amounts

Download this data here

If you have some samples, please do well to send them in to
Info@foresightbi.com.ng and we’ll add them to the bank. Just send in a brief
description of the data like I have done above. You can also include your social
media links if you do not mind a proper mention.

Up your Data Hygiene Skills. Cheers.

 

Join is for the next Power BI Bootcamp, where we teach using the DA100 Power BI
Certification curriculum.

 


2.3k Shares
Share
Tweet
Share
Share
Share
Share
Share
Email
Share


AHMED OYELOWO

Follow Me On LinkedIn

RELATED POSTS

June 9, 2023

KEY DAX CONCEPTS YOU SHOULD KNOW (PART1)

--------------------------------------------------------------------------------

Read more
March 6, 2023

POWER BI DEVELOPER-THE SECRET TECH ROLE YOU HAVE NOT FOUND OUT

--------------------------------------------------------------------------------

Read more
October 4, 2022

HIDE TOTAL FOR SELECTED COLUMN IN POWER BI TABLE

--------------------------------------------------------------------------------

Read more




AHMED OYELOWO
(MVP, MCSA, MCT, AFM)




ENROLL NOW @ UDEMY

SEARCH



RECENT

 * 0
   
   KEY DAX CONCEPTS YOU SHOULD KNOW (PART1)
   
   June 9, 2023
 * 0
   
   POWER BI DEVELOPER-THE SECRET TECH ROLE YOU HAVE NOT FOUND OUT
   
   March 6, 2023
 * 0
   
   HOW TO DEAL WITH NULL IN POWER QUERY
   
   December 26, 2022
 * 1
   
   TABLE HEAT MAPS AS ALTERNATIVE TO LINE CHARTS?
   
   December 23, 2022
 * 0
   
   IS AVERAGE THE RIGHT DAX FUNCTION FOR AVERAGE?
   
   December 16, 2022



POWER BI TRAINING BOOTCAMP






TAGS

 * Ahmed Oyelowo
 * ALL
 * Anjola Jimoh
 * AVERAGE vs MEDIAN
 * Bsest PL-300 Material
 * Business Intelligence
 * Calculated Column
 * Calculated Table
 * Charts
 * DA100


© 2023 Foresight BI - Achieve More With Less. All Rights Reserved.
 * Designed By Dot Dauntless

 * 
 * 
 * 
 * 
 * 
 * 
 * 


Translate »

 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 

▼
Scroll Up

//]]>

原文

请对此翻译评分
您的反馈将用于改进谷歌翻译