www.hackerfactor.com
Open in
urlscan Pro
2606:4700:3035::6815:43f9
Public Scan
URL:
https://www.hackerfactor.com/blog/index.php?/archives/552-Deep-Dive.html
Submission: On November 22 via manual from US — Scanned from US
Submission: On November 22 via manual from US — Scanned from US
Form analysis
1 forms found in the DOMPOST /blog/index.php?archives/552-Deep-Dive.html#feedback
<form id="serendipity_comment" action="/blog/index.php?archives/552-Deep-Dive.html#feedback" method="post">
<div><input type="hidden" name="serendipity[entry_id]" value="552"></div> <br><b>Code of conduct</b>
<ul>
<li class="serendipity_commentsLabel">Name calling and anti-social comments will not be posted. </li>
<li class="serendipity_commentsLabel">Comments must be related to the topic. Unrelated comments will not be posted. Make sure you are submitting your comment to the correct blog entry; Yes, people have submitted great comments to the wrong blog
entries. </li>
<li class="serendipity_commentsLabel">Comments should be rational and logical, citing findings as appropriate. </li>
<li class="serendipity_commentsLabel">Opinions and speculations are desired and welcome, but if they are represented as fact then they may be moderated or censored. </li>
<li class="serendipity_commentsLabel">The moderator reserves the right to end tangential discussions and censor offensive or inappropriate content.</li>
</ul>
<table border="0" width="100%" cellpadding="3">
<tbody>
<tr>
<td class="serendipity_commentsLabel"><label for="serendipity_commentform_name">Name</label></td>
<td class="serendipity_commentsValue"><input type="text" id="serendipity_commentform_name" name="serendipity[name]" value="" size="30"></td>
</tr>
<tr>
<td class="serendipity_commentsLabel"><label for="serendipity_commentform_email">Email</label></td>
<td class="serendipity_commentsValue"><input type="text" id="serendipity_commentform_email" name="serendipity[email]" value=""></td>
</tr>
<tr>
<td class="serendipity_commentsLabel"><label for="serendipity_commentform_url">Homepage</label></td>
<td class="serendipity_commentsValue"><input type="text" id="serendipity_commentform_url" name="serendipity[url]" value=""></td>
</tr>
<tr>
<td class="serendipity_commentsLabel"><label for="serendipity_replyTo">In reply to</label></td>
<td class="serendipity_commentsValue">
<select id="serendipity_replyTo" onchange="" name="serendipity[replyTo]">
<option value="0">[ Top level ]</option>
<option value="2355">#1: Dr. Neal Krawetz on 2013-05-23 22:23</option>
<option value="4810"> #1.1: Carl Seibert on 2021-12-07 12:02</option>
<option value="4811"> #1.1.1: Dr. Neal Krawetz on 2021-12-07 14:33</option>
<option value="2357">#2: John H. on 2013-05-24 07:12</option>
<option value="2363">#3: Phil Harvey on 2013-05-28 05:41</option>
<option value="2364"> #3.1: Dr. Neal Krawetz on 2013-05-28 06:08</option>
<option value="2372">#4: Stephen Fischer on 2013-05-31 22:16</option>
<option value="2704">#5: FLSqueezed on 2014-03-29 08:33</option>
<option value="2953">#6: LH on 2015-01-17 02:15</option>
<option value="2954"> #6.1: Dr. Neal Krawetz on 2015-01-17 03:04</option>
<option value="3121"> #6.1.1: Greg on 2015-12-05 10:10</option>
<option value="3122"> #6.1.1.1: Dr. Neal Krawetz on 2015-12-05 11:03</option>
<option value="3260"> #6.1.1.1.1: Cameron on 2016-06-07 16:22</option>
<option value="3295">#7: DerAblichter on 2016-07-17 12:32</option>
<option value="3296"> #7.1: DerAblichter on 2016-07-18 06:25</option>
<option value="3907">#8: MOna on 2017-12-13 14:01</option>
<option value="3947">#9: Doug Carner on 2018-02-15 20:14</option>
<option value="4757">#10: James Jenbg on 2021-08-21 08:28</option>
<option value="4759"> #10.1: Dr. Neal Krawetz on 2021-08-21 09:07</option>
<option value="5042">#11: Adele Myers on 2023-03-05 02:37</option>
</select>
<script>
var V = '(/td)(/tr)(tr) \
(td class="serendipity_commentsLabel")(label for="serendipity_commentform_comment")Comment(/label)(/td) \
(td class="serendipity_commentsValue")';
V = V.replace(/[(]/g, unescape("%3c"));
V = V.replace(/[)]/g, unescape("%3e"));
document.write(V);
</script>
</td>
</tr>
<tr>
<td class="serendipity_commentsLabel"><label for="serendipity_commentform_comment">Comment</label></td>
<td class="serendipity_commentsValue">
<textarea rows="10" cols="40" id="serendipity_commentform_comment" name="serendipity[comment]"></textarea><br>
<div class="serendipity_commentDirection serendipity_comment_s9ymarkup">Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.</div>
<div class="serendipity_commentDirection serendipity_comment_emoticate">Standard emoticons like :-) and ;-) are converted to images.</div>
<div class="serendipity_commentDirection serendipity_comment_spamblock">E-Mail addresses will not be displayed and will only be used for E-Mail notifications.</div>
<script>
var V = '(/td) \
(/tr) \
\
\
(tr) \
(td) (/td) \
(td class="serendipity_commentsLabel") \
(input id="checkbox_remember" type="checkbox" name="serendipity[remember]" /)(label for="checkbox_remember")Remember Information? (/label) \
\
(/td) \
(/tr) \
\
\
\
(tr) \
(td class="serendipity_commentsValue serendipity_msg_important" colspan="2")Submitted comments will be reviewed by moderators before being displayed.(/td) \
(/tr) \
\
\
(tr) \
(td) (/td) \
(td)(input type="submit" name="serendipity[submit]" value="Submit Comment" /) (input type="submit" id="serendipity_preview" name="serendipity[preview]" value="Preview" /)(/td) \
(/tr) \
(/table) \
(/form) \
(/div)';
V = V.replace(/[(]/g, unescape("%3c"));
V = V.replace(/[)]/g, unescape("%3e"));
document.write(V);
</script>
</td>
</tr>
<tr>
<td> </td>
<td class="serendipity_commentsLabel"> <input id="checkbox_remember" type="checkbox" name="serendipity[remember]"><label for="checkbox_remember">Remember Information? </label> </td>
</tr>
<tr>
<td class="serendipity_commentsValue serendipity_msg_important" colspan="2">Submitted comments will be reviewed by moderators before being displayed.</td>
</tr>
<tr>
<td> </td>
<td><input type="submit" name="serendipity[submit]" value="Submit Comment"> <input type="submit" id="serendipity_preview" name="serendipity[preview]" value="Preview"></td>
</tr>
</tbody>
</table>
</form>
Text Content
The Hacker Factor Blog At the advice of my attorney, I decline to answer. Home Blog Swag ABOUT Dr. Neal Krawetz writes The Hacker Factor Blog. Follow him on Mastodon. POPULAR POSTS • How Not to do Image Analysis Part I and Part II • Looks Like It • Body By Victoria • Direct Deposit, Direct Withdrawl • Point-of-Sale Vulnerabilities TOOLS • FotoForensics: Test your own photos. • Hintfo: View metadata. • Gender Guesser: Use your words. LINKS Security Internet Storm Center Krebs on Security Bruce Schneier Tao Security Images Photo Stealers Awkward Family Photos Unsplash Debunking News iMediaEthics Poynter Debunking Politics FactCheck PolitiFact Debunking Other Snopes Math with Bad Drawings CALENDAR « November '24 » S M T W T F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ARCHIVES * November 2024 * October 2024 * September 2024 * Recent... * Older... FEEDS * RSS 1.0 feed * RSS 2.0 feed CATEGORIES * Conferences * Copyright * Financial * Forensics * Authentication * FotoForensics * Image Analysis * IoT * Mass Media * Network * Honeypot * Tor * Phones * Politics * Privacy * Programming * AI * Security * Terrorists * Travel * Unfiction * [Other] All categories DEEP DIVE THURSDAY, 23 MAY 2013 I learned yesterday that, after writing my rebuttal, Dr. Hany Farid began to go on tour. He gave an interview with Wired in which he repeated his claim that I do not understand XMP metadata. He further mentioned a communication that FourAndSix had with me prior to their report, in which Kevin Connor repeatedly tried to convince me that I was wrong, but his samples failed to support his claim. In this blog entry, I'm going to go over the XMP data that I summarized earlier in extreme detail and show how I reached my conclusion. I will follow it by showing how FourAndSix were unable to convince me that I am wrong. NOTE: Compared to my other blog entries, this is an overly technical entry. Regular readers may not be able to follow all of it, but I'm certain that techies will enjoy the detailed walk-through. ACQUIRING THE XMP DATA Before we begin, let's set a basic assumption: assume that the data isn't tampered or edited. This assumption allows us to interpret everything at face value. The image we are analyzing is at FotoForensics. FotoForensics does not alter the original uploaded data, and the filename is the file's sha1 checksum and length: image. To download the image, go to the bottom of the page and click on the 'Source' link. After you download the picture, verify the sha1 checksum and length. Next, we need to extract the XMP data. There are automated tools for analyzing metadata, but most of them reformat the information or add/remove content. For example, ExifTool is a great analysis program, but it reformats the XMP information. Using ExifTool to extract the XMP data will rewrite hashes. > exiftool -tagsfromfile 7d72b2ba004477f4e45203770d7c08392f461a69.274701.jpg > data.xmp Update: ExifTool's Phil Harvey wrote in with the magic incantation to extract xmp data using ExifTool: > exiftool -xmp -b 7d72b2ba004477f4e45203770d7c08392f461a69.274701.jpg > > data.xmp Since we want to see the original data, we will be doing the extraction by hand. To see the XMP data, you can either use Photoshop, dd, or strings. With Photoshop: load the image, then go to the File menu, select "File Info", and then open the 'Raw Data' tab. If you don't have a 'Raw Data' tab, then search around the window for an option to enable it. Keep in mind, Photoshop reformats the XMP data. The 'Raw' view isn't actually the raw XML; it is the XML after being formatted, potentially rearranged, and potentially altered. For unix people, the 'dd' command is the best option for extracting the actual data. The command is 'dd bs=1 if=7d72b2ba004477f4e45203770d7c08392f461a69.274701.jpg of=data.xmp skip=718 count=3827'. However, my preference when doing this by hand is to just use 'strings' to extract the raw data. XMP is just an XML text block, so the 'strings' command properly extracts the data. We can then go in and delete everything before the first '<?xml' and last '<?xpacket end="r"?>'. In this case, the real XMP data (not formatted by Photoshop) has no newlines, so we can pretty up the format using: > xmllint -format data.xmp > data-formatted.xmp This alters the formatting for readability, but not the content or record ordering. TRACKING XMP SOURCES Usually when evaluating files, there's a basic belief that consistent tools generate consistent formats. However, that's not the case with Adobe. There is no consistent layout for an XMP record -- it all depends on the library that generated or appended to it. What's worse is that Adobe doesn't even know "which library" is used. This is because their code ships with multiple library versions. For example, my Mac has CS5 installed. Bridge CS5 contains a shared library (/Applications/Adobe Bridge CS5/Adobe Bridge CS5.app/Contents/Frameworks/AdobeXMP.framework/Versions/A/AdobeXMP). Adobe Photoshop CS5 has two shared libraries for XMP (AdobeXMP and AdobeXMPFiles), Adobe Captivate 5 has three libraries, etc. What's worse is that these libraries don't even need to be the same. In my case, the "AdobeXMP" library for Bridge CS5 is different from the "AdobeXMP" library for Photoshop CS5. Depending on your installation path, software, and patches, everything can be different. What this means: If you use the Adobe Bridge and Adobe Photoshop, then the XMP data may be generated by any of three potentially different XMP libraries. This problem is actually a little worse because these are shared libraries. There is a chance that the first one loaded wins -- the order that the applications are used may alter the XMP content's format. Different Adobe XMP libraries have different output formats and different bugs. I've been slowly mapping format artifacts to versions, but that's a story for some other day. The main thing to keep in mind is that all XMP formatting is effectively arbitrary. In general, the XML format permits keys/value pairs to be listed per tag, or together as attributes in a tag. For example: <tag field=value /> can also be written as <tag><field>value</field></tag>. In the world of XMP, these are functionally equivalent. Since we don't know the software and patch levels on the photographer's computer, we don't really care about the overall layout. The only important aspects are the fields, values, and XML nesting. Don't get caught up in the fact that the first block in the raw data uses lots of field=value attributes, and the other blocks use lots of <field>value</field> entries. BEGINNING THE EVALUATION All XMP records begin the same way: <?xpacket begin=".." id="W5M0MpCehiHzreSzNTczkc9d"?>. The ".." is some binary data used for determining endian for multi-byte text, but there's no multi-byte text in this file. The "W5M0MpCehiHzreSzNTczkc9d" is a unique key used by every XMP file as a "magic signature"; it identifies this record as an XMP record. After this header comes the data. In the raw XMP, all data is stored in an XMP "rdf:Description" block. It describes where the file came from and the sources that led to it. Some of this XMP data is inherited from other metadata fields in the file, including the original EXIF data. This record contains things like the type of lens (aux:Lens="EF16-35mm f/2.8L II USM") and information about the flash ('aux:FlashCompensation="0/1"' means that no flash was used). The full intro looks like: > <rdf:Description > xmlns:photomechanic="http://ns.camerabits.com/photomechanic/1.0/" > xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/" > xmlns:dc="http://purl.org/dc/elements/1.1/" > xmlns:xmp="http://ns.adobe.com/xap/1.0/" > xmlns:aux="http://ns.adobe.com/exif/1.0/aux/" > xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" > xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#" > xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#" > xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/" rdf:about="" > photomechanic:HasCrop="False" photomechanic:Prefs="1:0:0:005403" > photomechanic:PMVersion="PM4" > photoshop:LegacyIPTCDigest="435D74FBC12C4007083CF8F390DA6484" > photoshop:Country="Palestinian Territories" > photoshop:DateCreated="2012-11-20T09:39:38+01:00" dc:format="image/jpeg" > xmp:ModifyDate="2013-02-15T11:55:30+01:00" > xmp:CreateDate="2012-11-20T09:39:38" > xmp:MetadataDate="2013-02-15T11:55:30+01:00" xmp:CreatorTool="Adobe Photoshop > CS6 (Macintosh)" xmp:Rating="0" aux:SerialNumber="013021001346" > aux:LensInfo="16/1 35/1 0/0 0/0" aux:Lens="EF16-35mm f/2.8L II USM" > aux:LensID="246" aux:LensSerialNumber="0000400fe1" aux:ImageNumber="0" > aux:ApproximateFocusDistance="163/100" aux:FlashCompensation="0/1" > aux:Firmware="1.1.3" > xmpMM:DocumentID="xmp.did:81D9BBB16F1211E2B21DD3F6B94651E8" > xmpMM:OriginalDocumentID="11CD104525F505861ED0EC6DAC391558" > xmpMM:InstanceID="xmp.iid:81D9BBB06F1211E2B21DD3F6B94651E8" > xmpRights:Marked="False"> (For XML users, you'll notice that this is just the opening tag. The close is at the end of the file.) The header identifies the creator tool as "Adobe Photoshop CS6 (Macintosh)". Since we have the basic assumption that the file has not been tampered in an effort to throw off a forensic investigation, we can assume that all artifacts in the XMP record are specific to XMP libraries found on this platform. The most important records in this header are the IDs: > xmpMM:DocumentID="xmp.did:81D9BBB16F1211E2B21DD3F6B94651E8" > xmpMM:OriginalDocumentID="11CD104525F505861ED0EC6DAC391558" > xmpMM:InstanceID="xmp.iid:81D9BBB06F1211E2B21DD3F6B94651E8" Adobe's XMP format maintains two types of IDs: Document ID (DID) and Instance ID (IID). The DID is created once per file. Each time you use "Save As", a new DID is assigned. But simply hitting save (after the first save) does not alter the DID. In contrast, the IID is updated each time you hit "Save" -- indicating another instance of the file. If you save a picture, open it, and continue editing, then the IID will be updated but the DID will not. The DID only changes when you hit "Save As" (or Save For Web or Export... anything that creates a new file). Every file should have a DID that identifies the direct base and an IID that reflects the saved instance. The XMP typically records the IID history as a series of timestamped events. (Notice that I say "typically" -- since XMP libraries differ, some don't timestamp.) The other thing to notice is right-half of the long random hexadecimal value. CS6 for the Mac (Intel architecture) first generates the value when the program is started. Other than that, CS6 increments one byte. Usually this is the first byte, but sometimes this the 4th byte. (It depends on which XMP library is called.) With Photoshop CS6 for the Mac, opening a new file will partially randomize the left-half, but not the entire sequence. Typically the initial IID and DID values differ by an incremental value, but sometimes they are the same (it just depends on which XMP library created them). In this case, the DID and IID are incremental at the 4th byte: DID=81D9BBB16F1211E2B21DD3F6B94651E8 and IID=81D9BBB06F1211E2B21DD3F6B94651E8. Since they are incremental, we know that they were created at the same time, during the first save of this file. In effect, we know the user did a "Save As" and not just a "Save". (Well, a "Save" for the first time may bring up the "Save As" dialogue window. But subsequent saves will just overwrite the file, retaining the DID and updating the IID.) The other field is the "Original Document ID" (ODID). When you open a file that has an XMP record, it inherits the DID. Doing a "Save As" generates a new DID. The ODID holds the value of the previous DID. This is very explicit: it tell us that the user had edited the file, saved it, opened it, and then did a "Save As". (We'll see this same sequence in the History block in a moment.) ANCESTORS The next section in the XMP record is the Document Ancestor block: > <photoshop:DocumentAncestors> > <rdf:Bag> > <rdf:li>xmp.did:068011740720681180A9CEE8487CF300</rdf:li> > <rdf:li>xmp.did:0A8011740720681180A9CEE8487CF300</rdf:li> > <rdf:li>xmp.did:8F19CA801520681180A9CEE8487CF300</rdf:li> > <rdf:li>xmp.did:9119CA801520681180A9CEE8487CF300</rdf:li> > </rdf:Bag> > </photoshop:DocumentAncestors> According to the XMP specifications (search Google for "XMP Specifications Part" -- there are three parts), the Document Ancestors denote "copy-and-paste or place" operations. These do not identify what was incorporated into the file -- it could be an entire picture or a portion of a picture. We only know that these four separate files were incorporated into an existing file. These records identify other documents (DID) that were added to this document. This is explicitly the definition of a composition: a picture made from other pictures. I think it is safe to assume that the four documents are different -- either in coloring or content. This is a pretty safe assumption since it is unlikely that the artist would save four copies of the same document and then incorporate all four identical files. Since the right-side of these hex sequences are identical, it implies that they were all from the same instantiation of the Adobe program. We don't know what program created these, but we do have a strong reason to believe that the sequence of events was as follows: 1. An Adobe program was started and opened an image. This initialized the common DID bytes. 2. The Adobe program did a "Save As" operation. This generated the "0680117..." DID file. Since CS6 increments -- and we have no reason to suspect anything other than CS6 -- we even know that the IID for that file is likely "0780117...". (Could be "05...", depending on the library, but in this case, it is likely "07".) 3. The next DID begins with "0A80117...". So what happened to "08" and "09"? The user may have hit "Save" twice, or may have done a "Save As" (consuming two IDs) to a file that was not used as an ancestor to this file. (Foreshadowing: We'll actually see "09" in the next block; it's from a "Save As".) 4. The user did not close the program. He just did another "Save As", generating DID "0A8117..." and we can assume that it had IID "0B8117...". Keep in mind, we have no idea how much time has passed or what else the user did to the picture. We only know that there was another "Save As". 5. Then the left-hand sequence changed. As already mentioned, this means that the user opened a document. We don't know if the document represents the same picture. We don't even know if it was related. Seriously, just opening a document will randomize the left side. So we don't know what happened between 0A80117 and 8F19CA. We just know that the user did a "Save As", generating DID "8F19CA8..." and we know the IID would likely be "9019CA8...". 6. The user did one more "Save As", generating the next sequential IDs: DID "9119CA8..." with IID likely "9219CA8...". (Technical note: The Document Ancestors is supposed to be an unsorted array. However, I've only seen it as sorted in the order of events. Assuming that it is unsorted, we still know that "0680117..." came before "0A80117..." and "8F19CA8..." came before "9119CA8..." due to incremental sequencing.) The one thing the XMP record does not tell us is what was in these files. Each could be the entire original image. Each could be colorized differently. Each could be a selection of parts from the file. In fact, the user could have opened a completely different file and pasted from it. The only thing we do know is that (1) there are four independent documents (as defined by Adobe), and (2) they were combined into a picture to form the final image. We also know one more thing: We know the order of events. The user started an Adobe product and created these four ancestors. He then closed the Adobe product (or ran a completely different Adobe product) and started creating a file. He then closed that application, generating the ODID (which, at the time, was assigned as the DID). He then opened the file and did a "Save As", generating the final DID and demoting the old DID to the Original Document ID. We know this, because the right-side of the ancestor IDs are different from the header IDs -- and that only seems to happens when the program is restarted. In contrast, if the user had closed all files -- but not closed the program -- and opened a different file, then the right-side would remain the same and the left-side (at least the first 8 bytes) would be different. HISTORY RECORDS The next section is the "History" record. This identifies what happened with this specific document. It's essentially a timestamped, ordered array: > <rdf:li stEvt:action="saved" > stEvt:instanceID="xmp.iid:A29730BC0A2068119EE9AF3C2BE2913F" > stEvt:when="2012-11-20T17:19:09+01:00" stEvt:softwareAgent="Adobe Photoshop > Camera Raw 7.1 (Macintosh)" stEvt:changed="/metadata"/> > <rdf:li stEvt:action="saved" > stEvt:instanceID="xmp.iid:098011740720681180A9CEE8487CF300" > stEvt:when="2013-01-04T14:44+01:00" stEvt:softwareAgent="Adobe Photoshop > Camera Raw 7.1 (Macintosh)" stEvt:changed="/metadata"/> > <rdf:li stEvt:action="derived" stEvt:parameters="converted from > image/x-canon-cr2 to image/tiff"/> > <rdf:li stEvt:action="saved" > stEvt:instanceID="xmp.iid:8F19CA801520681180A9CEE8487CF300" > stEvt:when="2013-01-04T15:43:45+01:00" stEvt:softwareAgent="Adobe Photoshop > Camera Raw 7.1 (Macintosh)" stEvt:changed="/"/> > <rdf:li stEvt:action="saved" > stEvt:instanceID="xmp.iid:525849A00F206811822A94D83E08B11E" > stEvt:when="2013-01-04T16:08:44+01:00" stEvt:softwareAgent="Adobe Photoshop > CS6 (Macintosh)" stEvt:changed="/"/> > <rdf:li stEvt:action="converted" stEvt:parameters="from image/tiff to > image/jpeg"/> > <rdf:li stEvt:action="derived" stEvt:parameters="converted from image/tiff to > image/jpeg"/> > <rdf:li stEvt:action="saved" > stEvt:instanceID="xmp.iid:A0AEE3D11C206811822A94D83E08B11E" > stEvt:when="2013-01-04T16:08:44+01:00" stEvt:softwareAgent="Adobe Photoshop > CS6 (Macintosh)" stEvt:changed="/"/> > <rdf:li stEvt:action="saved" > stEvt:instanceID="xmp.iid:048011740720681180839DD19BA24E58" > stEvt:when="2013-02-15T11:23:04+01:00" stEvt:softwareAgent="Adobe Photoshop > CS6 (Macintosh)" stEvt:changed="/"/> Since the list is ordered, entries that are missing timestamps had to happen between the two dated elements. (I don't think it's documented, but I believe they are associated with the timestamp that comes after them.) This is the data that I previously, briefly summarized. 1. The first IID ends with "...2BE2913F". This sequence doesn't match anything that we have previously seen. It did not come from any of the ancestor documents. It did not come from the header's DID or ODID. So we explicitly know that another document exists (or existed) that had a DID end with "...2BE2913F". So here's what happened: The user started a file. It was assigned a DID. He closed the program, opened it again and did a "Save As", demoting the DID to an ODID. Then he did it again -- "Save As" created a new DID, the old DID is demoted to an ODID, and the old ODID is lost. We have no XMP record identifying the original DID from the first time the file was created, but we have this IID that represents that first iteration. The next thing this record tells us is that the IID was generated by "Adobe Photoshop Camera Raw 7.1". Camera Raw converts a deep-color image into an 8-bit deep image for Photoshop. This means that the first operation was a RAW image import into Photoshop. This means it is the whole picture, but XMP does not identify "which" picture. There are different ways to incorporate the converted camera raw picture into Photoshop. Most methods identify the "changed" record as "/", meaning the picture changed. However, sometimes it only changes "/metadata". As Adobe describes it, "When you use Camera Raw, the adjustments (or 'instructions') you make are stored as metadata." Don't assume that he only changed metadata; he likely changed the color since it came from Camera Raw. 2. The second IID is "098011740720681180A9CEE8487CF300". We've seen this before. This is the same "09" that I previously identified as a missing ancestor. Now we know: it isn't listed as an ancestor to this file because it is this file. In my previous, brief write-up, I commented that this is "typically seen when a picture is spliced from two sources." We know that there are multiple sources because of the Document Ancestor section. However, without pointing out the ancestors in my brief write-up, I can see how this would appear ambiguous. 3. The third IID is "8F19CA801520681180A9CEE8487CF300". This is the exact same as the DID found in a document ancestor. However, now it is assigned to an IID instead of a DID. Depending on how you save a Camera Raw converted image, Adobe may assign the DID and IID the same value. For example, if you open a RAW image in Camera RAW and click on "Open Image", then they are assigned incrementally different IID and DID values. However, if you modify a RAW colors and save the changes (by clicking on "Done"), then Adobe creates a separate ".xmp" file, which describes the changes without disrupting the original RAW file. This ".xmp" files does not contain a DID or IID, so one will be assigned when it is used. When the ".xmp" file is used, the same value is assigned to both the DID and IID. However, this may not be the only method for generating the same DID and IID values. Although the DID and IID values are the same, implying a basic color adjustment to a RAW image, it does not identify the source RAW file. We cannot identify which file was color adjusted, only that some file was likely color adjusted. Because this IID appears as an Ancestor, it means that it was included in this file. However, XMP doesn't identify when the ancestor was created or incorporated. Fortunately, this history record has a timestamp. Now we know: this file was saved on 2013-01-04 at 15:43:45 +01:00. Sometime after that timestamp, the file was re-incorporated into the file through a paste or place operation. We do not know if it was incorporated in whole or in part. In addition, since the change event is assigned to "/" (stEvt:changed="/"), we know that the picture changed. 4. The next IID is "525849A00F206811822A94D83E08B11E". We haven't seen the right-hand part before, so the user closed the program, started it, and hit "Save". However, we don't know what was done to the image beyond opening and hitting "Save". (Foreshadowing: remember that it records when he closed the program and then restarted it. That comes up again at the end of the XMP record.) 5. Then comes a conversion/derived to JPEG, followed by IID "A0AEE3D11C206811822A94D83E08B11E". Since the right-hand side is the same as the previous operation, we know that he didn't close the program. Since the left-side is very different, we know that he opened one or more other files. The History array is ordered, but the Ancestor list is not. We don't know when some of those paste operations happened, but since he opened other files, this seems like a great candidate for incorporating them. We know a few more things. Since this is the first (and only) series of conversions to JPEG, we know that this is the first time it was saved as a JPEG. These conversions are the first time we see an action by "Adobe Photoshop CS6", so this is the first actual save. And this is the last timestamp that pre-dates the contest submission. This likely represents the JPEG that he submitted. NOTE: I say "likely". We have no way of knowing if he had a completely different series of files that were actually submitted. But I'll get to why that is unlikely in a moment... 6. The final history is "048011740720681180839DD19BA24E58" and it happens after the winner was announced. Since the right-side is different from anything previously seen, we know that he closed the program and then started it up again. (That makes sense that he would not need to do edits until after the contest ends.) This was likely when he did the final image for public release. (And since I received it as a representation of the final winning image, this makes sense.) I had mentioned that the previous step likely represented the submitted content. This is because I don't think World Press Photo is stupid. If the winner turned in a significantly different picture for distribution after the contest, the judges would have likely noticed. We still have a few document ancestors that we cannot associate with any specific save operation. However, since the final image must look like the winning submission, we can assume that the ancestors were incorporated into the image no later than the conversion to JPEG. To reiterate: We have at least seven files. The base image, four ancestors that were added to it (including one that was a variant of a previous stage), the first picture saved as a JPEG, and the final JPEG. Moreover, we can directly account for three combination steps (the base, work before the known ancestor, and the work after the known ancestor). We can also account for at least two JPEG files: the first conversion to JPEG that predates the contest, and the file we are analyzing which comes right after the contest. DERIVED FROM The final XMP section identifies the "Derived From" records. According to Adobe's XMP specification, this is "a reference to the original document from which this one is derived." > <xmpMM:DerivedFrom stRef:instanceID="xmp.iid:048011740720681180839DD19BA24E58" > stRef:documentID="xmp.did:8F19CA801520681180A9CEE8487CF300" > stRef:originalDocumentID="11CD104525F505861ED0EC6DAC391558"/> This leads to a nice closed circle regarding the IDs: * The derived-from reference IID has been seen before -- it is the last history showing the final save. * The reference DID is the same as the ancestor that was created as a variant of this file. * The reference ODID matches the ODID seen in the header. This "derived from" record tells us that the JPEG we just analyzed isn't some arbitrary JPEG. It is based directly on the last JPEG that was listed in the History section. There is one little sticking point: why does the reference DID point to the saved DID seen in the history and in the document ancestor? As far as I can tell, there is only one way this can happen (there might be other ways; XMP does not record a complete history). In the fifth history step (history array item 8), we noted that he opened up a file -- so he could have opened a different previously-saved file. He then managed to include the same file back into itself, creating the one ancestor record. Any other way that I can think of would not retain the same history sequence. I fully expect critics to point out that I just confirmed: he copied the file back into itself. This is viewed as permitted HDR. However, that only accounts for one of four document ancestors. As I originally wrote in my brief report, he incorporated at least three other files. ARMCHAIR QUARTERBACKS A number of comments have voiced the opinion that there is nothing wrong with combining full versions of the entire image. This would be a global alteration and a manual step for performing high-dynamic range (HDR) imaging. However, there is nothing in the XMP data that identifies whole-picture incorporation. These could easily be partial picture overlays. The overlays could explain the difference in the compression ratios. It is also worth noting that a paste operation that contains different content would cause a compression difference, and even pasting the same content but having alignment off by a pixel (assuming a very large picture) would yield this result. A few people also commented that this could easily be performed in a darkroom. If we assume that all five images (base + four ancestors) were included in their entirety, then this identifies five global, independent operations -- not one visit to the darkroom. The XMP identifies a complex series of operations in Photoshop, which would be even more complex if it were performed a darkroom. A few people claimed that my conservative view would have banned people like Ansel Adams. However, Ansel Adams is known for his art photography. His works are on display in museums of fine art. In contrast, World Press Photo claims to be a contest for photo journalists. As journalists, they are not supposed to alter facts. If WPP is an art contest, then these modifications are fine. As a photo journalism contest, I have serious questions. However, WPP has announced and validated their winner. At this point, I would question their credibility if they recanted their decision. REGARDING FOURANDSIX In his interview with Wired and in his expert report summary hosted at World Press Photo, Dr. Hany Farid claimed that I did not understand how XMP records work. However, there is no indication that he noticed that the XMP record explicitly identifies multiple source files. Dr. Farid also mentioned a private communication (an email exchange). However, he was not included in the list of email recipients. The exchange was between his business partner, Kevin Connor, and myself. This exchange began the day before WPP announced the use of independent reviewers. As Dr. Farid said in the Wired interview, they privately tried to convince me of their position. Kevin Connor sent me some sample images, but the pictures failed to prove his point. In particular, he wrote: > No, I'm afraid you're mistaken about this metadata. You will *not* see this > happen if you open a new/different raw file. The portion of the metadata > you're looking at doesn't communicate any information whatsoever related to > potential compositing. As shown in this deep analysis, XMP information can record information about compositing; Kevin Connor is wrong in his conclusion. He also sent me two sample images that he claimed proved his point: NoEdits.jpg and SimpleComposite.jpg. He noted that there are ways to create a composite image that are not denoted in the XMP data. Each of his files only contains one "Adobe Photoshop Camera Raw 7.1 (Macintosh)" history record and no Document Ancestor records. The problem is that their tests did not demonstrate the approach that the photographer used to create the final image. (I typically keep private emails private. However, Dr. Hany Farid brought these up publicly in his interview with Wired.) Then again, the time between when World Press Photo (WPP) announced that they were conducting an investigation and when they published their results was measured in hours (5 hours). The time from when FourAndSix's Kevin Connor first contacted me and when WPP posted their results was about 24 hours, but that was before they were selected as reviewers. Kevin Connor informed me that they were selected as a reviewer about an hour after WPP announced the independent review. As Kevin Connor wrote: > Though I don't agree with your analysis of the World Press Photo winner, I was > avoiding making any public statements about that, because I thought it was > best to just share my concerns privately. However, we were contacted this > morning by the World Press Photo organization to provide our own analysis of > the photo. Of course, we have to share with them our honest opinion. Considering that a forensic write-up takes about two to three times longer than the actual evaluation, I can only assume that FourAndSix spent no more than an hour or two evaluating the metadata, the RAW image, and the contest submission. I suspect that their expert report was based on a precursory glance at the evidence, and their own incomplete understanding of the XMP format. (In all honesty, most people haven't taken the time to look that closely at library artifacts.) In his interview with Wired, Dr. Farid is also quoted as saying, "[Krawetz] claimed the date in the metadata showed it was morning. That's incorrect because he doesn't understand basic geometry." The metadata does not contain any geometry information. As seen in the header portion of the XMP data, the picture was reportedly taken on 2012-11-20 at 09:39:38+01:00. The last time I checked, 9:39am in GMT+01:00 was "morning" in Gaza (GMT+02:00). Dr. Hany Farid has chosen to make their misunderstanding of the XMP analysis public. FourAndSix did not identify the separate files that were combined to form the final composition, and they generated sample images that failed to demonstrate the methods used by the photographer. Usually Hany and Kevin do good work. I can only assume that a rushed schedule led to their oversight in identifying multiple source files and the composition method used by the photographer. Read more about Forensics, Image Analysis, Mass Media | Comments (20) | Direct Link Comments #1 Dr. Neal Krawetz (Homepage) on 2013-05-23 22:23 (Reply) For people who are keeping track, I changed a few words for clarity. For example, it previous said "all of the IDs are different", but I forgot to say what I was comparing. I corrected it to say that the ancestors are different from the header IDs. I also changed one instance of "analysis" to "summary" to better reflect my initial blog entry on this topic. I corrected one ID: the bold "90" was corrected to "91". I added the morning/timestamp information that Hany Farid misunderstood. I decided to not put in strike notations since this is a technical write-up and difficult to read as-is. #1.1 Carl Seibert on 2021-12-07 12:02 (Reply) Ah, the internet. Where violation lives on forever. I had forgotten this incident. You attacked a man without evidence of wrongdoing and damaged his ability to make a living. Just for the damned sport of it. I guess by this time you have successfully avoided being cleaned out down to your BVDs in a defamation action. When I spoke to your victim some years ago, he stated that he wanted to "put the incident behind him", terms we have become familiar with hearing from indecent assault victims. A pity. I would have loved to have seen some justice. You should show some decency and take this page down. The pedantic technical discussion of XMP metadata is fascinating (enough so that Google just returned it as I researched a technical issue) but it has nothing to do with the matter of whether the image in question was an honest depiction of events or the integrity of any of the real-world parties actually involved. This post's continued existence is just another example of the lasting hurt of cyberbullying. #1.1.1 Dr. Neal Krawetz (Homepage) on 2021-12-07 14:33 (Reply) Hello Carl Seibert, You wrote, "You attacked a man without evidence of wrongdoing and damaged his ability to make a living." Who are you alleging has been attacked? The photographer? I didn't tell him to submit an altered photo to a contest that claimed to only accept unaltered images. World Press Photo? They made a conscious decision to award their highest honor to an altered photo. When they learned of this mistake, they chose to not change the outcome. Instead, they introduced new steps to make sure this type of mistake doesn't happen again. Hany Farid? I didn't tell him to give an interview with Wired, where he did a personal attack and misrepresented the analysis. As far as I can tell, they are all still imployed. Nobody has lost their "ability to make a living." The issue here is that the photographer altered a picture and got caught. The contest permitted the alteration, amid a loud chorus that pointed out the edits gave the winner an unfair advantage. There were people already shouting about the edits before I got involved. The only difference is that I provided proof of the edits. Moreover, WPP brought in a direct contractor, direct service provider, and the contest chair's good friend -- while claiming that the reviewers were "independent" experts. You also wrote: "Just for the damned sport of it." I take photo analysis seriously. This entire controvery creates a great example for other people to learn how to do analysis. We learn from their examples so that these mistakes hopefully won't happen again. It's not a "sport" as you claim; it's an industry. #2 John H. (Homepage) on 2013-05-24 07:12 (Reply) Great write-up. I never knew that Photoshop stored that much info in the XMP data. #3 Phil Harvey on 2013-05-28 05:41 (Reply) To extract the original XMP with exiftool, do this: exiftool -xmp -b > data.xmp #3.1 Dr. Neal Krawetz (Homepage) on 2013-05-28 06:08 (Reply) Hi Phil, EXCELLENT! I was missing a command-line flag. That's much easier! #4 Stephen Fischer (Homepage) on 2013-05-31 22:16 (Reply) A very informative write-up Dr. Krawetz. I appreciate your blog and find it highly educational. Your efforts to help expose fraud with the number photographers that have slipped across that fuzzy boundary of proper ethics is applauded. With the advent of more advanced photo-processing tools, it is becoming easier for to perpetuate the type of manipulations you have pointed out with the latest World Press Photo award. Keep up the good work and help keep this field honest. #5 FLSqueezed on 2014-03-29 08:33 (Reply) I landed on this blog because I was curious as to how "identifying" those Document ID's were once I noticed them. I didn't understand what they were intended to identify, but now I do... so thanks for that. Am I correct in thinking that a person who is concerned about their anonymity should always close out photoshop between projects so that images appearing online don't have a sort of "XMP genetic profile" in common with each other? #6 LH (Homepage) on 2015-01-17 02:15 (Reply) Note that the DocumentID 81D9BBB16F1211E2B21DD3F6B94651E8 is actually in UUID format 1: split as 81D9BBB1-6F12-11E2-B21D-D3F6B94651E8, the variant code is B (B21D), meaning "UUID standard format", and the version code is 1 (11E2), meaning that the node ID D3F6B94651E8 is actually the MAC address of the computer that created the UUID. (11E => this image was created after 2010, see http://en.wikipedia.org/wiki/Globally_unique_identifier ). I don't know if identifying the Mac address will help your investigations, but maybe it's interesting. (InstanceIDs don't seem to follow this standard, they may be totally random, but I think document IDs or at least OriginalDoumentIDs do.) #6.1 Dr. Neal Krawetz (Homepage) on 2015-01-17 03:04 (Reply) Hi LH, Shhhhh! Don't make this public! Most people don't know about that and we don't want to tip off the bad guys on how to better hide their trail! I honestly thought that nobody would point that out. Or that the embedded MAC identifies a localhost multicast and not a specific hardware vendor -- which is indicative of an Adobe product. #6.1.1 Greg on 2015-12-05 10:10 (Reply) Hi Neal, What do you mean by "localhost multicast"? Do you have any resources about Adobe's UUID-generation algorithms? Thank's #6.1.1.1 Dr. Neal Krawetz (Homepage) on 2015-12-05 11:03 (Reply) Hi Greg, Shhh! We're still not making this public! See: https://en.wikipedia.org/wiki/MAC_address It's the last two bits of the first byte in the 6-byte MAC. Adobe usually sets the bits as 11 (locally administered, multicast). #6.1.1.1.1 Cameron on 2016-06-07 16:22 (Reply) so if for instance I wanted to glean the MAC address (if it were possible ;p) of this DocumentID: F236E0DF225811E69ABAE69C4975DBCA How would I go about that #7 DerAblichter on 2016-07-17 12:32 (Reply) The XMP DocumentID created by PS does not contain a MAC address, as statetd in one comment, at least not on a PC. It might be true for other documents / apps, not for PS - or might be true when PS was used on a MAC, but I doubt. Also DocumentIDs are different when the same RAW once was saved as PSD, TIF or JPEG (from the same PS session) If the RAW was openend again and again was saved as PSD, again the DocID is different, unique as it should *be*. What Hansen probably did (I have a JPG with metadata here, but no RDF) is saving several edited versions of the same RAW to blend them together (simple copyand paste). Which would explain the Ancestor Tags. If this is allowed I don't know. Anyway. You hardly get an effect like this just out of a RAW unless some is very good in using ACR, but what a different would it make, when it was achieved only by using ACR? See examples of DocIDs her http://bit.ly/29GcoPO Regarding date/time this seems to be correct time (UTC). But of course it depends on what the camera was set to, if it was Gaza time, CET or UTC we don't know. To avoid trouble while travelling, most prof set it to UTC. The timezone offset shown in IPTC (+01:00) means nothing, because it's either the timezone of the processing computer or it was set intentional, regardless if the DateTime-Created value in EXIF tag was right. Saying this I have to confess that I read all of this article. I will do later and see if you interpretation of Metadata is right. @DerAblichter IT specialist and photographer ps for those who like a GUI for ExifTool better, google for ExifToolGui #7.1 DerAblichter on 2016-07-18 06:25 (Reply) I wrote "Saying this I have to confess that I read all of this article." should be "that I didn't read all..." and with "I have a JPG with metadata here, but no RDF" was meant that the JPEG I have, has no LR/CRS Tags (lightroom or ACR) in XMP. Those tags show which modification were made directly in LR/ACR (saturation, sharpning, etc) But I was blind - of course they are there. #8 MOna on 2017-12-13 14:01 (Reply) Please tell me in simple word can you determine the origin computer where pdf file was created using XMP data? Can you determine the editors and comments on the basis of XMP data? In simple words what information can be obtained from PDF file. I can see author information but can data associated with PDF it tell more? Thank you. M #9 Doug Carner (Homepage) on 2018-02-15 20:14 (Reply) Neal, Just came across this writing. Excellent metadata analysis and great introduction to the XMP structure. Header/footer data can tell an amazing story once you know how to read the language. Thank you for sharing this. #10 James Jenbg on 2021-08-21 08:28 (Reply) What is the purl section for? #10.1 Dr. Neal Krawetz (Homepage) on 2021-08-21 09:07 (Reply) Hi James Jenbg, See: https://archive.org/services/purl/help #11 Adele Myers on 2023-03-05 02:37 (Reply) hi, not sure that I am in the right place here, but thought I would give it a go? a general inquiry about possible plagiarism hacks or sharing/buying projects that students might engage with adobe products and if there is any way to trace the original author of an after effects project, Illustrator and photoshop files. What metadata might be hidden for example and where to find it Add Comment Code of conduct * Name calling and anti-social comments will not be posted. * Comments must be related to the topic. Unrelated comments will not be posted. Make sure you are submitting your comment to the correct blog entry; Yes, people have submitted great comments to the wrong blog entries. * Comments should be rational and logical, citing findings as appropriate. * Opinions and speculations are desired and welcome, but if they are represented as fact then they may be moderated or censored. * The moderator reserves the right to end tangential discussions and censor offensive or inappropriate content. Name Email Homepage In reply to [ Top level ]#1: Dr. Neal Krawetz on 2013-05-23 22:23 #1.1: Carl Seibert on 2021-12-07 12:02 #1.1.1: Dr. Neal Krawetz on 2021-12-07 14:33 #2: John H. on 2013-05-24 07:12 #3: Phil Harvey on 2013-05-28 05:41 #3.1: Dr. Neal Krawetz on 2013-05-28 06:08 #4: Stephen Fischer on 2013-05-31 22:16 #5: FLSqueezed on 2014-03-29 08:33 #6: LH on 2015-01-17 02:15 #6.1: Dr. Neal Krawetz on 2015-01-17 03:04 #6.1.1: Greg on 2015-12-05 10:10 #6.1.1.1: Dr. Neal Krawetz on 2015-12-05 11:03 #6.1.1.1.1: Cameron on 2016-06-07 16:22 #7: DerAblichter on 2016-07-17 12:32 #7.1: DerAblichter on 2016-07-18 06:25 #8: MOna on 2017-12-13 14:01 #9: Doug Carner on 2018-02-15 20:14 #10: James Jenbg on 2021-08-21 08:28 #10.1: Dr. Neal Krawetz on 2021-08-21 09:07 #11: Adele Myers on 2023-03-05 02:37 Comment Enclosing asterisks marks text as bold (*word*), underscore are made via _word_. Standard emoticons like :-) and ;-) are converted to images. E-Mail addresses will not be displayed and will only be used for E-Mail notifications. Remember Information? Submitted comments will be reviewed by moderators before being displayed. Copyright 2002-2024 Hacker Factor. All rights reserved.