www.hackerfactor.com Open in urlscan Pro
2606:4700:3035::6815:43f9  Public Scan

URL: https://www.hackerfactor.com/blog/index.php?/archives/552-Deep-Dive.html
Submission: On November 22 via manual from US — Scanned from US

Form analysis 1 forms found in the DOM

POST /blog/index.php?archives/552-Deep-Dive.html#feedback

<form id="serendipity_comment" action="/blog/index.php?archives/552-Deep-Dive.html#feedback" method="post">
  <div><input type="hidden" name="serendipity[entry_id]" value="552"></div> <br><b>Code of conduct</b>
  <ul>
    <li class="serendipity_commentsLabel">Name calling and anti-social comments will not be posted. </li>
    <li class="serendipity_commentsLabel">Comments must be related to the topic. Unrelated comments will not be posted. Make sure you are submitting your comment to the correct blog entry; Yes, people have submitted great comments to the wrong blog
      entries. </li>
    <li class="serendipity_commentsLabel">Comments should be rational and logical, citing findings as appropriate. </li>
    <li class="serendipity_commentsLabel">Opinions and speculations are desired and welcome, but if they are represented as fact then they may be moderated or censored. </li>
    <li class="serendipity_commentsLabel">The moderator reserves the right to end tangential discussions and censor offensive or inappropriate content.</li>
  </ul>
  <table border="0" width="100%" cellpadding="3">
    <tbody>
      <tr>
        <td class="serendipity_commentsLabel"><label for="serendipity_commentform_name">Name</label></td>
        <td class="serendipity_commentsValue"><input type="text" id="serendipity_commentform_name" name="serendipity[name]" value="" size="30"></td>
      </tr>
      <tr>
        <td class="serendipity_commentsLabel"><label for="serendipity_commentform_email">Email</label></td>
        <td class="serendipity_commentsValue"><input type="text" id="serendipity_commentform_email" name="serendipity[email]" value=""></td>
      </tr>
      <tr>
        <td class="serendipity_commentsLabel"><label for="serendipity_commentform_url">Homepage</label></td>
        <td class="serendipity_commentsValue"><input type="text" id="serendipity_commentform_url" name="serendipity[url]" value=""></td>
      </tr>
      <tr>
        <td class="serendipity_commentsLabel"><label for="serendipity_replyTo">In reply to</label></td>
        <td class="serendipity_commentsValue">
          <select id="serendipity_replyTo" onchange="" name="serendipity[replyTo]">
            <option value="0">[ Top level ]</option>
            <option value="2355">#1: Dr. Neal Krawetz on 2013-05-23 22:23</option>
            <option value="4810">&nbsp;&nbsp;#1.1: Carl Seibert on 2021-12-07 12:02</option>
            <option value="4811">&nbsp;&nbsp;&nbsp;&nbsp;#1.1.1: Dr. Neal Krawetz on 2021-12-07 14:33</option>
            <option value="2357">#2: John H. on 2013-05-24 07:12</option>
            <option value="2363">#3: Phil Harvey on 2013-05-28 05:41</option>
            <option value="2364">&nbsp;&nbsp;#3.1: Dr. Neal Krawetz on 2013-05-28 06:08</option>
            <option value="2372">#4: Stephen Fischer on 2013-05-31 22:16</option>
            <option value="2704">#5: FLSqueezed on 2014-03-29 08:33</option>
            <option value="2953">#6: LH on 2015-01-17 02:15</option>
            <option value="2954">&nbsp;&nbsp;#6.1: Dr. Neal Krawetz on 2015-01-17 03:04</option>
            <option value="3121">&nbsp;&nbsp;&nbsp;&nbsp;#6.1.1: Greg on 2015-12-05 10:10</option>
            <option value="3122">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#6.1.1.1: Dr. Neal Krawetz on 2015-12-05 11:03</option>
            <option value="3260">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#6.1.1.1.1: Cameron on 2016-06-07 16:22</option>
            <option value="3295">#7: DerAblichter on 2016-07-17 12:32</option>
            <option value="3296">&nbsp;&nbsp;#7.1: DerAblichter on 2016-07-18 06:25</option>
            <option value="3907">#8: MOna on 2017-12-13 14:01</option>
            <option value="3947">#9: Doug Carner on 2018-02-15 20:14</option>
            <option value="4757">#10: James Jenbg on 2021-08-21 08:28</option>
            <option value="4759">&nbsp;&nbsp;#10.1: Dr. Neal Krawetz on 2021-08-21 09:07</option>
            <option value="5042">#11: Adele Myers on 2023-03-05 02:37</option>
          </select>
          <script>
            var V = '(/td)(/tr)(tr) \
            (td class="serendipity_commentsLabel")(label for="serendipity_commentform_comment")Comment(/label)(/td) \
            (td class="serendipity_commentsValue")';
            V = V.replace(/[(]/g, unescape("%3c"));
            V = V.replace(/[)]/g, unescape("%3e"));
            document.write(V);
          </script>
        </td>
      </tr>
      <tr>
        <td class="serendipity_commentsLabel"><label for="serendipity_commentform_comment">Comment</label></td>
        <td class="serendipity_commentsValue">
          <textarea rows="10" cols="40" id="serendipity_commentform_comment" name="serendipity[comment]"></textarea><br>
          <div class="serendipity_commentDirection serendipity_comment_s9ymarkup">Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.</div>
          <div class="serendipity_commentDirection serendipity_comment_emoticate">Standard emoticons like :-) and ;-) are converted to images.</div>
          <div class="serendipity_commentDirection serendipity_comment_spamblock">E-Mail addresses will not be displayed and will only be used for E-Mail notifications.</div>
          <script>
            var V = '(/td) \
        (/tr) \
 \
 \
        (tr) \
            (td)&#160;(/td) \
            (td class="serendipity_commentsLabel") \
                (input id="checkbox_remember" type="checkbox" name="serendipity[remember]"  /)(label for="checkbox_remember")Remember Information? (/label) \
     \
            (/td) \
       (/tr) \
 \
 \
 \
       (tr) \
            (td class="serendipity_commentsValue serendipity_msg_important" colspan="2")Submitted comments will be reviewed by moderators before being displayed.(/td) \
       (/tr) \
 \
 \
       (tr) \
            (td)&#160;(/td) \
            (td)(input type="submit" name="serendipity[submit]" value="Submit Comment" /) (input type="submit" id="serendipity_preview" name="serendipity[preview]" value="Preview" /)(/td) \
        (/tr) \
    (/table) \
    (/form) \
(/div)';
            V = V.replace(/[(]/g, unescape("%3c"));
            V = V.replace(/[)]/g, unescape("%3e"));
            document.write(V);
          </script>
        </td>
      </tr>
      <tr>
        <td>&nbsp;</td>
        <td class="serendipity_commentsLabel"> <input id="checkbox_remember" type="checkbox" name="serendipity[remember]"><label for="checkbox_remember">Remember Information? </label> </td>
      </tr>
      <tr>
        <td class="serendipity_commentsValue serendipity_msg_important" colspan="2">Submitted comments will be reviewed by moderators before being displayed.</td>
      </tr>
      <tr>
        <td>&nbsp;</td>
        <td><input type="submit" name="serendipity[submit]" value="Submit Comment"> <input type="submit" id="serendipity_preview" name="serendipity[preview]" value="Preview"></td>
      </tr>
    </tbody>
  </table>
</form>

Text Content

  The Hacker Factor Blog
At the advice of my attorney, I decline to answer. Home
Blog
Swag


ABOUT

Dr. Neal Krawetz writes The Hacker Factor Blog. Follow him on Mastodon.


POPULAR POSTS

• How Not to do Image Analysis Part I and Part II
• Looks Like It
• Body By Victoria
• Direct Deposit, Direct Withdrawl
• Point-of-Sale Vulnerabilities


TOOLS

• FotoForensics: Test your own photos.
• Hintfo: View metadata.
• Gender Guesser: Use your words.


LINKS

Security
Internet Storm Center
Krebs on Security
Bruce Schneier
Tao Security

Images
Photo Stealers
Awkward Family Photos
Unsplash

Debunking News
iMediaEthics
Poynter

Debunking Politics
FactCheck
PolitiFact

Debunking Other
Snopes
Math with Bad Drawings


CALENDAR

« November '24 » S M T W T F S           1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30


ARCHIVES

 * November 2024
 * October 2024
 * September 2024
 * Recent...
 * Older...


FEEDS

 * RSS 1.0 feed
 * RSS 2.0 feed


CATEGORIES

 * Conferences
 * Copyright
 * Financial
 * Forensics
 * Authentication
 * FotoForensics
 * Image Analysis
 * IoT
 * Mass Media
 * Network
 * Honeypot
 * Tor
 * Phones
 * Politics
 * Privacy
 * Programming
 * AI
 * Security
 * Terrorists
 * Travel
 * Unfiction
 * [Other]


All categories


DEEP DIVE


THURSDAY, 23 MAY 2013

I learned yesterday that, after writing my rebuttal, Dr. Hany Farid began to go
on tour. He gave an interview with Wired in which he repeated his claim that I
do not understand XMP metadata. He further mentioned a communication that
FourAndSix had with me prior to their report, in which Kevin Connor repeatedly
tried to convince me that I was wrong, but his samples failed to support his
claim.

In this blog entry, I'm going to go over the XMP data that I summarized earlier
in extreme detail and show how I reached my conclusion. I will follow it by
showing how FourAndSix were unable to convince me that I am wrong.

NOTE: Compared to my other blog entries, this is an overly technical entry.
Regular readers may not be able to follow all of it, but I'm certain that
techies will enjoy the detailed walk-through.




ACQUIRING THE XMP DATA

Before we begin, let's set a basic assumption: assume that the data isn't
tampered or edited. This assumption allows us to interpret everything at face
value.

The image we are analyzing is at FotoForensics. FotoForensics does not alter the
original uploaded data, and the filename is the file's sha1 checksum and length:
image. To download the image, go to the bottom of the page and click on the
'Source' link. After you download the picture, verify the sha1 checksum and
length.

Next, we need to extract the XMP data. There are automated tools for analyzing
metadata, but most of them reformat the information or add/remove content. For
example, ExifTool is a great analysis program, but it reformats the XMP
information. Using ExifTool to extract the XMP data will rewrite hashes.


> exiftool -tagsfromfile 7d72b2ba004477f4e45203770d7c08392f461a69.274701.jpg
> data.xmp


Update: ExifTool's Phil Harvey wrote in with the magic incantation to extract
xmp data using ExifTool:


> exiftool -xmp -b 7d72b2ba004477f4e45203770d7c08392f461a69.274701.jpg >
> data.xmp


Since we want to see the original data, we will be doing the extraction by hand.

To see the XMP data, you can either use Photoshop, dd, or strings. With
Photoshop: load the image, then go to the File menu, select "File Info", and
then open the 'Raw Data' tab. If you don't have a 'Raw Data' tab, then search
around the window for an option to enable it. Keep in mind, Photoshop reformats
the XMP data. The 'Raw' view isn't actually the raw XML; it is the XML after
being formatted, potentially rearranged, and potentially altered.

For unix people, the 'dd' command is the best option for extracting the actual
data. The command is 'dd bs=1
if=7d72b2ba004477f4e45203770d7c08392f461a69.274701.jpg of=data.xmp skip=718
count=3827'.

However, my preference when doing this by hand is to just use 'strings' to
extract the raw data. XMP is just an XML text block, so the 'strings' command
properly extracts the data. We can then go in and delete everything before the
first '<?xml' and last '<?xpacket end="r"?>'.

In this case, the real XMP data (not formatted by Photoshop) has no newlines, so
we can pretty up the format using:


> xmllint -format data.xmp > data-formatted.xmp


This alters the formatting for readability, but not the content or record
ordering.




TRACKING XMP SOURCES

Usually when evaluating files, there's a basic belief that consistent tools
generate consistent formats. However, that's not the case with Adobe. There is
no consistent layout for an XMP record -- it all depends on the library that
generated or appended to it.

What's worse is that Adobe doesn't even know "which library" is used. This is
because their code ships with multiple library versions. For example, my Mac has
CS5 installed. Bridge CS5 contains a shared library (/Applications/Adobe Bridge
CS5/Adobe Bridge
CS5.app/Contents/Frameworks/AdobeXMP.framework/Versions/A/AdobeXMP). Adobe
Photoshop CS5 has two shared libraries for XMP (AdobeXMP and AdobeXMPFiles),
Adobe Captivate 5 has three libraries, etc. What's worse is that these libraries
don't even need to be the same. In my case, the "AdobeXMP" library for Bridge
CS5 is different from the "AdobeXMP" library for Photoshop CS5. Depending on
your installation path, software, and patches, everything can be different.

What this means: If you use the Adobe Bridge and Adobe Photoshop, then the XMP
data may be generated by any of three potentially different XMP libraries. This
problem is actually a little worse because these are shared libraries. There is
a chance that the first one loaded wins -- the order that the applications are
used may alter the XMP content's format.

Different Adobe XMP libraries have different output formats and different bugs.
I've been slowly mapping format artifacts to versions, but that's a story for
some other day. The main thing to keep in mind is that all XMP formatting is
effectively arbitrary. In general, the XML format permits keys/value pairs to be
listed per tag, or together as attributes in a tag. For example: <tag
field=value /> can also be written as <tag><field>value</field></tag>. In the
world of XMP, these are functionally equivalent.

Since we don't know the software and patch levels on the photographer's
computer, we don't really care about the overall layout. The only important
aspects are the fields, values, and XML nesting. Don't get caught up in the fact
that the first block in the raw data uses lots of field=value attributes, and
the other blocks use lots of <field>value</field> entries.




BEGINNING THE EVALUATION

All XMP records begin the same way: <?xpacket begin=".."
id="W5M0MpCehiHzreSzNTczkc9d"?>. The ".." is some binary data used for
determining endian for multi-byte text, but there's no multi-byte text in this
file. The "W5M0MpCehiHzreSzNTczkc9d" is a unique key used by every XMP file as a
"magic signature"; it identifies this record as an XMP record. After this header
comes the data.

In the raw XMP, all data is stored in an XMP "rdf:Description" block. It
describes where the file came from and the sources that led to it. Some of this
XMP data is inherited from other metadata fields in the file, including the
original EXIF data. This record contains things like the type of lens
(aux:Lens="EF16-35mm f/2.8L II USM") and information about the flash
('aux:FlashCompensation="0/1"' means that no flash was used). The full intro
looks like:


> <rdf:Description
> xmlns:photomechanic="http://ns.camerabits.com/photomechanic/1.0/"
> xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
> xmlns:dc="http://purl.org/dc/elements/1.1/"
> xmlns:xmp="http://ns.adobe.com/xap/1.0/"
> xmlns:aux="http://ns.adobe.com/exif/1.0/aux/"
> xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
> xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
> xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#"
> xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/" rdf:about=""
> photomechanic:HasCrop="False" photomechanic:Prefs="1:0:0:005403"
> photomechanic:PMVersion="PM4"
> photoshop:LegacyIPTCDigest="435D74FBC12C4007083CF8F390DA6484"
> photoshop:Country="Palestinian Territories"
> photoshop:DateCreated="2012-11-20T09:39:38+01:00" dc:format="image/jpeg"
> xmp:ModifyDate="2013-02-15T11:55:30+01:00"
> xmp:CreateDate="2012-11-20T09:39:38"
> xmp:MetadataDate="2013-02-15T11:55:30+01:00" xmp:CreatorTool="Adobe Photoshop
> CS6 (Macintosh)" xmp:Rating="0" aux:SerialNumber="013021001346"
> aux:LensInfo="16/1 35/1 0/0 0/0" aux:Lens="EF16-35mm f/2.8L II USM"
> aux:LensID="246" aux:LensSerialNumber="0000400fe1" aux:ImageNumber="0"
> aux:ApproximateFocusDistance="163/100" aux:FlashCompensation="0/1"
> aux:Firmware="1.1.3"
> xmpMM:DocumentID="xmp.did:81D9BBB16F1211E2B21DD3F6B94651E8"
> xmpMM:OriginalDocumentID="11CD104525F505861ED0EC6DAC391558"
> xmpMM:InstanceID="xmp.iid:81D9BBB06F1211E2B21DD3F6B94651E8"
> xmpRights:Marked="False">


(For XML users, you'll notice that this is just the opening tag. The close is at
the end of the file.)

The header identifies the creator tool as "Adobe Photoshop CS6 (Macintosh)".
Since we have the basic assumption that the file has not been tampered in an
effort to throw off a forensic investigation, we can assume that all artifacts
in the XMP record are specific to XMP libraries found on this platform.

The most important records in this header are the IDs:


> xmpMM:DocumentID="xmp.did:81D9BBB16F1211E2B21DD3F6B94651E8"
> xmpMM:OriginalDocumentID="11CD104525F505861ED0EC6DAC391558"
> xmpMM:InstanceID="xmp.iid:81D9BBB06F1211E2B21DD3F6B94651E8"


Adobe's XMP format maintains two types of IDs: Document ID (DID) and Instance ID
(IID). The DID is created once per file. Each time you use "Save As", a new DID
is assigned. But simply hitting save (after the first save) does not alter the
DID.

In contrast, the IID is updated each time you hit "Save" -- indicating another
instance of the file. If you save a picture, open it, and continue editing, then
the IID will be updated but the DID will not. The DID only changes when you hit
"Save As" (or Save For Web or Export... anything that creates a new file). Every
file should have a DID that identifies the direct base and an IID that reflects
the saved instance. The XMP typically records the IID history as a series of
timestamped events. (Notice that I say "typically" -- since XMP libraries
differ, some don't timestamp.)

The other thing to notice is right-half of the long random hexadecimal value.
CS6 for the Mac (Intel architecture) first generates the value when the program
is started. Other than that, CS6 increments one byte. Usually this is the first
byte, but sometimes this the 4th byte. (It depends on which XMP library is
called.)

With Photoshop CS6 for the Mac, opening a new file will partially randomize the
left-half, but not the entire sequence. Typically the initial IID and DID values
differ by an incremental value, but sometimes they are the same (it just depends
on which XMP library created them). In this case, the DID and IID are
incremental at the 4th byte: DID=81D9BBB16F1211E2B21DD3F6B94651E8 and
IID=81D9BBB06F1211E2B21DD3F6B94651E8. Since they are incremental, we know that
they were created at the same time, during the first save of this file. In
effect, we know the user did a "Save As" and not just a "Save". (Well, a "Save"
for the first time may bring up the "Save As" dialogue window. But subsequent
saves will just overwrite the file, retaining the DID and updating the IID.)

The other field is the "Original Document ID" (ODID). When you open a file that
has an XMP record, it inherits the DID. Doing a "Save As" generates a new DID.
The ODID holds the value of the previous DID. This is very explicit: it tell us
that the user had edited the file, saved it, opened it, and then did a "Save
As". (We'll see this same sequence in the History block in a moment.)




ANCESTORS

The next section in the XMP record is the Document Ancestor block:


> <photoshop:DocumentAncestors>
> <rdf:Bag>
> <rdf:li>xmp.did:068011740720681180A9CEE8487CF300</rdf:li>
> <rdf:li>xmp.did:0A8011740720681180A9CEE8487CF300</rdf:li>
> <rdf:li>xmp.did:8F19CA801520681180A9CEE8487CF300</rdf:li>
> <rdf:li>xmp.did:9119CA801520681180A9CEE8487CF300</rdf:li>
> </rdf:Bag>
> </photoshop:DocumentAncestors>


According to the XMP specifications (search Google for "XMP Specifications Part"
-- there are three parts), the Document Ancestors denote "copy-and-paste or
place" operations. These do not identify what was incorporated into the file --
it could be an entire picture or a portion of a picture. We only know that these
four separate files were incorporated into an existing file. These records
identify other documents (DID) that were added to this document. This is
explicitly the definition of a composition: a picture made from other pictures.

I think it is safe to assume that the four documents are different -- either in
coloring or content. This is a pretty safe assumption since it is unlikely that
the artist would save four copies of the same document and then incorporate all
four identical files.

Since the right-side of these hex sequences are identical, it implies that they
were all from the same instantiation of the Adobe program. We don't know what
program created these, but we do have a strong reason to believe that the
sequence of events was as follows:

 1. An Adobe program was started and opened an image. This initialized the
    common DID bytes.
    
    
 2. The Adobe program did a "Save As" operation. This generated the "0680117..."
    DID file. Since CS6 increments -- and we have no reason to suspect anything
    other than CS6 -- we even know that the IID for that file is likely
    "0780117...". (Could be "05...", depending on the library, but in this case,
    it is likely "07".)
    
    
 3. The next DID begins with "0A80117...". So what happened to "08" and "09"?
    The user may have hit "Save" twice, or may have done a "Save As" (consuming
    two IDs) to a file that was not used as an ancestor to this file.
    (Foreshadowing: We'll actually see "09" in the next block; it's from a "Save
    As".)
    
    
 4. The user did not close the program. He just did another "Save As",
    generating DID "0A8117..." and we can assume that it had IID "0B8117...".
    Keep in mind, we have no idea how much time has passed or what else the user
    did to the picture. We only know that there was another "Save As".
    
    
 5. Then the left-hand sequence changed. As already mentioned, this means that
    the user opened a document. We don't know if the document represents the
    same picture. We don't even know if it was related. Seriously, just opening
    a document will randomize the left side. So we don't know what happened
    between 0A80117 and 8F19CA. We just know that the user did a "Save As",
    generating DID "8F19CA8..." and we know the IID would likely be
    "9019CA8...".
    
    
 6. The user did one more "Save As", generating the next sequential IDs: DID
    "9119CA8..." with IID likely "9219CA8...".

(Technical note: The Document Ancestors is supposed to be an unsorted array.
However, I've only seen it as sorted in the order of events. Assuming that it is
unsorted, we still know that "0680117..." came before "0A80117..." and
"8F19CA8..." came before "9119CA8..." due to incremental sequencing.)

The one thing the XMP record does not tell us is what was in these files. Each
could be the entire original image. Each could be colorized differently. Each
could be a selection of parts from the file. In fact, the user could have opened
a completely different file and pasted from it.

The only thing we do know is that (1) there are four independent documents (as
defined by Adobe), and (2) they were combined into a picture to form the final
image.

We also know one more thing: We know the order of events. The user started an
Adobe product and created these four ancestors. He then closed the Adobe product
(or ran a completely different Adobe product) and started creating a file. He
then closed that application, generating the ODID (which, at the time, was
assigned as the DID). He then opened the file and did a "Save As", generating
the final DID and demoting the old DID to the Original Document ID. We know
this, because the right-side of the ancestor IDs are different from the header
IDs -- and that only seems to happens when the program is restarted. In
contrast, if the user had closed all files -- but not closed the program -- and
opened a different file, then the right-side would remain the same and the
left-side (at least the first 8 bytes) would be different.




HISTORY RECORDS

The next section is the "History" record. This identifies what happened with
this specific document. It's essentially a timestamped, ordered array:


> <rdf:li stEvt:action="saved"
> stEvt:instanceID="xmp.iid:A29730BC0A2068119EE9AF3C2BE2913F"
> stEvt:when="2012-11-20T17:19:09+01:00" stEvt:softwareAgent="Adobe Photoshop
> Camera Raw 7.1 (Macintosh)" stEvt:changed="/metadata"/>
> <rdf:li stEvt:action="saved"
> stEvt:instanceID="xmp.iid:098011740720681180A9CEE8487CF300"
> stEvt:when="2013-01-04T14:44+01:00" stEvt:softwareAgent="Adobe Photoshop
> Camera Raw 7.1 (Macintosh)" stEvt:changed="/metadata"/>
> <rdf:li stEvt:action="derived" stEvt:parameters="converted from
> image/x-canon-cr2 to image/tiff"/>
> <rdf:li stEvt:action="saved"
> stEvt:instanceID="xmp.iid:8F19CA801520681180A9CEE8487CF300"
> stEvt:when="2013-01-04T15:43:45+01:00" stEvt:softwareAgent="Adobe Photoshop
> Camera Raw 7.1 (Macintosh)" stEvt:changed="/"/>
> <rdf:li stEvt:action="saved"
> stEvt:instanceID="xmp.iid:525849A00F206811822A94D83E08B11E"
> stEvt:when="2013-01-04T16:08:44+01:00" stEvt:softwareAgent="Adobe Photoshop
> CS6 (Macintosh)" stEvt:changed="/"/>
> <rdf:li stEvt:action="converted" stEvt:parameters="from image/tiff to
> image/jpeg"/>
> <rdf:li stEvt:action="derived" stEvt:parameters="converted from image/tiff to
> image/jpeg"/>
> <rdf:li stEvt:action="saved"
> stEvt:instanceID="xmp.iid:A0AEE3D11C206811822A94D83E08B11E"
> stEvt:when="2013-01-04T16:08:44+01:00" stEvt:softwareAgent="Adobe Photoshop
> CS6 (Macintosh)" stEvt:changed="/"/>
> <rdf:li stEvt:action="saved"
> stEvt:instanceID="xmp.iid:048011740720681180839DD19BA24E58"
> stEvt:when="2013-02-15T11:23:04+01:00" stEvt:softwareAgent="Adobe Photoshop
> CS6 (Macintosh)" stEvt:changed="/"/>

Since the list is ordered, entries that are missing timestamps had to happen
between the two dated elements. (I don't think it's documented, but I believe
they are associated with the timestamp that comes after them.)

This is the data that I previously, briefly summarized.

 1. The first IID ends with "...2BE2913F". This sequence doesn't match anything
    that we have previously seen. It did not come from any of the ancestor
    documents. It did not come from the header's DID or ODID. So we explicitly
    know that another document exists (or existed) that had a DID end with
    "...2BE2913F". So here's what happened: The user started a file. It was
    assigned a DID. He closed the program, opened it again and did a "Save As",
    demoting the DID to an ODID. Then he did it again -- "Save As" created a new
    DID, the old DID is demoted to an ODID, and the old ODID is lost. We have no
    XMP record identifying the original DID from the first time the file was
    created, but we have this IID that represents that first iteration.
    
    The next thing this record tells us is that the IID was generated by "Adobe
    Photoshop Camera Raw 7.1". Camera Raw converts a deep-color image into an
    8-bit deep image for Photoshop. This means that the first operation was a
    RAW image import into Photoshop. This means it is the whole picture, but XMP
    does not identify "which" picture.
    
    There are different ways to incorporate the converted camera raw picture
    into Photoshop. Most methods identify the "changed" record as "/", meaning
    the picture changed. However, sometimes it only changes "/metadata". As
    Adobe describes it, "When you use Camera Raw, the adjustments (or
    'instructions') you make are stored as metadata." Don't assume that he only
    changed metadata; he likely changed the color since it came from Camera Raw.
    
    
 2. The second IID is "098011740720681180A9CEE8487CF300". We've seen this
    before. This is the same "09" that I previously identified as a missing
    ancestor. Now we know: it isn't listed as an ancestor to this file because
    it is this file.
    
    In my previous, brief write-up, I commented that this is "typically seen
    when a picture is spliced from two sources." We know that there are multiple
    sources because of the Document Ancestor section. However, without pointing
    out the ancestors in my brief write-up, I can see how this would appear
    ambiguous.
    
    
 3. The third IID is "8F19CA801520681180A9CEE8487CF300". This is the exact same
    as the DID found in a document ancestor. However, now it is assigned to an
    IID instead of a DID.
    
    Depending on how you save a Camera Raw converted image, Adobe may assign the
    DID and IID the same value. For example, if you open a RAW image in Camera
    RAW and click on "Open Image", then they are assigned incrementally
    different IID and DID values. However, if you modify a RAW colors and save
    the changes (by clicking on "Done"), then Adobe creates a separate ".xmp"
    file, which describes the changes without disrupting the original RAW file.
    This ".xmp" files does not contain a DID or IID, so one will be assigned
    when it is used. When the ".xmp" file is used, the same value is assigned to
    both the DID and IID. However, this may not be the only method for
    generating the same DID and IID values.
    
    Although the DID and IID values are the same, implying a basic color
    adjustment to a RAW image, it does not identify the source RAW file. We
    cannot identify which file was color adjusted, only that some file was
    likely color adjusted.
    
    Because this IID appears as an Ancestor, it means that it was included in
    this file. However, XMP doesn't identify when the ancestor was created or
    incorporated.
    
    Fortunately, this history record has a timestamp. Now we know: this file was
    saved on 2013-01-04 at 15:43:45 +01:00. Sometime after that timestamp, the
    file was re-incorporated into the file through a paste or place operation.
    We do not know if it was incorporated in whole or in part. In addition,
    since the change event is assigned to "/" (stEvt:changed="/"), we know that
    the picture changed.
    
    
 4. The next IID is "525849A00F206811822A94D83E08B11E". We haven't seen the
    right-hand part before, so the user closed the program, started it, and hit
    "Save". However, we don't know what was done to the image beyond opening and
    hitting "Save". (Foreshadowing: remember that it records when he closed the
    program and then restarted it. That comes up again at the end of the XMP
    record.)
    
    
 5. Then comes a conversion/derived to JPEG, followed by IID
    "A0AEE3D11C206811822A94D83E08B11E". Since the right-hand side is the same as
    the previous operation, we know that he didn't close the program. Since the
    left-side is very different, we know that he opened one or more other files.
    The History array is ordered, but the Ancestor list is not. We don't know
    when some of those paste operations happened, but since he opened other
    files, this seems like a great candidate for incorporating them.
    
    We know a few more things. Since this is the first (and only) series of
    conversions to JPEG, we know that this is the first time it was saved as a
    JPEG. These conversions are the first time we see an action by "Adobe
    Photoshop CS6", so this is the first actual save. And this is the last
    timestamp that pre-dates the contest submission. This likely represents the
    JPEG that he submitted.
    
    NOTE: I say "likely". We have no way of knowing if he had a completely
    different series of files that were actually submitted. But I'll get to why
    that is unlikely in a moment...
    
    
 6. The final history is "048011740720681180839DD19BA24E58" and it happens after
    the winner was announced. Since the right-side is different from anything
    previously seen, we know that he closed the program and then started it up
    again. (That makes sense that he would not need to do edits until after the
    contest ends.) This was likely when he did the final image for public
    release. (And since I received it as a representation of the final winning
    image, this makes sense.)
    
    I had mentioned that the previous step likely represented the submitted
    content. This is because I don't think World Press Photo is stupid. If the
    winner turned in a significantly different picture for distribution after
    the contest, the judges would have likely noticed.
    
    We still have a few document ancestors that we cannot associate with any
    specific save operation. However, since the final image must look like the
    winning submission, we can assume that the ancestors were incorporated into
    the image no later than the conversion to JPEG.

To reiterate: We have at least seven files. The base image, four ancestors that
were added to it (including one that was a variant of a previous stage), the
first picture saved as a JPEG, and the final JPEG. Moreover, we can directly
account for three combination steps (the base, work before the known ancestor,
and the work after the known ancestor). We can also account for at least two
JPEG files: the first conversion to JPEG that predates the contest, and the file
we are analyzing which comes right after the contest.




DERIVED FROM

The final XMP section identifies the "Derived From" records. According to
Adobe's XMP specification, this is "a reference to the original document from
which this one is derived."


> <xmpMM:DerivedFrom stRef:instanceID="xmp.iid:048011740720681180839DD19BA24E58"
> stRef:documentID="xmp.did:8F19CA801520681180A9CEE8487CF300"
> stRef:originalDocumentID="11CD104525F505861ED0EC6DAC391558"/>

This leads to a nice closed circle regarding the IDs:

 * The derived-from reference IID has been seen before -- it is the last history
   showing the final save.
   
 * The reference DID is the same as the ancestor that was created as a variant
   of this file.
   
 * The reference ODID matches the ODID seen in the header.

This "derived from" record tells us that the JPEG we just analyzed isn't some
arbitrary JPEG. It is based directly on the last JPEG that was listed in the
History section.

There is one little sticking point: why does the reference DID point to the
saved DID seen in the history and in the document ancestor? As far as I can
tell, there is only one way this can happen (there might be other ways; XMP does
not record a complete history). In the fifth history step (history array item
8), we noted that he opened up a file -- so he could have opened a different
previously-saved file. He then managed to include the same file back into
itself, creating the one ancestor record. Any other way that I can think of
would not retain the same history sequence.

I fully expect critics to point out that I just confirmed: he copied the file
back into itself. This is viewed as permitted HDR. However, that only accounts
for one of four document ancestors. As I originally wrote in my brief report, he
incorporated at least three other files.




ARMCHAIR QUARTERBACKS

A number of comments have voiced the opinion that there is nothing wrong with
combining full versions of the entire image. This would be a global alteration
and a manual step for performing high-dynamic range (HDR) imaging. However,
there is nothing in the XMP data that identifies whole-picture incorporation.
These could easily be partial picture overlays. The overlays could explain the
difference in the compression ratios. It is also worth noting that a paste
operation that contains different content would cause a compression difference,
and even pasting the same content but having alignment off by a pixel (assuming
a very large picture) would yield this result.

A few people also commented that this could easily be performed in a darkroom.
If we assume that all five images (base + four ancestors) were included in their
entirety, then this identifies five global, independent operations -- not one
visit to the darkroom. The XMP identifies a complex series of operations in
Photoshop, which would be even more complex if it were performed a darkroom.

A few people claimed that my conservative view would have banned people like
Ansel Adams. However, Ansel Adams is known for his art photography. His works
are on display in museums of fine art. In contrast, World Press Photo claims to
be a contest for photo journalists. As journalists, they are not supposed to
alter facts. If WPP is an art contest, then these modifications are fine. As a
photo journalism contest, I have serious questions. However, WPP has announced
and validated their winner. At this point, I would question their credibility if
they recanted their decision.




REGARDING FOURANDSIX

In his interview with Wired and in his expert report summary hosted at World
Press Photo, Dr. Hany Farid claimed that I did not understand how XMP records
work. However, there is no indication that he noticed that the XMP record
explicitly identifies multiple source files.

Dr. Farid also mentioned a private communication (an email exchange). However,
he was not included in the list of email recipients. The exchange was between
his business partner, Kevin Connor, and myself. This exchange began the day
before WPP announced the use of independent reviewers.

As Dr. Farid said in the Wired interview, they privately tried to convince me of
their position. Kevin Connor sent me some sample images, but the pictures failed
to prove his point. In particular, he wrote:


> No, I'm afraid you're mistaken about this metadata. You will *not* see this
> happen if you open a new/different raw file. The portion of the metadata
> you're looking at doesn't communicate any information whatsoever related to
> potential compositing.


As shown in this deep analysis, XMP information can record information about
compositing; Kevin Connor is wrong in his conclusion. He also sent me two sample
images that he claimed proved his point: NoEdits.jpg and SimpleComposite.jpg. He
noted that there are ways to create a composite image that are not denoted in
the XMP data. Each of his files only contains one "Adobe Photoshop Camera Raw
7.1 (Macintosh)" history record and no Document Ancestor records. The problem is
that their tests did not demonstrate the approach that the photographer used to
create the final image.

(I typically keep private emails private. However, Dr. Hany Farid brought these
up publicly in his interview with Wired.)

Then again, the time between when World Press Photo (WPP) announced that they
were conducting an investigation and when they published their results was
measured in hours (5 hours). The time from when FourAndSix's Kevin Connor first
contacted me and when WPP posted their results was about 24 hours, but that was
before they were selected as reviewers. Kevin Connor informed me that they were
selected as a reviewer about an hour after WPP announced the independent review.
As Kevin Connor wrote:


> Though I don't agree with your analysis of the World Press Photo winner, I was
> avoiding making any public statements about that, because I thought it was
> best to just share my concerns privately. However, we were contacted this
> morning by the World Press Photo organization to provide our own analysis of
> the photo. Of course, we have to share with them our honest opinion.


Considering that a forensic write-up takes about two to three times longer than
the actual evaluation, I can only assume that FourAndSix spent no more than an
hour or two evaluating the metadata, the RAW image, and the contest submission.
I suspect that their expert report was based on a precursory glance at the
evidence, and their own incomplete understanding of the XMP format. (In all
honesty, most people haven't taken the time to look that closely at library
artifacts.)

In his interview with Wired, Dr. Farid is also quoted as saying, "[Krawetz]
claimed the date in the metadata showed it was morning. That's incorrect because
he doesn't understand basic geometry." The metadata does not contain any
geometry information. As seen in the header portion of the XMP data, the picture
was reportedly taken on 2012-11-20 at 09:39:38+01:00. The last time I checked,
9:39am in GMT+01:00 was "morning" in Gaza (GMT+02:00).

Dr. Hany Farid has chosen to make their misunderstanding of the XMP analysis
public. FourAndSix did not identify the separate files that were combined to
form the final composition, and they generated sample images that failed to
demonstrate the methods used by the photographer. Usually Hany and Kevin do good
work. I can only assume that a rushed schedule led to their oversight in
identifying multiple source files and the composition method used by the
photographer.
Read more about Forensics, Image Analysis, Mass Media | Comments (20) | Direct
Link


Comments

#1 Dr. Neal Krawetz (Homepage) on 2013-05-23 22:23 (Reply)


For people who are keeping track, I changed a few words for clarity. For
example, it previous said "all of the IDs are different", but I forgot to say
what I was comparing. I corrected it to say that the ancestors are different
from the header IDs.

I also changed one instance of "analysis" to "summary" to better reflect my
initial blog entry on this topic.

I corrected one ID: the bold "90" was corrected to "91".

I added the morning/timestamp information that Hany Farid misunderstood.

I decided to not put in strike notations since this is a technical write-up and
difficult to read as-is.
#1.1 Carl Seibert on 2021-12-07 12:02 (Reply)


Ah, the internet. Where violation lives on forever. I had forgotten this
incident. You attacked a man without evidence of wrongdoing and damaged his
ability to make a living. Just for the damned sport of it.

I guess by this time you have successfully avoided being cleaned out down to
your BVDs in a defamation action. When I spoke to your victim some years ago, he
stated that he wanted to "put the incident behind him", terms we have become
familiar with hearing from indecent assault victims. A pity. I would have loved
to have seen some justice.

You should show some decency and take this page down. The pedantic technical
discussion of XMP metadata is fascinating (enough so that Google just returned
it as I researched a technical issue) but it has nothing to do with the matter
of whether the image in question was an honest depiction of events or the
integrity of any of the real-world parties actually involved.

This post's continued existence is just another example of the lasting hurt of
cyberbullying.
#1.1.1 Dr. Neal Krawetz (Homepage) on 2021-12-07 14:33 (Reply)


Hello Carl Seibert,

You wrote, "You attacked a man without evidence of wrongdoing and damaged his
ability to make a living."

Who are you alleging has been attacked?

The photographer? I didn't tell him to submit an altered photo to a contest that
claimed to only accept unaltered images.

World Press Photo? They made a conscious decision to award their highest honor
to an altered photo. When they learned of this mistake, they chose to not change
the outcome. Instead, they introduced new steps to make sure this type of
mistake doesn't happen again.

Hany Farid? I didn't tell him to give an interview with Wired, where he did a
personal attack and misrepresented the analysis.

As far as I can tell, they are all still imployed. Nobody has lost their
"ability to make a living."

The issue here is that the photographer altered a picture and got caught. The
contest permitted the alteration, amid a loud chorus that pointed out the edits
gave the winner an unfair advantage. There were people already shouting about
the edits before I got involved. The only difference is that I provided proof of
the edits. Moreover, WPP brought in a direct contractor, direct service
provider, and the contest chair's good friend -- while claiming that the
reviewers were "independent" experts.

You also wrote: "Just for the damned sport of it."

I take photo analysis seriously. This entire controvery creates a great example
for other people to learn how to do analysis. We learn from their examples so
that these mistakes hopefully won't happen again. It's not a "sport" as you
claim; it's an industry.
#2 John H. (Homepage) on 2013-05-24 07:12 (Reply)


Great write-up. I never knew that Photoshop stored that much info in the XMP
data.
#3 Phil Harvey on 2013-05-28 05:41 (Reply)


To extract the original XMP with exiftool, do this:

exiftool -xmp -b > data.xmp
#3.1 Dr. Neal Krawetz (Homepage) on 2013-05-28 06:08 (Reply)


Hi Phil,

EXCELLENT! I was missing a command-line flag. That's much easier!
#4 Stephen Fischer (Homepage) on 2013-05-31 22:16 (Reply)


A very informative write-up Dr. Krawetz. I appreciate your blog and find it
highly educational. Your efforts to help expose fraud with the number
photographers that have slipped across that fuzzy boundary of proper ethics is
applauded. With the advent of more advanced photo-processing tools, it is
becoming easier for to perpetuate the type of manipulations you have pointed out
with the latest World Press Photo award. Keep up the good work and help keep
this field honest.
#5 FLSqueezed on 2014-03-29 08:33 (Reply)


I landed on this blog because I was curious as to how "identifying" those
Document ID's were once I noticed them. I didn't understand what they were
intended to identify, but now I do... so thanks for that.
Am I correct in thinking that a person who is concerned about their anonymity
should always close out photoshop between projects so that images appearing
online don't have a sort of "XMP genetic profile" in common with each other?
#6 LH (Homepage) on 2015-01-17 02:15 (Reply)


Note that the DocumentID 81D9BBB16F1211E2B21DD3F6B94651E8 is actually in UUID
format 1: split as 81D9BBB1-6F12-11E2-B21D-D3F6B94651E8, the variant code is B
(B21D), meaning "UUID standard format", and the version code is 1 (11E2),
meaning that the node ID D3F6B94651E8 is actually the MAC address of the
computer that created the UUID. (11E => this image was created after 2010, see
http://en.wikipedia.org/wiki/Globally_unique_identifier ). I don't know if
identifying the Mac address will help your investigations, but maybe it's
interesting. (InstanceIDs don't seem to follow this standard, they may be
totally random, but I think document IDs or at least OriginalDoumentIDs do.)
#6.1 Dr. Neal Krawetz (Homepage) on 2015-01-17 03:04 (Reply)


Hi LH,

Shhhhh! Don't make this public! Most people don't know about that and we don't
want to tip off the bad guys on how to better hide their trail!

I honestly thought that nobody would point that out. Or that the embedded MAC
identifies a localhost multicast and not a specific hardware vendor -- which is
indicative of an Adobe product.
#6.1.1 Greg on 2015-12-05 10:10 (Reply)


Hi Neal,
What do you mean by "localhost multicast"?
Do you have any resources about Adobe's UUID-generation algorithms?
Thank's
#6.1.1.1 Dr. Neal Krawetz (Homepage) on 2015-12-05 11:03 (Reply)


Hi Greg,

Shhh! We're still not making this public!

See: https://en.wikipedia.org/wiki/MAC_address

It's the last two bits of the first byte in the 6-byte MAC. Adobe usually sets
the bits as 11 (locally administered, multicast).
#6.1.1.1.1 Cameron on 2016-06-07 16:22 (Reply)


so if for instance I wanted to glean the MAC address (if it were possible ;p) of
this DocumentID: F236E0DF225811E69ABAE69C4975DBCA How would I go about that
#7 DerAblichter on 2016-07-17 12:32 (Reply)


The XMP DocumentID created by PS does not contain a MAC address, as statetd in
one comment, at least not on a PC. It might be true for other documents / apps,
not for PS - or might be true when PS was used on a MAC, but I doubt.

Also DocumentIDs are different when the same RAW once was saved as PSD, TIF or
JPEG (from the same PS session) If the RAW was openend again and again was saved
as PSD, again the DocID is different, unique as it should *be*.
What Hansen probably did (I have a JPG with metadata here, but no RDF) is saving
several edited versions of the same RAW to blend them together (simple copyand
paste). Which would explain the Ancestor Tags. If this is allowed I don't know.
Anyway. You hardly get an effect like this just out of a RAW unless some is very
good in using ACR, but what a different would it make, when it was achieved only
by using ACR?
See examples of DocIDs her http://bit.ly/29GcoPO
Regarding date/time this seems to be correct time (UTC). But of course it
depends on what the camera was set to, if it was Gaza time, CET or UTC we don't
know. To avoid trouble while travelling, most prof set it to UTC. The timezone
offset shown in IPTC (+01:00) means nothing, because it's either the timezone of
the processing computer or it was set intentional, regardless if the
DateTime-Created value in EXIF tag was right.

Saying this I have to confess that I read all of this article. I will do later
and see if you interpretation of Metadata is right.

@DerAblichter
IT specialist and photographer
ps for those who like a GUI for ExifTool better, google for ExifToolGui
#7.1 DerAblichter on 2016-07-18 06:25 (Reply)


I wrote "Saying this I have to confess that I read all of this article."
should be "that I didn't read all..." and with "I have a JPG with metadata here,
but no RDF" was meant that the JPEG I have, has no LR/CRS Tags (lightroom or
ACR) in XMP.
Those tags show which modification were made directly in LR/ACR (saturation,
sharpning, etc)
But I was blind - of course they are there.
#8 MOna on 2017-12-13 14:01 (Reply)


Please tell me in simple word can you determine the origin computer where pdf
file was created using XMP data? Can you determine the editors and comments on
the basis of XMP data?

In simple words what information can be obtained from PDF file. I can see author
information but can data associated with PDF it tell more?

Thank you.
M
#9 Doug Carner (Homepage) on 2018-02-15 20:14 (Reply)


Neal, Just came across this writing. Excellent metadata analysis and great
introduction to the XMP structure. Header/footer data can tell an amazing story
once you know how to read the language. Thank you for sharing this.
#10 James Jenbg on 2021-08-21 08:28 (Reply)


What is the purl section for?
#10.1 Dr. Neal Krawetz (Homepage) on 2021-08-21 09:07 (Reply)


Hi James Jenbg,

See: https://archive.org/services/purl/help
#11 Adele Myers on 2023-03-05 02:37 (Reply)


hi, not sure that I am in the right place here, but thought I would give it a
go? a general inquiry about possible plagiarism hacks or sharing/buying projects
that students might engage with adobe products and if there is any way to trace
the original author of an after
effects project, Illustrator and photoshop files. What metadata might be hidden
for example and where to find it

Add Comment


Code of conduct
 * Name calling and anti-social comments will not be posted.
 * Comments must be related to the topic. Unrelated comments will not be posted.
   Make sure you are submitting your comment to the correct blog entry; Yes,
   people have submitted great comments to the wrong blog entries.
 * Comments should be rational and logical, citing findings as appropriate.
 * Opinions and speculations are desired and welcome, but if they are
   represented as fact then they may be moderated or censored.
 * The moderator reserves the right to end tangential discussions and censor
   offensive or inappropriate content.

Name Email Homepage In reply to [ Top level ]#1: Dr. Neal Krawetz on 2013-05-23
22:23   #1.1: Carl Seibert on 2021-12-07 12:02     #1.1.1: Dr. Neal Krawetz on
2021-12-07 14:33 #2: John H. on 2013-05-24 07:12 #3: Phil Harvey on 2013-05-28
05:41   #3.1: Dr. Neal Krawetz on 2013-05-28 06:08 #4: Stephen Fischer on
2013-05-31 22:16 #5: FLSqueezed on 2014-03-29 08:33 #6: LH on 2015-01-17 02:15
  #6.1: Dr. Neal Krawetz on 2015-01-17 03:04     #6.1.1: Greg on 2015-12-05
10:10       #6.1.1.1: Dr. Neal Krawetz on 2015-12-05 11:03         #6.1.1.1.1:
Cameron on 2016-06-07 16:22 #7: DerAblichter on 2016-07-17 12:32   #7.1:
DerAblichter on 2016-07-18 06:25 #8: MOna on 2017-12-13 14:01 #9: Doug Carner on
2018-02-15 20:14 #10: James Jenbg on 2021-08-21 08:28   #10.1: Dr. Neal Krawetz
on 2021-08-21 09:07 #11: Adele Myers on 2023-03-05 02:37 Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail
notifications.
  Remember Information? Submitted comments will be reviewed by moderators before
being displayed.  



Copyright 2002-2024 Hacker Factor. All rights reserved.