pentaROOT Information Security
  • Home
  • Services
  • Blog
  • About
  • Contact
Select Page

Document Metadata Analysis and Extraction

by pentaROOT | Nov 6, 2017 | Recon, Tools

Metadata

Metadata is data that describes data. Document metadata is hidden to the standard user and includes details such as usernames, file system paths, email addresses, and many other useful bits of information. Many document types contain some amount of metadata, but some document types contain more than others. File types to look out for include

  • doc, dot, docx
  • html, htm
  • jpg, jpeg
  • pdf
  • ppt, pot, pptx
  • xls, xlt, xlsx
  • Full list under ‘Supported File Types’

This tutorial demonstrates how to analyze and extract document metadata using Exiftool and the strings command.

 

Software

  • ExifTool
  • strings command

 

Install ExifTool

  • Linux
    • $sudo apt install libimage-exiftool-perl -y
  • Windows and Mac
    • Most recent version found here

 

Install strings

No need! The Kali image previously installed comes with strings included.

 

Using ExifTool

For this example, save this image of Bart Simpson. Move the image from your Downloads folder to Home and rename it Bart.jpg. Run exiftool on the image.

  • $mv Downloads/Bart_Simpson_Season_25_Official.jpg bart.jpg

Move Bart

  • $exiftool bart.jpg

exiftool bart

To request the property names to use with ExifTool commands, use the -s option. Notice the difference previously with the first line showing ‘ExifTool Version Number’ and the command below showing the first line as ‘ExifToolVersion’.

  • exiftool -s bart.jpg

exiftool -s bart

With the appropriate property names, you can pull specific information about the file.

  • $exiftool -ImageWidth -ImageHeight bart.jpg

exiftool -ImageWidth -ImageHeight bart

ExifTool can also add custom properties to a file.

  • $exiftool -Owner=pentaroot bart.jpg

exiftool -Owner=pentaroot bart

It can also remove certain properties from a file.

  • $exiftool -Owner= bart.jpg

exiftool -Owner= bart

Combine ExifTool with grep to find path names.

  • $exiftool bart.jpg | grep /

exiftool bart grep 1

  • $exiftool bart.jpg | grep ‘\\’
    • This looks for any ONE backslash. bart.jpg does not include a path of this type.

Additional ExifTool commands to try

  • $exiftool -list bart.jpg and $exiftool -listw bart.jpg for available and writable tag names
  • $exiftool -Owner=pentaroot bart.jpg marge.jpg homer.jpg to write to multiple files
  • $exiftool -Owner=pentaroot c:/images to write to all files in a directory
  • $exiftool -Owner=pentaroot -copyright=”2017 pentaROOT” bart.jpg to write multiple tags
  • $exiftool -a -u -g1 bart.jpg prints all metadata grouped by family(1)
  • $exiftool -common /home/pentaroot/ prints all common metadata in a directory
  • $exiftool bart.jpg > bart.txt prints all metadata to a text file

Read the man page for additional commands and details.

  • $man exiftool

 

Using the strings command

For this example, download the official Introduction to Kali Linux from docs.kali.org. Move the pdf from your Downloads folder to Home and rename it kali.pdf. Run the strings command on kali.pdf.

  • $mv Downloads/kali-book-en.pdf kali.pdf

Move kali

  • $strings kali.pdf

While the output from running a basics strings command is usable, it is not in a convenient format. Try running strings to only find strings with eight or more characters.

  • $strings -n 8 kali.pdf

For this document, look for strings of only eight or more characters is also not beneficial. Try different combinations using grep and see if you find anything interesting.

  • $strings kali.pdf | grep -i date
  • $strings kali.pdf | grep /
  • $strings kali.pdf | grep ‘\\’

Another useful command is big and little endian strings (l for little, b for big) . Try these out and see what you find.

  • $strings -n 8 -e l kali.pdf | grep /
  • $strings -n 8 -e b kali.pdf | grep ‘\\’

The grep / command finds a number of sites throughout the document. Save these to a file. You can then sort the .txt file and remove the duplicates, saving to another file for easy access and visibility.

  • $strings -n 8 -e l kali.pdf | grep / > kali.txt
  • $sort -u kali.txt > kalisorted.txt

Additional strings commands to try

  • $strings kali.pdf | more for reading one terminal screen at a time (press ‘q‘ to quit)
  • $strings -n 8 -e s kali.pdf | grep / for encoding 7 bit byte (default for strings)
  • $strings -n 8 -e S kali.pdf | grep / for encoding 8 bit byte
  • $strings -f kali.pdf includes the file name before each new line
  • $strings -o kali.pdf includes the default octet before each new line
  • $strings -t d kali.pdf displays the decimal octet before each new line (may also use ‘o‘ for octal and ‘h‘ for hex)

Read the man page for additional commands and details.

  • $man strings

 

On Your Own

Try using a personal document with the ExifTool and strings tools. Financial or legal documents are some of the best to test with as you can extract Personally Identifiable Information (PII) from them. PII includes information such as social security numbers, bank account numbers, addresses, and telephone numbers.

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on Pinterest (Opens in new window)

Recent Posts

  • Social Engineering Tactics – Guest Blog Post on AlienVault
  • Exploit – WordPress Backdoor: Theme Pages
  • Penetration Testing Techniques
  • Exploit – Remote Access using Intel AMT BIOS Extension
  • Cracking Encrypted ZIP – fcrackzip

Archives

  • May 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
pentaROOT Information Security, LLC © Copyright 2022
 

Loading Comments...