Metadata is data that describes data. Document metadata is hidden to the standard user and includes details such as usernames, file system paths, email addresses, and many other useful bits of information. Many document types contain some amount of metadata, but some document types contain more than others. File types to look out for include
- doc, dot, docx
- html, htm
- jpg, jpeg
- ppt, pot, pptx
- xls, xlt, xlsx
- Full list under ‘Supported File Types’
This tutorial demonstrates how to analyze and extract document metadata using Exiftool and the strings command.
Software
- ExifTool
- strings command
Install ExifTool
- Linux
- $sudo apt install libimage-exiftool-perl -y
- Windows and Mac
- Most recent version found here
Install strings
No need! The Kali image previously installed comes with strings included.
Using ExifTool
For this example, save this image of Bart Simpson. Move the image from your Downloads folder to Home and rename it Bart.jpg. Run exiftool on the image.
- $mv Downloads/Bart_Simpson_Season_25_Official.jpg bart.jpg
- $exiftool bart.jpg
To request the property names to use with ExifTool commands, use the -s option. Notice the difference previously with the first line showing ‘ExifTool Version Number’ and the command below showing the first line as ‘ExifToolVersion’.
- exiftool -s bart.jpg
With the appropriate property names, you can pull specific information about the file.
- $exiftool -ImageWidth -ImageHeight bart.jpg
ExifTool can also add custom properties to a file.
- $exiftool -Owner=pentaroot bart.jpg
It can also remove certain properties from a file.
- $exiftool -Owner= bart.jpg
Combine ExifTool with grep to find path names.
- $exiftool bart.jpg | grep /
- $exiftool bart.jpg | grep ‘\\’
- This looks for any ONE backslash. bart.jpg does not include a path of this type.
Additional ExifTool commands to try
- $exiftool -list bart.jpg and $exiftool -listw bart.jpg for available and writable tag names
- $exiftool -Owner=pentaroot bart.jpg marge.jpg homer.jpg to write to multiple files
- $exiftool -Owner=pentaroot c:/images to write to all files in a directory
- $exiftool -Owner=pentaroot -copyright=”2017 pentaROOT” bart.jpg to write multiple tags
- $exiftool -a -u -g1 bart.jpg prints all metadata grouped by family(1)
- $exiftool -common /home/pentaroot/ prints all common metadata in a directory
- $exiftool bart.jpg > bart.txt prints all metadata to a text file
Read the man page for additional commands and details.
- $man exiftool
Using the strings command
For this example, download the official Introduction to Kali Linux from docs.kali.org. Move the pdf from your Downloads folder to Home and rename it kali.pdf. Run the strings command on kali.pdf.
- $mv Downloads/kali-book-en.pdf kali.pdf
- $strings kali.pdf
While the output from running a basics strings command is usable, it is not in a convenient format. Try running strings to only find strings with eight or more characters.
- $strings -n 8 kali.pdf
For this document, look for strings of only eight or more characters is also not beneficial. Try different combinations using grep and see if you find anything interesting.
- $strings kali.pdf | grep -i date
- $strings kali.pdf | grep /
- $strings kali.pdf | grep ‘\\’
Another useful command is big and little endian strings (l for little, b for big) . Try these out and see what you find.
- $strings -n 8 -e l kali.pdf | grep /
- $strings -n 8 -e b kali.pdf | grep ‘\\’
The grep / command finds a number of sites throughout the document. Save these to a file. You can then sort the .txt file and remove the duplicates, saving to another file for easy access and visibility.
- $strings -n 8 -e l kali.pdf | grep / > kali.txt
- $sort -u kali.txt > kalisorted.txt
Additional strings commands to try
- $strings kali.pdf | more for reading one terminal screen at a time (press ‘q‘ to quit)
- $strings -n 8 -e s kali.pdf | grep / for encoding 7 bit byte (default for strings)
- $strings -n 8 -e S kali.pdf | grep / for encoding 8 bit byte
- $strings -f kali.pdf includes the file name before each new line
- $strings -o kali.pdf includes the default octet before each new line
- $strings -t d kali.pdf displays the decimal octet before each new line (may also use ‘o‘ for octal and ‘h‘ for hex)
Read the man page for additional commands and details.
- $man strings
On Your Own
Try using a personal document with the ExifTool and strings tools. Financial or legal documents are some of the best to test with as you can extract Personally Identifiable Information (PII) from them. PII includes information such as social security numbers, bank account numbers, addresses, and telephone numbers.