General Approach to Document Analysis

  1. Inspect document manually for anomalies
  2. Locate embedded code (Shellcode, macros, javascript and so on)
  3. Extract the suspicious code or objects
  4. Deobfuscate the payload if required
  5. If required emulate, disassemble or debug the extracted payload
  6. Reverse engineer the malware

File Format

Binary Microsoft office files (.doc, .xls) are in the OLE2 format.

OOXML Office files (.docx, xlsm) are compressed .zip archives.
- VBA Macros are stored in an OLE2 binary file within the archive
- Excel allows XLM macros without the OLE2 binary file
- RTF documents cannot contain macros, but can contain embedded files and objects

Useful Commands

Command Description
zipdump.py file.pptx Examine contents of OOXML file file.pptx.
zipdump.py file.pptx -s 3 -d Extract file with index 3 from file.pptx to STDOUT.
olevba file.xlsm Locate and extract macros from file.xlsm.
oledump.py file.xls -i List all OLE2 streams present in file.xls.
oledump.py file.xls -s 3 -v Extract VBA source code from stream 3 in file.xls.
xmldump.py pretty Format XML file supplied via STDIN for easier analysis.
oledump.py file.xls -p plugin_http_heuristics Find obfuscated URLs in file.xls macros.
olevba file.doc Extract VBA macros in clear text with deobfuscation and analysis
oletime file.doc Extract file revision history
oldid file.doc1 High-level IOC extraction, good first place to look.
vmonkey file.doc Emulate the execution of macros in file.doc to analyze them.
evilclippy -uu file.ppt Remove the password prompt from macros in file.ppt.msoffcrypto-tool
infile.docm outfile.docm -p Decrypt outfile.docm using specified password to create outfile.docm.
pcodedmp file.doc Disassemble VBA-stomped p-code macro from file.doc.
pcode2code file.doc Decompile VBA-stomped p-code macro from file.doc.
rtfobj.py file.rtf Extract objects embedded into RTF file.rtf.
rtfdump.py file.rtf List groups and structure of RTF file file.rtf.
rtfdump.py file.rtf -O Examine objects in RTF file file.rtf.
rtfdump.py file.rtf -s 5 -H -d Extract hex contents from group in RTF file file.rtf.
xlmdeobfuscator --file file.xlsm Deobfuscate XLM (Excel 4) macros in file.xlsm.

Useful Websites

Often, you can upload a malicious document to sites like virustotal.com and they’ll already have a large detailed report of it’s decomposition.

IOC Keywords

/OpenAction and /AA specify the script or action to run automatically.
/JavaScript, /JS, /AcroForm, and /XFA can specify JavaScript to run.
/URI accesses a URL, perhaps for phishing.
/SubmitForm and /GoToR can send data to URL.
/ObjStm can hide objects inside an object stream.
/XObject can embed an image for phishing.

Be mindful of obfuscation with hex codes, such as /JavaScript vs. /J#61vaScript. (See examples.)

Sources

Examples