General Approach to Document Analysis

  1. Inspect document manually for anomalies
  2. Locate embedded code (Shellcode, macros, javascript and so on)
  3. Extract the suspicious code or objects
  4. Deobfuscate the payload if required
  5. If required emulate, disassemble or debug the extracted payload
  6. Reverse engineer the malware

File Format

Binary Microsoft office files (.doc, .xls) are in the OLE2 format.

OOXML Office files (.docx, xlsm) are compressed .zip archives.

  • VBA Macros are stored in an OLE2 binary file within the archive
  • Excel allows XLM macros without the OLE2 binary file
  • RTF documents cannot contain macros, but can contain embedded files and objects

Useful Commands

CommandDescription
zipdump.py file.pptxExamine contents of OOXML file file.pptx.
zipdump.py file.pptx -s 3 -dExtract file with index 3 from file.pptx to STDOUT.
olevba file.xlsmLocate and extract macros from file.xlsm.
oledump.py file.xls -iList all OLE2 streams present in file.xls.
oledump.py file.xls -s 3 -vExtract VBA source code from stream 3 in file.xls.
xmldump.py prettyFormat XML file supplied via STDIN for easier analysis.
oledump.py file.xls -p plugin_http_heuristicsFind obfuscated URLs in file.xls macros.
olevba file.docExtract VBA macros in clear text with deobfuscation and analysis
oletime file.docExtract file revision history
oldid file.doc1High-level IOC extraction, good first place to look.
vmonkey file.docEmulate the execution of macros in file.doc to analyze them.
evilclippy -uu file.pptRemove the password prompt from macros in file.ppt.msoffcrypto-tool
infile.docm outfile.docm -p Decrypt outfile.docmusing specified password to create outfile.docm.
pcodedmp file.docDisassemble VBA-stomped p-code macro from file.doc.
pcode2code file.docDecompile VBA-stomped p-code macro from file.doc.
rtfobj.py file.rtfExtract objects embedded into RTF file.rtf.
rtfdump.py file.rtfList groups and structure of RTF file file.rtf.
rtfdump.py file.rtf -OExamine objects in RTF file file.rtf.
rtfdump.py file.rtf -s 5 -H -dExtract hex contents from group in RTF file file.rtf.
xlmdeobfuscator --file file.xlsmDeobfuscate XLM (Excel 4) macros in file.xlsm.

Useful Websites

Often, you can upload a malicious document to sites like virustotal.com and they'll already have a large detailed report of it's decomposition.

IOC Keywords

/OpenAction and /AA specify the script or action to run automatically. /JavaScript, /JS, /AcroForm, and /XFA can specify JavaScript to run. /URI accesses a URL, perhaps for phishing. /SubmitForm and /GoToR can send data to URL. /ObjStm can hide objects inside an object stream. /XObject can embed an image for phishing.

Be mindful of obfuscation with hex codes, such as /JavaScript vs. /J#61vaScript. (See examples.)

Sources

Examples