General Approach to Document Analysis
- Inspect document manually for anomalies
- Locate embedded code (Shellcode, macros, javascript and so on)
- Extract the suspicious code or objects
- Deobfuscate the payload if required
- If required emulate, disassemble or debug the extracted payload
- Reverse engineer the malware
File Format
Binary Microsoft office files (.doc
, .xls
) are in the OLE2 format.
OOXML Office files (.docx
, xlsm
) are compressed .zip
archives.
- VBA Macros are stored in an OLE2 binary file within the archive
- Excel allows XLM macros without the OLE2 binary file
- RTF documents cannot contain macros, but can contain embedded files and objects
Useful Commands
Command | Description |
---|---|
zipdump.py file.pptx | Examine contents of OOXML file file.pptx. |
zipdump.py file.pptx -s 3 -d | Extract file with index 3 from file.pptx to STDOUT. |
olevba file.xlsm | Locate and extract macros from file.xlsm. |
oledump.py file.xls -i | List all OLE2 streams present in file.xls. |
oledump.py file.xls -s 3 -v | Extract VBA source code from stream 3 in file.xls. |
xmldump.py pretty | Format XML file supplied via STDIN for easier analysis. |
oledump.py file.xls -p plugin_http_heuristics | Find obfuscated URLs in file.xls macros. |
olevba file.doc | Extract VBA macros in clear text with deobfuscation and analysis |
oletime file.doc | Extract file revision history |
oldid file.doc1 | High-level IOC extraction, good first place to look. |
vmonkey file.doc | Emulate the execution of macros in file.doc to analyze them. |
evilclippy -uu file.ppt | Remove the password prompt from macros in file.ppt.msoffcrypto-tool |
infile.docm outfile.docm -p Decrypt outfile.docm | using specified password to create outfile.docm. |
pcodedmp file.doc | Disassemble VBA-stomped p-code macro from file.doc. |
pcode2code file.doc | Decompile VBA-stomped p-code macro from file.doc. |
rtfobj.py file.rtf | Extract objects embedded into RTF file.rtf. |
rtfdump.py file.rtf | List groups and structure of RTF file file.rtf. |
rtfdump.py file.rtf -O | Examine objects in RTF file file.rtf. |
rtfdump.py file.rtf -s 5 -H -d | Extract hex contents from group in RTF file file.rtf. |
xlmdeobfuscator --file file.xlsm | Deobfuscate XLM (Excel 4) macros in file.xlsm. |
Useful Websites
Often, you can upload a malicious document to sites like virustotal.com
and they'll already have a large detailed report of it's decomposition.
IOC Keywords
/OpenAction
and /AA
specify the script or action to run automatically.
/JavaScript
, /JS
, /AcroForm
, and /XFA
can specify JavaScript to run.
/URI
accesses a URL, perhaps for phishing.
/SubmitForm
and /GoToR
can send data to URL.
/ObjStm
can hide objects inside an object stream.
/XObject
can embed an image for phishing.
Be mindful of obfuscation with hex codes, such as /JavaScript
vs. /J#61vaScript
. (See examples.)