The problem I have is, that when I read an embedded (ole) (.docx) object from "Access" database .docx. I look for the " 50 4B 03 04 14 00 06 00 " hex header using c#.net. I extracted PDF, DOC, PNG, TIFF without any kind of problem.
Asked
Active
Viewed 108 times
1 Answers
0
DOCX files are an aggregate (Open Packaging Conventions, OPC) of XML (WordprocessingML) and other formats zipped together. (The overall DOCX / OOXML standard is described here.) Since zipping compresses files, it changes the binary content. Try repeating your method on whichever OPC part you want after unzipping – the key WordprocessingML would be a good place to start: word/document.xml

kjhughes
- 106,133
- 27
- 181
- 240