1

I wanted to learn about what I was looking at when I open a DOCX file in a hexadecimal viewer.

For example:

enter image description here

Hexadecimal is base 16 on a 32bit (DWORD) file?. So I was assuming that starting from right to left you would do:

0*16^0 + 0*16^1 + 6*16^2 + 14*16^3 ...... all the way to the 504B.

But when I end up with this huge number! It means nothing to me.

So I really guess I don't understand what I'm looking at. Why are the hex characters on the right section 0->F displaying funny characters under each one - PK........!.ae

Any information would be so helpful. I started with doing bitmaps and now I thought I'd have a play with DOCX to see if I could write a search tool for the files, But if I don't understand this simple concept, I have no possibility of cracking it on its head.

Jimmyt1988
  • 20,466
  • 41
  • 133
  • 233

1 Answers1

4

.docx files are .ZIP files. Run unzip on the file for your first step.

Gereon
  • 17,258
  • 4
  • 42
  • 73
  • Beautiful! I'm gnna take a further look then probably end up marking this as the answer. – Jimmyt1988 Jan 07 '13 at 14:00
  • Awww, okay so it reads fine in text format from document.xml, there I was getting excited that I'd need to open in binary mode for file reading. Oh well. :) – Jimmyt1988 Jan 07 '13 at 14:07
  • sorry to disappoint :-) See [here](http://en.wikipedia.org/wiki/Office_Open_XML) for a few more details on what's inside these .ZIPs – Gereon Jan 07 '13 at 22:40