I'm just in the process of writing a PE file parser and I've reached the point where I'd like to parse and interpret the actual code within PE files, which I'm assuming are stored as x86 opcodes.
As an example, each of the exports within a DLL point to RVAs (Relative Virtual Offsets) of where the function will be stored within memory, and I've written a function to convert these RVAs to physical file offsets.
The question is, are these really opcodes, or are they something else?
Does it depend on the compiler/linker as to how the functions are stored within the file, or are they one or two byte X86 opcodes.
As an example, the Windows 7 DLL 'BWContextHandler.dll' contains four functions that are loaded into memory, making them available within the system. The first exported function is 'DllCanUnloadNow', and it is located at offset 0x245D within the file. The first four bytes of this data are: 0xA1 0x5C 0xF1 0xF2
So are these one or two byte opcodes, or are they something else entirely?
If anyone can provide any information on how to examine these, it would be appreciated.
Thanks!
After a bit of further reading, and running the file through the demo version of IDA, I think I'm correct in saying that the first byte 0xA1, is a one byte opcode, meaning mov eax. I got that from here: http://ref.x86asm.net/geek32.html#xA1 and I'm assuming it is correct for the time being.
However, I'm a bit confused as to how the bytes following comprise the rest of the instruction. From the x86 assembler that I know, a move instruction requires two parameters, the destination and the source, so the instruction is to move (something) into the eax register, and I'm assuming that the something comes in the following bytes. However I don't know how to read that information yet :)