0

I m currently studying the PE structure. I am trying to parse this using perl instead of C.

This is not an important thing, but if you read a binary file, you have to jump to a certain section. (For example, to read e_lfanew)

I want to read the data at the point 0x78, which is read by $buf with 0x200 data.

Here are two ways I thought to extract the data at 0x78.

my ($ dummy, $ data) = unpack ("A0x78 A*", $buf);
or
seek (F, 0x78,0); read F, $buf, 0x200; print ~

I want to know which of the two methods is more effective than subtracting dummy data and reading data and reading new data through seek.

  • 2
    (1) [Benchmark](https://perldoc.perl.org/Benchmark.html) and [Time::HiRes](https://perldoc.perl.org/Time/HiRes.html) (2) What you show is incomparable: in one case you use `unpack`, in the other you `read` data; (regardless of buffering) you are comparing in-memory work with disk read? The last sentence is unclear. – zdim Feb 16 '19 at 00:12
  • 1
    Tip: Use `x` instead of `A` to skip over bytes without returning them – ikegami Feb 16 '19 at 01:29
  • 1
    Tip: Use `a` instead of `A` for binary data. `A` will remove trailing `0x20` bytes – ikegami Feb 16 '19 at 01:30
  • 1
    This leaves you with `my $data = unpack("x0x78 a*", $buf);` – ikegami Feb 16 '19 at 02:44
  • wow i first noticed the unpack feature called 'x'. Thanks for letting me know – SAnji Holic Feb 16 '19 at 06:46

1 Answers1

5

The minimum storage unit of a drive is called sector. For hard drives, these are usually 512 bytes in size (though you can also find drives with 4096 byte sectors).

Your file spans two sectors.

000  078       200   278      400
+--------------+--------------+---...
|    ****************
+--------------+--------------+---...

And since the block of interest is partly in the first sector, the same amount of sectors will need to be read by both of the approaches you described.

Since actually reading the data from disk is the slow part, there's no real difference between the two approaches.


Oh, but you're buffered IO instead of using sysread. When using buffered IO (e.g. read), Perl reads from the OS in 4 KiB or 8 KiB chunks (depending on your version of Perl). So 8 or 16 sectors are loaded from the disk if you start reading at position 0, and 9 or 17 sectors are loaded from the disk if you seek first. So by trying to read less, you are actually reading more!

That said, the difference is small enough that the speed difference should be lost in the noise.

ikegami
  • 367,544
  • 15
  • 269
  • 518