0

I'm creating a PHP app that at some point will download an SFX archive from a website and needs to extract the data from it.

Since I am running this on a Linux box, I need to chop off the SFX executable portion of the file and save the compressed file on the filesystem, which I will then run a program to unzip/extract. (SFX archives are basically an EXE file with the compressed archive tacked on after it. I have tried this manually with a hex editor and whatnot and it works just fine.)

The file type of the compressed archive within the SFX archive will always be the same, and I know what the magic number is for that file type.

What I need to do then in PHP is, after downloading the file (let's assume a simple file_get_contents() using a URL parameter) and it is sitting in memory, I need to extract the data from the contents starting at the magic number of the compressed archive.

I was thinking I could maybe do some sort of regex method, however, I need to process this as binary information (the magic number will need to be expressed as hex) and not character data. The magic number itself contains hex values that are non-printing/do not show up as any readable character.

hakre
  • 193,403
  • 52
  • 435
  • 836
jzimmerman2011
  • 1,806
  • 2
  • 24
  • 37
  • I assume you have much data appended. You probably want to make use of [`stream_copy_to_stream`](http://php.net/stream_copy_to_stream) instead of loading all into the memory. Also please note that strings in PHP *are* binary: http://php.net/string – hakre Nov 03 '12 at 14:12
  • `stream_copy_to_stream` could be helpful, however, in my case, the whole file is always going to be just a few MB (and it isn't running on a production server, so it doesn't really need to be super-efficient). Good suggestion though. – jzimmerman2011 Nov 05 '12 at 21:59

2 Answers2

2

Regexes are binary-safe. However you might be better off with strpos.

$magicpos = strpos($downloaded_data,"\x1a\x09\x01");

That assumes the magic number is 0x1A 0x09 0x01 - you can replace it with whatever the number actually is. Then:

$archive = substr($downloaded_data,$magicpos);

This will get the archive data from the magic number (included) onwards.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • Thanks for this! I ended up using this over a `preg_match()` as I was able to get it all onto one line (I put the `$magicpos` code right into the `substr` function, as I had no reason to use that value again). – jzimmerman2011 Nov 02 '12 at 18:20
  • @jzimmerman2011: Can you add your code somehwere to have some value with the question? E.g. the magic number you use is not part of the question for example, but you have tagged it SFX and so on. – hakre Nov 03 '12 at 14:14
1

You can preg_match binary with the \xXX syntax:

preg_match('/\x00/', chr(0))
deceze
  • 510,633
  • 85
  • 743
  • 889
  • I didn't know you could use binary stuff in regular expressions; always thought it was for strictly character-based data and not binary data (but it makes sense once I really thought about it). I ended up not using this solution, however, I don't doubt that it is one way of doing this. Thanks! – jzimmerman2011 Nov 02 '12 at 18:21