4

I'm trying to extract packed hexadecimal numbers from a string. My application is communicating with a server which sends a string with a header followed by 2 byte packed hexadecimal numbers. There are thousands of numbers in this string.

What I want to do is extract each 2 byte compressed number, and convert that into a number I can use to perform calculations on.

Example: string = "info:\x00\x00\x11\x11\x22\x22" will produce three numbers 0x0000 (decimal 0), 0x1111 (decimal 4369), 0x2222 (decimal 8738)

I have a working solution (see below,) but it functions too slowly when I try to process the several thousand numbers that the server sends over. Please provide some recommendations to speed up my approach.

//Works but is too slow!
//$string has the data from the server
$arrayIndex = 0;
for($index = [start of data]; $index < strlen($string); $index+=2){
    $value = getNum($string, $index, $index+1);
    $array[$arrayIndex++] = $value;
}
function getNum($string, $start, $end){
    //get the substring we're interested in transforming
    $builder = substr($string, $start, $end-$start+1);  

    //convert into hex string
    $array = unpack("H*data", $builder);
    $answer = $array["data"];

    //return the value as a number
    return hexdec($answer);
}

I've also been attempting to extract the numbers in a single unpack command, but that is not working (I'm having some trouble understanding the format string to use)

//Not working alternate method
//discard the header (in this case 18 bytes) and put the rest of the
//number values I'm interested in into an array
$unpacked = unpack("c18char/H2*data", $value);
for($i = 0; $i < $size; $i+=1){
    $data = $unpacked["data".$i];
    $array[$i] = $data;
}
Mike Mackintosh
  • 13,917
  • 6
  • 60
  • 87
Gregory Peck
  • 636
  • 7
  • 22

3 Answers3

2
$array = array();
$len = strlen($string);
for($index = [start of data];          $index < $len;               $index+=2){
    $d = unpack("H*data", substr($string, $index, 2));
    $array[] = hexdec($d["data"]);
}

The only significant things I did was to cache the value of strlen and reduce function calls.

you could also try this

foreach (str_split(substr($string, [start of data]), 2) as $chunk) {
    $d = unpack("H*data", $chunk);
    $array[] = hexdec($d["data"]);
}
goat
  • 31,486
  • 7
  • 73
  • 96
  • I timed your suggestsions: My original code ran in 0.099 sec, your first example ran in 0.066 seconds, and the second example ran in 0.070 seconds. So overall about a 33% improvement! Thanks. – Gregory Peck Jun 29 '12 at 18:49
  • I have the numbers in the previous comment reversed, example 1 took 0.070 seconds, and example 2 took 0.066 seconds. – Gregory Peck Jun 29 '12 at 19:17
1

One thing I can suggest is passing string containing thousands of hexadecimal number via reference, rather then value. If there is let's say 3k numbers, string is long 12k characters, with multiple of 3k function calls results in ~36M (if one byte used per char, ~72M if utf8) un-neccessary allocated memory on stack:

$arrayIndex = 0;
for($index = [start of data]; $index < strlen($string); $index+=2){
    $value = getNum($string, $index, $index+1);
    $array[$arrayIndex++] = $value;
}
 //pass by reference rather than value
function getNum(&$string, $start, $end){
    //get the substring we're interested in transforming
    //$builder = substr($string, $start, $end-$start+1);  
    //not sure if substr takes reference or value, so implementing this way, just in case it's by value
      $builder = $string[$start] . $string[$start + 1] ;
    //convert into hex string
    $array = unpack("H*data", $builder);
    $answer = $array["data"];

    //return the value as a number
    return hexdec($answer);
}

Not sure how much this speeds up (memory allocation for sure), but definitely worth a shot.

toske
  • 1,744
  • 13
  • 24
  • 2
    php uses copy on write, so passing the string by value doesn't actually copy the whole thing. – goat Jun 29 '12 at 18:34
  • Rand some tests using your suggestions, here is what I got: My code ran in 0.099 sec, &$string w/ substr ran in 0.280 sec, &$string w/ addtion (your suggestion) ran in 0.097 seconds. So this is slightly better but still isn't good enough. Thanks for the help though! – Gregory Peck Jun 29 '12 at 18:40
0

Why not trying something like:

$string = "info:\x00\x00\x11\x11\x22\x22";

$ret = array();
preg_match_all('#\\x(\d{2})#', $string, $items);
if(isset($items[1]) && count($items[1])>0)
{
     for($i=0;$i<count($items[1]);$i+=2)
     {
            if(isset($items[1][$i]) && isset($items[1][$i+1]))
            {
                    $ret[] = '0x' . $items[1][$i] . $items[1][$i+1];
                    unset($items[1][$i]);
                    unset($items[1][$i+1]);
            }
     }
}
Stephane
  • 4,978
  • 9
  • 51
  • 86
  • For whatever reason, I can't get this to output the data into the array correctly. Maybe because the data is packed hex (so \d won't match.) I've modified the code a little bit to be like this `$string = substr($string, [start position]); preg_match_all('#(.{2})#', $value, $items);` This runs, but the output isn't correct. With my modifications, this runs in 0.132 seconds, versus 0.099 for my original example. So its a little slower, but thanks for the help! – Gregory Peck Jun 29 '12 at 19:10