6

here is the skinny (scroll down to see the problem): I am doing Huffman Encoding to compress a file using PHP (for a project). I have made the map, and made everything into a string like so:

00101010001100001110011101001101111011111011

Now, I need to convert that into an actual binary string, in its current state, it is only a string of 1s and 0s.

Here is the problem:

The string of 1s and 0s is 17,747,595 characters long, and it is really slowing down at around 550,000

This is the code I have:

<?php

$i=0
$len = strlen($binaryString);

while ($i < $len){
    $section = substr($binaryString,$i,$i+8);
    $out .= chr(bindec($section));
    $i=$i+8;
}

?>

How can I make this efficient enough to run the 17 million character string?

Thanks very much for any support!

Addo Solutions
  • 1,619
  • 3
  • 20
  • 37
  • 1
    Did you take a look @ http://stackoverflow.com/questions/6382738/convert-string-to-binary-then-back-again-using-php – MagePal Extensions Nov 06 '12 at 19:52
  • 1
    Yes, base_convert won't accept it because it is too long :P – Addo Solutions Nov 06 '12 at 19:55
  • 1
    Don't write it to a variable in whole, but to some file cache after X bytes. That way, not the whole string is loaded on each iteration to append the next few bytes. – feeela Nov 06 '12 at 19:58
  • Yes, the original file being encoded is 4MB, and then broken out via Huffman to the 17m… I know there has to be an efficient way of doing this, I just dont know what it is lol. – Addo Solutions Nov 06 '12 at 19:59
  • @feeela, I actually am, I just didn't want to clutter the code with that :) It writes at $i % 2000 – Addo Solutions Nov 06 '12 at 20:00
  • can't you convert it to decimal or hex first then do what you need to and then convert it back to binary? Also what are you trying to do with the binary string? – Zappa Nov 06 '12 at 20:02
  • Normally PHP should run the garbage collector to unset not used data. I would try to use a `unset( $out );` before the end of the loop-block and see if that matters. Or use some `fopen` function to read in X bytes of the input, perform your actions and write to another file. There should be only X bytes of memory used on each iteration. – feeela Nov 06 '12 at 20:03
  • 1
    why didnt you try to make bit stream instead of bit string? i mean just use 8 bit of the bye from the begining. it s because of locality of reference. – morteza kavakebi Nov 06 '12 at 20:06

1 Answers1

5

You don't need to loop you can use gmp with pack

$file = "binary.txt";
$string = file_get_contents($file);
$start = microtime(true);

// Convert the string
$string = simpleConvert($string);
//echo $string ;

var_dump(number_format(filesize($file),2),microtime(true)- $start);

function simpleConvert($string) {
    return pack('H*',gmp_strval(gmp_init($string, 2), 16));
}

Output

string '25,648,639.00' (length=13) <---- Length Grater than 17,747,595
float 1.0633520126343  <---------------- Total Conversion Time 

Links

Note Solution requires GMP Functions

Baba
  • 94,024
  • 28
  • 166
  • 217
  • Wow! I like that approach, but I seem to be getting a "Segmentation fault" upon initializing GMP in gmp_init($string, 2); any ideas what that is about? (Yes, I have GMP installed :) – Addo Solutions Nov 06 '12 at 23:24
  • What version of PHP & GMP ? – Baba Nov 06 '12 at 23:25
  • Ahh… I was running it on PHP/5.2.10 GD/4.1.4, but I moved it to my server with PHP/5.4.7 GMP/4.3.2 and it works like a charm :) Well done! Thanks @Baba – Addo Solutions Nov 07 '12 at 00:02