1

I want to extract only the junk data from the free space of a raw partition image (EXT4). So I got this idea, to zero out the free space and then to use the result as a mask.

I have raw partition image (14GB) containing data and free space and the same raw partition image, with free space zeroed.

I want to do the following operation between these two files in Perl, for each byte of them in order to obtain the raw partition image processed, will contain only junk data from free space.

RPM  - raw partition image
RPMz - raw partition image with free space zeroed
RPMp - raw partition image processed, will contain only junk data from free space

for each byte: RPM & !RPMz => RPMp

Can someone help me out with a Perl script or a starting point for this?

Borodin
  • 126,100
  • 9
  • 70
  • 144
Nick
  • 331
  • 3
  • 14
  • a bit. I have in mind the chunk reading, but in this moment I don't know how to extract each byte from the chunk, for example. Actually this would extract it, but it's slow: unpack("x[$pos]H2", $buffer); I've used it in a different scenario. – Nick Nov 09 '14 at 21:24
  • So if I told you that the logical operators `&`, `|`, `~` and `^` [work bitwise on entire strings](http://perldoc.perl.org/perlop.html#Bitwise-String-Operators) that would get you on your way? – Borodin Nov 09 '14 at 21:28
  • Yes. I think I can do it myself. :) – Nick Nov 09 '14 at 21:30
  • To split a string into individual bytes use `unpack 'C*', $string`, but there doesn't seem to be a need to do that. – Borodin Nov 09 '14 at 21:30
  • Okay. Give it a try and come back if you get stuck and can show your code. – Borodin Nov 09 '14 at 21:31
  • Although, one more thing, they should work byte-wise. :) anything that is different from 0x00, should become 0x00 in !RPMz. – Nick Nov 09 '14 at 21:32
  • I would start by changing your *mask* file so that any non-zero byte becomes 0xFF. Indeed, you may want to create a positive mask from the negative one, changing zero to 0xFF and anything else to zero. – Borodin Nov 09 '14 at 21:33
  • @Borodin: If the mask is, indeed, the same as the input, except with some bytes zeroed out, then simply XORing the two strings will give you the data that was zeroed out. But yes, more generally, you have three possible cases to consider for each byte (or, perhaps more relevantly, for each disk block): not just a) input and mask are equal and b) mask is zero but input is not, but also c) mask is neither zero nor equal to the input. The XOR method will return (essentially) garbage in the last case. – Ilmari Karonen Nov 09 '14 at 22:13

1 Answers1

1

This is what I wrote for inverting the bytes, in order to obtain !RPMz. But it's slow, and with 100MB chunks I'm out of memory. I need some help.

use strict;
use warnings;
use bignum;

my $buffer = "";

my $path1="F:/data-lost-workspace/partition-for-zerofree/mmcblk0p12.raw";
my $path2="F:/data-lost-workspace/partition-for-zerofree/mmcblk0p12_invert.raw";

open(FILE_IN, "<$path1");
binmode(FILE_IN); 

my $offset=0;
my $remaining = -s $path1;
my $length=1024*1024*100;
my $index=1;

unlink $path2;

while($remaining>0)
{
my $line=read(FILE_IN, $buffer, $length);  
print $index." ".$line."\r\n";
$index++;
$remaining=$remaining-$length;

my $buffer_invert=();

my @c = split('', $buffer);

for(my $i=0;$i<$length;$i++)
{
    if(ord($c[$i])==0x0)
    {
        $c[$i]=chr(0xFF);
    }
    else
    {
        $c[$i]=chr(0x00);
    }
}

$buffer_invert=join('', @c);

open(FILE_OUT, ">>$path2");
binmode(FILE_OUT); 
print FILE_OUT $buffer_invert;
close(FILE_OUT);
}
close(FILE_IN);
Nick
  • 331
  • 3
  • 14
  • Finally someone suggested this, which is quite fast: cat mmcblk0p12.raw | tr '\377' '\001' | tr '\000' '\377' | tr '\001-\376' '\000' > mmcblk0p12_invert.raw – Nick Nov 10 '14 at 08:01