0

I have two queries about the tie::file module

  1. I have used the tie::file module to do a search on a 55 MB file and set an memory of 20 MB in tie::file. When i tried to grep on the tied array for a search string it's taking a lot of time. Is there any work around for it?

  2. Can tie::file used for reading a binary file. The tied array is delimited by "\n". How do i use a tie::file to read an binary file? Could you pls paste me some sample code.

/home/a814899> perl -e 'print "x\n"x27 for 1..1024*1024;' >a

/home/a814899> echo "hello world" >> a

Using Unix grep

/home/a814899> time grep "hello " a
hello world

real    0m8.280s
user    0m8.129s
sys     0m0.139s

Using the regex

/home/a814899> (time perl -e 'while (<>) { if (/hello/) { print "hello world"} }' a)
hello world
real    0m51.316s
user    0m51.087s
sys     0m0.189s


Using Perl Grep

#!/usr/bin/perl
print "executing\n";
my $outputFileDir="/home/a814899";
my $sFileName="a";
open my $fh, "<", $outputFileDir . "/" . $sFileName or do {
       print "Could not open the file";
    };
print "success  in open" . "\n";
my @out=grep {/hello world/} <$fh> ;
print "@out" ;
close($fh)
Arav
  • 4,957
  • 23
  • 77
  • 123
  • Tie:File can be [notoriously slow on larger files](http://perlmonks.org/index.pl?node_id=1000412). Are there records of a specific length in your binary file? Perhaps it would be helpful to share what you're trying to do. – Kenosis Jan 18 '13 at 00:03
  • Why do you think you need Tie::File? – ikegami Jan 18 '13 at 01:07
  • You may want to read the [module documentation](https://metacpan.org/module/Tie::File) — it includes sections on what is considered a “record seperator” and on possible optimizations. Beyond that, `tie`ing files is often unneccessary. – amon Jan 18 '13 at 01:07

1 Answers1

2
  1. Yes.

    This is how you probably did it using Tie::File:

    $ (
        time perl -MTie::File -e'
           tie @a, "Tie::File", $ARGV[0];
           for (@a) { if (/y/) { } }
        ' a
    ) 2>&1 | grep real
    real    2m44.333s
    

    This is the "workaround":

    $ (
        time perl -e'
            while (<>) { if (/y/) { } }
        ' a
    ) 2>&1 | grep real
    real    0m0.644s
    

    The data file was created using

    $ perl -E'say "x"x54 for 1..1024*1024;' >a
    
  2. Tie::File doesn't read files; Tie::File provides a means of mapping lines of a file to array elements. Since "binary" files have no lines, accessing one using Tie::File wouldn't make any sense.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thanks a lot for the info. If i am going to search a huge file multiple times it's going to take much time. Thought Searching the array multipe times will be faster using the Tie::file module. In the Tie::File module there was a option Memory which i specified 20 MB. Was thinking File Module reads the text file of 20 MB , puts it in memory and link it to a array. Not sure why it's taking a lot of time. Also for reading binary file Was thinking whether Tie::File module will be able to put each byte in the large file into a array. Will use the way you said. – Arav Jan 20 '13 at 22:41
  • Was reading the manual is was saying for large files use Tie::File Module. In what case for large file I need to use the Tie::File Module. Grep is taking a lot of time for the tied array. Also thanks a lot for the command s you posted. was quite useful. – Arav Jan 20 '13 at 23:39
  • "Searching the array multipe times will be faster using the Tie::file module." Huh???? 2:44.0 * $many_times is not going to be faster than 0:00.6 * $many_times. – ikegami Jan 22 '13 at 00:49
  • "Was thinking File Module reads the text file of 20 MB" You said you had a 55MB file. – ikegami Jan 22 '13 at 00:50
  • "Was reading the manual is was saying for large files use Tie::File Module" Wow, you find out that Tie::File doesn't work for you, but you want to use it anyway because some manual told you to? Wow! I thought you said you didn't want to take three minutes to do 1 second worth of work, but suit yourself. – ikegami Jan 22 '13 at 00:51
  • "Was thinking whether Tie::File module will be able to put each byte in the large file into a array." The overhead would be freakishly insane. – ikegami Jan 22 '13 at 00:57
  • Thanks a lot for the info. I am not going to use Tie::File Module. Was trying to know in what cases i need to use it. I had the edited the question and pasted the with the various ways i tried. Created the 54MB Sample file a with new line separated. When i tried to use the unix grep it was faster. But when i used the perl grep it came with out of memory error. Not sure why is it so Tried the way you told me it was taking more time than unix grep. Does it means using backtick and using unix grep is the fastest inside a perl program? – Arav Jan 22 '13 at 06:24
  • I will use backtip and unix grep. Thanks. But not sure why perl grep is failing with out of memory error? – Arav Jan 22 '13 at 07:12
  • It's not. Loading the entire file into an array is. Or potentially dumping that array on the stack. – ikegami Jan 22 '13 at 07:14
  • You seem to have very little memory available to your process. Perhaps you are doing this on a machine where you shouldn't? Or perhaps your are imposing a limit that's too low? – ikegami Jan 22 '13 at 07:17