0

I'm writing a script to read a large file(>10 GB) and write the data from an array to the end of each line in that file. Here is my code

   my $count=0;
   while(my $lines = <$FILE>){
        seek $FILE, length($lines), 1;
        print $FILE "\t", $array[$count];
        $count++;
        }

But I think I'm wrong in finding the end-of-line using seek method. I couldn't get my head around this. Can anyone please see whats wrong in this code. before processing..

my 1st line
my 2nd line
my 3rd line

After processing....

my 1st line data1
my 2nd line data2
my 3rd line data3

data1,data2,data3 are in the @array.

Details on the code:

  • FILE is opened in +< mode (read/write)
  • FILE lines are tab delimited.
  • @array holds the data1,2...

Issues:

  • Moving the pointer to end of each line

Thanks,

Robin

Robin
  • 3
  • 1
  • Why not just create a new file with the data appended on each line? – Kenosis Dec 10 '13 at 01:00
  • Remember that the data you print isn't "inserted"; it simply overwrites whatever is already there. In a binary file with fixed-length records, that might be okay. In your case, it looks like you're overwriting current data (including newlines). It'd probably be easier to write out a second file. – rutter Dec 10 '13 at 01:00
  • Thanks for the suggestions _@Kenosis_ & _@rutter_. The problem here is that the file I'm writing will have millions of lines and 1000's of cols though i might start with the blank file. The file will have to be written (lines * cols) times for all data arrays. This will be a performance bottle neck. So I'm trying to do this other way around to optimize the performance. I'll try _@Borodin_ suggestions and will update the thread soon!! – Robin Dec 11 '13 at 00:36

1 Answers1

2

You can't do that. Seeking to a location in a file and then printing to it overwrites the data at that position.

I suggest you use Tie::File, which lets you access the contents of a file as an array, so appending to the end of a line of the file is done by simply adding a string to one of the elements of the array.

The code would look like this. Note that the line that creates @newdata is there just for testing. It creates an array that is the same length as the file, with lines like data1, data2 etc. as you have in your question.

You should test this on a smaller file initially as it will take a while to process the 15GB file, and it also overwrites it, so if you have any bugs you will destroy your data.

use strict;
use warnings;

use Tie::File;
use Fcntl 'O_RDWR';

tie my @file, 'Tie::File', 'myfile', mode => O_RDWR or die $!;

my @newdata = map sprintf('data%d', $_ + 1), 0 .. $#file;

my $i = 0;
for my $line (@file) {
  $line .= "\t" . $newdata[$i];
  ++$i;
}

untie @file;
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Thanks a lot _@Borodin_. It works!!. I just have a quick question here...When we tie a file to the array will the file data(lines) be stored in the memory?? – Robin Dec 11 '13 at 01:05