3

I have a text file, I read first line from it to find out how many bytes does it take,

open($fh, "<:raw", $file);
my $len;
while (my $row = <$fh>) {
  $len = length $row;
  last;
};

Now I want to read that file from mid row +100 bytes, how do I do that?

Something like

read ($fh, 100, $len/2)

Cannot really figure out the proper syntax.

zdim
  • 64,580
  • 5
  • 52
  • 81
DisplayMyName
  • 329
  • 3
  • 15
  • Do you by "_from midrow_" mean from the middle of the first row? – zdim Jan 29 '18 at 23:45
  • 1
    Possible duplicate of [Perl seek function](https://stackoverflow.com/questions/16556332/perl-seek-function) – Ken Y-N Jan 29 '18 at 23:46
  • Yes, I want to half the first row in bytes – DisplayMyName Jan 29 '18 at 23:46
  • OK, so that's 100 bytes after the half of the first line ... then what? – zdim Jan 29 '18 at 23:57
  • Sorry for not making myself clear, i want to "read", that is get the string from byte x to byte y from the file. – DisplayMyName Jan 30 '18 at 00:42
  • Right, I got that, sorry if my query was unclear. I meant to ask what you want to do next, after you've read those 100 bytes past the half of the first line. (I assumed you want to keep reading from the file in some fashion.) Never mind, I posted what you asked (edited in the meanwhile). Clarify if needed. – zdim Jan 30 '18 at 06:29
  • Since you have already read the whole of the first line, why not just delete the first half of `$row` instead of re-reading the part you need. – Borodin Jan 30 '18 at 09:06

1 Answers1

6

After you get the length of the line

my $row_len = length <$fh>;  # with newline, or (read then) chomp first

position the handle where you need it using seek

use Fcntl qw(:seek);

seek $fh, $row_len/2, SEEK_SET;

where Fcntl provides constants SEEK_SET, SEEK_CUR, and SEEK_END so that the position in the second argument is taken from either the beginning, or the current position, or the end of file (when a negative position is normally used). Instead of these one can use 0, 1, 2.

Then read $bytes into $data using read

my $bytes = 100;
my $data;

my $rb = read $fh, $data, $bytes;

where $rb is how many bytes were actually read out of $bytes requested.


For some filehandles (sockets for one) read may not get as much as requested at once so you'd need to keep reading. For example, using OFFSET (see docs) at which to write to the string

use bytes qw();

my ($data, $requested, $total_read) = ('', 100, 0); 

while ($total_read < $requested) {
    my $bytes_data = bytes::length $data;
    $total_read += read $fh, $data, $requested - $bytes_data, $bytes_data;
}

where read now writes to $data at position $bytes_data. Without that offset each read overwrites $data, what can be appended to a string with all data (or otherwise accumulated).

While bytes::length is fine the bytes pragma is in general "strongly discouraged".


Thanks to ikegami for comments.

Note that read doesn't treat "newlines" in any special way and a read may well pick up from the next line(s) of the file, while those newline-bytes do count and thus affect your position in a file.

It is not specified what you want to do next but you can keep (repositioning and) reading.

See this post for a crystal clear explanation of moving in a file with seek and read.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • Also, keep in mind that for some handles (sockets, at least), `read` might not read as many bytes as requested. A loop is needed. – ikegami Jan 30 '18 at 01:07
  • Note that it's probably better to use named constants for the last (`WHENCE`) parameter of `seek`. So `use Fcntl ':seek'` and `seek $fh, $row_len/2, SEEK_SET` – Borodin Jan 30 '18 at 09:03
  • 1
    @Borodin Right ... I considered that and thought that it may spread the answer too wide. I will add a bit on it, thank you. – zdim Jan 30 '18 at 09:26
  • @zdim: I can usually remember what zero does, but I can't use 1 or 2 correctly without checking the documentation! – Borodin Jan 30 '18 at 09:27
  • `use Fcntl qw(:seek);` is not in the seek docs. It is not required and I don't accept that extra, un-rememberable lines of code and symbols are good, even if they are more orderly in your opinion. That's what your downvote is for. – felwithe Mar 30 '21 at 23:03
  • 1
    @felwithe Well, thanks for stating your reason for downvote, much appreciated. (1) It **is** in the docs -- did you look? -- right in the first paragraph of [seek](https://perldoc.perl.org/functions/seek), loud and clear, and with the link to the module where constants (and the `:seek` tag) are given (2) Module [Fcntl](https://perldoc.perl.org/Fcntl) loads C's `Fcntl.h` defines, of the `fcntl(2)` fame, a standard utility for working with file descriptors (3) This way `seek` can use named constants of the `fseek(3)` syscall (4) This is widely considered good practice – zdim May 04 '21 at 01:35
  • 1
    @felwithe For future reference: fcntl == file control. I suggest to read through `Fcntl` docs to appreciate just how useful the module is, and for `fcntl` in general just type it in google and read down the page. – zdim May 04 '21 at 01:36