-1

In Perl, I want to seek to the nth bit (not byte) of a file and then read the next m bits, returned as a list of 0s and 1s.

Is there any easy way to do this?

I realize I can write a subroutine wrapping regular seek and read, but was wondering if there's a easier solution.

  • 2
    seek() is byte based, so either CPAN or write your own function. – mpapec Aug 26 '14 at 16:01
  • 3
    There is no simpler method other than dividing the bit-offset by 8 and seeking to that position. This is a few lines of code, which you should try writing yourself. If you have problems, then post a specific question. – Jim Garrison Aug 26 '14 at 16:02
  • @JimGarrison Well, you also have to do bit shifting to discard unwanted bits. I didn't say it was super-difficult, I just wondered if someone had already solved this problem. Tried CPAN first, couldn't find anything. –  Aug 26 '14 at 16:42
  • Just out of curiosity, what is this for? – ThisSuitIsBlackNot Aug 26 '14 at 17:38
  • Files and memory are byte-addressable, not bit-addressable. – ikegami Aug 26 '14 at 18:05
  • @ThisSuitIsBlackNot I'm trying to store a structured array of integers very compactly by using a variable number of bits per integer (the structure lets me do this unambiguously). Gorier details on request. –  Aug 26 '14 at 18:07
  • @ikegami Well, I know, but surely there are tools that let me pretend that files/memory are bit-addressable? That's what I'm asking. I guess I'm looking for an "abstraction layer". –  Aug 26 '14 at 18:07
  • No, that would be very inefficient to fetch a bit at a time. I don't think I've even read a file a byte at a time before. Besides, you said yourself you don't want a single bits. So the existing functions that read multiple bytes are perfectly adequate. – ikegami Aug 26 '14 at 18:10
  • Well, not fetch one bit a time. Just like seek gets a group of bytes at a time, bitseek would grab a group of bits at one time. I'm looking for bit addressing, not bit-by-bit reading. –  Aug 26 '14 at 18:12
  • `seek` doesn't get anything. `read` gets a number of bits/bytes at at time. `vec($str, $offset, 1)` allows accessing bits of a string. – ikegami Aug 26 '14 at 18:14
  • OK. I give up trying to explain why this question is interesting :) –  Aug 26 '14 at 18:16

3 Answers3

1

bitseek would grab a group of bits at one time.

seek($fh, int($bit_num/8), SEEK_SET);
my $offset = $bit_num % 8;
read($fh, my $buf, ceil(($offset+$num_bits)/8));

I'm looking for bit addressing, not bit-by-bit reading.

vec($bits, $offset+$bit_num, 1);
ikegami
  • 367,544
  • 15
  • 269
  • 518
0

If n is a multiple of m, and m is one of 1, 2, 4, 8, 16, 32, and on some platforms, 64, you can read the whole file into a string and use vec for this.

(Admittedly a fairly constraining case, but a common one.)

Barring that, you'll just have to do the math; in pseudo-code:

discard = n % 8;
startbyte = (n - discard) / 8
bits = m + discard
bytes = int( (bits + 7) / 8 )
seek to startbyte
read bytes into string
@list = split //, unpack "${bits}b", string
splice( @list, 0, $discard ) 
splice( @list, $m, @list )
ysth
  • 96,171
  • 6
  • 121
  • 214
  • Unfortunately, I need the completely general case. EG, seek to bit 6930084 and read 1173 bits. –  Aug 26 '14 at 17:12
0

I ended up writing something like what @ikegami and @ysth suggested. For reference:

=item seek_bits($fh, $start, $num) 

Seek to bit (not byte) $start in filehandle $fh, and return the next 
$num bits (as a list). 

=cut 

sub seek_bits { 
  my($fh, $start, $num) = @_; 
  # the byte where this bit starts and the offset 
  my($fbyte, $offset) = (floor($start/8), $start%8); 
  # the number of bytes to read ($offset does affect this) 
  my($nbytes) = ceil($num+$offset)/8; 

  seek($fh, $fbyte, SEEK_SET);  
  read($fh, my($data), $nbytes); 
  my(@ret) = split(//, unpack("B*", $data)); 
  # return requested bits 
  return @ret[$offset-1..$offset+$num]; 
} 
  • ah, unpack B; I was wondering if you wanted that or unpack b. – ysth Aug 26 '14 at 22:42
  • 75KB (not counting memory allocation overhead) to store 1173 bits ...not what most people want when they way they want to work with bits. That's why our answers differ. (`perl -MDevel::Size=total_size -E'say total_size([split //, unpack "B*", "0"x(1176/8)])'`) – ikegami Aug 27 '14 at 00:08