102

On a Linux desktop (RHEL4) I want to extract a range of bytes (typically less than 1000) from within a large file (>1 Gig). I know the offset into the file and the size of the chunk.

I can write code to do this but is there a command line solution?

Ideally, something like:

magicprogram --offset 102567 --size 253 < input.binary > output.binary
DanM
  • 2,331
  • 2
  • 18
  • 14

6 Answers6

150

Try dd:

dd skip=102567 count=253 if=input.binary of=output.binary bs=1

The option bs=1 sets the block size, making dd read and write one byte at a time. The default block size is 512 bytes.

The value of bs also affects the behavior of skip and count since the numbers in skip and count are the numbers of blocks that dd will skip and read/write, respectively.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
Thomas Padron-McCarthy
  • 27,232
  • 8
  • 51
  • 75
  • 3
    Optionally add `status=none` to suppress outputting to stderr. – kenorb Oct 06 '15 at 10:05
  • 24
    Here is example using hex offsets: `dd if=in.bin bs=1 status=none skip=$((0x88)) count=$((0x80)) of=out.bin`. – kenorb Oct 06 '15 at 10:06
  • @kenorb: I believe the hex syntax is part of Bash, so it doesn't necessarily work with other shells. I myself use tcsh (don't hit me!) and your example doesn't work there. – Thomas Padron-McCarthy Oct 06 '15 at 11:20
  • 2
    Is there a specific reason why you use bs=1 and count=253 and not the other way round? Would the larger block size make the command more efficient? – rexford Jun 13 '17 at 08:14
  • 1
    @rexford: The skip number is also given in blocks, and is not a multiple of 253. And given that the OS does its own buffering when reading from a normal file on a file system, in this case efficiency will not be as bas as when reading from a device. – Thomas Padron-McCarthy Jun 13 '17 at 12:00
  • but I don't want to calc the `count`, I only known the start and end offset. – Jiang YD Mar 14 '19 at 04:07
  • Could you explain the parameters? – Lin Jian Jun 12 '20 at 19:42
  • 2
    bs=1 is loads slower than, say, bs=1000 though. I actually saw a factor of 500 in a short test. – Stefan Reich Aug 05 '20 at 20:25
84

This is an old question, but I'd like to add another version of the dd command that is better-suited for large chunks of bytes:

dd if=input.binary of=output.binary skip=$offset count=$bytes iflag=skip_bytes,count_bytes

where $offset and $bytes are numbers in byte units.

The difference with Thomas's accepted answer is that bs=1 does not appear here. bs=1 sets the input and output block size to 1 byte, which makes it terribly slow when the number of bytes to extract is large.

This means we leave the block size (bs) at its default of 512 bytes. Using iflag=skip_bytes,count_bytes, we tell dd to treat the values after skip and count as byte amount instead of block amount.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
ChronoTrigger
  • 8,459
  • 1
  • 36
  • 57
15

head -c + tail -c

Not sure how it compares to dd in efficiency, but it is fun:

printf "123456789" | tail -c+2 | head -c3

picks 3 bytes, starting at the 2nd one:

234

See also:

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • @elvis.dukaj yes, there should be no different. Just give it a try with `printf '\x01\x02' > f` and `hd`. – Ciro Santilli OurBigBook.com Jul 23 '19 at 10:40
  • 3
    Much faster than dd with bs=1, thank you! Please note that tail counts bytes from 1, not from 0. Also, tail exits with error code 1 when its output is closed prematurely by head. Make sure to ignore that error when using "set -e". – proski Aug 25 '19 at 19:55
2

Even faster

dd bs=<req len> count=1 skip=<req offset> if=input.binary of=output.binary 
  • 3
    The problem here is that `skip` is in units of `bs`. – Arkku Jul 18 '19 at 12:32
  • it is a detail for the executor, and still better than the above, true you'd need to re-calc like: `req_offset=$(bc <<< "$offset/$bs")` and make sure it turns out a round value. – Tchakabam May 10 '20 at 19:35
2

I have had the same problem, trying to cut parts of a RAW disk image. dd with bs=1 is unusable, therefore I have made a simple C program for the task.

// usage:
//  ./cutfile srcfile destfile offset length
//  ./cutfile my.image movie.avi 4524 20412452
// compile, presuming it is saved as cutfile.cc:
//  gcc cutfile.cc -o cutfile -std=c11 -pedantic -W -Wall -Werror 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char *argv[])
{
  if(argc != 5) {
      printf("error, need 4 arguments!\n");
      return 1;
  }


  const unsigned blocksize = 16*512;  // can adjust
  unsigned char buffer[blocksize];

  FILE *f = fopen(argv[1], "rb");
  FILE *fout = fopen(argv[2], "wb");
  long offset = atol(argv[3]);
  long length = atol(argv[4]);
  if(f==NULL || fout==NULL) {
      perror("cannot open file");
      return 1;
  }
  fseek(f, offset, SEEK_SET);

  while(length > blocksize) {
      fread(buffer, 1, blocksize, f);
      fwrite(buffer, 1, blocksize, fout);
      length -= blocksize;
  }
  if(length>0) { // copy rest
      fread(buffer, 1, length, f);
      fwrite(buffer, 1, length, fout);
  }    

  fclose(fout);
  fclose(f);
  return 0;
}
karna7
  • 176
  • 7
  • Note that in C++ you're kind of expected to use `std::ifstream` and `std::ofstream`... – Alexis Wilke Oct 17 '21 at 00:37
  • yes, but actually it's pure C if I change includes like cstdio to stdio.h . I don't know why I chose to start with C++ headers, maybe I have thought I will use more C++ stuff at first?! – karna7 Nov 29 '21 at 22:50
  • 1
    Yeah, you may want to edit your answer and make it C since the OP asked about C. – Alexis Wilke Nov 30 '21 at 00:18
1

The dd command can do all of this. Look at the seek and/or skip parameters as part of the call.

Joe
  • 41,484
  • 20
  • 104
  • 125
  • but dd can be very slow when you want blick missaligned access. and doing bs=1 is super slow – karna7 Apr 01 '22 at 22:18