How do I extract a single chunk of bytes from within a file?

Question

On a Linux desktop (RHEL4) I want to extract a range of bytes (typically less than 1000) from within a large file (>1 Gig). I know the offset into the file and the size of the chunk.

I can write code to do this but is there a command line solution?

Ideally, something like:

magicprogram --offset 102567 --size 253 < input.binary > output.binary

score 150 · Accepted Answer · edited Oct 14 '21 at 11:02

150

Try dd:

dd skip=102567 count=253 if=input.binary of=output.binary bs=1

The option bs=1 sets the block size, making dd read and write one byte at a time. The default block size is 512 bytes.

The value of bs also affects the behavior of skip and count since the numbers in skip and count are the numbers of blocks that dd will skip and read/write, respectively.

edited Oct 14 '21 at 11:02

Matthias Braun

32,039
22
142
171

answered Sep 14 '09 at 19:12

Thomas Padron-McCarthy

27,232
8
51
75

3

Optionally add `status=none` to suppress outputting to stderr. – kenorb Oct 06 '15 at 10:05
24

Here is example using hex offsets: `dd if=in.bin bs=1 status=none skip=$((0x88)) count=$((0x80)) of=out.bin`. – kenorb Oct 06 '15 at 10:06
@kenorb: I believe the hex syntax is part of Bash, so it doesn't necessarily work with other shells. I myself use tcsh (don't hit me!) and your example doesn't work there. – Thomas Padron-McCarthy Oct 06 '15 at 11:20
2

Is there a specific reason why you use bs=1 and count=253 and not the other way round? Would the larger block size make the command more efficient? – rexford Jun 13 '17 at 08:14
1

@rexford: The skip number is also given in blocks, and is not a multiple of 253. And given that the OS does its own buffering when reading from a normal file on a file system, in this case efficiency will not be as bas as when reading from a device. – Thomas Padron-McCarthy Jun 13 '17 at 12:00
but I don't want to calc the `count`, I only known the start and end offset. – Jiang YD Mar 14 '19 at 04:07
Could you explain the parameters? – Lin Jian Jun 12 '20 at 19:42
2

bs=1 is loads slower than, say, bs=1000 though. I actually saw a factor of 500 in a short test. – Stefan Reich Aug 05 '20 at 20:25

score 84 · Answer 2 · edited Oct 14 '21 at 11:14

84

This is an old question, but I'd like to add another version of the dd command that is better-suited for large chunks of bytes:

dd if=input.binary of=output.binary skip=$offset count=$bytes iflag=skip_bytes,count_bytes

where $offset and $bytes are numbers in byte units.

The difference with Thomas's accepted answer is that bs=1 does not appear here. bs=1 sets the input and output block size to 1 byte, which makes it terribly slow when the number of bytes to extract is large.

This means we leave the block size (bs) at its default of 512 bytes. Using iflag=skip_bytes,count_bytes, we tell dd to treat the values after skip and count as byte amount instead of block amount.

edited Oct 14 '21 at 11:14

Matthias Braun

32,039
22
142
171

answered Nov 24 '16 at 18:17

ChronoTrigger

8,459
1
36
57

6

This is indeed very much faster than my answer. – Thomas Padron-McCarthy May 09 '18 at 06:38
1

Doesn't work on Mac - `iflag` is an unknown operand and without it you get an entire block. – Timmmm May 14 '19 at 07:37
3

@Timmmm GNU `dd` can be used for `iflag` support (`brew install coreutils`). Note: by default the utilities are installed with a `g` prefix (e.g. `gdd` instead of `dd`) – Shakil May 16 '20 at 06:02

score 15 · Answer 3 · edited Oct 14 '21 at 11:46

15

head -c + tail -c

Not sure how it compares to dd in efficiency, but it is fun:

printf "123456789" | tail -c+2 | head -c3

picks 3 bytes, starting at the 2nd one:

See also:

edited Oct 14 '21 at 11:46

Matthias Braun

32,039
22
142
171

answered May 10 '17 at 08:41

Ciro Santilli OurBigBook.com

347,512
102
1,199
985

@elvis.dukaj yes, there should be no different. Just give it a try with `printf '\x01\x02' > f` and `hd`. – Ciro Santilli OurBigBook.com Jul 23 '19 at 10:40
3

Much faster than dd with bs=1, thank you! Please note that tail counts bytes from 1, not from 0. Also, tail exits with error code 1 when its output is closed prematurely by head. Make sure to ignore that error when using "set -e". – proski Aug 25 '19 at 19:55

Albert Burbea · Answer 4 · 2019-06-06T10:48:14.013

2

Even faster

dd bs=<req len> count=1 skip=<req offset> if=input.binary of=output.binary

edited Jun 06 '19 at 10:48

answered Jun 06 '19 at 10:45

Albert Burbea

41
2

3

The problem here is that `skip` is in units of `bs`. – Arkku Jul 18 '19 at 12:32
it is a detail for the executor, and still better than the above, true you'd need to re-calc like: `req_offset=$(bc <<< "$offset/$bs")` and make sure it turns out a round value. – Tchakabam May 10 '20 at 19:35

karna7 · Answer 5 · 2021-11-30T23:06:41.490

I have had the same problem, trying to cut parts of a RAW disk image. dd with bs=1 is unusable, therefore I have made a simple C program for the task.

// usage:
//  ./cutfile srcfile destfile offset length
//  ./cutfile my.image movie.avi 4524 20412452
// compile, presuming it is saved as cutfile.cc:
//  gcc cutfile.cc -o cutfile -std=c11 -pedantic -W -Wall -Werror 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char *argv[])
{
  if(argc != 5) {
      printf("error, need 4 arguments!\n");
      return 1;
  }


  const unsigned blocksize = 16*512;  // can adjust
  unsigned char buffer[blocksize];

  FILE *f = fopen(argv[1], "rb");
  FILE *fout = fopen(argv[2], "wb");
  long offset = atol(argv[3]);
  long length = atol(argv[4]);
  if(f==NULL || fout==NULL) {
      perror("cannot open file");
      return 1;
  }
  fseek(f, offset, SEEK_SET);

  while(length > blocksize) {
      fread(buffer, 1, blocksize, f);
      fwrite(buffer, 1, blocksize, fout);
      length -= blocksize;
  }
  if(length>0) { // copy rest
      fread(buffer, 1, length, f);
      fwrite(buffer, 1, length, fout);
  }    

  fclose(fout);
  fclose(f);
  return 0;
}

Note that in C++ you're kind of expected to use `std::ifstream` and `std::ofstream`... — Alexis Wilke, Oct 17 '21 at 00:37
yes, but actually it's pure C if I change includes like cstdio to stdio.h . I don't know why I chose to start with C++ headers, maybe I have thought I will use more C++ stuff at first?! — karna7, Nov 29 '21 at 22:50
Yeah, you may want to edit your answer and make it C since the OP asked about C. — Alexis Wilke, Nov 30 '21 at 00:18

score 1 · Answer 6 · answered Sep 14 '09 at 19:12

1

The dd command can do all of this. Look at the seek and/or skip parameters as part of the call.

answered Sep 14 '09 at 19:12

Joe

41,484
20
104
125

but dd can be very slow when you want blick missaligned access. and doing bs=1 is super slow – karna7 Apr 01 '22 at 22:18

How do I extract a single chunk of bytes from within a file?

6 Answers6

Linked

Related