2

I want to get a part of a binary file, from byte #480161397 to #480170447 (included, 9051 bytes in total)

I use cut -b, and I expected the size of trunk1.gz to be 9051 bytes, but I get a different result.

$ wget https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2016-07/segments/1454701152097.59/warc/CC-MAIN-20160205193912-00264-ip-10-236-182-209.ec2.internal.warc.gz

$ cut -b480161397-480170447 CC-MAIN-20160205193912-00264-ip-10-236-182-209.ec2.internal.warc.gz >trunk1.gz

$ echo $((480170447-480161397+1))
9051

$ ls -l trunk1.gz
-rw-r--r--  1 david  staff     3400324 Sep  8 10:28 trunk1.gz

What is wrong?

David Portabella
  • 12,390
  • 27
  • 101
  • 182
  • What do you get if you do a `wc -c trunk1.gz`? – Chem-man17 Sep 08 '16 at 08:48
  • 3400324 trunk1.gz – David Portabella Sep 08 '16 at 08:54
  • This could help http://stackoverflow.com/questions/1423346/how-do-i-extract-a-single-chunk-of-bytes-from-within-a-file – Inian Sep 08 '16 at 08:55
  • That means your cut is not doing what you thought it should. I tried `cut -b` with some `.gz` files that I had as well. I also got file sizes larger than the bbytes specified. In normal files this can be explained by the fact that there are columns in the files. So the command `cut -b` picks out the corresponding bytes from each line. Hence large file sizes. i.e. `cut -b` is probably not what you need here. – Chem-man17 Sep 08 '16 at 08:57

2 Answers2

2

cut -bN-M copies the range N-M bytes from every line of the input.

Example:

$ cut -b4-7 <<END
0123456789
abcdefghij
ABCDEFGHIJ
END

Output:

3456
defg
DEFG

Consider using dd for your purposes.

Leon
  • 31,443
  • 4
  • 72
  • 97
1

If you work with binary, I advise you to use dd command.

dd if=trunk1.gz bs=1 skip=480161397 count=9051 of=output.bin

bs is the block size and is set to 1 byte.

oliv
  • 12,690
  • 25
  • 45