1

I use truncate on the file to check behavior of my software on incomplete files, it tuns out totally unexpected. I started to dig and ended up here.

>cp full truncated
>truncate -s 2485129215 truncated
>ls -la
-rw-r--r-- 1 yuki yuki 2485129215 Mar  6 14:30 truncated
-rw-r--r-- 1 yuki yuki 2485129216 Mar  6 14:24 full

I expect this to give truncated identical to full without the last byte. But not, I ran xxd on those and compare with diff - lots of gygabytes of differences, hard to comprehend.

I wrote a dumb byte comparison program in python.

import os
f1 = open('./full', 'r')
f2 = open('./truncated', 'r')

s1 = os.stat('./full').st_size
s2 = os.stat('./truncated').st_size

print('size1 {}'.format(s1))
print('size2 {}'.format(s2))

s = min(s1, s2)

print('min={}'.format(s))

i = 0
while i < s:
    b1 = f1.read(1)
    b2 = f2.read(1)
    if b1 != b2:
        print("{} {:02x} != {:02x}".format(i, ord(b1[0]), ord(b2[0])))
        break
    i += 1

returns

size1 2485129216
size2 2485129215
min=2485129215
1288204320 44 != e8

Ok tried to look with xxd

>xxd -o 1288204320 -l 5 ./full
4cc87020: 2a00 0000 00                             *....

does not seems like it... Run this

>xxd -o 1288204319 -l 5 ./full
4cc8701f: 2a00 0000 00                          

Seriously? Why does it look identical to the previous?

What is wrong with linux tools, or me using it? I on regular Ubuntu 18.04. Can it be that the truncate and xxd only accept some kind of low numbers and overflow otherwise? Any ideas?

Btw, running head -c 2485129215 full > truncated gives an expected result. Houray! At least something does work.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
Yuki
  • 3,857
  • 5
  • 25
  • 43
  • To rule out some variables, if you start again from the beginning (`cp full truncated ; truncate -s 2485129215 truncated`) and then do `cmp full truncated`, what do you get? `cmp` is a standard program that is roughly the equivalent of your "dumb byte comparison" program. – Nate Eldredge Mar 06 '20 at 16:48
  • I removed the [tag:terminal] tag as this has nothing to do with terminals per se. – Nate Eldredge Mar 06 '20 at 16:49
  • 1
    When I tried your test, starting with a random file `full` of the same size, `cmp full truncated` outputs `cmp: EOF on truncated after byte 2485129215`, and your compare program runs to completion without reporting any differences. So I think either you accidentally modified one of your files during your tests, or there is something unusual or broken about your system. – Nate Eldredge Mar 06 '20 at 17:10
  • 1
    Oh, here is a guess. What happens if you change your python program to open the files in binary mode (`open(..., 'rb')`)? If the files really are the same up to the end, they should compare the same in either text or binary mode, but something funny could happen at the very end. On a similar note, what does `echo $LANG` display, and does anything change if you do `export LANG=C` before beginning? – Nate Eldredge Mar 06 '20 at 17:15
  • 1
    The `xxd -o ...` result isn't what you expected because `-o ...` does not mean what you think it means. That option does not specify an offset into the file. It specifies a value that is added to the position reported in the output. So both of those `xxd -o ...` commands show the first 5 bytes of the file, but they report those bytes as being at the position you specified to `-o ...` instead of reporting them at position 0. The `xxd` option that skips to a specified offset in the file is `-s ...`. – ottomeister Mar 07 '20 at 07:12

0 Answers0