1

My python code looks like this:

def test():
    pipe = sp.Popen( ["test.sh"], stdin=sp.PIPE)
    data = "".join([chr((s)%17) for s in range(0,33)])
    os.write(pipe.stdin.fileno(), data)
    pipe.stdin.write("endoffile")

if __name__ == "__main__":
    test()

It calls the following simple bash shell script which simply writes stdin to a file (script is called test.sh)

#!/bin/bash
VALUE=$(cat)

echo "$VALUE" >> /tmp/test.txt

When I run the python code I expect test.txt to contain the values 0x01..0x10 two times, and after that the string "endoffile"

However here's a hexdump of the file:

0000000: 0102 0304 0506 0708 090a 0b0c 0d0e 0f10  ................
0000010: 0102 0304 0506 0708 090a 0b0c 0d0e 0f65  ...............e
0000020: 6e64 6f66 6669 6c65 0a                   ndoffile.

It appears that a byte is missing (0x10).

What am I missing here?

--- Update

Changing the test() function to:

def test():
    pipe = sp.Popen( ["test.sh"], stdin=sp.PIPE)
    data = "".join([chr((s)%16+1) for s in range(0,32)])
    os.write(pipe.stdin.fileno(), data)
    pipe.stdin.write("endoffile")

Seems to solve that. It seems to be related to having chr(0) sent to pipe.

oferlivny
  • 300
  • 4
  • 15

1 Answers1

1

range() is right side exclusive.

range(0, 33) is [0, ..., 32], probably because this way you can range(0, len(sequence)) without off-by-one errors.

Since 32 % 17 == 15 == 0x0f, the byte '\x10' you are expecting was never part of the list in the first place.

Edit 1: Also missing from the output are the zero characters '\x00'. If you use VALUE=$(cat) the output of cat is subject to processing by the shell.

SingleUnix/POSIX seems to be silent on the matter. It is however clear, that you cannot have '\0' as part of a shell variable's value (or name for that matter) since the Unix environment requires both to be C-style zero terminated strings. I actually would have expected the value of VALUE to be an empty string.

Edit 2 After some digging, I can say that at least the ash implementation ignores '\0' processing backtick-supplied input. Input is read until EOF and null characters are explicitly skipped.

bash does the same and even has an explicit (even if commented out) warning associated with the event.

dhke
  • 15,008
  • 2
  • 39
  • 56
  • dhke - You are correct, however the off-by-one error is to the other side - an entry is missing. I suspect this is due to sending chr(0) via pipe - it is discarded. Any idea why? – oferlivny Sep 21 '15 at 16:36
  • @sferic I did not even realize that the `\0` was also missing, thanks for reminding me. See edit. – dhke Sep 21 '15 at 16:45
  • In the contrived example, a solution would for test.sh to contain the single line of code: `cat >> /tmp/test.txt`. – Robᵩ Sep 21 '15 at 16:54
  • The problem is not that the output of `cat` is subject to word-splitting (it isn't), but that the null byte terminates the string prematurely. Shell simply isn't designed to handle arbitrary binary data. – chepner Sep 21 '15 at 17:52
  • @chepner Ah right, `=` doesn't do word splitting. This actually makes me more sure that this is simply an undocumented side effect that you cannot rely on in any way. – dhke Sep 21 '15 at 19:15
  • I can't find an exact, definitive statement, but I'm pretty sure the POSIX specification requires the null byte to be treated as a string terminator in a parameter value. `cat` can produce a string containing a null byte, but the shell is incapable of storing it in a string. Compare with something like `printf 'abc\0def' | { read -d ''; echo $REPLY; read -d ''; echo $REPLY; }`, which never subjects the null byte to shell expansion or storage; it is only read via standard input and discarded as a separator. – chepner Sep 21 '15 at 19:34
  • That why I would expect $VALUE to be `''`, but it isn't. It rather looks like, the string is read until EOF and `strcpy()`ied into the variable value. Since `strcpy(target, "\0")` does exactly nothing, the null seems to simply fall of a cliff. I still think this a side effect of the read-till-eof. However, that could also explain random cut-offs because of buffer size unless the shell is always reading char-by-char. – dhke Sep 21 '15 at 19:46