5

I have a file like this:

A\r
B\n
C\r\n.

(By \r I'm referring to CR, and \n is LF)

And this script:

import fileinput
for line in fileinput.input(mode='rU'):
    print(line)

When I call python script.py myfile.txt I get the correct output:

A
B
C

But when I call it like this: type myfile.txt|python script.py, I get this:

B
C

You see? No more "A".

What is happening? I thought the mode='rU' would take care of every newline problem...

EDIT: In Python 3 there is no such problem! Only in Python 2. But that does not solve the problem.

Thanks

EDIT:

Just for the sake of completeness. - It happens also in Linux.

  • Python 3 handles every newline type (\n, \r or \r\n) transparently to the user. Doesn't matter which one your file got, you don't have to worry.
  • Python 2 needs the parameter mode='rU' passed to fileinput.input to allow it to handle every newline transparently. The thing is, in Python 2 this does not work correctly when piping content to it. Having tried to pipe a file like this:

    CR: \r
    LF: \n
    CRLF: \r\n
    

Python 2 just treats these two lines as just one line and if you try to print every line with this code:

for i,line in enumerate(fileinput.input(mode='rU')):
    print("Line {}: {}".format(i,line), end='')

It outputs this:

Line 0: CR:
LF:
Line 1: CRLF:

This doesn't happen in Python 3. There, these are 2 different lines. When passing this text as a file, it works ok though.

Piping data like this:

LF: \n    
CR: \r
CRLF: \r\n

Gives me a similar result:

Line 0: LF: 
Line 1: CR:
CRLF:

My conclusion is the following:

For some reason, when piping data, Python 2 looks for the first newline symbol it encounters and then on, it just considers that specific character as a newline. In this example Python 2 encounters \r as the first newline character and all the others (\n or \r\n) are just common characters.

Manuel
  • 478
  • 8
  • 20
  • I think it has something to do with the `type` command because simply executing `type myfile.txt` produces: `B C .` On the other hand you said it worked on Python3 so that's really puzzling. – asherbret Jan 05 '17 at 10:32
  • I thought exactly that, and then it working in Python 3 got me upside down... now I don't know what to think. – Manuel Jan 05 '17 at 10:36
  • I couldn´t find an answer, I just ended deleting any CR or LF with re.sub – Manuel Jan 05 '17 at 19:22
  • Last `C` is not deleted; `\r` moves the cursor to the start of the line and it hides `C`. try to `... |python script.py | od -c ` to see whia is going on. – JJoao Mar 29 '19 at 09:10

0 Answers0