4

I was calling dos2unix from within Python this way:

call("dos2unix " + file1, shell=True, stdout=PIPE)

However to silence the Unix output, I did this:

f_null = open(os.devnull, 'w')
call("dos2unix " + file1, shell=True, stdout=f_null , stderr=subprocess.STDOUT)

This doesn't seem to work. The command isn't being called anymore as the diff that I perform on the file1 against file2 (did a diff -y file1 file2 | cat -t and could see the line endings hadn't changed).

file2 is the file I am comparing file1 against. It has Unix line endings as it is generated on the box. However, there is a chance that file1 doesn't.

methuselah
  • 12,766
  • 47
  • 165
  • 315

1 Answers1

3

Not sure, why but I would try to get rid of the "noise" around your command & check return code:

check_call(["dos2unix",file1], stdout=f_null , stderr=subprocess.STDOUT)
  • pass as list of args, not command line (support for files with spaces in it!)
  • remove shell=True as dos2unix isn't a built-in shell command
  • use check_call so it raises an exception instead of failing silently

At any rate, it is possible that dos2unix detects that the output isn't a tty anymore and decides to dump the output in it instead (dos2unix can work from standard input & to standard output). I'd go with that explanation. You could check it by redirecting to a real file instead of os.devnull and check if the result is there.

But I would do a pure python solution instead (with a backup for safety), which is portable and doesn't need dos2unix command (so it works on Windows as well):

with open(file1,"rb") as f:
   contents = f.read().replace(b"\r\n",b"\n")
with open(file1+".bak","wb") as f:
   f.write(contents)
os.remove(file1)
os.rename(file1+".bak",file1)

reading the file fully is fast, but could choke on really big files. A line-by-line solution is also possible (still using the binary mode):

with open(file1,"rb") as fr, open(file1+".bak","wb") as fw:
   for l in fr:
      fw.write(l.replace(b"\r\n",b"\n"))
os.remove(file1)
os.rename(file1+".bak",file1)
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Your sample python code will not work with big files. You might use `iter` to read by chunks. `for chunk in iter(f.read, b''): out.write(chunk.replace(b'\r\n', b'\n'))`. Obviously this is simplified because the chunk might end up right in the middle of a Windows newline, and you'd have to check for that. An other alternative is to open the file in text mode with universal newline support and then write it in binary using the correct newline character, which should be easier. – Giacomo Alzetta Apr 09 '18 at 09:25
  • you're right, if the file is big, it doesn't work. I could also use `readline()` in binary mode. – Jean-François Fabre Apr 09 '18 at 09:26