14

I have a bunch of files. Some are Unix line endings, many are DOS. I'd like to test each file to see if if is dos formatted, before I switch the line endings.

How would I do this? Is there a flag I can test for? Something similar?

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
chiggsy
  • 8,153
  • 5
  • 33
  • 43
  • Same question as http://stackoverflow.com/questions/121392/how-to-determine-the-line-ending-of-a-file (except this one's tagged 'python' :-) – Jonik May 09 '10 at 18:32

7 Answers7

33

Python can automatically detect what newline convention is used in a file, thanks to the "universal newline mode" (U), and you can access Python's guess through the newlines attribute of file objects:

f = open('myfile.txt', 'U')
f.readline()  # Reads a line
# The following now contains the newline ending of the first line:
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (Mac OS pre-OS X).
# If no newline is found, it contains None.
print repr(f.newlines)

This gives the newline ending of the first line (Unix, DOS, etc.), if any.

As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.

Reference: http://docs.python.org/2/library/functions.html#open

If you just want to convert a file, you can simply do:

with open('myfile.txt', 'U') as infile:
    text = infile.read()  # Automatic ("Universal read") conversion of newlines to "\n"
with open('myfile.txt', 'w') as outfile:
    outfile.write(text)  # Writes newlines for the platform running the program
Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
  • 2
    -1 It's called `newlines` (plural) and it's not an encoding. What you have shown is how to find what (if anything) terminates the first line (if any). Your comment is incorrect: it doesn't include the case where the first line and only line is not terminated (and so `newlines` refers to `None`). Further, it assumes that all lines are terminated the same way. Concatenations of files of different line endings are not unknown. In the OP's application of standardising on one line ending, he will need to read ALL the input file (and ALL the docs, especially where it mentions `tuple`). – John Machin May 10 '10 at 12:18
  • 5
    @John: Come on: -1 for an answer that mentions the useful `newlines`, but only with a typo? Or for pathological files concatenated from files with different newline conventions? The original poster mentioned "files from Unix or DOS", not such strange files! – Eric O. Lebigot May 10 '10 at 15:51
  • @John: Your information about f.newlines returning a tuple in the case of a mixed newline convention is interesting. I added it to the response. – Eric O. Lebigot May 10 '10 at 15:58
  • I upvoted it. I was a useful answer to me. @John makes a very good point though, concerning corner cases. – chiggsy May 10 '10 at 20:33
  • Thank you! I did cite John's corner case in the answer, because I also found it interesting. :) – Eric O. Lebigot May 11 '10 at 07:02
  • The file objects' `newlines` attribute comes from [io.TextIOBase](https://docs.python.org/3/library/io.html#io.TextIOBase.newlines) (Python 3): A string, a tuple of strings, or None, indicating the newlines translated so far. Depending on the implementation and the initial constructor flags, this may not be available. – handle Jul 26 '16 at 09:30
  • Interestingly, the official reference given in the answer indicates that `newlines` is always available, though… – Eric O. Lebigot Aug 23 '18 at 13:27
  • 1
    `DeprecationWarning: 'U' mode is deprecated`. I wonder which version started that? – Mark Ransom Sep 08 '22 at 22:09
9

You could search the string for \r\n. That's DOS style line ending.

EDIT: Take a look at this

nc3b
  • 15,562
  • 5
  • 51
  • 63
3

(Python 2 only:) If you just want to read text files, either DOS or Unix-formatted, this works:

print open('myfile.txt', 'U').read()

That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".

http://docs.python.org/library/functions.html#open

(Thanks handle!)

johntellsall
  • 14,394
  • 4
  • 46
  • 40
  • 1
    Well, I'll want to edit them in vim. I'd like to make that line ending change once and commit it, vs per file. – chiggsy May 09 '10 at 22:20
  • 2
    This will destructively change DOS CRLF to Unix LF on all files in the current directory: perl -p0i -e 's/\r\n/\n/g' * I've typed this so many times my fingers have memorized it :) – johntellsall May 10 '10 at 21:53
  • @chiggsy install the dos2unix package, and run the dos2unix command on the files rather. – nos Apr 07 '14 at 12:59
  • 2
    The `U` mode is obsolete in Python 3. – handle Jul 26 '16 at 09:36
2

As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:

if "\r\n" in open("/path/file.txt","rb").read():
    print "DOS line endings found"

Edit: simplified as per John Machin's comment (no need to use regular expressions).

Jonik
  • 80,077
  • 70
  • 264
  • 372
  • Shouldn't you open the file with "rb"? – President James K. Polk May 09 '10 at 19:17
  • Hmm, my first thought was no, because we're dealing with *text* files... But are you referring to this: "The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading." (http://docs.python.org/library/functions.html#open)? I wasn't aware of such conversions – maybe "rb" should indeed be used for this to work on non-Unix systems too. – Jonik May 09 '10 at 20:15
  • 2
    `re.search()` is not minimalist; it's OVERKILL; use `"\r\n" in open(...).read()`. There's no "maybe" about using `"rb"`; it's a must. – John Machin May 09 '10 at 22:20
1

You can use the following function (which should work in Python 2 and Python 3) to get the newline representation used in an existing text file. All three possible kinds are recognized. The function reads the file only up to the first newline to decide. This is faster and less memory consuming when you have larger text files, but it does not detect mixed newline endings.

In Python 3, you can then pass the output of this function to the newline parameter of the open function when writing the file. This way you can alter the context of a text file without changing its newline representation.

def get_newline(filename):
    with open(filename, "rb") as f:
        while True:
            c = f.read(1)
            if not c or c == b'\n':
                break
            if c == b'\r':
                if f.read(1) == b'\n':
                    return '\r\n'
                return '\r'
    return '\n'
Cito
  • 5,365
  • 28
  • 30
0

dos linebreaks are \r\n, unix only \n. So just search for \r\n.

Femaref
  • 60,705
  • 7
  • 138
  • 176
0

Using grep & bash:

grep -c -m 1 $'\r$' file

echo $'\r\n\r\n' | grep -c $'\r$'     # test

echo $'\r\n\r\n' | grep -c -m 1 $'\r$'  
shallo
  • 1