readlines() python 2.7 vs 3.10

Question

I wrote a script in python 2.7 but want to switch to python 3.10 The only problem is that for some reason the readlines() command isn't producing the same results and is causing problems with my list comp. Below are the two different versions and their results:

Python 2.7

file_to_open = open('file.csv', 'r') 
f = file_to_open.readlines()
print(len(f))

The result is 2001

Python 3.10

file_to_open = open('file.csv', 'r') 
f = file_to_open.readlines()
print(len(f))

The result is 10401

The csv file does have 2001 rows so that is the correct number. There has to be some characters that are creating new lines or something that is screwing with the python 3 version. Has anyone encountered this before?

This could be a universal newlines thing. Try opening the file in `'rU'` mode in Python 2 and see if you get 10401. If you do, the difference is caused by universal newlines. It'd be a very weird CSV file if that's the case, though. — user2357112, Jan 21 '22 at 23:18
`len` does not give enough information here. What does `f[:10]` give? That's a better way to probe when these two results are different. — Kraigolas, Jan 22 '22 at 01:10
Unfortunately I can't post a file because it has to do with work. @user2357112supportsMonica it does have to do with \n characters found in the csv. Looking through all the fields in the csv there are several fields that have \n characters in there. You were right. But how do I get python 3 to ignore those and only do the end of the line? — Bob_Loblaw2342, Jan 24 '22 at 20:33

score 0 · Answer 1 · answered Jan 24 '22 at 22:47

It has to do with universal new lines and how python 2 and 3 read them. In the CSV file there were extra '\r' characters within the fields. So I had to use the 'b' option when opening the file to ignore universal new lines. But then it was reading each line as bytes so I had to type cast each line back to a str and then do an re.sub to replace the '\r' characters. Below is the list that I created that ended up working perfectly.

import re

f = [re.sub(' \r ', '', str(line)) for line in open('file.csv', 'rb')]

readlines() python 2.7 vs 3.10

1 Answers1