0

I am trying to parse gzip files line by line :

with gzip.open(obj.get()['Body'])as f:

    for line in f:
        line=StringIO(line.decode("utf-8"))
        line=csv.reader(line,delimiter=',')

        for line1 in line:

         #some logic

But for some of the files I have error:

new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

When I try to open in newlline mode:

csv.reader(open(line, 'rU'), delimiter=',')

I have:

expected str, bytes or os.PathLike object, not _io.StringIO

I want all fields, which contain '\r' to be in that field as part of string value. How this can be resolved?

Liz Hi
  • 3
  • 1
  • 5

2 Answers2

0

Something like this, which avoids using the csv.reader and StringIO modules:

with gzip.open(obj.get()['Body'])as f:
    for line in f:
        line = line.strip()
        line = line.decode("utf-8").split(',')

        for line1 in line:
            #some logic
Alex
  • 6,610
  • 3
  • 20
  • 38
  • Thank you for your suggestion, but I use csv reader, cause it parse my data correctly. I have commas inside quoted filelds and that is why .split(',') doesnt work for me – Liz Hi Apr 10 '19 at 17:30
  • 1
    A [mcve] with complete example file would be useful to understand – bracco23 Apr 10 '19 at 17:37
0

According to https://docs.python.org/3.7/library/io.html?highlight=io#io.StringIO if you pass a second parameter as None it should recognize all newlines

bracco23
  • 2,181
  • 10
  • 28
  • Thank you!I thought it worked, but it turned out, rows whose fields contained '\r' inside were removed after i set newline parameter to None. And I need all rows in my resultset – Liz Hi Apr 10 '19 at 17:28