0

I have a comma delimited file which also contains commas in the actual field values, something like this:

foo,bar,"foo, bar"

This file is very large so I am wondering if there is a way in python to either put double quotes around ever field:

eg: "foo","bar","foo, bar"

or just change the delimeter overall?

eg: foo|bar|foo, bar

End goal: The goal is to ultimately load this file into sql server. Given the size of the file bulk insert is only feasible approach for loading but I cannot specify a text qualifier/field quote due to the version of ssms I have.

This leads me to believe the only remaining approach is to do some preprocessing on the source file.

  • 1
    I'm sure either could be done with Python. – Scott Hunter Dec 07 '21 at 17:25
  • 1
    csv.reader has a [quote character](https://docs.python.org/3/library/csv.html#csv.Dialect.quotechar) parameter. If you specify `quotechar='"'` it should treat `"foo,bar"` as a single column. – wwii Dec 07 '21 at 17:27
  • 2
    Why do you care about inconsistent quoting? Your file is only using them where necessary; adding more just bloats the file size. – chepner Dec 07 '21 at 17:27
  • the goal is to ultimately load this file into sql server. Given the size of the file bulk insert is only feasible approach for loading but I cannot specify a text qualifier/field quote due to the version of ssms I have. – JimmyBuffet_Express Dec 07 '21 at 17:29
  • 1
    Does [Python parse CSV ignoring comma with double-quotes](https://stackoverflow.com/questions/21527057/python-parse-csv-ignoring-comma-with-double-quotes) answer your question? – wwii Dec 07 '21 at 17:31

2 Answers2

2

Changing the delimiter just requires parsing and re-encoding the data.

with open("data.csv") as input, open("new_data.csv", "w") as output:
    r = csv.reader(input, delimiter=",", quotechar='"')
    w = csv.writer(output, delimiter="|")
    w.writerows(r)

Given that your input file is a fairly standard version of CSV, you don't even need to specify the delimiter and quote arguments to reader; the defaults will suffice.

r = csv.reader(input)
chepner
  • 497,756
  • 71
  • 530
  • 681
0

It is not an inconsistent quotes. If a value in a CSV file has comma or new line in it quotes are added to it. It shoudn't be a problem, because all standard CSV readers can read it properly.

  • I think the OP is referring to the inconsistency of not quoting *every* field, only those that require quoting. – chepner Dec 07 '21 at 17:29