0

I need to create the following solution, I have a CSV file that the delimiter is a comma (,), but in one of the columns this comma can come by error in the user registration. The CSV scenario is this:

francisco, 17, fran-mail@gmail,com, blue
kai, 10, pont_193@yahoo.com, red
Deive, 19, deveper@hotmail,com, black

How can I do this interpretation without breaking my reading in Python 2.7?

In my scenario I don't have permission to update Python.

  • is the erroneous comma always in the email column, or can it appear in any column? – Sören May 20 '22 at 18:34
  • 1
    Updating to a different version of Python won't help. This issue is caused by an improperly quoted CSV file, it also has what appear to be superfluous spaces in the values but this isn't a format issue just an indication that the source of this CSV file doesn't understand CSV. The best solution is to request that the source send you a properly quoted CSV file. If this isn't an option, then you're stuck writing either a custom parser or cleaning up the data in a post-processing step after parsing with `csv`. – Michael Ruth May 20 '22 at 18:40
  • The way I would handle this is when you get any invalid record (you can do some level of validation to make sure that after the @ symbol there's at least one dot or not) you emit a warning message saying an invalid record was detected and skipped, and then skip the CSV file. It's not on you to fix every CSV error, and if your file has CSV failures like this which are format errors, it's not on you to really fix it or adapt - it's a case of the file not conforming to CSV format. – Thomas Ward May 20 '22 at 18:46
  • I can't make a regex by quoting between the @ and ,com ? If so, how could I do it? –  May 20 '22 at 18:57
  • @Felipe you can, but you have to write a custom CSV parser to do that properly as Michael said and you won't be able to use the inbuilt CSV libraries. – Thomas Ward May 20 '22 at 19:19

0 Answers0