0

I'm working with some CSV files which have been created incorrectly. There are quotations and commas interconnected, and I keep getting parsing errors from pd.read_csv, even after replacing all column-separating commas with tabs.

Nevertheless, Numbers (Apple's Excel) can read the file perfectly, and, after re-saving it as csv, Pandas can generate data frames seamlessly. Thus, I wanted to know if there was a way, preferentially through Python, to automate this import-export in Numbers/Excel (maybe an API?) to fix my CSVs, or maybe find out what they do to correct them.

EDIT: The CSV rows look as following:

"id","lastVisitTimeLocal","lastVisitTimeUTC","title","url","typedCount","visitCount",""[]"_id","_id" 8986,"06/03/2018, 20:00:48","3/6/2018 2:30:48 PM","","https://chrome.google.com",0,1,3000001,2000001

Although some titles contain commas and some links contain quotation marks, so I keep getting parsing errors, despite Numbers/Excel parsing them seamlessly.

EDIT2: I'm looking for a pipeline that does the following:

file.csv --excel_engine--> file.xlsx --excel_engine--> file2.csv

Nico
  • 311
  • 4
  • 11

1 Answers1

1

Have you tried setting quoting and doublequote in pd.read_csv()? It's odd to me that Pandas can't read a csv that Excel can (i usually have problems with Excel instead; the only issue i've had with Pandas is NUL characters).

Alternatively you can also run this in VBA:

Sub openCsvAndSave()
    Dim csv_paths, path
    csv_paths = Array(path1, path2, ...) ' Set your csv paths here '
    For Each path in csv_paths
        Dim NewWb As Workbook: Set NewWb = Workbooks.Open(path)
        NewWb.SaveAs Left(path, Len(path) - 4) & "_2.csv", xlCSV
    Next path
End 
iamanigeeit
  • 784
  • 1
  • 6
  • 11