-1

I received extracted data from a server, the problem is the extract has the delimiter ";" in the csv file.

I read the folder with the following command:

files = glob.glob(r"path/*.csv")
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
df2 = pd.concat(dfs,ignore_index=True)

and the output is:


columnA    columnB .... columnT columnU
2000        A      ....  I wish  NaN
1000        B     ....   that    NaN
this ends   NaN   ....    NaN    NaN
3000        A     .....    I      DUU
...

the text in row 3 belongs to the columnT in the second row. So far i am only possible to delete all weirds rows like row 4 but i am not able to keep that information.

df2.dropna(subset=['columnB'], how='all', inplace=True)

How can i read the files correctly? The Problem is, that in the text field columnT in the text it also use ";" as normal character.

the original text is (in csv):

columnA;    columnB; .... columnT;          columnU:
2000;        A;      ....  I wish;            NaN;
1000;        B;     ....   that; this ends;    NaN;
3000;        A;     .....    I;               DUU;
PV8
  • 5,799
  • 7
  • 43
  • 87
  • I'd say that's very difficult, as for any process, if the delimiter is purposely set to `;`, then a `;` in a text version of the `.csv` will always be seen as a new column indicator. You'll most likely need to clean the file before conversion. Do you have access to the conversion/extraction process from the server? See if you can get the `.csv` with a different delimiter. – davidverweij Feb 04 '22 at 13:30
  • 1
    Could you show **as plain text** an extract (with fake data if your want) of your csv file with at least the header row and one of the row containing the problematic data. Without that I cannot guess whether it is only a config problem or whether the file is irremediably broken or whether a possible workaround could exist. And please do not show what happens in a spreadsheet but the raw content as it happens in notepad, vi or any other text editor. – Serge Ballesta Feb 04 '22 at 13:44
  • i added it as a small extract – PV8 Feb 07 '22 at 07:13
  • No you did not. Your code reads the csv with `sep=';'` and what you show as the csv does not contain any `;`. I asked you to give the raw content not to bother you, but because, to be able to help you, I need to see exactly how the delimiters and quotes are used in the file. – Serge Ballesta Feb 07 '22 at 07:29
  • got you, i added it now – PV8 Feb 09 '22 at 13:38

1 Answers1

0

I wasn't aware of a programmatic approach to solve this (see my comment), but out of interest, a quick search led me to Escaping quotes and delimiters in CSV files with Excel. Perhaps you could try the same. I.e., either manually or programmatically, replace all single quotes for double quotes, and try your code again.

Dharman
  • 30,962
  • 25
  • 85
  • 135
davidverweij
  • 328
  • 1
  • 15