Error tokenizing data when I use a high number of files

Question

I am trying to create a dataframe for each file from a list of >3,000 files. When I use a small number of files my code works fine, but when I try bigger numbers (>300 files) I keep getting the same error:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 5

This is the script:

all_files_df = [pd.read_table("/data/lab/datasets/Drug_CyTOF_screening/"+x, sep='\t') for x in all_files]

Does anyone know what is causing this issue?

Thank you!

One of the lines in your csv files has one extra tab. – Corralien Jul 27 '21 at 20:26 — Corralien, Jul 27 '21 at 20:26

score 0 · Answer 1 · answered Jul 27 '21 at 20:34

To debug, try something like that:

data = []
for x in all_files:
   try:
       df = pd.read_table("/data/lab/datasets/Drug_CyTOF_screening/"+x, sep='\t')
       data.append(df)
   except pd.errors.ParseError as err:
       print(f"'{x}' contains errors. skipped")
       print(err)

Error tokenizing data when I use a high number of files

1 Answers1