0

I have a text file that that I would like to load into a pandas DF. looks like this:

SESS  UID       DBASE                 TYPE  TRANS____  ELAPSE______  RESP________  CPU__________
------------------------------------------------------------------------------------------------
0000  ALLUSERS  PARIS                 0107        753       235.717       235.712     158.995343
0000  ALLUSERS  NewYork               0107         15         3.262         3.262       1.292182
... # thousands of lines 
0000 SEP

following some answers of SO I am using:

f = pd.read_csv("DATA/e201910.txt", skiprows=2, header=None, delimiter=r"\s+" )

but it does not work.

The error:

ParserError: Error tokenizing data. C error: Expected 7 fields in line 327684, saw 8

It looks like the last line causes problems. How can I skip the last line together with the first two lines?

thanks

JFerro
  • 3,203
  • 7
  • 35
  • 88
  • A recent version of pandas should read the file correctly, which version did you use? – mozway Sep 15 '22 at 07:29
  • 1
    If it's not a versioning thing, then maybe try parsing to a list of dicts as an intermediate step, which will allow you more control to edit out invalid rows, and then dataframe.from_dict? – bn_ln Sep 15 '22 at 07:34

0 Answers0