ParserError: Error tokenizing data. C error: out of memory

Question

I'm having the following error:

"ParserError: Error tokenizing data. C error: out of memory"

When I try to read a large dataframe (5 gb), but I am selecting only the columns that interest me and setting the necessary parameters, and even so it does not work. I've tried using chunks parameter.

df = pd.read_csv('file.csv', encoding = 'ISO-8859-1', usecols = names_columns, low_memory = False, nrows = 10000)

The strange thing is that when I put the parameter "nrows = 1000" it works.

I've run dataframes with many more rows than that and it worked perfectly, but this one is giving this error.

Someone has any suggestions?

A Dataframe with 1000 rows, many columns, and large data types can be a larger object than a Dataframe with 10000 rows, a few columns, and small data types. Perhaps you may benefit from specifying `dtype`? (see the `dtype` argument in the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)) — soyapencil, Jan 20 '20 at 22:32
hi and welcome to SO! Consider removing `low_memory=False` to prevent OOM errors. — hongsy, Jan 21 '20 at 07:18

score 1 · Answer 1 · answered Jan 21 '20 at 07:29

From this answer:

There should not be a need to mess with low_memory. Remove that parameter option.
Specifying dtypes (should always be done)

Consider the example of one file which has a column called user_id. It contains 10 million rows where the user_id is always numbers. Adding dtype={'user_id': int} to the pd.read_csv() call will make pandas know when it reads the file, that this is only integers.

ParserError: Error tokenizing data. C error: out of memory

1 Answers1