0

I'm trying to open 10 csv files in pandas as a dataframe using the read_csv() function however I keep getting the following error- "MemoryError: Unable to allocate 207. MiB for an array with shape (10, 2718969) and data type int64".

8 of the csv files are around 1-3 KB while one is 11,819 KB and the other is 99,694 KB. The 8 files are like lookup tables while the 99,694 KB file is the main file.

I also have to merge/join these files into one file based on a few conditions. For example, the 99,694 KB file(Let's call it Table1) has the following rows:

enter image description here

One of the smaller lookup files(Table 2) has this information:

enter image description here

I'm trying to merge the files based on SId of Table 1 with SId of Table 2. I tried to use ms access to do this and got an "Overflow" error.

Is there any better way to do this?

I was able to use Dask to join the multiple tables but the problem is the main file has more than 2 million rows. I tried to use df.head(1) to see just the first row of the final combined file and Dask threw a MemoryError. I tried saving it as csv and again I got a MemoryError.

I'm trying to use this dataset to perform some EDA and hopefully classification but I don't think I will be able to do that using this large dataset.

In such cases, is it better to take a sample of the data to perform EDA and ML? or is there a better way?

user4157124
  • 2,809
  • 13
  • 27
  • 42
Aastha Jha
  • 153
  • 1
  • 2
  • 14
  • might be better to work in chunks or use `Dask` have a look into the `chunk` paramter in `pd.read_csv` also look up memory efficient methods when working with large dataframes. [this post](https://realpython.com/python-pandas-tricks/) has a few good examples. – Umar.H Jun 10 '20 at 13:15
  • Could you try passing explicit column types via the [`dtype` argument](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)? Looks like you're getting a DataFrame object with extremely large types. – joebeeson Jul 08 '20 at 21:32

0 Answers0