3

I'm trying to run the following code :

import pandas as pd
import numpy as np
df = pd.read_csv('E:/test.csv', low_memory=False)
mat = df.as_matrix(columns ['pageTitle','deviceCategory','eventCategory','eventAction'])
values, counts = np.unique(mat.astype(str), return_counts=True)
for x in values:
  df[x]=df.isin([x]).any(1).astype(int)

grouped = df.groupby('Session_ID')
grouped.sum().to_csv('E:/test2.csv')

but I get the following error:

Traceback (most recent call last):

File "C:\Users\User\Desktop\flat_seminar.py", line 5, in <module>
values, counts = np.unique(mat.astype(str), return_counts=True)
ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

I tried to use memmap but memmap doesn't support as_matrix functions.

Idan Str
  • 614
  • 1
  • 11
  • 33
elor
  • 31
  • 3
  • FYI `low_memory=False` doesn't actually do anything: https://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options – FHTMitchell May 14 '18 at 11:30
  • Possible duplicate of [ValueError: array is too big - cannot understand how to fix this](https://stackoverflow.com/questions/21666976/valueerror-array-is-too-big-cannot-understand-how-to-fix-this) – FHTMitchell May 14 '18 at 11:31

0 Answers0